【架构实战】海量数据存储:分库分表中间件实战

张开发
2026/4/4 20:42:00 15 分钟阅读
【架构实战】海量数据存储:分库分表中间件实战
一、什么时候需要分库分表单库单表的瓶颈问题表现阈值参考单表数据量过大查询变慢索引失效单表1000万行单库连接数不足连接池耗尽并发1000单库磁盘空间不足写入失败磁盘80%单库CPU/内存不足响应变慢CPU70%分库分表的目标将数据分散到多个库/表降低单库压力提升查询性能提高系统可用性二、分库分表策略1. 垂直分库按业务模块拆分单体数据库 ├── 用户表 ├── 订单表 ├── 商品表 └── 支付表 ↓ 垂直分库 用户库 订单库 商品库 ├── 用户表 ├── 订单表 ├── 商品表 └── 地址表 └── 订单详情表 └── 商品分类表2. 水平分表按数据量拆分同一张表order表1亿行 ↓ 水平分表按user_id取模 order_0user_id % 4 0 order_1user_id % 4 1 order_2user_id % 4 2 order_3user_id % 4 33. 分片键选择好的分片键数据分布均匀查询时能精确定位分片不会频繁变更常见分片键业务分片键分片方式订单user_id取模用户user_id取模日志时间按月/年商品category_id取模三、ShardingSphere分库分表实战1. 依赖配置dependencygroupIdorg.apache.shardingsphere/groupIdartifactIdshardingsphere-jdbc-core/artifactIdversion5.3.0/version/dependency2. 分库分表配置spring:shardingsphere:datasource:names:ds0,ds1ds0:type:com.zaxxer.hikari.HikariDataSourcedriver-class-name:com.mysql.cj.jdbc.Driverjdbc-url:jdbc:mysql://192.168.1.100:3306/order_db_0username:rootpassword:passwordds1:type:com.zaxxer.hikari.HikariDataSourcedriver-class-name:com.mysql.cj.jdbc.Driverjdbc-url:jdbc:mysql://192.168.1.101:3306/order_db_1username:rootpassword:passwordrules:sharding:tables:order:actual-data-nodes:ds${0..1}.order_${0..3}# 分库策略按user_id取模database-strategy:standard:sharding-column:user_idsharding-algorithm-name:db-inline# 分表策略按order_id取模table-strategy:standard:sharding-column:order_idsharding-algorithm-name:table-inline# 分布式IDkey-generate-strategy:column:order_idkey-generator-name:snowflakesharding-algorithms:db-inline:type:INLINEprops:algorithm-expression:ds${user_id % 2}table-inline:type:INLINEprops:algorithm-expression:order_${order_id % 4}key-generators:snowflake:type:SNOWFLAKEprops:worker-id:1props:sql-show:true3. 自定义分片算法ComponentpublicclassOrderShardingAlgorithmimplementsStandardShardingAlgorithmLong{OverridepublicStringdoSharding(CollectionStringavailableTargetNames,PreciseShardingValueLongshardingValue){LonguserIdshardingValue.getValue();// 按user_id取模intindex(int)(userId%availableTargetNames.size());returnavailableTargetNames.stream().filter(name-name.endsWith(String.valueOf(index))).findFirst().orElseThrow(()-newRuntimeException(找不到分片));}OverridepublicCollectionStringdoSharding(CollectionStringavailableTargetNames,RangeShardingValueLongshardingValue){// 范围查询返回所有分片returnavailableTargetNames;}OverridepublicPropertiesgetProps(){returnnewProperties();}Overridepublicvoidinit(Propertiesprops){}OverridepublicStringgetType(){returnORDER_SHARDING;}}4. 绑定表配置# 绑定表order和order_item使用相同的分片键避免跨库JOINspring:shardingsphere:rules:sharding:binding-tables:-order,order_itemtables:order:actual-data-nodes:ds${0..1}.order_${0..3}table-strategy:standard:sharding-column:order_idsharding-algorithm-name:table-inlineorder_item:actual-data-nodes:ds${0..1}.order_item_${0..3}table-strategy:standard:sharding-column:order_idsharding-algorithm-name:table-inline5. 广播表配置# 广播表在所有分片中都有完整数据如字典表spring:shardingsphere:rules:sharding:broadcast-tables:-dict_data-region四、分库分表的挑战1. 跨库JOIN问题分库后无法直接JOIN解决方案// 方案1应用层JOINpublicOrderDetailgetOrderDetail(LongorderId){// 查询订单OrderorderorderMapper.selectById(orderId);// 查询用户可能在不同库UseruseruserMapper.selectById(order.getUserId());// 应用层组装returnnewOrderDetail(order,user);}// 方案2冗余字段// 在order表中冗余user_name字段避免跨库查询2. 分布式事务问题跨库操作无法使用本地事务解决方案// 使用Seata分布式事务GlobalTransactionalpublicvoidcreateOrder(OrderRequestrequest){// 扣减库存商品库inventoryService.deductStock(request.getProductId(),request.getQuantity());// 创建订单订单库orderService.createOrder(request);// 扣减余额用户库accountService.deductBalance(request.getUserId(),request.getAmount());}3. 分页查询问题跨分片分页查询性能差解决方案// 方案1禁止深度分页// 只允许查询前N页// 方案2游标分页publicListOrdergetOrdersByUserId(LonguserId,LonglastOrderId,intpageSize){// 使用游标分页避免OFFSETreturnorderMapper.selectByUserIdAndLastId(userId,lastOrderId,pageSize);}// 方案3ES辅助// 将需要分页的数据同步到ES通过ES分页4. 全局唯一ID// 使用雪花算法生成全局唯一IDServicepublicclassOrderService{AutowiredprivateSnowflakeIdGeneratoridGenerator;publicOrdercreateOrder(OrderRequestrequest){OrderordernewOrder();order.setOrderId(idGenerator.nextId());// 全局唯一IDorder.setUserId(request.getUserId());// ...returnorder;}}五、数据迁移方案1. 双写迁移阶段1双写写新库旧库读旧库 阶段2双写写新库旧库读新库 阶段3单写只写新库读新库代码实现ServicepublicclassOrderService{Value(${migration.phase:1})privateintmigrationPhase;publicOrdercreateOrder(OrderRequestrequest){OrderorderbuildOrder(request);// 阶段1和2双写if(migrationPhase1){newOrderMapper.insert(order);}if(migrationPhase2){oldOrderMapper.insert(order);}returnorder;}publicOrdergetOrder(LongorderId){// 阶段1读旧库if(migrationPhase1){returnoldOrderMapper.selectById(orderId);}// 阶段2和3读新库returnnewOrderMapper.selectById(orderId);}}2. 数据校验ComponentpublicclassDataMigrationValidator{Scheduled(fixedRate60000)publicvoidvalidate(){// 抽样校验ListLongsampleIdsgetSampleOrderIds(1000);intmismatchCount0;for(Longid:sampleIds){OrderoldOrderoldOrderMapper.selectById(id);OrdernewOrdernewOrderMapper.selectById(id);if(!Objects.equals(oldOrder,newOrder)){mismatchCount;log.error(数据不一致: orderId{},id);}}doublemismatchRate(double)mismatchCount/sampleIds.size();if(mismatchRate0.001){alertService.alert(数据迁移不一致率超过0.1%);}}}六、总结分库分表是解决海量数据存储的有效方案垂直分库按业务模块拆分水平分表按数据量拆分ShardingSphere成熟的分库分表中间件挑战跨库JOIN、分布式事务、分页查询实施建议优先考虑单库优化索引、缓存确实需要时再分库分表选择合适的分片键做好数据迁移方案思考题你们系统有没有做分库分表遇到了哪些挑战个人观点仅供参考

更多文章