Skip to content
云间札记
Go back

TiDB 生产集群部署:医链商务端实时查询架构设计与性能优化

Updated:

背景

医链商务端查询业务需要:

机器选型

规格配置数量月费用
ecs.hfg6.6xlarge24vCPU*3.1GHz / 96G / ESSD 200G3¥7623

选型理由

集群拓扑

服务节点端口说明
pd_servers3 节点2379调度中心
tidb_servers3 节点4000SQL 层
tikv_servers3 节点20160行存储
tiflash_servers3 节点9000列存储(分析)
monitoring1 节点19090Prometheus
grafana1 节点3000可视化

磁盘初始化

# 分区挂载
fdisk -u /dev/vdb
mkfs -t ext4 /dev/vdb1
mount /dev/vdb1 /vdb1

# TiDB 关键参数:nodelalloc, noatime
UUID=xxx /vdb1 ext4 defaults,nodelalloc,noatime 1 1

系统优化

# 关闭 swap
echo "vm.swappiness = 0" >> /etc/sysctl.conf
swapoff -a && swapon -a

# 关闭透明大页面
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# 文件句柄
echo "fs.file-max = 1000000" >> /etc/sysctl.conf

# NUMA 绑核
yum install -y numactl

TiUP 部署

# 安装 TiUP
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
source /root/.bash_profile

# 生成拓扑
tiup cluster template > topology.yaml

topology.yaml 关键配置:

global:
  user: "root"
  deploy_dir: "/data/tidb/tidb-deploy"
  data_dir: "/vdb1/tidb/tidb-data"

server_configs:
  tidb:
    log.slow-threshold: 300
  tikv:
    readpool.storage.use-unified-pool: true
    readpool.coprocessor.use-unified-pool: true
  pd:
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
# 检查并修复
tiup cluster check ./topology.yaml --user root
tiup cluster check ./topology.yaml --apply --user root

# 部署
tiup cluster deploy yl-tidb v5.3.0 ./topology.yaml --user root

# 启动
tiup cluster start yl-tidb

LDAP 集成 Grafana

[[servers]]
host = "192.168.1.110"
port = 389
use_ssl = false
bind_dn = "cn=Manager,dc=miaoshou,dc=com"
bind_password = 'Miaoyxkj2018'
search_filter = "(cn=%s)"
search_base_dns = ["ou=People,dc=miaoshou,dc=com"]

[servers.group_mappings]
group_dn = "superuser"
org_role = "Admin"

用户权限管理

-- 大数据账号
CREATE USER 'bigdata'@'%' IDENTIFIED BY 'xxx';
GRANT select, insert, update, delete ON *.* TO 'bigdata'@'%';

-- 只读账号
CREATE USER 'bigdata_r'@'%' IDENTIFIED BY 'xxx';
GRANT select ON *.* TO 'bigdata_r'@'%';

-- 运维账号
CREATE USER 'devops_tidb_r'@'%' IDENTIFIED BY 'xxx';

性能验证

指标目标实际
复杂 JOIN 查询< 3s1.2s
单表点查< 100ms45ms
数据导入速度> 10k/s15k/s

复盘

问题根因:原集群资源争抢,未按业务隔离部署

改进措施

本文首发于 wr.mrchi.cn,转载请注明出处。



Previous Post
SonarQube 26.3.0 生产部署:内存优化、分支扫描与 GitLab CI 集成