Skip to content
云间札记
Go back

CDH 6.3.2 生产集群部署与磁盘扩容实战

Updated:

背景

2021 年生产环境 CDH 6.3.2 集群节点磁盘告警(使用率 > 90%),需完成:

  1. 集群初始部署(5 节点:2 Master + 3 Slave)
  2. 在线磁盘扩容(1TB → 2TB)
  3. HDFS 数据平衡(节点间 + 节点内多磁盘)

集群规划

IP主机名角色
192.168.234.10master-1NN(Active)/JHS/MySQL
192.168.234.11master-2NN(Standby)/JN
192.168.234.12slave-1NM/DN
192.168.234.13slave-2RM(Standby)/NM/DN
192.168.234.14slave-3RM(Active)/NM/DN

前置优化

1. 关闭透明大页面

echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# 持久化
cat >> /etc/rc.local <<'EOF'
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
EOF
chmod +x /etc/rc.d/rc.local

2. 内核参数调优

cat >> /etc/sysctl.conf <<'EOF'
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
fs.file-max = 655350
vm.swappiness = 1
EOF
sysctl -p

3. 免密登录

ssh-keygen -t rsa
for host in 192.168.234.{11..14}; do
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@$host
done

CM 部署流程

1. 本地 YUM 源

mkdir -p /var/www/html/cloudera-repos/
cd /var/www/html/cloudera-repos/ && createrepo .

2. 数据库初始化

CREATE DATABASE scm DEFAULT CHARACTER SET utf8;
CREATE DATABASE amon DEFAULT CHARACTER SET utf8;
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8;
-- ... 共 10 个库

GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'Data@2019';
-- ... 对应授权

3. CM Server/Agent 安装

# Master
yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

# Slaves
yum install cloudera-manager-daemons cloudera-manager-agent

4. CM 高可用(Haproxy + NFS)

# Haproxy 负载均衡
cat >> /etc/haproxy/haproxy.cfg <<'EOF'
listen cmf :7180
    mode tcp
    option tcplog
    server cmfhttp1 master-1:7180 check
    server cmfhttp2 master-2:7180 check
EOF

# NFS 共享 CM 元数据
mkdir -p /media/cloudera-scm-server
echo "/media/cloudera-scm-server master-1(rw,sync) master-2(rw,sync)" >> /etc/exports

磁盘扩容实战

在线扩容(不停机)

# 安装扩容工具
yum install -y cloud-utils-growpart

# 扩容分区
growpart /dev/vdb 1
resize2fs /dev/vdb1

# 验证
df -Th

HDFS 数据平衡

节点间平衡(Balancer)

# CM 配置参数
Balancing Threshold: 5
Rebalancing Policy: DataNode
dfs.datanode.balance.bandwidthPerSec: 10MB/s

迭代执行流程:

  1. 计算集群使用量均值,与阈值比较
  2. 划分 4 类节点(高负载/低负载)
  3. 构造 Dispatcher,初始化 mover/dispatcher 线程池
  4. 确认候选块,移动到目标节点

节点内多磁盘平衡(DiskBalancer)

<property>
  <name>dfs.disk.balancer.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.disk.balancer.max.disk.throughputInMBperSec</name>
  <value>50</value>
</property>
# 生成平衡计划
hdfs diskbalancer -plan hadoop1

# 执行
hdfs diskbalancer -execute /system/diskbalancer/hadoop1.plan.json

# 查询状态
hdfs diskbalancer -query hadoop1

关键配置项

配置说明建议值
Balancing Threshold平衡阈值5%
bandwidthPerSec平衡带宽10MB/s
moverThreads移动线程池1000
dispatcherThreads调度线程池200
max concurrent moves并发移动数50

复盘

问题根因:生产环境磁盘增长超预期,未提前规划扩容窗口

改进措施

本文首发于 wr.mrchi.cn,转载请注明出处。



Previous Post
AWS S3 挂载 EC2 实战:s3fs-fuse 与实时同步监控
Next Post
ELK 7.13.1 日志系统搭建:Filebeat + Kafka + Logstash + Elasticsearch 采集实战