Skip to content
云间札记
Go back

OP Stack L2 测试网分叉处理:数据同步与流量切换实战

Updated:

背景

K2 测试网运行过程中,主节点(Sequencer)与副本节点(Replica)出现区块高度差异过大,无法正常同步。需要紧急处理保证服务可用性。

环境信息

节点IP角色
Sequencer10.7.113.106主节点(排序器)
Replica 110.7.66.87副本节点
Replica 210.7.102.137副本节点 + Nginx
ETH Node10.7.95.144L1 自建节点
Explorer10.7.120.156浏览器

现象

问题排查

1. KeyStore 连接问题

2. P2P 同步异常

紧急处理方案

阶段一:Nginx 流量切换(5 分钟)

目标:将所有 RPC 流量切换到 Sequencer 节点,保证用户访问一致性

# 登录 Nginx 节点 (10.7.102.137)
vim /usr/local/openresty/nginx/conf/vhosts/nal/testnet-rpc.nal.network.conf

修改前配置:

upstream testnet-rpc {
    # 负载均衡到副本节点
    server 10.7.66.87:8545 weight=50 max_fails=2 fail_timeout=30s;
    server 10.7.102.137:8545 weight=50 max_fails=2 fail_timeout=30s;
}

修改后配置:

upstream testnet-rpc {
    # 注释掉副本节点,仅指向 Sequencer
    #server 10.7.66.87:8545 weight=50 max_fails=2 fail_timeout=30s;
    #server 10.7.102.137:8545 weight=50 max_fails=2 fail_timeout=30s;

    # 仅指向主节点
    server 10.7.113.106:8545;
}

server {
    listen 80;
    server_name testnet-rpc.nal.network;
    index index.html index.htm;
    ssl_session_timeout 5m;
    ssl_protocols TLSv1.1 TLSv1.2 TLSv1.3;

    location / {
        proxy_pass http://testnet-rpc;
        proxy_http_version    1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Port $server_port;
        error_log /data/wwwlogs/testnet-rpc-nal-network/http_error.log error;
        access_log /data/wwwlogs/testnet-rpc-nal-network/http_access.log access;
    }
}
# 检查配置并重启
/usr/local/openresty/nginx/sbin/nginx -t
/usr/local/openresty/nginx/sbin/nginx -s reload

阶段二:数据同步(30 分钟)

目标:从 Sequencer 同步完整数据到 Replica 节点

1. Sequencer 节点准备数据

# 登录 Sequencer (10.7.113.106)
cd /data/deploy/op-geth/datadir/

# 创建同步目录
mkdir -pv /data/nfs/op/20240813/datadir

# 复制完整数据(chaindata + lightchaindata)
cp -r /data/deploy/op-geth/datadir/* /data/nfs/op/20240813/datadir/

# 验证数据完整性
ls -lh /data/nfs/op/20240813/datadir/geth/
du -sh /data/nfs/op/20240813/datadir/geth/chaindata

2. Replica 节点重建

Replica 1 (10.7.66.87):

# 1. 停止服务
supervisorctl stop test-replica1-op-geth
supervisorctl stop test-replica1-op-node

# 2. 备份旧数据(保留现场)
cd /data/deploy/op-geth/
mv datadir datadir_bad_0813

# 3. 从 NFS 复制主节点数据
cp -r /data/nfs/op/20240813/datadir ./

# 4. 验证数据
du -sh datadir/geth/chaindata
ls datadir/geth/

# 5. 启动服务
supervisorctl start test-replica1-op-geth
supervisorctl start test-replica1-op-node

# 6. 查看同步状态
supervisorctl status
tail -f /data/logs/op-node/test-replica1-op-node-out.log

Replica 2 (10.7.102.137):

# 同上操作
supervisorctl stop test-replica2-op-geth
supervisorctl stop test-replica2-op-node

cd /data/deploy/op-geth/
mv datadir datadir_bad_0813
cp -r /data/nfs/op/20240813/datadir ./

supervisorctl start test-replica2-op-geth
supervisorctl start test-replica2-op-node

阶段三:恢复负载均衡(验证后)

待副本节点同步完成后,恢复 Nginx 负载均衡配置:

upstream testnet-rpc {
    server 10.7.66.87:8545 weight=50 max_fails=2 fail_timeout=30s;
    server 10.7.102.137:8545 weight=50 max_fails=2 fail_timeout=30s;
    server 10.7.113.106:8545 backup;  # Sequencer 作为备用
}

验证命令

检查区块高度

# 查询 Sequencer 高度
curl -s -H "Content-Type: application/json" -X POST   --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'   http://10.7.113.106:8545 | jq .result

# 查询 Replica 1 高度
curl -s -H "Content-Type: application/json" -X POST   --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'   http://10.7.66.87:8545 | jq .result

# 查询 Replica 2 高度
curl -s -H "Content-Type: application/json" -X POST   --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'   http://10.7.102.137:8545 | jq .result

检查 P2P 连接

# Replica 节点日志
grep -r "connected to peer" /data/logs/op-node/test-replica1-op-node-out.log

# 预期输出
t=2024-08-13T02:39:40+0000 lvl=info msg="connected to peer"   peer=16Uiu2HAmK8oVSbeJjgtfUAVEVfC75GFnSVgMZQ5VfZgm6NX8XWE3   addr=/ip4/10.7.113.106/tcp/9003

复盘

问题根因

  1. 网络分区导致 P2P 连接中断
  2. 副本节点长时间无法同步,区块差距累积
  3. 无自动告警机制,发现时差距已过大

解决

改进措施

本文首发于 wr.mrchi.cn,转载请注明出处。



Previous Post
以太坊主网自建节点部署:Geth 执行客户端 + Prysm 共识客户端
Next Post
OP Stack L2 Rollup 主网部署:从合约部署到多节点同步