OpenStack 生产级高可用部署架构 整体架构 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 ┌─────────────────────────────────────────┐ │ 负载均衡层(HAProxy) │ │ VIP: 10.0.0.10(Keepalived) │ └──────────────┬──────────────────────────┘ │ ┌────────────────────────┼────────────────────────┐ │ │ │ ┌─────────▼──────┐ ┌──────────▼─────┐ ┌──────────▼─────┐ │ 控制节点 1 │ │ 控制节点 2 │ │ 控制节点 3 │ │ controller-1 │ │ controller-2 │ │ controller-3 │ │ │ │ │ │ │ │ keystone-api │ │ keystone-api │ │ keystone-api │ │ nova-api │ │ nova-api │ │ nova-api │ │ neutron-server│ │ neutron-server│ │ neutron-server│ │ cinder-api │ │ cinder-api │ │ cinder-api │ │ glance-api │ │ glance-api │ │ glance-api │ │ heat-api │ │ heat-api │ │ heat-api │ │ horizon │ │ horizon │ │ horizon │ └────────┬───────┘ └───────┬────────┘ └───────┬────────┘ │ │ │ ┌────────▼──────────────────────▼────────────────────────▼────────┐ │ MySQL Galera Cluster │ │ controller-1 controller-2 controller-3 │ │ (同步多主复制,任意节点可读写) │ └─────────────────────────────────────────────────────────────────┘ │ │ │ ┌────────▼──────────────────────▼────────────────────────▼────────┐ │ RabbitMQ Cluster │ │ controller-1 controller-2 controller-3 │ │ (镜像队列,消息持久化) │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ 计算节点(N 台) │ │ compute-1 compute-2 compute-3 ... compute-N │ │ nova-compute neutron-openvswitch-agent libvirt │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ 存储层 │ │ Ceph Cluster(MON × 3 + OSD × N) │ │ volumes pool / images pool / vms pool │ └─────────────────────────────────────────────────────────────────┘
控制节点 HA HAProxy 配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 global log /dev/log local0 maxconn 4096 defaults log global mode http option httplog timeout connect 5s timeout client 50s timeout server 50s frontend keystone_public bind *:5000 default_backend keystone_public_back backend keystone_public_back balance roundrobin option httpchk GET /v3 server controller-1 10.0.0.1:5000 check inter 2s server controller-2 10.0.0.2:5000 check inter 2s server controller-3 10.0.0.3:5000 check inter 2s frontend nova_api bind *:8774 default_backend nova_api_back backend nova_api_back balance roundrobin option httpchk GET / server controller-1 10.0.0.1:8774 check inter 2s server controller-2 10.0.0.2:8774 check inter 2s server controller-3 10.0.0.3:8774 check inter 2s frontend rabbitmq bind *:5672 mode tcp default_backend rabbitmq_back backend rabbitmq_back mode tcp balance roundrobin server controller-1 10.0.0.1:5672 check inter 2s server controller-2 10.0.0.2:5672 check inter 2s server controller-3 10.0.0.3:5672 check inter 2s
Keepalived VIP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 vrrp_script chk_haproxy { script "killall -0 haproxy" interval 2 weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 101 advert_int 1 authentication { auth_type PASS auth_pass openstack } virtual_ipaddress { 10.0 .0.10 /24 } track_script { chk_haproxy } }
MySQL Galera 集群 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [mysqld] binlog_format = ROWdefault-storage-engine = innodbinnodb_autoinc_lock_mode = 2 bind-address = 0.0 .0.0 wsrep_on = ON wsrep_provider = /usr/lib/galera/libgalera_smm.sowsrep_cluster_name = "openstack_galera" wsrep_cluster_address = "gcomm://10.0.0.1,10.0.0.2,10.0.0.3" wsrep_node_address = "10.0.0.1" wsrep_node_name = "controller-1" wsrep_sst_method = rsyncinnodb_buffer_pool_size = 8 Ginnodb_log_file_size = 512 Mmax_connections = 1000
数据库连接池(oslo.db) 1 2 3 4 5 6 7 8 9 10 [database] connection = mysql+pymysql://nova:password@10.0 .0.10 /novamax_pool_size = 30 max_overflow = 60 pool_timeout = 30 connection_recycle_time = 600
RabbitMQ 集群 1 2 3 4 5 6 7 8 9 10 11 rabbitmqctl stop_app rabbitmqctl join_cluster rabbit@controller-1 rabbitmqctl start_app rabbitmqctl set_policy ha-all "^" \ '{"ha-mode":"all","ha-sync-mode":"automatic"}' rabbitmqctl cluster_status
1 2 3 [DEFAULT] transport_url = rabbit://openstack:password@10.0 .0.1 :5672 ,openstack:password@10.0 .0.2 :5672 ,openstack:password@10.0 .0.3 :5672 /
计算节点高可用 虚拟机疏散(Evacuate) 当计算节点宕机时,自动将其上的 VM 迁移到其他节点:
1 2 3 4 5 6 7 nova host-evacuate compute-1
自动疏散脚本 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import openstackimport time conn = openstack.connect(cloud='mycloud' )def check_and_evacuate (): services = conn.compute.services() for svc in services: if svc.binary == 'nova-compute' and svc.state == 'down' : time.sleep(30 ) svc_recheck = conn.compute.get_service(svc.id ) if svc_recheck.state == 'down' : print (f"节点 {svc.host} 宕机,开始疏散..." ) servers = conn.compute.servers( all_projects=True , host=svc.host ) for server in servers: conn.compute.evacuate_server(server.id ) print (f" 疏散 VM: {server.name} " )
跨 AZ 灾备架构 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Region : RegionOne │ ├── AZ : az1(机房 A) │ ├── controller-1,2,3 │ ├── compute-1 ~ compute-20 │ └── Ceph Cluster A │ └── AZ : az2(机房 B) ├── compute-21 ~ compute-40 └── Ceph Cluster B(与 A 异步复制) 跨 AZ 网络: - 控制平面:专线互联(低延迟) - 存储复制:Ceph RBD Mirror(异步) - 虚拟机网络:VXLAN 跨 AZ 延伸
Ceph RBD 跨 AZ 镜像 1 2 3 4 5 6 7 8 9 rbd mirror pool enable volumes image rbd mirror image enable volumes/volume-xxx rbd mirror pool peer bootstrap create volumes > bootstrap-token rbd mirror pool peer bootstrap import volumes bootstrap-token
关键配置检查清单 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 openstack compute service list openstack network agent list openstack volume service list mysql -e "SHOW STATUS LIKE 'wsrep_cluster_size';" rabbitmqctl cluster_status | grep running_nodesecho "show stat" | socat stdio /var/run/haproxy/admin.sock | \ cut -d',' -f1,2,18 | grep -v "^#" ceph health detail ceph osd stat
容量规划参考
规模
控制节点
计算节点
存储
小型(< 100 VM)
3 × 8C16G
10 × 32C256G
Ceph 3 节点
中型(< 1000 VM)
3 × 16C32G
50 × 64C512G
Ceph 10+ 节点
大型(> 1000 VM)
3 × 32C64G + Cell v2
200+ × 64C512G
Ceph 30+ 节点
控制节点瓶颈通常在数据库,建议单独部署 MySQL 集群并配置 ProxySQL 做读写分离。