Heat 深度解析:OpenStack 编排服务与基础设施即代码

Heat 深度解析:OpenStack 编排服务与基础设施即代码

定位与职责

Heat 是 OpenStack 的编排服务,实现”基础设施即代码(IaC)”:

  • 通过模板描述整套基础设施(VM、网络、存储、安全组)
  • 一键创建/更新/删除整个应用栈(Stack)
  • 支持自动扩缩容(Auto Scaling)
  • 支持条件判断、参数化、输出变量
  • 兼容 AWS CloudFormation 模板语法

架构总览

1
2
3
4
5
6
7
8
9
10
11
heat-api(REST API)
│ RPC

heat-api-cfn(CloudFormation 兼容 API,可选)


heat-engine(核心引擎)
├── 模板解析
├── 依赖图构建(DAG)
├── 资源并行创建
└── 调用各 OpenStack 服务 API

HOT 模板语法

HOT(Heat Orchestration Template)是 Heat 的原生模板格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
heat_template_version: 2021-04-16

description: 一个完整的 Web 应用栈示例

# 参数(部署时传入)
parameters:
image_id:
type: string
description: 虚拟机镜像 ID
default: ubuntu-22.04

flavor:
type: string
description: 虚拟机规格
default: m1.small
constraints:
- allowed_values: [m1.small, m1.medium, m1.large]

key_name:
type: string
description: SSH 密钥对名称

db_password:
type: string
description: 数据库密码
hidden: true # 不在输出中显示

# 条件
conditions:
is_production:
equals: [{get_param: flavor}, m1.large]

# 资源定义
resources:
# 安全组
web_security_group:
type: OS::Neutron::SecurityGroup
properties:
name: web-sg
rules:
- protocol: tcp
port_range_min: 80
port_range_max: 80
remote_ip_prefix: 0.0.0.0/0
- protocol: tcp
port_range_min: 443
port_range_max: 443
remote_ip_prefix: 0.0.0.0/0
- protocol: tcp
port_range_min: 22
port_range_max: 22
remote_ip_prefix: 0.0.0.0/0

# 网络
private_net:
type: OS::Neutron::Net
properties:
name: private-net

private_subnet:
type: OS::Neutron::Subnet
properties:
network_id: {get_resource: private_net}
cidr: 192.168.100.0/24
dns_nameservers: [8.8.8.8]

# 路由器
router:
type: OS::Neutron::Router
properties:
external_gateway_info:
network: public

router_interface:
type: OS::Neutron::RouterInterface
properties:
router_id: {get_resource: router}
subnet_id: {get_resource: private_subnet}

# 虚拟机
web_server:
type: OS::Nova::Server
properties:
name: web-server
image: {get_param: image_id}
flavor: {get_param: flavor}
key_name: {get_param: key_name}
security_groups:
- {get_resource: web_security_group}
networks:
- network: {get_resource: private_net}
user_data_format: RAW
user_data: |
#!/bin/bash
apt-get update
apt-get install -y nginx
systemctl start nginx

# 浮动 IP
floating_ip:
type: OS::Neutron::FloatingIP
properties:
floating_network: public

floating_ip_assoc:
type: OS::Neutron::FloatingIPAssociation
properties:
floatingip_id: {get_resource: floating_ip}
port_id: {get_attr: [web_server, addresses, private-net, 0, port]}

# 云硬盘
data_volume:
type: OS::Cinder::Volume
properties:
size: 100
volume_type: ssd

volume_attachment:
type: OS::Cinder::VolumeAttachment
properties:
volume_id: {get_resource: data_volume}
instance_uuid: {get_resource: web_server}

# 输出
outputs:
server_ip:
description: 服务器公网 IP
value: {get_attr: [floating_ip, floating_ip_address]}

server_private_ip:
description: 服务器内网 IP
value: {get_attr: [web_server, first_address]}

内置函数

函数 用途 示例
get_param 获取参数值 {get_param: flavor}
get_resource 获取资源 ID {get_resource: my_server}
get_attr 获取资源属性 {get_attr: [server, first_address]}
str_replace 字符串替换 动态生成 user_data
list_join 列表拼接 {list_join: [',', [a, b, c]]}
if 条件判断 {if: [is_prod, large, small]}
repeat 循环生成资源 批量创建相同资源

自动扩缩容(Auto Scaling)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
resources:
# 扩缩容组
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 2
max_size: 10
resource:
type: OS::Nova::Server
properties:
image: {get_param: image_id}
flavor: m1.small
# ...

# 扩容策略(每次增加 1 个)
scale_up_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1

# 缩容策略(每次减少 1 个)
scale_down_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: -1

# CPU 告警触发扩容(配合 Ceilometer/Aodh)
cpu_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
metric: cpu_util
aggregation_method: mean
threshold: 80
comparison_operator: gt
evaluation_periods: 3
alarm_actions:
- {get_attr: [scale_up_policy, signal_url]}

cpu_alarm_low:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
metric: cpu_util
aggregation_method: mean
threshold: 20
comparison_operator: lt
evaluation_periods: 5
alarm_actions:
- {get_attr: [scale_down_policy, signal_url]}

依赖图与并行执行

Heat 引擎会分析资源间的依赖关系,构建 DAG(有向无环图),并行创建无依赖关系的资源:

1
2
3
4
5
6
7
8
9
private_net ──────────────────────────────► web_server
private_subnet ──► router_interface ──────► web_server
router ──────────► router_interface │
web_security_group ────────────────────────┘


floating_ip_assoc

floating_ip ──────────────
1
2
3
4
5
6
7
8
9
# heat/engine/scheduler.py

class DependencyTaskGroup:
def __call__(self):
# 找到所有没有未完成依赖的任务
ready = [t for t in self.tasks if self._is_ready(t)]
# 并行执行
with ThreadPoolExecutor() as executor:
futures = [executor.submit(task) for task in ready]

Stack 生命周期

1
2
3
4
5
6
7
8
CREATE_IN_PROGRESS → CREATE_COMPLETE
→ CREATE_FAILED

UPDATE_IN_PROGRESS → UPDATE_COMPLETE
→ UPDATE_FAILED(自动回滚)

DELETE_IN_PROGRESS → DELETE_COMPLETE
→ DELETE_FAILED

更新策略

1
2
3
4
5
6
7
8
# 预览更新(不实际执行)
openstack stack update --dry-run -t template.yaml mystack

# 更新 Stack(Heat 自动计算差异,只更新变化的资源)
openstack stack update -t template.yaml mystack

# 回滚到上一个版本
openstack stack cancel mystack # 触发回滚

源码关键路径

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
heat/
├── api/
│ └── openstack/v1/ # REST API
├── engine/
│ ├── service.py # heat-engine 主服务
│ ├── stack.py # Stack 对象(核心)
│ ├── resource.py # Resource 基类
│ ├── scheduler.py # 并行调度器
│ └── resources/ # 各类资源实现
│ ├── openstack/
│ │ ├── nova/server.py # OS::Nova::Server
│ │ ├── neutron/net.py # OS::Neutron::Net
│ │ └── cinder/volume.py # OS::Cinder::Volume
│ └── aws/ # CloudFormation 兼容资源
└── common/
└── template.py # 模板解析

生产使用技巧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 创建 Stack
openstack stack create -t webapp.yaml \
--parameter image_id=ubuntu-22.04 \
--parameter key_name=mykey \
mystack

# 查看 Stack 事件(排查失败原因)
openstack stack event list mystack --nested-depth 3

# 查看资源状态
openstack stack resource list mystack

# 输出变量
openstack stack output show mystack server_ip

# 模板验证
openstack orchestration template validate -t template.yaml