Placement 深度解析:OpenStack 资源追踪与调度优化
为什么需要 Placement
在 Placement 出现之前(Pike 版本独立),Nova Scheduler 直接查询每个 nova-compute 节点的资源状态,存在以下问题:
- 调度时需要查询所有 compute 节点,性能差
- 资源类型固定(CPU/RAM/Disk),无法扩展
- 无法精确追踪 NUMA、GPU、SR-IOV 等复杂资源
- 资源分配没有原子性保证,可能超分
Placement 通过集中式资源库存管理解决了这些问题。
核心概念
资源提供者(Resource Provider)
1 2 3 4 5 6 7 8 9 10 11
| ResourceProvider(资源提供者) ├── 代表一个可以提供资源的实体 ├── 可以是:计算节点、NUMA 节点、GPU、SR-IOV PF └── 支持树形结构(嵌套资源提供者)
示例: compute-node-1(根 RP) ├── compute-node-1_NUMA0(NUMA 节点 RP) │ ├── compute-node-1_NUMA0_PF0(SR-IOV 物理函数 RP) │ └── compute-node-1_NUMA0_GPU0(GPU RP) └── compute-node-1_NUMA1(NUMA 节点 RP)
|
资源类(Resource Class)
1 2 3 4 5 6 7 8 9 10 11 12 13
| VCPU MEMORY_MB DISK_GB VGPU SRIOV_NET_VF PCI_DEVICE NUMA_TOPOLOGY FPGA
CUSTOM_BAREMETAL_SMALL CUSTOM_LICENSE_WINDOWS
|
库存(Inventory)
1 2 3 4 5 6 7 8 9 10 11 12
| 每个 ResourceProvider 对每种 ResourceClass 维护库存:
Inventory { resource_provider: compute-node-1 resource_class: VCPU total: 32 # 物理 CPU 总数 reserved: 2 # 系统保留 min_unit: 1 # 最小分配单位 max_unit: 32 # 最大分配单位 step_size: 1 # 步长 allocation_ratio: 16.0 # 超分比(32 * 16 = 512 可分配) }
|
分配(Allocation)
1 2 3 4 5 6
| Allocation { resource_provider: compute-node-1 resource_class: VCPU consumer_uuid: <instance-uuid> used: 4 }
|
数据模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| resource_providers (uuid, name, generation, parent_provider_id)
inventories (resource_provider_id, resource_class_id, total, reserved, min_unit, max_unit, step_size, allocation_ratio)
allocations (resource_provider_id, resource_class_id, consumer_id, used)
traits (name) resource_provider_traits (resource_provider_id, trait_id)
|
调度流程中的 Placement
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
def select_destinations(self, context, spec_obj, ...): request_groups = spec_obj.requested_resources
alloc_reqs, provider_summaries, allocation_request_version = \ self.placement_client.get_allocation_candidates( context, request_groups )
hosts = self._get_filtered_hosts(hosts, spec_obj, ...)
weighed_hosts = self._get_weighed_hosts(hosts, spec_obj)
self.placement_client.claim_resources( context, instance_uuid, alloc_req )
|
Placement API 查询示例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| GET /placement/allocation_candidates? resources=VCPU:4,MEMORY_MB:8192,DISK_GB:50 &required=HW_CPU_X86_AVX2 &forbidden=CUSTOM_MAINTENANCE &group_policy=isolate
{ "allocation_requests": [ { "allocations": { "compute-node-1-uuid": { "resources": {"VCPU": 4, "MEMORY_MB": 8192, "DISK_GB": 50} } } } ], "provider_summaries": { "compute-node-1-uuid": { "resources": { "VCPU": {"capacity": 512, "used": 100}, "MEMORY_MB": {"capacity": 524288, "used": 65536} } } } }
|
特征(Trait)机制
Trait 是资源提供者的能力标签,用于精细化调度:
1 2 3 4 5 6 7 8 9 10 11 12 13
| openstack trait list | grep HW_CPU
openstack resource provider trait set \ --trait HW_CPU_X86_AVX2 \ --trait HW_CPU_X86_AVX512F \ --trait STORAGE_DISK_SSD \ <compute-node-uuid>
openstack flavor set gpu-flavor \ --property trait:CUSTOM_GPU_NVIDIA_A100=required
|
嵌套资源提供者(Nested RP)
用于精确建模复杂硬件资源:
1 2 3 4 5 6 7 8 9 10 11 12
| compute-node-1 ├── VCPU: 64, MEMORY_MB: 262144 │ ├── NUMA_NODE_0 │ ├── VCPU: 32, MEMORY_MB: 131072 │ └── GPU_0 │ └── VGPU: 8 (每个 GPU 切分 8 个 vGPU) │ └── NUMA_NODE_1 ├── VCPU: 32, MEMORY_MB: 131072 └── SRIOV_PF_0 └── SRIOV_NET_VF: 16
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
def update_provider_tree(self, provider_tree, nodename, ...): for numa_node in self._get_numa_topology(): provider_tree.new_child( name=f'{nodename}_NUMA{numa_node.id}', parent=nodename, uuid=generate_uuid() ) provider_tree.update_inventory( f'{nodename}_NUMA{numa_node.id}', { 'VCPU': {'total': numa_node.cpus, ...}, 'MEMORY_MB': {'total': numa_node.memory_mb, ...} } )
|
源码关键路径
1 2 3 4 5 6 7 8 9 10 11 12 13
| placement/ ├── api/ │ └── handlers/ │ ├── resource_providers.py │ ├── inventories.py │ ├── allocations.py │ └── allocation_candidates.py ├── db/ │ └── sqlalchemy/ │ └── api.py └── objects/ ├── resource_provider.py └── allocation_candidate.py
|
生产运维
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| openstack resource provider list openstack resource provider show <uuid> openstack resource provider inventory list <uuid> openstack resource provider usage show <uuid>
openstack resource provider allocation show <instance-uuid>
nova-manage placement heal_allocations --verbose
nova-manage placement sync_aggregates
|
常见问题
| 问题 |
原因 |
解决 |
| 调度失败 No valid host |
Placement 中资源不足或 Trait 不匹配 |
检查 allocation_candidates API 返回 |
| 资源泄漏 |
VM 删除后 Allocation 未清理 |
运行 heal_allocations |
| 超分比不生效 |
Inventory 的 allocation_ratio 未更新 |
重启 nova-compute 触发上报 |