77 KiB
Executable File
Network isolation and network_data.yaml
By default all openstack services will all run on the provisioning network, to separate out the various service types to their own networks (recommended), Openstack introduces the concept of network isolation. To enable network isolation the deployment command must include the following templates. These templates require no modification.
/usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml/usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml
To assign IPs/VLANs to the various operational networks edit the network_data.yaml file.
- The overcloud installer command accepts a parameter
--networks-fileto reference thenetwork_data.yamlfor definitions of the networks, their ip allocation pools and vlan ID. - The default reference template is located @
/usr/share/openstack-tripleo-heat-templates/network_data.yaml, if the installer command is run without the--networks-fileparameter then the IP/VLAN scheme in this file will be used.
Create the configuration, allocate the IP ranges and VLANs for the intended design.
- This template references the IP ranges and VLANs for the University deployment. Whilst all IPs will be statically defined the installer requires the allocation_pools entries (used in flavours method).
- There is a default/standard storage management network included for Ceph integration, this is not used in an external Ceph configuration but required for the installer to complete.
- An additional non-standard network is included for Compute Instance-HA, this is IPMI network.
mkdir /home/stack/templates
nano -cw /home/stack/templates/network_data.yaml
- name: Storage
enabled: true
vip: true
vlan: 13
name_lower: storage
ip_subnet: '10.122.10.0/24'
allocation_pools: [{'start': '10.122.10.30', 'end': '10.122.10.249'}]
mtu: 1500
- name: StorageMgmt
name_lower: storage_mgmt
enabled: true
vip: true
vlan: 14
ip_subnet: '10.122.12.0/24'
allocation_pools: [{'start': '10.122.12.30', 'end': '10.122.12.249'}]
mtu: 1500
- name: InternalApi
name_lower: internal_api
enabled: true
vip: true
vlan: 12
ip_subnet: '10.122.6.0/24'
allocation_pools: [{'start': '10.122.6.30', 'end': '10.122.6.249'}]
mtu: 1500
- name: Tenant
name_lower: tenant
enabled: true
vip: false
vlan: 11
ip_subnet: '10.122.8.0/24'
allocation_pools: [{'start': '10.122.8.30', 'end': '10.122.8.249'}]
mtu: 1500
- name: External
name_lower: external
vip: true
vlan: 1214
ip_subnet: '10.121.4.0/24'
gateway_ip: '10.121.4.1'
allocation_pools: [{'start': '10.121.4.30', 'end': '10.121.4.249'}]
mtu: 1500
- name: IpmiNetwork
name_lower: ipmi_network
vip: false
vlan: 2
ip_subnet: '10.122.1.0/24'
allocation_pools: [{'start': '10.122.1.80', 'end': '10.122.1.249'}]
mtu: 1500
Create custom roles
The following custom roles will be created.
- Controller role without the networker functions which are to be provided by the networker role.
- Controller role with additional network for IPMI fencing.
- Compute role for server hardware A. (this will include Instance-HA capability)
- Compute role for server hardware B. (this will include Instance-HA capability)
Controller role
Find services required for a controller role without networker services. The ControllerOpenstack.yaml role contains only the controller core services, thus missing database and messenger/queue services, this is the base role to build upon.
grep 'OS::TripleO::Services::' /usr/share/openstack-tripleo-heat-templates/roles/ControllerOpenstack.yaml > ~/ControllerOpenstack.txt ;\
grep 'OS::TripleO::Services::' /usr/share/openstack-tripleo-heat-templates/roles/Database.yaml > ~/Database.txt ;\
grep 'OS::TripleO::Services::' /usr/share/openstack-tripleo-heat-templates/roles/Messaging.yaml > ~/Messaging.txt ;\
grep 'OS::TripleO::Services::' /usr/share/openstack-tripleo-heat-templates/roles/Networker.yaml > ~/Networker.txt ;\
grep 'OS::TripleO::Services::' /usr/share/openstack-tripleo-heat-templates/roles/ControllerNoCeph.yaml > ~/ControllerNoCeph.txt
Find services required for Database, these are to be added to the custom Controller role.
diff <(sort ~/ControllerOpenstack.txt) <(sort ~/Database.txt) | grep \>
> - OS::TripleO::Services::Clustercheck
> - OS::TripleO::Services::MySQL
Find services required for Messaging, these are to be added to the custom Controller role.
diff <(sort ~/ControllerOpenstack.txt) <(sort ~/Messaging.txt) | grep \>
> - OS::TripleO::Services::OsloMessagingNotify
> - OS::TripleO::Services::OsloMessagingRpc
Find services required for Ceph storage, these are to be added to the custom Controller role forCeph deployments (specifically External Ceph integration). (NOTE: ControllerOpenstack.txt and ControllerNoCeph.txt both contain Networker services)
diff <(sort ~/ControllerNoCeph.txt) <(sort ~/ControllerOpenstack.txt) | grep \>
> - OS::TripleO::Services::CephGrafana
> - OS::TripleO::Services::CephMds
> - OS::TripleO::Services::CephMgr
> - OS::TripleO::Services::CephMon
> - OS::TripleO::Services::CephRbdMirror
> - OS::TripleO::Services::CephRgw
We keep/add the client services to use external Ceph. With RHOSP 16 when using external Ceph all of the Ceph services are still required, not just the following client services.
< - OS::TripleO::Services::CephClient
< - OS::TripleO::Services::CephExternal
Find services required for Networker, these are to be removed from the custom Controller role, if you are using ControllerOpenstack.yaml as the base template these will not require removal (they are not present).
diff <(sort ~/ControllerOpenstack.txt) <(sort ~/Networker.txt) | grep \>
> - OS::TripleO::Services::IronicNeutronAgent
> - OS::TripleO::Services::NeutronDhcpAgent
> - OS::TripleO::Services::NeutronL2gwAgent
> - OS::TripleO::Services::NeutronL3Agent
> - OS::TripleO::Services::NeutronMetadataAgent
> - OS::TripleO::Services::NeutronML2FujitsuCfab
> - OS::TripleO::Services::NeutronML2FujitsuFossw
> - OS::TripleO::Services::NeutronOvsAgent
> - OS::TripleO::Services::NeutronVppAgent
> - OS::TripleO::Services::OctaviaHealthManager
> - OS::TripleO::Services::OctaviaHousekeeping
> - OS::TripleO::Services::OctaviaWorker
Create a custom roles directory, copy the default roles to the directory, these will be used as a base for generating the customised roles.
mkdir /home/stack/templates/roles
cp -r /usr/share/openstack-tripleo-heat-templates/roles /home/stack/templates
mv /home/stack/templates/roles/Controller.yaml /home/stack/templates/roles/Controller.yaml.orig
cp /home/stack/templates/roles/ControllerOpenstack.yaml /home/stack/templates/roles/Controller.yaml
Create the new controller role with the services that are to be added/removed.
The 'name:' key (Controller) is referenced in the scheduler_hints_env.yaml by the entry '<role name>SchedulerHints' (ControllerSchedulerHints), this binds the role to the host.
# change the 'Role:' description, the 'name:' and append/remove services listed
nano -cw /home/stack/templates/roles/Controller.yaml
###############################################################################
# Role: ControllerNoNetworkExtCeph #
###############################################################################
- name: Controller
description: |
Controller role that does not contain the networking
components.
# add to role
> - OS::TripleO::Services::Clustercheck
> - OS::TripleO::Services::MySQL
# add to role
> - OS::TripleO::Services::OsloMessagingNotify
> - OS::TripleO::Services::OsloMessagingRpc
# check present/add to role
> - OS::TripleO::Services::CephGrafana
> - OS::TripleO::Services::CephMds
> - OS::TripleO::Services::CephMgr
> - OS::TripleO::Services::CephMon
> - OS::TripleO::Services::CephRbdMirror
> - OS::TripleO::Services::CephRgw
# check present/add to role
< - OS::TripleO::Services::CephClient
< - OS::TripleO::Services::CephExternal
# check present/remove from role
> - OS::TripleO::Services::IronicNeutronAgent
> - OS::TripleO::Services::NeutronDhcpAgent
> - OS::TripleO::Services::NeutronL2gwAgent
> - OS::TripleO::Services::NeutronL3Agent
> - OS::TripleO::Services::NeutronMetadataAgent
> - OS::TripleO::Services::NeutronML2FujitsuCfab
> - OS::TripleO::Services::NeutronML2FujitsuFossw
> - OS::TripleO::Services::NeutronOvsAgent
> - OS::TripleO::Services::NeutronVppAgent
> - OS::TripleO::Services::OctaviaHealthManager
> - OS::TripleO::Services::OctaviaHousekeeping
> - OS::TripleO::Services::OctaviaWorker
Compute role
- No customisation of services for the role is required.
- For an instance-HA deployment copy
ComputeInstanceHA.yamlrole tocomputeA.yaml / computeB.yaml. - For a standard deployment copy
Compute.yamlrole tocomputeA.yaml / computeB.yaml. - The 'name:' key (ComputeA) is referenced in the
scheduler_hints_env.yamlby the entry '<role name>SchedulerHints' (ComputeASchedulerHints), this binds the role to the host.
Instance-HA compute role.
- Using the instance-HA compute roles without any of the environment files to enable the capability on the controllers seems to work fine, enabling instance-HA is covered further on in the document.
cp /home/stack/templates/roles/ComputeInstanceHA.yaml /home/stack/templates/roles/ComputeA.yaml
cp /home/stack/templates/roles/ComputeInstanceHA.yaml /home/stack/templates/roles/ComputeB.yaml
# edit the role
# 1) change the 'name:' key to match the scheduler hints
# 2) change the 'HostnameFormatDefault:' key to ensure hostnames do not clash for the compute instances
nano -cw /home/stack/templates/roles/ComputeA.yaml
###############################################################################
# Role: ComputeInstanceHA #
###############################################################################
- name: ComputeA
description: |
Compute Instance HA Node role to be used with -e environments/compute-instanceha.yaml
CountDefault: 1
networks:
InternalApi:
subnet: internal_api_subnet
Tenant:
subnet: tenant_subnet
Storage:
subnet: storage_subnet
#HostnameFormatDefault: '%stackname%-novacomputeiha-%index%'
HostnameFormatDefault: '%stackname%-computeA-%index%'
# edit the role to change the 'name:' key to match the scheduler hints
nano -cw /home/stack/templates/roles/ComputeB.yaml
###############################################################################
# Role: ComputeInstanceHA #
###############################################################################
- name: ComputeB
description: |
Compute Instance HA Node role to be used with -e environments/compute-instanceha.yaml
CountDefault: 1
networks:
InternalApi:
subnet: internal_api_subnet
Tenant:
subnet: tenant_subnet
Storage:
subnet: storage_subnet
#HostnameFormatDefault: '%stackname%-novacomputeiha-%index%'
HostnameFormatDefault: '%stackname%-computeB-%index%'
Vanilla compute role.
- Where you do not want instance-HA. There are some entries in the config files that can be ommited - see the Instance-HA section further on in this document.
cp /home/stack/templates/roles/Compute.yaml /home/stack/templates/roles/ComputeA.yaml
cp /home/stack/templates/roles/Compute.yaml /home/stack/templates/roles/ComputeB.yaml
# edit the role
# 1) change the 'name:' key to match the scheduler hints
# 2) change the 'HostnameFormatDefault:' key to ensure hostnames do not clash for the compute instances
nano -cw /home/stack/templates/roles/ComputeA.yaml
###############################################################################
# Role: Compute #
###############################################################################
- name: ComputeA
description: |
Basic Compute Node role
CountDefault: 1
# Create external Neutron bridge (unset if using ML2/OVS without DVR)
tags:
- external_bridge
networks:
InternalApi:
subnet: internal_api_subnet
Tenant:
subnet: tenant_subnet
Storage:
subnet: storage_subnet
#HostnameFormatDefault: '%stackname%-novacompute-%index%'
HostnameFormatDefault: '%stackname%-computeA-%index%'
# edit the role to change the 'name:' key to match the scheduler hints
nano -cw /home/stack/templates/roles/ComputeB.yaml
###############################################################################
# Role: Compute #
###############################################################################
- name: ComputeB
description: |
Basic Compute Node role
CountDefault: 1
# Create external Neutron bridge (unset if using ML2/OVS without DVR)
tags:
- external_bridge
networks:
InternalApi:
subnet: internal_api_subnet
Tenant:
subnet: tenant_subnet
Storage:
subnet: storage_subnet
#HostnameFormatDefault: '%stackname%-novacompute-%index%'
HostnameFormatDefault: '%stackname%-computeB-%index%'
Create custom roles_data.yaml
Create the new roles_data.yaml with the updated controller and compute roles. This file is simply a concatenation of the role files (that we have just edited).
Note the command openstack overcloud roles generate references parameters Controller Networker ComputeA ComputeB, each of these refers to a roles file such as /home/stack/templates/roles/ComputeB.yaml.
After the roles_data.yaml has been generated it is safe to remove /home/stack/templates/roles, you will likely want to keep it until you get a successful deployment.
# generate '/home/stack/templates/roles_data.yaml'
openstack overcloud roles generate \
--roles-path /home/stack/templates/roles \
-o /home/stack/templates/roles_data.yaml \
Controller Networker ComputeA ComputeB
# you can remove the /home/stack/templates/roles now, it is not required for the deployment command
Edit the new /home/stack/templates/roles_data.yaml to include a new IPMI service network to the controller role, this network will be required for instance-HA later in this document.
- Instance-HA simply detects if an Openstack compute node is dead and migrates the VM instances to another compute node, this is a poor mans version of HA/DRS.
- For instance-HA the controller nodes must be able to communicate with the IPMI interfaces of the compute nodes (to check power status), we add an additional IPMI service network to only the controllers for this purpose.
- The controllers will use the IPMI interface of the compute nodes to assist with reboot and fencing, once a node is fenced (no VMs can be scheduled on the compute node) the 'active' controller node will send IPMI power commands to compute nodes and determine after reboot if they can re-join the cluster and then be un-fenced.
- The entry for the new IPMI network in the role file follows the naming convention defined in the
network_data.yaml. (note the network cannot be named just 'IPMI', this is used as a functional variable in the heat templates and causes non diagnosable issues!)
nano -cw /home/stack/templates/roles_data.yaml
###############################################################################
# Role: ControllerNoNetwork #
###############################################################################
- name: Controller
description: |
Controller role that does not contain the networking
roles.
tags:
- primary
- controller
networks:
External:
subnet: external_subnet
InternalApi:
subnet: internal_api_subnet
Storage:
subnet: storage_subnet
StorageMgmt:
subnet: storage_mgmt_subnet
Tenant:
subnet: tenant_subnet
IpmiNetwork:
subnet: ipmi_network_subnet
default_route_networks: ['External']
Predictive IPs
Using 'controlling node placement' method each node must have and IP for each defined network that it participates in.
Using 'controlling node placement', note that the network_data.yaml still requires IP ranges per defined per network for the installer to run even though IPs are statically assigned, in the 'flavours' method IPs from the various range would be dynamically allocated.
The addition of the ipmi_network for the controller nodes is for VM instance-ha later in this document.
nano -cw /home/stack/templates/predictive_ips.yaml
# There are 24 nodes available for the computeA role, 1 node has a TPMS issue and has not been imported
#ComputeAIPs
#10.122.1.39 38:68:dd:4a:41:48 # compute node cannot boot due to tpms issue
# There are 24 nodes available for the computeB role, 3 nodes are being used whilst we await the Ceph nodes to be delivered
#ComputeBIPs
#10.122.1.55 6c:fe:54:33:4f:3c # temporary ceph1
#10.122.1.56 6c:fe:54:33:55:74 # temporary ceph2
#10.122.1.57 6c:fe:54:33:4b:5c # temporary ceph3
# the IPs listed will cover the broken/repurposed nodes ready to intergrated into the cluster
nano -cw /home/stack/templates/predictive_ips.yaml
parameter_defaults:
ControllerIPs:
ipmi_network:
- 10.122.1.80
- 10.122.1.81
- 10.122.1.82
external:
- 10.121.4.20
- 10.121.4.21
- 10.121.4.22
internal_api:
- 10.122.6.30
- 10.122.6.31
- 10.122.6.32
storage:
- 10.122.10.30
- 10.122.10.31
- 10.122.10.32
tenant:
- 10.122.8.30
- 10.122.8.31
- 10.122.8.32
ctlplane:
- 10.122.0.30
- 10.122.0.31
- 10.122.0.32
NetworkerIPs:
internal_api:
- 10.122.6.40
- 10.122.6.41
tenant:
- 10.122.8.40
- 10.122.8.41
ctlplane:
- 10.122.0.40
- 10.122.0.41
ComputeAIPs:
internal_api:
- 10.122.6.50
- 10.122.6.51
- 10.122.6.52
- 10.122.6.53
- 10.122.6.54
- 10.122.6.55
- 10.122.6.56
- 10.122.6.57
- 10.122.6.58
- 10.122.6.59
- 10.122.6.60
- 10.122.6.61
- 10.122.6.62
- 10.122.6.63
- 10.122.6.64
- 10.122.6.65
- 10.122.6.66
- 10.122.6.67
- 10.122.6.68
- 10.122.6.69
- 10.122.6.70
- 10.122.6.71
- 10.122.6.72
- 10.122.6.73
storage:
- 10.122.10.50
- 10.122.10.51
- 10.122.10.52
- 10.122.10.53
- 10.122.10.54
- 10.122.10.55
- 10.122.10.56
- 10.122.10.57
- 10.122.10.58
- 10.122.10.59
- 10.122.10.60
- 10.122.10.61
- 10.122.10.62
- 10.122.10.63
- 10.122.10.64
- 10.122.10.65
- 10.122.10.66
- 10.122.10.67
- 10.122.10.68
- 10.122.10.69
- 10.122.10.70
- 10.122.10.71
- 10.122.10.72
- 10.122.10.73
tenant:
- 10.122.8.50
- 10.122.8.51
- 10.122.8.52
- 10.122.8.53
- 10.122.8.54
- 10.122.8.55
- 10.122.8.56
- 10.122.8.57
- 10.122.8.58
- 10.122.8.59
- 10.122.8.60
- 10.122.8.61
- 10.122.8.62
- 10.122.8.63
- 10.122.8.64
- 10.122.8.65
- 10.122.8.66
- 10.122.8.67
- 10.122.8.68
- 10.122.8.69
- 10.122.8.70
- 10.122.8.71
- 10.122.8.72
- 10.122.8.73
ctlplane:
- 10.122.0.50
- 10.122.0.51
- 10.122.0.52
- 10.122.0.53
- 10.122.0.54
- 10.122.0.55
- 10.122.0.56
- 10.122.0.57
- 10.122.0.58
- 10.122.0.59
- 10.122.0.60
- 10.122.0.61
- 10.122.0.62
- 10.122.0.63
- 10.122.0.64
- 10.122.0.65
- 10.122.0.66
- 10.122.0.67
- 10.122.0.68
- 10.122.0.69
- 10.122.0.70
- 10.122.0.71
- 10.122.0.72
- 10.122.0.73
ComputeBIPs:
internal_api:
- 10.122.6.80
- 10.122.6.81
- 10.122.6.82
- 10.122.6.83
- 10.122.6.84
- 10.122.6.85
- 10.122.6.86
- 10.122.6.87
- 10.122.6.88
- 10.122.6.89
- 10.122.6.90
- 10.122.6.91
- 10.122.6.92
- 10.122.6.93
- 10.122.6.94
- 10.122.6.95
- 10.122.6.96
- 10.122.6.97
- 10.122.6.98
- 10.122.6.99
- 10.122.6.100
- 10.122.6.101
- 10.122.6.102
- 10.122.6.103
storage:
- 10.122.10.80
- 10.122.10.81
- 10.122.10.82
- 10.122.10.83
- 10.122.10.84
- 10.122.10.85
- 10.122.10.86
- 10.122.10.87
- 10.122.10.88
- 10.122.10.89
- 10.122.10.90
- 10.122.10.91
- 10.122.10.92
- 10.122.10.93
- 10.122.10.94
- 10.122.10.95
- 10.122.10.96
- 10.122.10.97
- 10.122.10.98
- 10.122.10.99
- 10.122.10.100
- 10.122.10.101
- 10.122.10.102
- 10.122.10.103
tenant:
- 10.122.8.80
- 10.122.8.81
- 10.122.8.82
- 10.122.8.83
- 10.122.8.84
- 10.122.8.85
- 10.122.8.86
- 10.122.8.87
- 10.122.8.88
- 10.122.8.89
- 10.122.8.90
- 10.122.8.91
- 10.122.8.92
- 10.122.8.93
- 10.122.8.94
- 10.122.8.95
- 10.122.8.96
- 10.122.8.97
- 10.122.8.98
- 10.122.8.99
- 10.122.8.100
- 10.122.8.101
- 10.122.8.102
- 10.122.8.103
ctlplane:
- 10.122.0.80
- 10.122.0.81
- 10.122.0.82
- 10.122.0.83
- 10.122.0.84
- 10.122.0.85
- 10.122.0.86
- 10.122.0.87
- 10.122.0.88
- 10.122.0.89
- 10.122.0.90
- 10.122.0.91
- 10.122.0.92
- 10.122.0.93
- 10.122.0.94
- 10.122.0.95
- 10.122.0.96
- 10.122.0.97
- 10.122.0.98
- 10.122.0.99
- 10.122.0.100
- 10.122.0.101
- 10.122.0.102
- 10.122.0.103
VIPs
PublicVirtualFixedIPs - very significant for TLS and external access
nano -cw /home/stack/templates/vips.yaml
parameter_defaults:
ControlFixedIPs: [{'ip_address':'10.122.0.14'}]
PublicVirtualFixedIPs: [{'ip_address':'10.121.4.14'}]
InternalApiVirtualFixedIPs: [{'ip_address':'10.122.6.14'}]
RedisVirtualFixedIPs: [{'ip_address':'10.122.6.15'}]
OVNDBsVirtualFixedIPs: [{'ip_address':'10.122.6.16'}]
StorageVirtualFixedIPs: [{'ip_address':'10.122.10.14'}]
Scheduler hints
Using 'controlling node placement' method the various node type 'counts' must match the number servers, there must be >= IPs available for each server in the predictive_ips.yaml.
# view the capabilities key/value pair for a node on the undercloud
source ~/stackrc
openstack baremetal node show osctl0 -f json -c properties | jq -r .properties.capabilities
node:controller-0,profile:baremetal,cpu_vt:true,cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,cpu_txt:true
# view node name as hint to match, will need this list for the hostname map override
for i in `openstack baremetal node list -f json | jq -r .[].Name` ; do openstack baremetal node show $i -f json -c properties | jq -r .properties.capabilities | awk -F "," '{sub(/node:/,"",$1);print $1}'; done
#controller-0
#controller-1
#controller-2
#networker-0
#networker-1
#computeA-0
#.....
#computeA-22
#computeB-0
#.....
#computeB-19
The Overcloud<role>Flavor: entry relates to the undercloud node 'capabilities' key/value pair profile:baremetal.
The <role>SchedulerHints: entry relates to the undercloud node 'capabilities' key/value pair node:controller-0.
The <role>SchedulerHints: entry inteligently maps the name of the role to be used in the roles_data.yaml using the entry - name: <role>.
CAUTION: If you start renaming your roles the heat templates get in a mess quickly. (the 'composable role' documentation will not take you further than a simple example and not explain that heat templates are full of functional variable names that can clash)
- For example if you name your role
- name: ControllerNoNetworkingNoCeph. - Necessitating
ControllerNoNetworkingNoCephSchedulerHints:andOvercloudControllerNoNetworkingNoCephFlavor:. - The Heat templates may not correctly attribute the role to a node if it gets too complicated.
- The named ComputeA / ComputeB roles although basic do not cause issue.
nano -cw /home/stack/templates/scheduler_hints_env.yaml
parameter_defaults:
ControllerSchedulerHints:
'capabilities:node': 'controller-%index%'
NetworkerSchedulerHints:
'capabilities:node': 'networker-%index%'
ComputeASchedulerHints:
'capabilities:node': 'computeA-%index%'
ComputeBSchedulerHints:
'capabilities:node': 'computeB-%index%'
OvercloudControllerFlavor: baremetal
OvercloudNetworkerFlavor: baremetal
OvercloudComputeAFlavor: baremetal
OvercloudComputeBFlavor: baremetal
ControllerCount: 3
NetworkerCount: 2
ComputeACount: 22
ComputeBCount: 21
# UPDATE THIS 24 + 24 nodes on final build
Node root password set
During the deployment each node will setup the OS then the network then bootstrap all the various service containers.
After network setup stage the undercloud node will push its own public SSH key to the nodes for user ssh heat-admin@<node>.
The hostnames/IPs for the control plane interfaces are writen to the undercloud /etc/hosts.
When building the cluster often it is useful to get onto a node for debug via the out of band management adapter (XClarity remote console for University), this is especially useful when using custom network interfaces (that maybe failing), luckily the password is set before the interface customisation commences.
nano -cw /home/stack/templates/userdata_root_password.yaml
resource_registry:
OS::TripleO::NodeUserData: /usr/share/openstack-tripleo-heat-templates/firstboot/userdata_root_password.yaml
parameter_defaults:
NodeRootPassword: 'Password0'
Update the deployment command to include -e /home/stack/templates/userdata_root_password.yaml.
Custom network interface templates
From checking the undercloud inspection data we worked out the following network scheme will be used in the templates.
Server classA: (controller, networker and computeA)
| mapping | interface | purpose |
|---|---|---|
| nic1 | eno1 | Control Plane - VLAN1 native, IPMI - VLAN2 |
| nic2 | enp0s20f0u1u6 | USB ethernet, likely from the XClarity controller |
| nic3 | ens2f0 | LACP bond, guest/storage |
| nic4 | ens2f1 | LACP bond, guest/storage |
Server classB: (computeB)
| mapping | interface | purpose |
|---|---|---|
| nic1 | enp0s20f0u1u6 | USB ethernet, likely from the XClarity controller |
| nic2 | ens2f0 | Control Plane - VLAN1 native, IPMI - VLAN2 |
| nic3 | ens2f1 | LACP bond, guest/storage |
| nic4 | ens4f0 | LACP bond, guest/storage |
Set the interface name instead of the mapping in the network interface templates, this is to assist with the two different server types and the LACP bond configuration which can be unreliable without carrier signal on both ports.
Custom network interface templates are required for the following reasons in the University deployment.
- IPMI network interface (type VLAN) on the Controller nodes for Instance-HA fencing.
- Specifying the 25G Ethernet interfaces for the LACP bond to host the majority of the VLAN interfaces for the various Openstack networks.
- two classes of server hardware - where the 'nic1, nicN' interface mappings are not consistent for different server hardware interface enumeration.
Render and edit custom network interface templates for os-net-config runtime of node deployment, these will be included in the 'openstack overcloud deploy' command via environment file custom-network-configuration.yaml.
# render all heat templates, you can cherry pick the custom-nics files that would usually be dynamically rendered on deployment
cd /usr/share/openstack-tripleo-heat-templates
./tools/process-templates.py -o /home/stack/openstack-tripleo-heat-templates-rendered -n /home/stack/templates/network_data.yaml -r /home/stack/templates/roles_data.yaml
# create custom nics directory, copy the rendered custom-nics config files into place
# we are use the 'single-nic-vlans' template in the LAB and the 'bond-with-vlans' template as a basis for University
mkdir /home/stack/templates/custom-nics ;\
cp /home/stack/openstack-tripleo-heat-templates-rendered/network/config/bond-with-vlans/controller.yaml /home/stack/templates/custom-nics/ ;\
cp /home/stack/openstack-tripleo-heat-templates-rendered/network/config/bond-with-vlans/networker.yaml /home/stack/templates/custom-nics/ ;\
cp /home/stack/openstack-tripleo-heat-templates-rendered/network/config/bond-with-vlans/computea.yaml /home/stack/templates/custom-nics/computeA.yaml ;\
cp /home/stack/openstack-tripleo-heat-templates-rendered/network/config/bond-with-vlans/computeb.yaml /home/stack/templates/custom-nics/computeB.yaml
# check that the controller custom network interface config includes the new IPMI service network in the controller.yaml
# remove the IPMI VLAN interface from the ovs_bridge and put directly under the single 1G network interface used for the control plane traffic
nano -cw /home/stack/templates/custom-nics/controller.yaml
- type: interface
#name: nic1
name: eno1
mtu:
get_param: ControlPlaneMtu
use_dhcp: false
addresses:
- ip_netmask:
list_join:
- /
- - get_param: ControlPlaneIp
- get_param: ControlPlaneSubnetCidr
routes:
list_concat_unique:
- get_param: ControlPlaneStaticRoutes
- type: vlan
mtu:
get_param: IpmiNetworkMtu
vlan_id:
get_param: IpmiNetworkNetworkVlanID
#device: nic1
device: eno1
addresses:
- ip_netmask:
get_param: IpmiNetworkIpSubnet
routes:
list_concat_unique:
- get_param: IpmiNetworkInterfaceRoutes
# set the interface name scheme and LACP bond options for 'controller', 'networker' and 'computeA'
# eno1 is a single physical interface with an IP on the native/untagged VLAN1 for control plane traffic
# ens2f0/1 are in an ovs bond (LACP) attached to an ovs bridge (br-ex once named by the installer process)
# bond options dont seem to set correctly in the parameters section of the template (BondInterfaceOvsOptions), instead set directly under 'ovs_options:'
nano -cw /home/stack/templates/custom-nics/controller.yaml
nano -cw /home/stack/templates/custom-nics/networker.yaml
nano -cw /home/stack/templates/custom-nics/computeA.yaml
- type: interface
#name: nic1
name: eno1
mtu:
get_param: ControlPlaneMtu
use_dhcp: false
addresses:
- ip_netmask:
list_join:
- /
- - get_param: ControlPlaneIp
- get_param: ControlPlaneSubnetCidr
routes:
list_concat_unique:
- get_param: ControlPlaneStaticRoutes
- type: ovs_bridge
name: bridge_name
dns_servers:
get_param: DnsServers
domain:
get_param: DnsSearchDomains
members:
- type: ovs_bond
name: bond1
mtu:
get_attr: [MinViableMtu, value]
ovs_options:
#get_param: BondInterfaceOvsOptions
"bond_mode=balance-slb lacp=active other-config:lacp-fallback-ab=true other_config:lacp-time=fast other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100 other_config:bond_updelay=1000 other_config:bond-rebalance-interval=10000"
members:
- type: interface
#name: nic3
name: ens2f0
mtu:
get_attr: [MinViableMtu, value]
primary: true
- type: interface
#name: nic4
name: ens2f1
mtu:
get_attr: [MinViableMtu, value]
# set the interface name scheme and LACP bond options for 'computeB'
# ens4f0 is a single physical interface with an IP on the native/untagged VLAN1 for control plane traffic
# ens2f0/1 are in an ovs bond (LACP) attached to an ovs bridge (br-ex once named by the installer process)
# bond options dont seem to set correctly in the parameters section of the template (BondInterfaceOvsOptions), instead set directly under 'ovs_options:'
nano -cw /home/stack/templates/custom-nics/computeB.yaml
- type: interface
#name: nic4
name: ens4f0
mtu:
get_param: ControlPlaneMtu
use_dhcp: false
addresses:
- ip_netmask:
list_join:
- /
- - get_param: ControlPlaneIp
- get_param: ControlPlaneSubnetCidr
routes:
list_concat_unique:
- get_param: ControlPlaneStaticRoutes
- type: ovs_bridge
name: bridge_name
dns_servers:
get_param: DnsServers
domain:
get_param: DnsSearchDomains
members:
- type: ovs_bond
name: bond1
mtu:
get_attr: [MinViableMtu, value]
ovs_options:
#get_param: BondInterfaceOvsOptions
"bond_mode=balance-slb lacp=active other-config:lacp-fallback-ab=true other_config:lacp-time=fast other_config:bond-detect-mode=miimon other_config:bond-miimon-interval=100 other_config:bond_updelay=1000 other_config:bond-rebalance-interval=10000"
members:
- type: interface
#name: nic2
name: ens2f0
mtu:
get_attr: [MinViableMtu, value]
primary: true
- type: interface
#name: nic3
name: ens2f1
mtu:
get_attr: [MinViableMtu, value]
# set the path for the net-os-config script that gets pushed to the nodes and subsequently provisions the network config
nano -cw /home/stack/templates/custom-nics/controller.yaml
nano -cw /home/stack/templates/custom-nics/networker.yaml
nano -cw /home/stack/templates/custom-nics/computeA.yaml
nano -cw /home/stack/templates/custom-nics/computeB.yaml
OsNetConfigImpl:
type: OS::Heat::SoftwareConfig
properties:
group: script
config:
str_replace:
template:
#get_file: ../../scripts/run-os-net-config.sh
get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh
# create an environment file referencing the custom nics config files for inclusion
nano -cw /home/stack/templates/custom-network-configuration.yaml
resource_registry:
OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/custom-nics/controller.yaml
OS::TripleO::Networker::Net::SoftwareConfig: /home/stack/templates/custom-nics/networker.yaml
OS::TripleO::ComputeA::Net::SoftwareConfig: /home/stack/templates/custom-nics/computeA.yaml
OS::TripleO::ComputeB::Net::SoftwareConfig: /home/stack/templates/custom-nics/computeB.yaml
# deployment with new environment file
#
# new environment file to include in the deployment command
#-e /home/stack/templates/custom-network-configuration.yaml
#
# omit the environment file referencing the dynamically created network interface configs
#-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
All functional config files to this point
.
├── containers-prepare-parameter.yaml
├── instackenv.json
├── templates
│ ├── custom-network-configuration.yaml
│ ├── custom-nics
│ │ ├── computeA.yaml
│ │ ├── computeB.yaml
│ │ ├── controller.yaml
│ │ └── networker.yaml
│ ├── network_data.yaml
│ ├── predictive_ips.yaml
│ ├── roles_data.yaml
│ ├── scheduler_hints_env.yaml
│ ├── userdata_root_password.yaml
│ └── vips.yaml
└── undercloud.conf
Deployment command to this point
# ensure you are in the stack home directory
cd ~/
source ~/stackrc
time openstack overcloud deploy --templates \
--networks-file /home/stack/templates/network_data.yaml \
-e /home/stack/templates/scheduler_hints_env.yaml \
-e /home/stack/templates/predictive_ips.yaml \
-e /home/stack/templates/vips.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /home/stack/templates/custom-network-configuration.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/templates/userdata_root_password.yaml \
--roles-file /home/stack/templates/roles_data.yaml
# if a new stack fails to deploy (not due to configuration issue) the deployment command can be run again to finish off the provision
# check deployment completed
openstack overcloud status
# remove failed deployment
openstack stack list
openstack stack delete overcloud
# check all nodes are back to 'available' state before trying another deployment
openstack baremetal node list
# if not updating the stack but deploying a 'failing' fresh stack, you may need to tidy up:
# - remove the '~/overcloudrc' file
# - overcloud node entries from the /etc/hosts file '# START_HOST_ENTRIES_FOR_STACK: overcloud'
# - do not remove the undercloud host entries '# START_HOST_ENTRIES_FOR_STACK: undercloud'
Deployment problems
Logging
- Logging in tripleo Openstack is not very clear, undercloud and overcloud deployment logging is hit and miss.
- Deployment failures with RHOSP are generally configuration issues or roles based, with opensource tripleo the heat templates may be broken or containers failing QA.
- Once you have a running overcloud you often need to start testing services and creating client networks, the logs for the services are generated inside podman containers and exported to the host with the
k8s-filelog driver. - Container logs can be found @
/var/log/container/<service>, runpodman psto determine the name of the container for the service. - Most services will run on the controller nodes, as these are pacemaker controlled (VIP service API endpoint) it is often the case that the services only run one controller at a time, you may have to search all 3 controllers to find the active/realtime log for any given service.
- There are heat template parameters to extend the logging of core services and enable debug, the parameters file can be included in the deploy command with an environment configuration file
-e /home/stack/templates/debug.yaml.
Enabling debug
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/advanced_overcloud_customization/chap-debug_modes https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/overcloud_parameters/debug-parameters
Create the heat template environment (parameters) file.
Note we are enabling debug on only one service, debug on all services can be enabled, log files will be much larger but container logs do get log-rotated and compressed.
# set all of the following to 'False', 'CinderDebug: True' is for illustration
nano -cw /home/stack/templates/debug.yaml
parameter_defaults:
# Enable debugging on all services
Debug: False
# Run configuration management (e.g. Puppet) in debug mode
ConfigDebug: False
# Enable debug on individual services
BarbicanDebug: False
CinderDebug: True
ConfigDebug: False
GlanceDebug: False
HeatDebug: False
HorizonDebug: False
IronicDebug: False
KeystoneDebug: False
ManilaDebug: False
NeutronDebug: False
NovaDebug: False
SaharaDebug: False
It is prudent to include the debug environment file with all debug set to 'False' in the deployment command to aide any future update that may require debug enabling. Debug can be applied during initial deployment OR as an update. When updating an existing overcloud you must run the EXACT same deployment command as before with the addition of this file.
Updating configuration
As mentioned when enabling debug you can update the configuration without a redeployment, you must run the EXACT same deployment command, openstack stack status will show an UPDATED status.
The caveat with this is that physical network changes (physical->logical service network maping) will not apply in many cases, there are instructions to add tags to parameters to force redeployment [UPDATE], [CREATE] of the network bridges and configuration but you are taking a risk at this point and should have tested identical hardware in the old->new configuration states before running these on a production system.
Network changes are not recommended on a production customer system, try and steer the action towards a redeployment to minimise outages.
Ceph config
Openstack configuration for (external) Ceph RBD storage
The deployment command requires some additional heat templates, these set overrides for various storage backends.
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml \
-e /home/stack/templates/ceph-config.yaml \
Install the ceph-ansible package on the undercloud/director node.
# The correct version has already been installed in the 'Undercloud Deployment' document
#sudo dnf install -y ceph-ansible
Create a custom environments file ceph-config.yaml and provide parameters unique to your external ceph cluster.
- Find the value for 'CephClusterFSID' from the command
ceph status. - The 'openstack' user (with Capabilities) has already been created, use the command
ceph auth get client.openstackto find the 'CephClientKey'. - The 'CephExternalMonHost' host IPs are the cluster 'public' network IPs for each ceph cluster node.
nano -cw /home/stack/templates/ceph-config.yaml
parameter_defaults:
CinderEnableIscsiBackend: false
CinderEnableRbdBackend: true
CinderEnableNfsBackend: false
NovaEnableRbdBackend: true
GlanceBackend: rbd
CinderRbdPoolName: 'volumes'
NovaRbdPoolName: 'vms'
GlanceRbdPoolName: 'images'
CinderBackupRbdPoolName: 'backups'
GnocchiRbdPoolName: 'metrics'
CephClusterFSID: 5b99e574-4577-11ed-b70e-e43d1a63e590
CephExternalMonHost: 10.122.10.7,10.122.10.8,10.122.10.9
CephClientKey: 'AQCC5z5jtOmJARAAiFaC2HB4f2pBYfMKWzkkkQ=='
CephClientUserName: 'openstack'
ExtraConfig:
ceph::profile::params::rbd_default_features: '1'
Openstack deployment command with Ceph RBD
source ~/stackrc
time openstack overcloud deploy --templates \
--networks-file /home/stack/templates/network_data.yaml \
-e /home/stack/templates/scheduler_hints_env.yaml \
-e /home/stack/templates/predictive_ips.yaml \
-e /home/stack/templates/vips.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /home/stack/templates/custom-network-configuration.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/templates/userdata_root_password.yaml \
-e /home/stack/templates/debug.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml \
-e /home/stack/templates/ceph-config.yaml \
--roles-file /home/stack/templates/roles_data.yaml
# check deployment completed
openstack overcloud status
VM Instance HA fencing configuration
All configuration to this point has included changes to support instance-ha.
templates/network_data.yaml= additional IPMI network definedtemplates/roles_data.yaml= include Instance-ha roles for ComputeA / ComputeBtemplates/predictive_ips.yaml= includes IPMI range for controllerstemplates/custom-nics/controller.yaml= has VLAN interface for IPMI To set the functionality active, additional environment files must be included in the deployment command to ensure 'watcher/fencing/migration' processes start.
Create the VM instance HA fencing configuration file
- The parameter
tripleo::instanceha::no_shared_storagemust be set to 'true' if local controller backend storage is used, such as controllers presenting (non shared storage) LVM based disk over iscsi to compute nodes. The LAB is using Ceph so set to 'false'. The documentation is fairly confusing, the parameter is set 'true' by default. - This config actually references the controller, networker and compute nodes, documentation states that all nodes should be added even if not used in fencing (controller/networker nodes do not participate in Instance HA).
The tripleo::instanceha::no_shared_storage is a seeming simple parameter but can cause a lot of hassle whilst trying to debug failing HA, digging through the puppet module you will find the default value to be 'true'.
A Ceph RBD backend (much like NFS) is considered shared storage, Cinder is configured for the Ceph back end, the documentation is a little confusing and you may incorrectly consider Cinder as a non shared resource with a shared back end. For Ceph explicitly set the heat template parameter tripleo::instanceha::no_shared_storage: false.
Beware the following confusing statement from Redhat, Ceph IS a shared storage backend for Cinder:
However, if all your instances are configured to boot from an OpenStack Block Storage (cinder) volume, you do not need to configure shared storage for the disk image of the instances, and you can evacuate all instances using the no-shared-storage option.
The following link shows how the migration works, essentially a pacemaker resource configuration runs a script that loops checking for compute nodes with a non responsive libvirt daemon, this script is configured with parameters such as --no_shared_storage=true which are used in the messages/commands issued to the nova API endpoint on the control nodes. When a non-responsive libvirt daemon is detected a call is made to determine which VM Instances reside on the broken hypervisor, another call is made to determine which other hypervisors have capacity for each VM Instance, then a nova evacuate <VM Instace> <other hypervisor> command is issued for each VM Instance.
To quickly accertain if you have a shared storage issue run openstack server show test-failover -f json and look for the error '[Error: Invalid state of instance files on shared storage]', you may also find this on the controller with the external API endpoint ip (192.168.101.190 in the LAB) when checking the nova container logs /var/log/containers/<all the various nova/scheduler container logs>.
RHOSP helper script method
RHOSP includes a nice script to build the fencing configuration directly from the instackenv.json, this seems to include more fields than listed in the documentation and has proven to be the working configuration.
cd
source ~/stackrc
openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output /home/stack/templates/fencing.yaml /home/stack/instackenv.json
# add the no_shared_storage parameter under the parameters statement.
nano -cw /home/stack/templates/fencing.yaml
parameter_defaults:
ExtraConfig:
tripleo::instanceha::no_shared_storage: false
EnableFencing: true
FencingConfig:
devices:
- agent: fence_ipmilan
host_mac: 38:68:dd:4a:42:48
params:
ipaddr: 10.122.1.10
lanplus: true
login: USERID
passwd: Password0
privlvl: administrator
- agent: fence_ipmilan
host_mac: 38:68:dd:4a:55:90
params:
ipaddr: 10.122.1.11
lanplus: true
login: USERID
passwd: Password0
privlvl: administrator
...
Deployment command
Additional environment files to be included:
- INCLUDE
-e /home/stack/templates/fencing.yaml. - INCLUDE
-e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml. - INCLUDE
-e /home/stack/templates/custom-network-configuration.yamlThis includes the IPMI VLAN network interface for the controller nodes and is already part of the current deployment command above.
source ~/stackrc
time openstack overcloud deploy --templates \
--networks-file /home/stack/templates/network_data.yaml \
-e /home/stack/templates/scheduler_hints_env.yaml \
-e /home/stack/templates/predictive_ips.yaml \
-e /home/stack/templates/vips.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /home/stack/templates/custom-network-configuration.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/templates/userdata_root_password.yaml \
-e /home/stack/templates/debug.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml \
-e /home/stack/templates/ceph-config.yaml \
-e /home/stack/templates/fencing.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml \
--roles-file /home/stack/templates/roles_data.yaml
# check deployment completed
openstack overcloud status
TLS endpoint (Dashboard/API) configuration
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/advanced_overcloud_customization/index#sect-Enabling_SSLTLS_on_the_Overcloud https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ssl.html
The general steps for TLS cert creation follow:
- Create a certificate authority. (this is a a basic internal certificate with the CA signing 'usage')
- Create a certificate/key combination for the External endpoint. (this will technically be a SAN certificate, it will validate the DNS FQDN and IP of the endpoint, thus will include common_name (CN) and subject alt_names (SAN))
- The common name will be whatever you want to include in the estate wide DNS server.
stack.university.ac.uk, this maps to the parameterCloudName:in the/home/stack/templates/custom-domain.yamltemplate - The alt_name is the IP (you could have N entries for more IPs or FQDN if you required some further integration/legacy reasons) listed in
templates/predictive_ips.yamlentryPublicVirtualFixedIPs. - Create a certificate signing request, sign the certificate with the CA that has been created.
Set the DNS and overcloud name attributes
The University external domain is 'university.ac.uk'.
NOTE: If HostnameMap is used in the
/home/stack/templates/scheduler_hints_env.yamlconfiguration, ensure any override of the node names fit the following endpoint hostname scheme.
The CloudName: stack.university.ac.uk key relates to the public VIP pointing to the external API endpoint PublicVirtualFixedIPs, if using estate wide DNS (i.e laptops need to get to the overcloud console) an A record for this IP/NAME combo should be set.
dig stack.university.ac.uk @144.173.6.71
;; ANSWER SECTION:
stack.university.ac.uk. 86400 IN A 10.121.4.14
# Prefereably a PTR record should be in place, University do not have this set
dig -x 10.121.4.14 @144.173.6.71
Create the override for the endpoint naming scheme:
cp /usr/share/openstack-tripleo-heat-templates/environments/predictable-placement/custom-domain.yaml /home/stack/templates/
nano -cw /home/stack/templates/custom-domain.yaml
parameter_defaults:
# The DNS domain used for the hosts. This must match the overcloud_domain_name configured on the undercloud.
CloudDomain: university.ac.uk
# The DNS name of this cloud. E.g. ci-overcloud.tripleo.org
CloudName: stack.university.ac.uk
# The DNS name of this cloud's provisioning network endpoint. E.g. 'ci-overcloud.ctlplane.tripleo.org'.
CloudNameCtlplane: stack.ctlplane.university.ac.uk
# The DNS name of this cloud's internal_api endpoint. E.g. 'ci-overcloud.internalapi.tripleo.org'.
CloudNameInternal: stack.internalapi.university.ac.uk
# The DNS name of this cloud's storage endpoint. E.g. 'ci-overcloud.storage.tripleo.org'.
CloudNameStorage: stack.storage.university.ac.uk
# The DNS name of this cloud's storage_mgmt endpoint. E.g. 'ci-overcloud.storagemgmt.tripleo.org'.
CloudNameStorageManagement: stack.storagemgmt.university.ac.uk
DnsServers: ["144.173.6.71", "1.1.1.1"]
Create Certificate Authority
Rather than creating a self signed cert we can create a CA to sign any generated certificates. This simplifies client validation of any certs signed by this CA cert, the CA cert can be imported onto Linux client machines @ /etc/pki/tls/certs/ca-bundle.crt or the trust store on MS machines. Alternatively you could generate a CSR, submit to a public/verified CA (via the University security department) to then receive a certificate for the cluster, the certificate may require building into a PEM format (with any passphrases removed) depending on what is returned and the config files will likely need the full trust chain including Intermiatory CA certs.
Install cfssl
sudo curl -s -L -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
sudo curl -s -L -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
sudo curl -s -L -o /usr/local/bin/cfssl-certinfo https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
sudo chmod +x /usr/local/bin/cfssl*
Generate CA
cd
mkdir -p ~/CA/config
mkdir -p ~/CA/out
nano -cw ~/CA/config/ca-csr.json
{
"CA": {
"expiry": "87600h",
"pathlen": 0
},
"CN": "University Openstack CA",
"key": {
"algo": "rsa",
"size": 4096
},
"names": [
{
"C": "GB",
"O": "UOE",
"OU": "Cloud",
"L": "University",
"ST": "England"
}
]
}
cfssl gencert -initca ~/CA/config/ca-csr.json | cfssljson -bare ~/CA/out/ca -
Generate the external Dashboard/API endpoint certificate.
Use the same cfssl-profile.json to configure a new certificate.
The undercloud host may already have the CA certificate imported if a Quay registry was setup as per the LAB setup.
cd ~/CA/config
# create a cfssl configuration profile with a 10 year expiry that allows for certificates with multiple usage
nano -cw cfssl-profile.json
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"server": {
"usages": ["signing", "digital signing", "key encipherment", "server auth"],
"expiry": "87600h"
}
}
}
}
# create a certificate CSR profile for the overcloud.local endpoint
# Openstack is a bit picky around using both a CN (context name) and a SAN (subject alternate name), populate both CN and 'hosts' entries
# use the VIP for PublicVirtualFixedIPs as part of the SAN
nano -cw overcloud-csr.json
{
"CN": "stack.university.ac.uk",
"hosts": [
"stack.university.ac.uk",
"10.121.4.14"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "GB",
"L": "University",
"ST": "England"
}
]
}
# generate certificate signed by the CA, there no need to create a certificate chain with an intermediatory certificate in this scenario
cfssl gencert -ca ../out/ca.pem -ca-key ../out/ca-key.pem -config ./cfssl-profile.json -profile=server ./overcloud-csr.json | cfssljson -bare ../out/overcloud
# you will get an error message '[WARNING] This certificate lacks a "hosts" field.' owing to the CA signing cert not being able to certify a website, this is not an issue.
# check cert
cfssl-certinfo -cert ../out/overcloud.pem
{
"subject": {
"common_name": "stack.university.ac.uk",
"country": "GB",
"locality": "University",
"province": "England",
"names": [
"GB",
"England",
"University",
"stack.university.ac.uk"
]
},
"issuer": {
"common_name": "University Openstack CA",
"country": "GB",
"organization": "UOE",
"organizational_unit": "Cloud",
"locality": "University",
"province": "England",
"names": [
"GB",
"England",
"University",
"Cloud",
"University Openstack CA"
]
},
"serial_number": "63583601022960320621656457322685669356580690922",
"sans": [
"stack.university.ac.uk",
"10.121.4.14"
],
"not_before": "2022-07-14T11:19:00Z",
"not_after": "2032-07-11T11:19:00Z",
"sigalg": "SHA512WithRSA",
"authority_key_id": "58:5F:BC:63:BF:22:34:5C:D1:FE:3F:61:DF:7C:FC:E6:C8:34:2D:45",
"subject_key_id": "4D:75:7D:60:CE:11:9:46:7D:6E:69:1E:96:4D:4C:5A:92:36:D7:E3",
"pem": "-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----\n"
}
# check cert, query the pem directly with openssl toolchain
openssl x509 -in ../out/overcloud.pem -text -noout
Subject: C = GB, ST = England, L = University, CN = stack.university.ac.uk
X509v3 Subject Alternative Name:
DNS:stack.university.ac.uk, IP Address:10.121.4.14
# list generated certificate/key pair
ll ../out/
-rw-r--r--. 1 stack stack 1704 Jun 27 18:47 ca.csr
-rw-------. 1 stack stack 3243 Jun 27 18:47 ca-key.pem
-rw-rw-r--. 1 stack stack 2069 Jun 27 18:47 ca.pem
-rw-r--r--. 1 stack stack 1013 Jun 27 18:51 overcloud.csr
-rw-------. 1 stack stack 1679 Jun 27 18:51 overcloud-key.pem
-rw-rw-r--. 1 stack stack 1728 Jun 27 18:51 overcloud.pem
Configure the undercloud to be able to validate the PublicVirtualFixedIPs endpoint
Ensure the DNS server (144.173.6.71) has the following A record (preferably also with PTR record). stack.university.ac.uk -> 10.121.4.14
# add host entry where DNS is not available
# update the deployment hard codes this entry automatically
#echo "10.121.4.14 stack.university.ac.uk" >> /etc/hosts
Import the certificate authority to the undercloud, when deploying the overcloud the undercloud will check (at the end of deployment) if the public endpoint is up, if it cannot validate the SSL certificate the installer will fail (you will not know if just the endpoint was not validated or if there were other deployment issues).
sudo cp /home/stack/CA/out/ca.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
trust list | grep label | wc -l
147
trust list | grep label | grep -i university
label: University Openstack CA
Set the config files for TLS
Openstack external API endpoint certificate configuration file.
cp /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml /home/stack/templates/
# edit /home/stack/templates/enable-tls.yaml
# insert the contents of /home/stack/CA/out/overcloud.pem to the SSLCertificate section entry (4 space indent)
# insert the content of the /home/stack/CA/out/overcloud-key.pem to the SSLKey section entry (4 space indent)
# set PublicTLSCAFile to the path of the CA cert /etc/pki/ca-trust/source/anchors/ca.pem
# ensure the DeployedSSLCertificatePath is set to /etc/pki/tls/private/overcloud_endpoint.pem, this will be populated/updated on deployment
nano -cw /home/stack/templates/enable-tls.yaml
parameter_defaults:
HorizonSecureCookies: True
PublicTLSCAFile: '/etc/pki/ca-trust/source/anchors/ca.pem'
SSLCertificate: |
-----BEGIN CERTIFICATE-----
MIIE4zCCAsugAwIBAgIUZUNyk+eV4aidYikN21GRWbsndJ0wDQYJKoZIhvcNAQEN
................................................................
N/FgTMHNQ4qylQCRwdchkBADyjIh+dC7mwnBEY4XLaMcCh3F0dgDdp/VZX0mk9UW
jpJD93nbqA==
-----END CERTIFICATE-----
SSLIntermediateCertificate: ''
SSLKey: |
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEA0JacewbcVu37MGpAopX9pRakBMp+6xFPUSDEWASFx50V6VJF
................................................................
VcBZsDDVEvzWQIc7d3fkRxO+r/QeSIw8IJ6aPRS7xegAEMNwD8ZXzFjEXOdN/LsM
oUgYstUl1OwL/uupELwFpR5LdtjRszd3BoprI5ZdW0WuYmGm+YPw
-----END RSA PRIVATE KEY-----
DeployedSSLCertificatePath: /etc/pki/tls/private/overcloud_endpoint.pem
Certificate authority configuration file.
cp /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml /home/stack/templates/
# edit /home/stack/templates/inject-trust-anchor-hiera.yaml
# insert the contents of the /home/stack/CA/out/ca.pem to the CAMap key, multiple CAs can be added, in this case only a single CA is used
# the certificate key name under CAMap is arbritrary, by default these are named 'first-ca-name', 'second-ca-name' for ease (8 space indent)
nano -cw /home/stack/templates/inject-trust-anchor-hiera.yaml
parameter_defaults:
CAMap:
first-ca-name:
content: |
-----BEGIN CERTIFICATE-----
MIIFzDCCA7SgAwIBAgIUXS9uFGJSbVPt1Tj0Oc82XwlmfQMwDQYJKoZIhvcNAQEN
................................................................
FkExys4JyWK3bFz3KAzYKfNb/forqoXPVEtE+v+Io3Da8yf207VchE5iOdxgNJiH
-----END CERTIFICATE-----
Configure the public API endpoint to accept inbound connections by IP or DNS
DNS entry and valid Certificate (with SAN entries) will allow a browser to use the dashboard by FQDN 'https://stack.university.ac.uk:443/dashboard' or 'https://10.121.4.15:443/dashboard'.
Include
- If you use a DNS name for accessing the public endpoints, use
/usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml - If you use only IP address for accessing the public endpoints, use
/usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml
A client connecting to the overcloud (such as a laptop) will need to have the CA certificate in the local trust store and make DNS requests to a server with an A record entry for stack.university.ac.uk.
To avoid this, use a University wide certificate authority (frequently all University machines will have their own CA certificate pushed via group policy) or use a Public certificate authority that have their CA/Intermediatory certificate chains distributed with the OS/browser by default.
Deployment command to this point
Additional environment files to be included:
- INCLUDE
-e /home/stack/templates/custom-domain.yaml. - INCLUDE
-e /home/stack/templates/enable-tls.yaml. - INCLUDE
/home/stack/templates/inject-trust-anchor-hiera.yaml - INCLUDE
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml.
time openstack overcloud deploy --templates \
--networks-file /home/stack/templates/network_data.yaml \
-e /home/stack/templates/scheduler_hints_env.yaml \
-e /home/stack/templates/predictive_ips.yaml \
-e /home/stack/templates/vips.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /home/stack/templates/custom-network-configuration.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/templates/userdata_root_password.yaml \
-e /home/stack/templates/debug.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml \
-e /home/stack/templates/ceph-config.yaml \
-e /home/stack/templates/fencing.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml \
-e /home/stack/templates/custom-domain.yaml \
-e /home/stack/templates/enable-tls.yaml \
-e /home/stack/templates/inject-trust-anchor-hiera.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \
--roles-file /home/stack/templates/roles_data.yaml
# check deployment completed
openstack overcloud status
LDAP
The University AD servers use SSL (not TLS) and have a certificate signed by a Public CA with a certificate that already exists in the trust store of a vanilla RedHat/Centos installation, for this reason no certificated need to be imported onto the hosts running the keystone services (controllers).
Check University domain connectivity
# find all groups
ldapsearch -LLL -o ldif-wrap=no -x \
-w "Password0" \
-b "OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk" \
-D "svc_iscalookup@university.ac.uk" \
-H "ldaps://secureprodad.university.ac.uk" \
"(objectClass=group)" \
cn distinguishedName name sAMAccountName objectClass
# find all members of the openstack group
ldapsearch -LLL -o ldif-wrap=no -x \
-w "Password0" \
-b "OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk" \
-D "svc_iscalookup@university.ac.uk" \
-H "ldaps://secureprodad.university.ac.uk" \
"(&(objectClass=group)(cn=ISCA-Openstack-Users))" \
member
# number of openstack users
ldapsearch -LLL -o ldif-wrap=no -x \
-w "Password0" \
-b "OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk" \
-D "svc_iscalookup@university.ac.uk" \
-H "ldaps://secureprodad.university.ac.uk" \
"(&(objectClass=group)(cn=ISCA-Openstack-Users))" \
member | grep -v ^dn: | wc -l
# find simon/ocf account (assuming there was an account created at project inception)
ldapsearch -LLL -o ldif-wrap=no -x -w "Password0" -b "OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk" -D "svc_iscalookup@university.ac.uk" -H "ldaps://secureprodad.university.ac.uk" "(&(objectClass=group)(cn=ISCA-Openstack-Users))" member | grep -v ^dn: | sed 's/member: //' | sed '/^$/d' > search.txt
while read i; do ldapsearch -LLL -o ldif-wrap=no -x -w "Password0" -b "$i" -D "svc_iscalookup@university.ac.uk" -H "ldaps://secureprodad.university.ac.uk" "(objectClass=user)" cn displayName mail uid ;done < search.txt > search1.txt
grep -i simon search1.txt
grep -i ocf search1.txt
rm -f search*txt
# no account was created
Create config file
cp /usr/share/openstack-tripleo-heat-templates/environments/services/keystone_domain_specific_ldap_backend.yaml /home/stack/templates/
nano -cw /home/stack/templates/keystone_domain_specific_ldap_backend.yaml
parameter_defaults:
KeystoneLDAPDomainEnable: true
KeystoneLDAPBackendConfigs:
ldap:
# AD domain
url: ldaps://secureprodad.university.ac.uk:636
user: CN=svc_iscalookup,OU=Machine Accounts,OU=Service Accounts,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk
password: Password0
suffix: DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk
# scope can be set to one OR sub, one level down the tree or the entire subtree
# the University directory has many tiers and many thousands of objects requiring defined tree locations for users and groups, without these targets objects will not be returned (timeout) and performance is poor
query_scope: sub
# user lookup
user_tree_dn: OU=People,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk
user_filter: (memberOf=CN=ISCA-Openstack-Users,OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk)
user_objectclass: person
user_id_attribute: sAMAccountName
#user_name_attribute: sAMAccountName
user_name_attribute: cn
user_mail_attribute: mail
# enable user enable/disable from LDAP field
user_enabled_attribute: userAccountControl
user_enabled_mask: 2
user_enabled_default: 512
# keystone attributes to ignore on create/update, (tenant ~ project)
# when a user is autocreated on login the typical keystone fields listed below will not be populated, for example password is provided/passthrough by LDAP
user_attribute_ignore: password,tenant_id,tenants
# group lookup
group_tree_dn: OU=ISCA-Groups,OU=HPC,OU=Member Servers,DC=isad,DC=isadroot,DC=university,DC=ac,DC=uk
group_objectclass: group
group_id_attribute: sAMAccountName
group_name_attribute: cn
group_member_attribute: member
group_desc_attribute: cn
# The University LDAPS connection is using SSL (not TLS) with a public CA cert that already exists in the OS trust store
use_tls: False
tls_cacertfile: ""
All functional config files to this point
.
├── CA
│ ├── config
│ │ ├── ca-csr.json
│ │ ├── cfssl-profile.json
│ │ └── overcloud-csr.json
│ └── out
│ ├── ca.csr
│ ├── ca-key.pem
│ ├── ca.pem
│ ├── overcloud.csr
│ ├── overcloud-key.pem
│ └── overcloud.pem
├── containers-prepare-parameter.yaml
├── instackenv.json
├── templates
│ ├── ceph-config.yaml
│ ├── custom-domain.yaml
│ ├── custom-network-configuration.yaml
│ ├── custom-nics
│ │ ├── computeA.yaml
│ │ ├── computeB.yaml
│ │ ├── controller.yaml
│ │ └── networker.yaml
│ ├── debug.yaml
│ ├── enable-tls.yaml
│ ├── fencing.yaml
│ ├── inject-trust-anchor-hiera.yaml
│ ├── keystone_domain_specific_ldap_backend.yaml
│ ├── network_data.yaml
│ ├── predictive_ips.yaml
│ ├── roles_data.yaml
│ ├── scheduler_hints_env.yaml
│ ├── userdata_root_password.yaml
│ └── vips.yaml
└── undercloud.conf
Deployment command to this point
Additional environment files to be included:
- INCLUDE
-e /home/stack/templates/keystone_domain_specific_ldap_backend.yaml. - Wrap the command in a script, every time the deployment command changes include in the 'deploy.sh' script to ensure a record is kept.
# ensure you are the stack user in the $HOME directory to run the deployment command
cd
touch overcloud-deploy.sh
chmod +x overcloud-deploy.sh
nano -cw overcloud-deploy.sh
source /home/stack/stackrc
time openstack overcloud deploy --templates \
--networks-file /home/stack/templates/network_data.yaml \
-e /home/stack/templates/scheduler_hints_env.yaml \
-e /home/stack/templates/predictive_ips.yaml \
-e /home/stack/templates/vips.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-environment.yaml \
-e /home/stack/templates/custom-network-configuration.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e /home/stack/templates/userdata_root_password.yaml \
-e /home/stack/templates/debug.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible-external.yaml \
-e /home/stack/templates/ceph-config.yaml \
-e /home/stack/templates/fencing.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yaml \
-e /home/stack/templates/custom-domain.yaml \
-e /home/stack/templates/enable-tls.yaml \
-e /home/stack/templates/inject-trust-anchor-hiera.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \
-e /home/stack/templates/keystone_domain_specific_ldap_backend.yaml \
--roles-file /home/stack/templates/roles_data.yaml
Check LDAP connectivity
# load environment/credentials for the CLI to interact with the Overcloud API
source ~/overcloudrc
# check for the creation of the 'ldap' domain (present i)
openstack domain list
+----------------------------------+------------+---------+--------------------+
| ID | Name | Enabled | Description |
+----------------------------------+------------+---------+--------------------+
| 70ddca13588744a9ab3af718abaf70dc | heat_stack | True | |
| acbc81b55cc242e198648230625fbc0b | ldap | True | |
| default | Default | True | The default domain |
+----------------------------------+------------+---------+--------------------+
# get ID of the 'ldap' openstack domain
openstack domain show ldap -f json | jq -r .id
acbc81b55cc242e198648230625fbc0b
# get ID of default admin user (this is in the default openstack domain)
openstack user list --domain default -f json | jq -r '.[] | select(."Name" == "admin") | .ID'
1050084350ed4b55b43a929d29e64ac1
# get the ID of the admin role
openstack role list -f json | jq -r '.[] | select(."Name" == "admin") | .ID'
db4388c489dd4afc97dbbfea0b1dd0ac
# setup 'admin' access to the Overcloud/Cluster
# bind the (default) Openstack admin user to the admin role for the 'ldap' domain
openstack role add --domain acbc81b55cc242e198648230625fbc0b --user 1050084350ed4b55b43a929d29e64ac1 db4388c489dd4afc97dbbfea0b1dd0ac
Query Keystone for AD users/groups:
openstack user list --domain ldap | head -n 10
+------------------------------------------------------------------+----------+
| ID | Name |
+------------------------------------------------------------------+----------+
| 06bb55f37d07e62a1309cfa5bf86feec8b0af5e1c28fb64e789629fb901b485b | ptfrost |
| 1c3629955a3d5d6e90005dea89aee86970a912ed95a6a0e6c6f6eabbdf0bfdec | mcw204 |
| f1c146517abe61f67b6e89c7ee2a7a31ea5958394f0bc5e0859e5e4ed51ea3c2 | snfieldi |
| 07eeb10cfe28a37677f5001d35ce18012e3594f1028265532a3496c67c5e9bd5 | jh288 |
| 278820da9a686e2102aff305c81dac330d41060d74020cef3d2c8437d3dd4a7c | rnb203 |
| 37d181f770cd61f090d377f536a67c99286ca66ba9157afb1b31940db71a46ff | kebrown |
| ae2ec9344c05939f3a28aafc27466e3197d78fa7936c0c1cc43a01e7974e46ba | arichard |
openstack user list --domain ldap | wc -l
1851
openstack user show ptfrost --domain ldap
+---------------------+------------------------------------------------------------------+
| Field | Value |
+---------------------+------------------------------------------------------------------+
| description | Staff |
| domain_id | c0543515d22f45f88a69008b5b884ebf |
| email | P.T.Frost@university.ac.uk |
| enabled | True |
| id | 06bb55f37d07e62a1309cfa5bf86feec8b0af5e1c28fb64e789629fb901b485b |
| name | ptfrost |
| options | {} |
| password_expires_at | None |
+---------------------+------------------------------------------------------------------+
openstack group list --domain ldap
+------------------------------------------------------------------+----------------------+
| ID | Name |
+------------------------------------------------------------------+----------------------+
| 90c99abaec2f579937a6a3be1d66e35de635c28391bfaf6656e92d305e4a2660 | ISCA-Openstack-Users |
| 52120554370b8678c1893dcf2b3033c7eae27f345acb3da5aff7f7a0f5e01861 | ISCA-Admins |
| 4dc6714669c04de77b2507488930ade9acf82ceb6f69098dc8b6c36e917b8a9d | ISCA-Users |
| ec75b5f9bdcebf6681f0bc83b52f2b02f26be22a46f432f50ea7f903b047168c | ISCA-module-stata |
+------------------------------------------------------------------+----------------------+
openstack group show ISCA-Openstack-Users --domain ldap
+-------------+------------------------------------------------------------------+
| Field | Value |
+-------------+------------------------------------------------------------------+
| description | ISCA-Openstack-Users |
| domain_id | c0543515d22f45f88a69008b5b884ebf |
| id | 90c99abaec2f579937a6a3be1d66e35de635c28391bfaf6656e92d305e4a2660 |
| name | ISCA-Openstack-Users |
+-------------+------------------------------------------------------------------+
Dashboard login
# get Openstack 'admin' user password
grep OS_PASSWORD ~/overcloudrc | awk -F "=" '{print $2}'
Password0
Browse to https://stack.university.ac.uk/dashboard.
- user: admin
- password: Password0
- domain: default (for AD login the domain is 'ldap')