initial commit

main
tseed 2022-10-26 17:58:48 +01:00
commit 8adc0397da
12 changed files with 6199 additions and 0 deletions

View File

@ -0,0 +1,133 @@
# Access to university Openstack
```
edit local ~/.ssh/config and include the following entries
###### university
Host university-jump
HostName 144.173.114.20
ProxyJump nemesis
IdentityFile ~/.ssh/id_rsa
Port 22
User root
Host university-proxmox
Hostname 10.121.4.5
Proxyjump university-jump
#PreferredAuthentications password
IdentityFile ~/.ssh/id_rsa
Port 22
User root
Host university-proxmox-dashboard
Hostname 10.121.4.5
Proxyjump university-jump
#PreferredAuthentications password
IdentityFile ~/.ssh/id_rsa
Port 22
User root
DynamicForward 8888
Host university-undercloud
Hostname 10.121.4.25
Proxyjump university-jump
IdentityFile ~/.ssh/id_rsa
Port 22
User stack
ServerAliveInterval 100
ServerAliveCountMax 2
Host university-ceph1
Hostname 10.121.4.7
Proxyjump university-jump
IdentityFile ~/.ssh/id_rsa
Port 22
User root
Host university-ceph2
Hostname 10.121.4.8
Proxyjump university-jump
IdentityFile ~/.ssh/id_rsa
Port 22
User root
Host university-ceph3
Hostname 10.121.4.9
Proxyjump university-jump
IdentityFile ~/.ssh/id_rsa
Port 22
User root
```
# Logins
## Switches
| IP/Login | Password | Type | Notes |
| --- | --- | --- | --- |
| cumulus@10.122.0.250 | Password0 | 100G switch | 2x CLAG bond between 100G switches, 2x Peerlink CLAG across 100G switches to university Juniper core switches |
| cumulus@10.122.0.251 | Password0 | 100G switch | 2x CLAG bond between 100G switches, 2x Peerlink CLAG across 100G switches to university Juniper core switches |
| cumulus@10.122.0.252 | Password0 | 1G switch | 2x SFP+ 10G LAG bond between management switches, 1G ethernet uplink from each 100G switch for access |
| cumulus@10.122.0.253 | Password0 | 1G switch | 2x SFP+ 10G LAG bond between management switches |
## Node OOB (IPMI / XClarity web)
| IP | Login | Password |
| --- | --- | --- |
| 10.122.1.5(proxmox) 10.122.1.10-12(controller) 10.122.1.20-21(networker) 10.122.1.30-77(compute) 10.122.1.90-92(ceph) | USERID | Password0 |
## Node Operating System
| IP | Login | Password |
| --- | --- | --- |
| 10.121.4.5 (proxmox hypervisor) | root | Password0 |
| 10.121.4.25 (undercloud VM) | stack OR root | Password0 |
| 10.122.0.30-32(controller) 10.122.0.40-41(networker) 10.122.0.50-103(compute) | root OR heat-admin | Password0 |
## Dashboards
| Dashboard | IP / URL | Login | Password | Notes |
| --- | --- | --- | --- | --- |
| Proxmox | https://10.121.4.5:8006/ | root | Password0 | |
| Ceph | https://10.122.10.7:8443/ | admin | Password0 | 10.122.10.7,8,9 will redirect to live dashboard |
| Ceph Grafana | https://10.121.4.7:3000/ | | | many useful dashboards for capacity and throughput |
| Ceph Alertmanager | http://10.121.4.7:9093/ | | | check ceph alerts |
| Ceph Prometheus | http://10.121.4.7:9095/ | | | check if promethus is monitoring ceph |
| Openstack Horizon | https://stack.university.ac.uk/dashboard | admin | Password0 | domain: default (for AD login the domain is 'ldap')<br>floating ip 10.121.4.14<br>find password on undercloud `grep OS_PASSWORD ~/overcloudrc \\\\\| awk -F "=" '{print $2}'` |
# Networking
![university_Network.drawio.png](university_Network.drawio.png)
## Openstack control networks
- These networks reside on the primary 1G ethernet adapter.
- The IPMI network is usually only used by the undercloud, however to facilitate IPMI fencing for Instance-HA the Openstack controller nodes will have a logical interface
| Network | VLAN | IP Range | |
| --- | --- | --- | --- |
| ControlPlane | 1 Native | 10.122.0.0/24 | |
| IPMI | 2 | 10.122.1.0/24 | |
## Openstack service networks
- The logical networks reside upon an OVS bridge across an LACP bond on the 2x Mellanox 25G ethernet adapters in each node.
- The 2x Mellanox 25G ethernet adapters are cabled to 100G switch1 and 100G switch2 respectively, the switch handles the LACP bond as one logical entity across switches with a CLAG.
| Network | VLAN | IP Range | |
| --- | --- | --- | --- |
| Storage Mgmt | 14 | 10.122.12.0/24 | |
| Storage | 13 | 10.122.10.0/24 | |
| InternalApi | 12 | 10.122.6.0/24 | |
| Tenant | 11 | 10.122.8.0/24 | |
| External | 1214 | 10.121.4.0/24 Gateway 10.121.4.1 | |
## Ceph service networks
Use Openstack "Storage Mgmt" for the Ceph public network.
| Network | VLAN | IP Range | |
| --- | --- | --- | --- |
| Cluster Network | 15 | 10.122.14.0/24 | |
| Public Network (Openstack Storage) | 13 | 10.122.10.0/24 | |
| Management (Openstack Storage Mgmt) | 14 | 10.122.12.0/24 | |

64
10) Outstanding issues.md Executable file
View File

@ -0,0 +1,64 @@
# Nodes
```
10.122.1.5 proxmox/undercloud
10.122.1.10 controller
10.122.1.11 controller
10.122.1.12 controller
10.122.1.20 networker
10.122.1.21 networker
10.122.1.30 compute SR630
10.122.1.31
10.122.1.32
10.122.1.33 faulty PSU
10.122.1.34 lost mellanox adapter
10.122.1.35
10.122.1.36
10.122.1.37 lost mellanox adapter
10.122.1.38
10.122.1.39
10.122.1.40
10.122.1.41
10.122.1.42
10.122.1.43
10.122.1.44
10.122.1.45
10.122.1.46
10.122.1.47
10.122.1.48
10.122.1.49
10.122.1.50
10.122.1.51
10.122.1.52
10.122.1.53
10.122.1.54 compute SR630v2 - expansion
10.122.1.55
10.122.1.56
10.122.1.57
10.122.1.58
10.122.1.59
10.122.1.60
10.122.1.61
10.122.1.62
10.122.1.63
10.122.1.64
10.122.1.65
10.122.1.66 faulty PSU
10.122.1.67
10.122.1.68
10.122.1.69
10.122.1.70
10.122.1.71
10.122.1.72
10.122.1.73
10.122.1.74
10.122.1.75
10.122.1.76
10.122.1.77
10.122.1.90 ceph1
10.122.1.91 ceph2
10.122.1.92 ceph3
```

558
2) Undercloud Deployment.md Executable file
View File

@ -0,0 +1,558 @@
# Proxmox installation
Proxmox hosts the undercloud node, this enables snapshots to assist in Update/DR/Rebuild scenarios, primarily this will allow a point in time capture of working heat-templates and containers.
> https://pve.proxmox.com/wiki/Installation
| setting | value |
| --- | --- |
| filesystem | xfs |
| swapsize | 8GB |
| maxroot | 50GB |
| country | United Kingdom |
| time zone | Europe/London |
| keyboard layout | United Kingdom |
| password | Password0 |
| email | user@university.ac.uk (this can be changed in the web console @ datacenter/users/root) |
| management interface | eno1 |
| hostname | pve.local |
| ip address | 10.122.0.5/24 |
| gateway | 10.122.0.1 (placeholder, there is no gateway on this range) |
| dns | 144.173.6.71 |
- Install from a standard version 7.2 ISO, use settings listed as above.
- Create a bridge on the 1G management interface, this is VLAN 1 native on the 'ctlplane' network with VLAN 2 tagged for IPMI traffic.
- Ensure the 25G interfaces are setup as an LACP bond, create a bridge on the bond with the 'tenant', 'storage', 'internal-api' and 'external' VLANs as tagged (the 'external' range has the default gateway).
- Proxmox host has VLAN interfaces into each openstack network for introspection/debug, nmap is installed.
```sh
cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual
iface enx3a68dd4a4c5f inet manual
auto ens2f0np0
iface ens2f0np0 inet manual
auto ens2f1np1
iface ens2f1np1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f0np0 ens2f1np1
bond-miimon 100
bond-mode 802.3ad
auto vmbr0
iface vmbr0 inet static
address 10.122.0.5/24
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#vlan 1(native) 2 (tagged) ControlPlane
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#vlan 1(native) 11 12 13 1214 (tagged)
auto vlan2
iface vlan2 inet static
address 10.122.1.5/24
vlan-raw-device vmbr0
#IPMI
auto vlan13
iface vlan13 inet static
address 10.122.10.5/24
vlan-raw-device vmbr1
#Storage
auto vlan1214
iface vlan1214 inet static
address 10.121.4.5/24
gateway 10.121.4.1
vlan-raw-device vmbr1
#External
auto vlan12
iface vlan12 inet static
address 10.122.6.5/24
vlan-raw-device vmbr1
#InternalApi
auto vlan11
iface vlan11 inet static
address 10.122.8.5/24
vlan-raw-device vmbr1
#Tenant
```
Setup the no-subscription repository.
```sh
# comment/disable enterprise repo
nano -cw /etc/apt/sources.list.d/pve-enterprise.list
#deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
# insert pve-no-subscription repo
nano -cw /etc/apt/sources.list
deb http://ftp.uk.debian.org/debian bullseye main contrib
deb http://ftp.uk.debian.org/debian bullseye-updates main contrib
# security updates
deb http://security.debian.org bullseye-security main contrib
# pve-no-subscription
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
# update
apt-get update
apt-get upgrade -y
reboot
```
Download some LXC containers.
- LXC is not used in production, but during build LXC containers with network interfaces in all ranges (last octet suffix .6) was used to debug IP connectivity, switch configuration and serve linux boot images over NFS for XClarity.
```sh
pveam update
pveam available --section system
pveam download local almalinux-8-default_20210928_amd64.tar.xz
pveam download local rockylinux-8-default_20210929_amd64.tar.xz
pveam download local ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
pveam list local
NAME SIZE
local:vztmpl/almalinux-8-default_20210928_amd64.tar.xz 109.08MB
local:vztmpl/rockylinux-8-default_20210929_amd64.tar.xz 107.34MB
local:vztmpl/ubuntu-18.04-standard_18.04.1-1_amd64.tar.gz 203.54MB
local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst 123.81MB
```
# Undercloud VM instance
## Download RHEL 8.4 full DVD image
Select the RHEL8.4 image, choose the full image rather than the boot image, this will allow installation without registering the system during the installer, you may then attach the system to a license via the `subscription-manager` tool after the host is built.
## Install spec
- RHEL8 (RHEL 8.4 specifically)
- 1 socket, 16 core (must use HOST cpu type for nested virtualization)
- 24GB ram
- 100GB disk (/root 89GiB lvm, /boot 1024MiB, swap 10GiB lvm)
- ControlPlane network interface on vmbr0, no/native vlan, 10.122.0.25/24, ens18
- IPMI network interface on vmbr0, vlan2 (vlan assigned in proxmox not OS), 10.122.1.25/24, ens19
- External/Routable network interface on vmbr1, vlan 1214 (vlan assigned in proxmox not OS), 10.121.4.25/24, gateway 10.121.4.1, dns 144.173.6.7,1 1.1.1.1, ens20
- ensure all network interfaces do not have the firewall enabled in proxmox or OS (mac spoofing will be required and should be allowed in the firewall if used)
- root:Password0
- undercloud.local
- minimal install with QEMU guest agents
- will require registering with redhat subscription service
## OCF partner subscription entitlement
Register for a partner product entitlement.
> https://partnercenter.redhat.com/NFRPageLayout
> Product: Red Hat OpenStack Platform, Standard Support (4 Sockets, NFR, Partner Only) - 25.0 Units
Once the customer has purchased the entitlement, this should be present in their own RedHat portal to consume on the production nodes.
## Register undercloud node with the require software repositories
> [https://access.redhat.com/documentation/en-us/red\_hat\_openstack\_platform/16.2/html/director\_installation\_and\_usage/assembly_preparing-for-director-installation#enabling-repositories-for-the-undercloud](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_preparing-for-director-installation#enabling-repositories-for-the-undercloud)
Browse to:
> https://access.redhat.com/management/systems/create
Create a new system with the following attributes.
- Virtual System
- Name: university_test
- Architecture: x86_64
- Number of vCPUs: 16
- Red Hat Enterprise Linux Version: 8
- Create
Attach the following initial subscription: 'Red Hat Enterprise Linux, Self-Support (128 Sockets, NFR, Partner Only)'
Note the name and UUID of the system.
Register the system.
```sh
sudo su -
[root@undercloud ~]# subscription-manager register --name=university_test --consumerid=f870ae18-6664-4206-9a89-21f24f312866 --username=tseed@ocf.co.uk
Registering to: subscription.rhsm.redhat.com:443/subscription
Password:
The system has been registered with ID: a1b24b8a-933b-4ce8-8244-1a7e16ff51a3
The registered system name is: university_test
#[root@undercloud ~]# subscription-manager refresh
[root@undercloud ~]# subscription-manager list
+-------------------------------------------+
Installed Product Status
+-------------------------------------------+
Product Name: Red Hat Enterprise Linux for x86_64
Product ID: 479
Version: 8.4
Arch: x86_64
Status: Subscribed
Status Details:
Starts: 06/13/2022
Ends: 06/13/2023
[root@undercloud ~]# subscription-manager list
+-------------------------------------------+
Installed Product Status
+-------------------------------------------+
Product Name: Red Hat Enterprise Linux for x86_64
Product ID: 479
Version: 8.4
Arch: x86_64
Status: Subscribed
Status Details:
Starts: 06/13/2022
Ends: 06/13/2023
[root@undercloud ~]# subscription-manager identity
system identity: f870ae18-6664-4206-9a89-21f24f312866
name: university_test
org name: 4110881
org ID: 4110881
```
Add an entitlement to the license system.
```sh
# Check the entitlement/purchased-products portal
# you will find the SKU under a contract - this will help to identify the openstack entitlement if you have multiple
# find a suitable entitlement pool ID for Red Hat OpenStack Director Deployment Tools
subscription-manager list --available --all
subscription-manager list --available --all --matches="*OpenStack*"
Subscription Name: Red Hat OpenStack Platform, Standard Support (4 Sockets, NFR, Partner Only)
SKU: SER0505
Contract: 13256907
Pool ID: 8a82c68d812ba3c301815c6f842f5ecf
# attach to the entitlement pool ID
subscription-manager attach --pool=8a82c68d812ba3c301815c6f842f5ecf
Successfully attached a subscription for: Red Hat OpenStack Platform, Standard Support (4 Sockets, NFR, Partner Only)
1 local certificate has been deleted.
# set release version statically
subscription-manager release --set=8.4
```
Enable repositories, set version of container-tools, update packages.
```sh
subscription-manager repos --disable=* ;\
subscription-manager repos \
--enable=rhel-8-for-x86_64-baseos-eus-rpms \
--enable=rhel-8-for-x86_64-appstream-eus-rpms \
--enable=rhel-8-for-x86_64-highavailability-eus-rpms \
--enable=ansible-2.9-for-rhel-8-x86_64-rpms \
--enable=openstack-16.2-for-rhel-8-x86_64-rpms \
--enable=fast-datapath-for-rhel-8-x86_64-rpms ;\
dnf module disable -y container-tools:rhel8 ;\
dnf module enable -y container-tools:3.0 ;\
dnf update -y
reboot
```
## Install Tripleo client
```sh
# install tripleoclient for install of the undercloud
dnf install -y python3-tripleoclient
# these packages are advised for the TLS everywhere functionality, probably not required for external TLS endpoint but wont hurt
dnf install -y python3-ipalib python3-ipaclient krb5-devel python3-novajoin
```
Install Ceph-Ansible packages, even if you are not initially using Ceph it cannot hurt to have an undercloud capable of deploying Ceph, to use external Ceph (as in not deployed by tripleo) you will need the following package.
There are different packages for different versions of Ceph, this is especially relevant when using external Ceph.
> https://access.redhat.com/solutions/2045583
- Redhat Ceph 4.1 = Nautilus release
- Redhat Ceph 5.1 = Pacific release
```sh
subscription-manager repos | grep -i ceph
# Nautilus (default version in use with Tripleo deployed Ceph)
#subscription-manager repos --enable=rhceph-4-tools-for-rhel-8-x86_64-rpms
# Pacific (if you are using external Ceph from the opensource repos you will likely be using this version)
#dnf remove -y ceph-ansible
#subscription-manager repos --disable=rhceph-4-tools-for-rhel-8-x86_64-rpms
subscription-manager repos --enable=rhceph-5-tools-for-rhel-8-x86_64-rpms
# install
dnf info ceph-ansible
dnf install -y ceph-ansible
```
# Configure and deploy the Tripleo undercloud
## Prepare host
Disable firewalld.
```sh
systemctl disable firewalld
systemctl stop firewalld
```
Create user/sudoers, push ssh key. Sudoers required for the tripleo installer.
```sh
groupadd -r -g 1001 stack && useradd -r -u 1001 -g 1001 -m -s /bin/bash stack
echo "%stack ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/stack
chmod 0440 /etc/sudoers.d/stack
passwd stack # password is Password0
exit
ssh-copy-id -i ~/.ssh/id_rsa.pub stack@university-new-undercloud
```
Local ssh config setup.
```sh
nano -cw ~/.ssh/config
Host undercloud
Hostname 10.121.4.25
User stack
IdentityFile ~/.ssh/id_rsa
```
Set hostname, disable firewall (leave SElinux enabled, RHOSP tripleo requires it), install packages.
```sh
ssh undercloud
sudo su -
timedatectl set-timezone Europe/London
dnf install chrony nano -y
# replace server/pool entries with PHC (high precision clock device) entry to use the hypervisors hardware clock (which in turn is sync'd from online ntp pool), this should be the most acurate time for a VM
# the LXC container running ntp (192.168.101.43) does actually use the hypervisor hardware clock, the LXC container and VM should be on the same hypervisor if this is used
nano -cw /etc/chrony.conf
#server 192.168.101.43 iburst
#pool 2.centos.pool.ntp.org iburst
refclock PHC /dev/ptp0 poll 2
systemctl enable chronyd
echo ptp_kvm > /etc/modules-load.d/ptp_kvm.conf
# the undercloud installer should set the hostname based on the 'undercloud_hostname' entry in the undercloud.conf config file
# you can set it before deployment with the following, the Opensource tripleo documentation advises to allow the undercloud installer to set it
hostnamectl set-hostname undercloud.local
hostnamectl set-hostname --transient undercloud.local
# RHOSP hosts entry
nano -cw /etc/hosts
10.121.4.25 undercloud.local undercloud
reboot
hostname -A
hostname -s
# install some useful tools
sudo su -
dnf update -y
dnf install qemu-guest-agent nano tree lvm2 chrony telnet traceroute net-tools bind-utils python3 yum-utils mlocate ipmitool tmux wget -y
# need to shutdown for qemu-guest tools to function, ensure the VM profile on the hypervisor has guest agents enabled
shutdown -h now
```
## Build the undercloud config file
The first interface (enp6s18 on the proxmox VM instance) will be on the ControlPlane range.
- Controller nodes are in all networks but cannot install nmap, can find hosts in ranges with `for ip in 10.122.6.{1..254}; do ping -c 1 -t 1 $ip > /dev/null && echo "${ip} is up"; done`.
- Proxmox has interfaces in every network and nmap installed `nmap -sn 10.122.6.0/24` to assist with debug.
| Node | IPMI VLAN2 | Ctrl_plane VLAN1 | External VLAN1214 | Internal_api VLAN12 | Storage VLAN13 | Tenant VLAN11 |
| --- | --- | --- | --- | --- | --- | --- |
| Proxmox | 10.122.1.54 (IPMI) (Proxmox interface 10.122.1.5) | 10.122.0.5 | 10.121.4.5 | 10.122.6.5 | 10.122.10.5 | 10.122.8.5 |
| Undercloud | 10.122.1.25 | 10.121.0.25-27 (br-ctlplane) | 10.121.4.25 (Undercloud VM) | NA | NA | NA |
| Temporary Storage Nodes | 10.122.1.55-57 | NA | 10.121.4.7-9 | NA | 10.122.10.7-9 | NA |
| Overcloud Controllers | 10.122.1.10-12 (Instance-HA 10.122.1.80-82 | 10.122.0.30-32 | 10.121.4.30-32 | 10.122.6.30-32 | 10.122.10.30-32 | 10.122.8.30-32 |
| Overcloud Networkers | 10.122.1.20-21 | 10.122.0.40-41 | NA (reserved 10.121.4.23-24) | 10.122.6.40-41 | NA | 10.122.8.40-41 |
| Overcloud Compute | 10.122.1.30-53/54,58-77 | 10.122.0.50-103 | NA | 10.122.6.50-103 | 10.122.10.50-103 | 10.122.8.50-103 |
```sh
sudo su - stack
nano -cw /home/stack/undercloud.conf
[DEFAULT]
certificate_generation_ca = local
clean_nodes = true
cleanup = true
container_cli = podman
container_images_file = containers-prepare-parameter.yaml
discovery_default_driver = ipmi
enable_ironic = true
enable_ironic_inspector = true
enable_nova = true
enabled_hardware_types = ipmi
generate_service_certificate = true
inspection_extras = true
inspection_interface = br-ctlplane
ipxe_enabled = true
ironic_default_network_interface = flat
ironic_enabled_network_interfaces = flat
local_interface = enp6s18
local_ip = 10.122.0.25/24
local_mtu = 1500
local_subnet = ctlplane-subnet
overcloud_domain_name = university.ac.uk
subnets = ctlplane-subnet
undercloud_admin_host = 10.122.0.27
undercloud_debug = true
undercloud_hostname = undercloud.local
undercloud_nameservers = 144.173.6.71,1.1.1.1
undercloud_ntp_servers = ntp.university.ac.uk,0.pool.ntp.org
undercloud_public_host = 10.122.0.26
[ctlplane-subnet]
cidr = 10.122.0.0/24
#dhcp_end = 10.122.0.140
#dhcp_start = 10.122.0.80
dhcp_end = 10.122.0.194
dhcp_start = 10.122.0.140
#dns_nameservers =
gateway = 10.122.0.25
#inspection_iprange = 10.122.0.141,10.122.0.201
inspection_iprange = 10.122.0.195,10.122.0.249
masquerade = true
```
## RHEL Tripleo container preparation
Generate the `/home/stack/containers-prepare-parameter.yaml` config file using the default method for a local registry on the undercloud.
```sh
sudo su - stack
openstack tripleo container image prepare default \
--local-push-destination \
--output-env-file containers-prepare-parameter.yaml
```
Add the API key to download containers from RHEL Quay public registry.
RHEL requires containers to be pulled from Quay.io using a valid API token (unique to your RHEL account), containers-prepare-parameters.yaml must be modified to include the API key.
The following opensource tripleo sections explain the containers-prepare-parameters.yaml in more detail, for a quick deployment use the following instructions.
> https://access.redhat.com/RegistryAuthentication
Edit `containers-prepare-parameter.yaml` to include the Redhat Quay bearer token.
```sh
nano -cw /home/stack/containers-prepare-parameter.yaml
parameter_defaults:
ContainerImagePrepare:
- push_destination: true
set:
<....settings....>
tag_from_label: '{version}-{release}'
ContainerImageRegistryLogin: true
ContainerImageRegistryCredentials:
registry.redhat.io:
4110881|osp16-undercloud: long-bearer-token-here
```
## Deploy the undercloud
Shutdown the Undercloud VM instance and take a snapshot in Proxmox, call it 'pre\_undercloud\_deploy'.
```sh
openstack undercloud install --dry-run
time openstack undercloud install
#time openstack undercloud install --verbose # if there are failing tasks
##########################################################
The Undercloud has been successfully installed.
Useful files:
Password file is at /home/stack/undercloud-passwords.conf
The stackrc file is at ~/stackrc
Use these files to interact with OpenStack services, and
ensure they are secured.
##########################################################
real 31m11.191s
user 13m28.211s
sys 3m15.817s
```
> If you need to change any configuration in the undercloud.conf you can rerun the install over the top and the node **should** reconfigure itself (network changes likely necessitate redeployment, changinf ipxe/inspection ranges seems to require redeployment of VM).
```sh
# update undercloud configuration, forcing regeneration of passwords 'undercloud-passwords.conf'
openstack undercloud install --force-stack-update
```
## Output
- undercloud-passwords.conf - A list of all passwords for the director services.
- stackrc - A set of initialisation variables to help you access the director command line tools.
Load env vars specific to the undercloud for the openstack cli tool.
```sh
source ~/stackrc
```
Check openstack undercloud endpoints, after a reboot always check the endpoints are up before performing actions.
```sh
openstack endpoint list
```

512
3) Overcloud Node Import.md Executable file
View File

@ -0,0 +1,512 @@
## Obtain images for overcloud nodes RHEL/RHOSP Tripleo
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_installing-director-on-the-undercloud#proc_single-cpu-architecture-overcloud-images_overcloud-images
Download images direct from Redhat and upload to undercloud swift API.
```sh
sudo su - stack
source ~/stackrc
sudo dnf install -y rhosp-director-images-ipa-x86_64 rhosp-director-images-x86_64
mkdir ~/images
cd ~/images
for i in /usr/share/rhosp-director-images/overcloud-full-latest-16.2.tar /usr/share/rhosp-director-images/ironic-python-agent-latest-16.2.tar; do tar -xvf $i; done
openstack overcloud image upload --image-path /home/stack/images/
openstack image list
ll /var/lib/ironic/httpboot # look for inspector ipxe config and the kernel and initramfs files
```
## Import bare metal nodes
### Build node definition list
This is commonly refered to as the `instackenv.json` file, Redhat references this as the node definition template nodes.json.
> the schema reference for this file:
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/environments/baremetal.html#instackenv
Gather all IP addresses for the IPMI interfaces.
- `.[].ports.address` is the MAC address for iPXE boot, typically eth0.
- `.[].pm_addr` is the IP address of the IPMI adapter.
- If the IPMI interface is shared with the eth0 control plane interface the MAC address will be used for iPXE boot.
- If the IPMI interface and eth0 interface are not shared (have different MAC address) you may have a tedious task ahead of you searching through the XClarity out of band adapters or looking through the switch MAC table and then correlating the switch port to the node to enumerate the MAC address.
- University nodes do share a single interface for IPMI and iPXE but the MAC addresses are different.
```sh
# METHOD 1 - will not work for University SR630 servers
# where the IPMI and PXE interfaces share the same MAC address ( NOTE this is not the case for the Lenovo SR630 with OCP network adapter working to bridge the XClarity/IPMI)
# Scan the IPMI port of all hosts.
sudo dnf install nmap -y
nmap -p 623 10.122.1.0/24
# Query the arp table to return the MAC addresses of the IPMI(thus PXE) interfaces.
ip neigh show dev enp6s19
# controller 10-12, 20-21 networker, 30-77 compute, (54 temporary proxmox, 55-57 temporary storage nodes - remove from compute range)
#ipmitool -N 1 -R 0 -I lanplus -H 10.122.1.10 -U USERID -P Password0 lan print
for i in {10..80}; do j=10.122.1.$i ; ip --json neigh show dev enp6s19 | jq -r " .[] | select(.dst==\"$j\") | \"\(.dst) \(.lladdr)\""; done | grep -v null
10.122.1.10 38:68:dd:4a:56:3c
10.122.1.11 38:68:dd:4a:55:94
10.122.1.12 38:68:dd:4a:42:4c
10.122.1.20 38:68:dd:4a:4a:34
10.122.1.21 38:68:dd:4a:52:1c
10.122.1.30 38:68:dd:4c:17:ec
10.122.1.31 38:68:dd:4c:17:b4
10.122.1.32 38:68:dd:4d:1e:84
10.122.1.33 38:68:dd:4d:0f:f4
10.122.1.34 38:68:dd:4d:26:ac
10.122.1.35 38:68:dd:4d:1b:f4
10.122.1.36 38:68:dd:4a:46:4c
10.122.1.37 38:68:dd:4d:16:7c
10.122.1.38 38:68:dd:4d:15:8c
10.122.1.39 38:68:dd:4d:1a:4c
10.122.1.40 38:68:dd:4a:75:94
10.122.1.41 38:68:dd:4d:1c:fc
10.122.1.42 38:68:dd:4d:19:0c
10.122.1.43 38:68:dd:4a:43:ec
10.122.1.44 38:68:dd:4a:41:4c
10.122.1.45 38:68:dd:4d:14:24
10.122.1.46 38:68:dd:4d:18:c4
10.122.1.47 38:68:dd:4d:18:cc
10.122.1.48 38:68:dd:4a:41:8c
10.122.1.49 38:68:dd:4c:17:8c
10.122.1.50 38:68:dd:4c:17:2c
10.122.1.51 38:68:dd:4d:1d:cc
10.122.1.52 38:68:dd:4c:17:e4
10.122.1.53 38:68:dd:4c:17:5c
10.122.1.54 38:68:dd:70:a8:e8
10.122.1.55 38:68:dd:70:a0:84
10.122.1.56 38:68:dd:70:a4:cc
10.122.1.57 38:68:dd:70:aa:cc
10.122.1.58 38:68:dd:70:a8:88
10.122.1.59 38:68:dd:70:a5:bc
10.122.1.60 38:68:dd:70:a5:54
10.122.1.61 38:68:dd:70:a2:e0
10.122.1.62 38:68:dd:70:a2:b8
10.122.1.63 38:68:dd:70:a7:10
10.122.1.64 38:68:dd:70:a2:0c
10.122.1.65 38:68:dd:70:9f:38
10.122.1.66 38:68:dd:70:a8:74
10.122.1.67 38:68:dd:70:a2:ac
10.122.1.68 38:68:dd:70:a5:18
10.122.1.69 38:68:dd:70:a7:88
10.122.1.70 38:68:dd:70:a4:d8
10.122.1.71 38:68:dd:70:a6:b0
10.122.1.72 38:68:dd:70:aa:c4
10.122.1.73 38:68:dd:70:9e:e0
10.122.1.74 38:68:dd:70:a3:40
10.122.1.75 38:68:dd:70:a2:08
10.122.1.76 38:68:dd:70:a4:a0
10.122.1.77 38:68:dd:70:a1:6c
# METHOD 2 - used for University SR630 servers
# where the IPMI interface and eth0 interface are not shared (or have different MAC addresses)
## install XClarity CLI
mkdir onecli
cd onecli
curl -o lnvgy_utl_lxce_onecli02a-3.5.0_rhel_x86-64.tgz https://download.lenovo.com/servers/mig/2022/06/01/55726/lnvgy_utl_lxce_onecli02a-3.5.0_rhel_x86-64.tgz
tar -xvzf lnvgy_utl_lxce_onecli02a-3.5.0_rhel_x86-64.tgz
## XClarity CLI - find the MAC of the eth0 device
### find all config items
./onecli config show all --bmc USERID:Password0@10.122.1.10 --never-check-trust --nolog
### find specific item
./onecli config show IMM.HostIPAddress1 --bmc USERID:Password0@10.122.1.10 --never-check-trust --nolog --quiet
./onecli config show IntelREthernetConnectionX722for1GbE--OnboardLAN1PhysicalPort1LogicalPort1.MACAddress --never-check-trust --nolog --quiet
### find MAC address for eth0 (assuming eth0 is connected)
#### for the origional SR630 University nodes
for i in {10..53}; do j=10.122.1.$i ; echo $j $(sudo ./onecli config show IntelREthernetConnectionX722for1GbE--OnboardLAN1PhysicalPort1LogicalPort1.MACAddress --bmc USERID:Password0@$j --never-check-trust --nolog --quiet | grep IntelREthernetConnectionX722for1GbE--OnboardLAN1PhysicalPort1LogicalPort1.MACAddress | awk -F '=' '{print $2}' | tr '[:upper:]' '[:lower:]'); done
## SR630
# controllers
10.122.1.10 38:68:dd:4a:56:38
10.122.1.11 38:68:dd:4a:55:90
10.122.1.12 38:68:dd:4a:42:48
# networkers
10.122.1.20 38:68:dd:4a:4a:30
10.122.1.21 38:68:dd:4a:52:18
# compute
10.122.1.30 38:68:dd:4c:17:e8
10.122.1.31 38:68:dd:4c:17:b0
10.122.1.32 38:68:dd:4d:1e:80
10.122.1.33 38:68:dd:4d:0f:f0
10.122.1.34 38:68:dd:4d:26:a8
10.122.1.35 38:68:dd:4d:1b:f0
10.122.1.36 38:68:dd:4a:46:48
10.122.1.37 38:68:dd:4d:16:78
10.122.1.38 38:68:dd:4d:15:88
10.122.1.39 38:68:dd:4d:1a:48
10.122.1.40 38:68:dd:4a:75:90
10.122.1.41 38:68:dd:4d:1c:f8
10.122.1.42 38:68:dd:4d:19:08
10.122.1.43 38:68:dd:4a:43:e8
10.122.1.44 38:68:dd:4a:41:48
10.122.1.45 38:68:dd:4d:14:20
10.122.1.46 38:68:dd:4d:18:c0
10.122.1.47 38:68:dd:4d:18:c8
10.122.1.48 38:68:dd:4a:41:88
10.122.1.49 38:68:dd:4c:17:88
10.122.1.50 38:68:dd:4c:17:28
10.122.1.51 38:68:dd:4d:1d:c8
10.122.1.52 38:68:dd:4c:17:e0
10.122.1.53 38:68:dd:4c:17:58
## SR630v2 node have a different OCP network adapter
for i in {54..77}; do j=10.122.1.$i ; echo $j $(sudo ./onecli config show IntelREthernetNetworkAdapterI350-T4forOCPNIC30--Slot4PhysicalPort1LogicalPort1.MACAddress --bmc USERID:Password0@$j --never-check-trust --nolog --quiet | grep IntelREthernetNetworkAdapterI350-T4forOCPNIC30--Slot4PhysicalPort1LogicalPort1.MACAddress | awk -F '=' '{print $2}' | tr '[:upper:]' '[:lower:]'); done
10.122.1.54 6c:fe:54:32:b8:60
10.122.1.55 6c:fe:54:33:4f:3c
10.122.1.56 6c:fe:54:33:55:74
10.122.1.57 6c:fe:54:33:4b:5c
10.122.1.58 6c:fe:54:33:4f:d2
10.122.1.59 6c:fe:54:33:53:ae
10.122.1.60 6c:fe:54:33:4f:7e
10.122.1.61 6c:fe:54:33:97:46
10.122.1.62 6c:fe:54:33:57:18
10.122.1.63 6c:fe:54:33:4e:fa
10.122.1.64 6c:fe:54:33:53:ea
10.122.1.65 6c:fe:54:33:4d:f8
10.122.1.66 6c:fe:54:33:4d:2c
10.122.1.67 6c:fe:54:32:e8:4e
10.122.1.68 6c:fe:54:33:55:fe
10.122.1.69 6c:fe:54:33:4b:86
10.122.1.70 6c:fe:54:33:55:56
10.122.1.71 6c:fe:54:33:4e:b2
10.122.1.72 6c:fe:54:33:57:12
10.122.1.73 6c:fe:54:33:4e:d6
10.122.1.74 6c:fe:54:33:51:98
10.122.1.75 6c:fe:54:33:4d:62
10.122.1.76 6c:fe:54:33:55:50
10.122.1.77 6c:fe:54:32:f0:2a
```
Create each node configuration in the "nodes" list `/home/stack/instackenv.json`.
```json
{
"nodes": [
{
"ports": [
{
"address": "38:68:dd:4a:42:4c",
"physical_network": "ctlplane"
}
],
"name": "osctl0",
"cpu": "4",
"memory": "6144",
"disk": "120",
"arch": "x86_64",
"pm_type": "ipmi",
"pm_user": "USERID",
"pm_password": "Password0",
"pm_addr": "10.122.1.10",
"capabilities": "profile:baremetal,boot_option:local",
"_comment": "rack - openstack - location - u5"
},
{
"ports": [
{
"address": "38:68:dd:4a:4a:34",
"physical_network": "ctlplane"
}
],
"name": "osnet1",
"cpu": "4",
"memory": "6144",
"disk": "120",
"arch": "x86_64",
"pm_type": "ipmi",
"pm_user": "USERID",
"pm_password": "Password0",
"pm_addr": "10.122.1.21",
"capabilities": "profile:baremetal,boot_option:local",
"_comment": "rack - openstack - location - u9"
},
{
"ports": [
{
"address": "38:68:dd:4c:17:e4",
"physical_network": "ctlplane"
}
],
"name": "oscomp1",
"cpu": "4",
"memory": "6144",
"disk": "120",
"arch": "x86_64",
"pm_type": "ipmi",
"pm_user": "USERID",
"pm_password": "Password0",
"pm_addr": "10.122.1.31",
"capabilities": "profile:baremetal,boot_option:local",
"_comment": "rack - openstack - location - u11"
}
]
}
```
- Do not have to include capabilities, we later add these for the overcloud deployment.
- The capabilities 'profile:flavour' and 'boot_option:local' are good defaults, more capabilities will be automatically added during introspection and manually added when binding a node to a role.
## Setup RAID + Legacy BIOS boot mode
> IMPORTANT: UEFI boot does work on the SR650 as expected, however it can take a very long time to cycle through the interfaces to the PXE boot interface.
> On large deployments you may reach the timeout on the DHCP server entry, BIOS mode is quicker to get to the PXE rom.
Use `/home/stack/instackenv.json` to start each node, login to each nodes XClarity web interface and setup a RAID1 array of the boot disks.
```sh
# check nodes power state
for i in `jq -r .nodes[].pm_addr instackenv.json`; do ipmitool -N 1 -R 0 -I lanplus -H $i -U USERID -P Password0 chassis status | grep ^System;done
# start all nodes
for i in `jq -r .nodes[].pm_addr instackenv.json`; do ipmitool -N 1 -R 0 -I lanplus -H $i -U USERID -P Password0 chassis power on ;done
for i in `jq -r .nodes[].pm_addr instackenv.json`; do ipmitool -N 1 -R 0 -I lanplus -H $i -U USERID -P Password0 chassis status | grep ^System;done
# get IP login to XClarity web console
# configure RAID1 array on each node
# set boot option from UEFI to LEGACY/BIOS boot mode
for i in `jq -r .nodes[].pm_addr instackenv.json`; do echo $i ;done
# stop all nodes
for i in `jq -r .nodes[].pm_addr instackenv.json`; do ipmitool -N 1 -R 0 -I lanplus -H $i -U USERID -P Password0 chassis power off ;done
```
## Import nodes into the undercloud
> WARNING: the capabilities field keypair value 'node:compute-0, node:compute-1, node:compute-N' value must be contiguous, the University has a node with broken hardware 'oscomp9' that is not in the `instackenv.json` file.
> WARNING: Each capability keypair 'node:\<type\>-#' must be in sequence, with oscomp9 removed from the `instackenv.json` we add the keypairs as so: `oscomp8 = computeA-8 AND oscomp10 = computeA-9`.
**Notice the Univerity cluster has 2 different server hardware types, with different network interface mappings, the node capabilities (computeA-0 VS node:computeB-0) will be used in the `scheduler_hints.yaml` to bind nodes to roles, there need to be 2 roles for the compute nodes to allow each server type to have a different 'associated' network interface mapping schemes.**
```sh
# load credentials
source ~/stackrc
# remove nodes if not first run
#for i in `openstack baremetal node list -f json | jq -r .[].Name`; do openstack baremetal node manage $i;done
#for i in `openstack baremetal node list -f json | jq -r .[].Name`; do openstack baremetal node delete $i;done
# ping all nodes to update the arp cache
#for i in `jq -r .nodes[].pm_addr instackenv.json`; do sudo ping -c 3 -W 5 $i ;done
nmap -p 623 10.122.1.0/24
# import nodes
openstack overcloud node import instackenv.json
# set nodes to use BIOS boot mode for overcloud installation
for i in `openstack baremetal node list -f json | jq -r .[].Name` ; do openstack baremetal node set --property capabilities="boot_mode:bios,$(openstack baremetal node show $i -f json -c properties | jq -r .properties.capabilities | sed "s/boot_mode:[^,]*,//g")" $i; done
# set nodes for baremetal profile for the schedule_hints.yaml to select the nodes as candidates
for i in `openstack baremetal node list -f json | jq -r .[].Name` ; do openstack baremetal node set --property capabilities="profile:baremetal,$(openstack baremetal node show $i -f json -c properties | jq -r .properties.capabilities | sed "s/profile:baremetal[^,]*,//g")" $i; done
## where some nodes cannot deploy
# oscomp4, oscomp7 have been removed from the instackenv.json owing to network card issues
# owing to the way we are setting the node capability using a loop index we will see that the oscomp8 will be named in openstack as computeA-6
#
# openstack baremetal node show oscomp8 -f json -c properties | jq .properties.capabilities
# "node:computeA-6,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal"
#
# if you do not have a full compliment of nodes ensure templates/scheduler_hints_env.yaml has the correct amount of nodes, in this case 22 computeA nodes
# ControllerCount: 3
# NetworkerCount: 2
# #2 nodes removed owing to network card issues
# #ComputeACount: 24
# ComputeACount: 22
# ComputeBCount: 24
# set 'node:name' capability to allow scheduler_hints.yaml to match roles to nodes
## set capability for controller and networker nodes
openstack baremetal node set --property capabilities="node:controller-0,$(openstack baremetal node show osctl0 -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" osctl0 ;\
openstack baremetal node set --property capabilities="node:controller-1,$(openstack baremetal node show osctl1 -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" osctl1 ;\
openstack baremetal node set --property capabilities="node:controller-2,$(openstack baremetal node show osctl2 -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" osctl2 ;\
openstack baremetal node set --property capabilities="node:networker-0,$(openstack baremetal node show osnet0 -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" osnet0 ;\
openstack baremetal node set --property capabilities="node:networker-1,$(openstack baremetal node show osnet1 -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" osnet1
## capability for compute nodes
index=0 ; for i in {0..23}; do openstack baremetal node set --property capabilities="node:computeA-$index,$(openstack baremetal node show oscomp$i -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" oscomp$i && index=$((index + 1)) ;done
## capability for *NEW* compute nodes (oscomp-24..27 are being used for temporary proxmox and ceph thus removed from the instackenv.json) - CHECK
index=0 ; for i in {24..47}; do openstack baremetal node set --property capabilities="node:computeB-$index,$(openstack baremetal node show oscomp$i -f json -c properties | jq -r .properties.capabilities | sed "s/node:[^,]*,//g")" oscomp$i && index=$((index + 1)) ;done
# check capabilities are set for all nodes
#for i in `openstack baremetal node list -f json | jq -r .[].Name` ; do echo $i && openstack baremetal node show $i -f json -c properties | jq -r .properties.capabilities; done
for i in `openstack baremetal node list -f json | jq -r .[].Name` ; do openstack baremetal node show $i -f json -c properties | jq -r .properties.capabilities; done
# output, notice the order of the nodes
#node:controller-0,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:controller-1,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:controller-2,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:networker-0,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:networker-1,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeA-0,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeA-1,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeA-2,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeA-3,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#...
#node:computeB-0,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeB-1,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeB-2,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#node:computeB-3,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
#...
#node:computeB-23,profile:baremetal,boot_mode:bios,boot_option:local,profile:baremetal
# all in one command for inspection and provisioning
#openstack overcloud node introspect --all-manageable --provide
# inspect all nodes hardware
for i in `openstack baremetal node list -f json | jq -r .[].Name`; do openstack baremetal node inspect $i;done
# if a node fails inspection
openstack baremetal node maintenance unset oscomp9
openstack baremetal node manage oscomp9
openstack baremetal node power off oscomp9 # wait for node to power off
openstack baremetal node inspect oscomp9
# wait until all nodes are in a 'managable' state to continue, this may take around 15 minutes
openstack baremetal node list
# set nodes to provide state and invokes node cleaning (uses the overcloud image)
for i in `openstack baremetal node list -f json | jq -r ' .[] | select(."Provisioning State" == "manageable") | .Name'`; do openstack baremetal node provide $i;done
# if a node fails provision
openstack baremetal node maintenance unset osnet1
openstack baremetal node manage osnet1
openstack baremetal node provide osnet1
# wait until all nodes are in an 'available' state to deploy the overcloud
baremetal node list
# set all nodes back to 'manage' state to rerun introspection/provide
# for i in `openstack baremetal node list -f json | jq -r .[].Name`; do openstack baremetal node manage $i;done
```
## Checking networking via inspection data
Once the node inspections complete, we can check the list of network adapters in a chassis to assist with the network configuration in the deployment configuration files.
```sh
# load credentials
source ~/stackrc
# find the UUID of a sample node
openstack baremetal node list -f json | jq .
# check collected metadata, commands will show all interfaces and if they have carrier signal
#openstack baremetal node show f409dad9-1c1e-4ca0-b8af-7eab1b7f878d -f json | jq -r .
#openstack baremetal introspection data save f409dad9-1c1e-4ca0-b8af-7eab1b7f878d | jq .inventory.interfaces
#openstack baremetal introspection data save f409dad9-1c1e-4ca0-b8af-7eab1b7f878d | jq .all_interfaces
#openstack baremetal introspection data save f409dad9-1c1e-4ca0-b8af-7eab1b7f878d | jq '.all_interfaces | keys[]'
# origional server hardware SR630 (faedafa5-5fa4-432e-b3aa-85f7f30f10fb | oscomp23)
(undercloud) [stack@undercloud ~]$ openstack baremetal introspection data save faedafa5-5fa4-432e-b3aa-85f7f30f10fb | jq '.all_interfaces | keys[]'
"eno1"
"eno2"
"eno3"
"eno4"
"enp0s20f0u1u6"
"ens2f0"
"ens2f1"
# new server hardware SR630v2 (b239f8b7-3b97-47f8-a057-4542ca6c7ab7 | oscomp28)
(undercloud) [stack@undercloud ~]$ openstack baremetal introspection data save b239f8b7-3b97-47f8-a057-4542ca6c7ab7 | jq '.all_interfaces | keys[]'
"enp0s20f0u1u6"
"ens2f0"
"ens2f1"
"ens4f0"
"ens4f1"
"ens4f2"
"ens4f3"
```
Interfaces are shown in the order that they are seen on the PCI bus, modern linux OS' have an interafce naming scheme triggered by udev.
This naming scheme is often described as:
- Predictable Network Interface Names
- Consistent Network Device Naming
- Persistent names (https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/)
```sh
# example interface naming scheme
enp0s10:
| | |
v | | --> virtual (qemu)
en| | --> ethernet
v |
p0| --> bus number (0)
v
s10 --> slot number (10)
f0 --> function (multiport card)
```
Openstack adopts a interface mapping scheme to help identify the network interfaces by the notation by notation 'nic1, nic2, nicN'.
Only interfaces with a carrier signal (connected to switch) will be participate in the interface mapping scheme.
For the University nodes we the following Openstack mapping scheme is created.
Server classA:
| mapping | interface | purpose |
| --- | --- | --- |
| nic1 | eno1 | Control Plane |
| nic2 | enp0s20f0u1u6 | USB ethernet, likely from the XClarity controller |
| nic3 | ens2f0 | LACP bond, guest/storage |
| nic4 | ens2f1 | LACP bond, guest/storage |
Server classB:
| mapping | interface | purpose |
| --- | --- | --- |
| nic1 | enp0s20f0u1u6 | USB ethernet, likely from the XClarity controller |
| nic2 | ens2f0 | Control Plane |
| nic3 | ens2f1 | LACP bond, guest/storage |
| nic4 | ens4f0 | LACP bond, guest/storage |
The 'Server classA' nodes will be used for roles 'controller', 'networker' and 'compute'. the Server classB' hardware will be used for roles 'compute'.
The mapping 'nic1' is not consistent for 'Control Plane' network across both classes of server hardware, necessitating multiple roles (thus multiple network interface templates) for the compute nodes.
You may notice some LLDP information (Cumulus switch must be running the LLDP service), this is very helpful to determine the switch port that the network interface is connected to and verify your point-to-point list.
Owing to the name of the switch we can quickly see this is the 100G cumulus switch.
```
"ens2f0": {
"ip": "fe80::d57c:2432:d78d:e15d",
"mac": "10:70:fd:24:62:e0",
"client_id": null,
"pxe": false,
"lldp_processed": {
"switch_chassis_id": "b8:ce:f6:18:c3:4a",
"switch_port_id": "swp9s0",
"switch_system_name": "sw100g0",
"switch_system_description": "Cumulus Linux version 4.2.0 running on Mellanox Technologies Ltd. MSN3700C",
"switch_capabilities_support": [
"Bridge",
"Router"
],
"switch_capabilities_enabled": [
"Bridge",
"Router"
],
"switch_mgmt_addresses": [
"172.31.31.11",
"fe80::bace:f6ff:fe18:c34a"
],
"switch_port_description": "swp9s0",
"switch_port_link_aggregation_enabled": false,
"switch_port_link_aggregation_support": true,
"switch_port_link_aggregation_id": 0,
"switch_port_autonegotiation_enabled": true,
"switch_port_autonegotiation_support": true,
"switch_port_physical_capabilities": [
"1000BASE-T fdx",
"PAUSE fdx"
],
"switch_port_mau_type": "Unknown"
}
},
```

1079
4) Ceph Cluster Setup.md Executable file

File diff suppressed because it is too large Load Diff

1851
5) Overcloud Deployment.md Executable file

File diff suppressed because it is too large Load Diff

1207
6) Multi-tenancy.md Executable file

File diff suppressed because it is too large Load Diff

360
7) Example Project.md Executable file
View File

@ -0,0 +1,360 @@
# Example of a new project
The following example exclusively uses the CLI administration, this helps clarity the componets in play and their interdependencies. All steps can be be performed in the web console.
## Load environment variables to use the Overcloud CLI
```sh
[stack@undercloud ~]$ source ~/stackrc
(undercloud) [stack@undercloud ~]$ source ~/overcloudrc
(overcloud) [stack@undercloud ~]$
```
## Create project
```sh
# create project
openstack project create --domain 'ldap' --description "Bioinformatics Project" bioinformatics
```
## Create an internal Openstack network/subnet for the project
```sh
openstack network create bioinformatics-network --internal --no-share --project bioinformatics
openstack subnet create bioinformatics-subnet --project bioinformatics --network bioinformatics-network --gateway 172.16.1.1 --subnet-range 172.16.1.0/16 --dhcp
```
## Create a router for the project
```sh
openstack router create bioinformatics-router --project bioinformatics
openstack router set bioinformatics-router --external-gateway provider
```
## Add an interface to the provider network to the project network
```sh
openstack router add subnet bioinformatics-router bioinformatics-subnet
```
## Create a security group named 'linux-default' to allow inbound ssh for VM instances
- a new security group injects rules on creation to allow outbound traffic by default, where multiple security groups are attached these default rules may be removed
```sh
openstack security group create --project bioinformatics linux-default
openstack security group rule create \
--ingress \
--protocol tcp \
--ethertype IPv4 \
--remote-ip '0.0.0.0/0' \
--dst-port 22 \
$(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(.Name == "linux-default").ID')
# list security group rules
openstack security group rule list $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(."Name" == "default") | .ID')
openstack security group rule list $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(."Name" == "linux-default") | .ID') --long
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+
| ID | IP Protocol | Ethertype | IP Range | Port Range | Direction | Remote Security Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+
| 99210e25-4b7f-4125-93bb-7abea3eddf07 | None | IPv4 | 0.0.0.0/0 | | egress | None |
| adc21371-52bc-4c63-8e23-8e55a119407c | None | IPv6 | ::/0 | | egress | None |
| d327baac-bdaa-437c-b506-b90659e92833 | tcp | IPv4 | 0.0.0.0/0 | 22:22 | ingress | None |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+
```
## Set quotas for the scope of the entire project
```sh
openstack quota set --instances 50 bioinformatics ;\
openstack quota set --cores 300 bioinformatics ;\
openstack quota set --ram 204800 bioinformatics ;\
openstack quota set --gigabytes 5000 bioinformatics ;\
openstack quota set --volumes 500 bioinformatics ;\
openstack quota set --key-pairs 50 bioinformatics ;\
openstack quota set --floating-ips 50 bioinformatics ;\
openstack quota set --networks 10 bioinformatics ;\
openstack quota set --routers 5 bioinformatics ;\
openstack quota set --subnets 10 bioinformatics ;\
openstack quota set --secgroups 100 bioinformatics ;\
openstack quota set --secgroup-rules 1000 bioinformatics
```
## Create flavours for the project
- flavours are pre-scoped specs of the instances
```sh
openstack flavor create small --ram 2048 --disk 10 --vcpus 2 --private --project bioinformatics ;\
openstack flavor create medium --ram 3072 --disk 10 --vcpus 4 --private --project bioinformatics ;\
openstack flavor create large --ram 8192 --disk 10 --vcpus 8 --private --project bioinformatics ;\
openstack flavor create xlarge --ram 16384 --disk 10 --vcpus 16 --private --project bioinformatics ;\
openstack flavor create xxlarge --ram 65536 --disk 10 --vcpus 48 --private --project bioinformatics
```
## End-user access using Active Directory groups
- In the Univerity Prod environment you would typically create an AD group with nested AD users
- To illustrate the method, chose the pre-existing group 'ISCA-Admins'
```sh
openstack user list --group 'ISCA-Admins' --domain ldap
+------------------------------------------------------------------+--------+
| ID | Name |
+------------------------------------------------------------------+--------+
| c633f80625e587bc3bbe492af57cb99cec59201b16cc06f614e36a6b767d6b29 | mtw212 |
| 0c4e3bdacda6c9b8abcd61de94deb47ff236cec3581fbbacf2d9daa1c584a44d | mmb204 |
| 2d4338bc2ba649ff15111519e535d0fc6c65cbb7e5275772b4e0c675af09002b | rr274 |
| b9461f113d208b54a37862ca363ddf37da68cf00ec06d67ecc62bb1e5caf06d4 | dma204 |
| 0fb8469b2d7e297151102b0119a4b08f6b26113ad8401b6cb79936adf946ba19 | ac278 |
+------------------------------------------------------------------+--------+
# bind member role to users in the access group for the project
openstack role add --group-domain 'ldap' --group 'ISCA-Admins' --project-domain 'ldap' --project bioinformatics member
# bind admin role to a specific user for the project
openstack role add --user-domain 'ldap' --user mtw212 --project-domain 'ldap' --project bioinformatics admin
openstack role assignment list --user $(openstack user show --domain 'ldap' mtw212 -f json | jq -r .id) --names
+-------+-------------+-------+---------------------+--------+--------+-----------+
| Role | User | Group | Project | Domain | System | Inherited |
+-------+-------------+-------+---------------------+--------+--------+-----------+
| admin | mtw212@ldap | | bioinformatics@ldap | | | False |
+-------+-------------+-------+---------------------+--------+--------+-----------+
# bind member role for local user 'tseed' for the project
openstack role add --user-domain 'Default' --user tseed --project-domain 'ldap' --project bioinformatics member
# bind admin role for the (default) local user 'admin' for the project - we want the admin user to have full access to the project
openstack role add --user-domain 'Default' --user admin --project-domain 'ldap' --project bioinformatics admin
```
## Import a disk image to be used specifically for the project
- This can be custom image pre-baked with specific software or any vendor OS install image
- Images should support cloud-init to support initial user login, generic distro images with cloud-init enabled should work
```sh
wget https://repo.almalinux.org/almalinux/8/cloud/x86_64/images/AlmaLinux-8-GenericCloud-8.6-20220513.x86_64.qcow2
openstack image create --disk-format qcow2 --container-format bare --private --project bioinformatics --property os_type=linux --file ./AlmaLinux-8-GenericCloud-8.6-20220513.x86_64.qcow2 alma_8.6
```
## SSH keypairs
Generate an ssh key pair, this will be used for initial login to a VM instance.
- the keypair in this example is owned by the admin user, other users will not see the ssh keypair in the web console and will need a copy of the ssh private key (unless a password is set in cloud-init userdata)
- each user will have their own keypair that will be selected when provisioning a VM instance in the web console
- once instantiated, additional users can import ssh keys to the authorized_keys file as per typical linux host
- when generating ssh public keys Openstack requires a comment at the end of the key, when importing a keypair (even via the web console) the public key needs a comment
Generic distro (cloud-init) images generally have their own default user, typically these image specific such as 'almalinux' or 'ubuntu', this user will login with this user using the ssh private key counterpart to the specified public ssh key with the '--key-name' parameter.
Some cloud-init images use the user in the comment of the ssh key as the default user (or as an additional user).
Convention is that you provision instances with cloud-init userdata with the expectation you will provide your own user + credentials.
```sh
ssh-keygen -t rsa -b 4096 -C "bioinformatics@university.ac.uk" -f ~/bioinformatics_cloud
openstack keypair create --public-key ~/bioinformatics_cloud.pub bioinformatics
```
## Cloud-init userdata
This OPTIONAL step is very useful, typically cloud providers utilise userdata to setup initial login, however userdata is much more powerful and often used to register the instance with a configuration management tool to install a suite of software (chef/puppet/ansible(in pull mode)) or even embed a shell script for direct software provision (pull+start containers), beware userdata is limited to 64KB.
NOTE: OCF have built cloud-init userdata for Linux (and Windows in Azure) to configure SSSD to join cloud instances to Microsoft Active Directory to enable multi-user access, this is highly environment/customer specific.
- Openstack is kind, you dont have to base64 encode the userdata like some public cloud providers, it is automatic
- generally each cloud-init image will have its own default user, typically these image specific such as 'almalinux' or 'ubuntu'
- the following config will replace this default user with your own bioinformatics user, password and ssh key (It also adds the universityops user to ensure an admin can get into the system)
- NOTE the ssh key entry below has had the trailing comment removed
- passwords can be in cleartext but Instance users will be able to see the password in the userdata, create a hash with the command `openssl passwd -6 -salt xyz Password0`
- userdata can be added to the instance when provisioning in the web console @ Customisation Script, it is always a good idea to provide a userdata template to the end user where they self provision
```sh
nano -cw userdata.txt # yaml format
#cloud-config
ssh_pwauth: true
groups:
- admingroup: [root,sys]
- bioinformatics
- universityops
users:
- name: bioinformatics
primary_group: bioinformatics
lock_passwd: false
passwd: $6$xyz$4tTWyuHIT6gXRuzotBZn/9xZBikUp0O2X6rOZ7MDJo26aax.Ok5P4rWYyzdgFkjArIIyB8z8LKVW1wARbcBzn/
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQD4Yh0OuBTvyXObUcJKLDNjIhmSkf/RiSPhPYzNECwC7hlIms/fYcbODPQmboo8pgtnlDK0aElWr16n1z+Yb/3btzmO/G8pZEUR607VmWINuYzSJyAieL6zNPn0XC2eP9mqWJJP44SjroVKLjnhajy761FaGxXJyXr3RXmIb4xc+qW8ETJQh98ucZZZQ3X8MernjIOO+VGVObDDDTZXsaL1wih0+v/R9gMJP8AgSCpi539o0A6RgFzMqFfroUKe6uYa1ohBrjii+teKETEb7isNOZFPx459zhqRPVjFlzVXNpDBPVjz32uuUyBRW4jMlwQ/GIrhT7+fNjpxG0CrVe0c3F+BoBnqfdrsLFCJ3dg+z19lBLnC2ulp511kqEVctjG96l9DeEPtab28p22aV3fuzdnx24y3BJi8Wea79U8+RTy0fYCM0Sm8rwREUHD2bAgjtIUU8gTKnQLyeUAc5+qJCFqa3H9/DJZ44MQzk/rC0shBUU7z+IwWhftU1P9GWURko11Bmg6pq+/fdGVm/eqilDabirbZxjqnxXCBGcOM6QsPoooJ9cgCU34k9KhUxPJ34frYfwHaWkDYxe+7VBrrzPWpOnOGt04eegwdNBDMnl703wfXqobnyy8nMmzH04j2PThJ7ZrRnA6bo/dYtVZXHocfq76yPxSsmYClebJBSQ==
- name: universityops
primary_group: bioinformatics
lock_passwd: false
passwd: $6$xyz$4tTWyuHIT6gXRuzotBZn/9xZBikUp0O2X6rOZ7MDJo26aax.Ok5P4rWYyzdgFkjArIIyB8z8LKVW1wARbcBzn/
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
```
## Create a floating ip
With the network design up to this point you can have a routable IP capable of accepting ingress traffic from the wider University estate by two methods:
1. floating IP, a '1:1 NAT' of a provider network IP mapped to the VM interface IP in the private Openstack 'bioinformatics' network
2. interface IP directly in the provider network
Floating IPs are more versatile as they can be moved between instances for all manner of blue-green scenarios, typically a VM instance does not have to be multihomed between networks either.
Floating IPs in Openstack private networks are possible can be just useful in a multi-tiered application stack - think DR strategy, scripting the Openstack API to move the floating IP between instances.
However end users may want a VM instance with only a provider network IP, this would only be able to communicate with other Openstack VM instances with a provider IP.
```sh
# create a floating IP in the 'provider' network on the 'provider-subnet' subnet range
openstack floating ip create --project bioinformatics --description 'bioinformatics01' --subnet provider-subnet provider
openstack floating ip list --project bioinformatics --long -c 'ID' -c 'Floating IP Address' -c 'Description'
+--------------------------------------+---------------------+------------------+
| ID | Floating IP Address | Description |
+--------------------------------------+---------------------+------------------+
| 0eb3f78d-d59d-4ec6-b725-d2c1f45c9a77 | 10.121.4.246 | bioinformatics01 |
+--------------------------------------+---------------------+------------------+
```
Check allocated 'ports', think of this as IP endpoints for objects known by openstack.
- VM Instance = compute:nova
- Floating IP = network:floatingip
- DHCP service = network:dhcp (most networks will have one)
- Primary router interface = network:router_gateway (usually in the provider network, for egress/SNAT access to external networks)
- Secondary router interface = network:router_interface (router interface on a private Openstack network)
```sh
openstack port list --long -c 'ID' -c 'Fixed IP Addresses' -c 'Device Owner'
+--------------------------------------+-----------------------------------------------------------------------------+--------------------------+
| ID | Fixed IP Addresses | Device Owner |
+--------------------------------------+-----------------------------------------------------------------------------+--------------------------+
| 108171d9-cd76-49ab-944e-751f8257c8d1 | ip_address='10.121.4.150', subnet_id='92361cfd-f348-48a2-b264-7845a3a3d592' | compute:nova |
| 3d86a21a-f187-47e0-8204-464adf334fb0 | ip_address='172.16.0.2', subnet_id='a92d2ac0-8b60-4329-986d-ade078e75f45' | network:dhcp |
| 3db3fe34-85a8-4028-b670-7f9aa5c86c1a | ip_address='10.121.4.148', subnet_id='92361cfd-f348-48a2-b264-7845a3a3d592' | network:floatingip |
| 400cb067-2302-4f8e-bc1a-e187929afbbc | ip_address='10.121.4.205', subnet_id='92361cfd-f348-48a2-b264-7845a3a3d592' | network:router_gateway |
| 5c93d336-05b5-49f0-8ad4-9de9c2ccf216 | ip_address='172.16.2.239', subnet_id='ab658788-0c5f-4d22-8786-aa7256db66b6' | compute:nova |
| 62afa3de-5316-4eb6-88ca-4830c141c898 | ip_address='172.16.1.1', subnet_id='ab658788-0c5f-4d22-8786-aa7256db66b6' | network:router_interface |
| 7c8b58c0-3ff7-44f6-9eb3-a601a139aab9 | ip_address='172.16.0.1', subnet_id='a92d2ac0-8b60-4329-986d-ade078e75f45' | network:router_interface |
| 9f41db95-8333-4f6d-88e0-c0e3f7d4b7f0 | ip_address='172.16.1.2', subnet_id='ab658788-0c5f-4d22-8786-aa7256db66b6' | network:dhcp |
| c9591f1b-8d43-4322-acd6-75cd4cce04e3 | ip_address='10.121.4.239', subnet_id='92361cfd-f348-48a2-b264-7845a3a3d592' | network:router_gateway |
| e3f35c0a-6543-4508-8d17-96de69f85a1c | ip_address='10.121.4.130', subnet_id='92361cfd-f348-48a2-b264-7845a3a3d592' | network:dhcp |
+--------------------------------------+-----------------------------------------------------------------------------+--------------------------+
```
## Create disk volumes
Create volumes that will be attached on VM instantiation (bioinformatics02).
```sh
# find the image to use on the boot disk
openstack image list -c 'ID' -c 'Name' -c 'Project' --long -f json | jq -r '.[] | select(.Name == "alma_8.6").ID'
0a0d99c1-4bce-4e74-9df8-f9cf5666aa98
# create a bootable disk
openstack volume create --bootable --size 50 --image $(openstack image list -c 'ID' -c 'Name' -c 'Project' --long -f json | jq -r '.[] | select(.Name == "alma_8.6").ID') --description "bioinformatics02 boot" --os-project-domain-name='ldap' --os-project-name 'bioinformatics' bioinformatics02boot
# create a data disk
openstack volume create --non-bootable --size 100 --description "bioinformatics02 data" --os-project-domain-name='ldap' --os-project-name 'bioinformatics' bioinformatics02data
```
## Create VM instances
Creating instances via the CLI can save a lot of time VS the web console if the environment is not to be initially self provisioned by the end user, allowing you to template a bunch of machines quickly.
VM instances are not technically 'owned' by a user, they reside in a domain/project, they are provisioned by a user (initially with a user specific SSH key) and can be administered by users in same the project via the CLI/web-console. SSH access to the VM will be user specific unless the provisioning user adds access for other users (via password or SSH private key distribution at the operating system level). Userdata is the key to true multitenancy.
### Instance from flavour with larger disk and floating IP
The following command illustrates:
- create VM Instance in the Openstack 'bioinformatics' network with an additional floating IP
- override the instance flavour 10GB disk with a 100GB disk, the disk is not removed when the instance is deleted
- add multiple security groups, these apply to all interfaces by default, allowing specific ingress for only the floating IP would be achieved with a rule matching the destination of floating IP
```sh
# create VM instance
openstack server create \
--image alma_8.6 \
--flavor large \
--boot-from-volume 100 \
--network bioinformatics-network \
--security-group $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(.Name == "default").ID') \
--security-group $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(.Name == "linux-default").ID') \
--key-name bioinformatics \
--user-data userdata.txt \
--os-project-domain-name='ldap' \
--os-project-name 'bioinformatics' \
bioinformatics01
```
Attach the floating IP:
- this command relies on the unique uuid ID of both the server and floating IP objects as the command doesn't support the --project parameter
- we named both our floating IP and VM instance 'bioinformatics01', really this is where tags start to become useful
```sh
# attach floating IP
openstack server add floating ip $(openstack server list --project bioinformatics -f json | jq -r '.[] | select(.Name == "bioinformatics01").ID') $(openstack floating ip list --project bioinformatics --long -c 'ID' -c 'Floating IP Address' -c 'Description' -f json | jq -r '.[] | select(.Description == "bioinformatics01") | ."Floating IP Address"')
# check the IP addresses allocated to the VM instance, we see the floating IP 10.121.4.246 directly on the routable provider network
openstack server list --project bioinformatics
+--------------------------------------+------------------+--------+--------------------------------------------------+-------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------------+--------+--------------------------------------------------+-------+--------+
| ca402aed-84dd-47ad-b5ba-5fc74978f66b | bioinformatics01 | ACTIVE | bioinformatics-network=172.16.3.74, 10.121.4.246 | | large |
+--------------------------------------+------------------+--------+--------------------------------------------------+-------+--------+
```
### 'multi-homed' Instance from flavour with manually specified disk
Create the VM instance with the disk volumes attached and network interfaces in both the project's Openstack private network and the provider network.
```sh
# create a VM instance
## -v is a debug parameter, -vv for more
openstack server create \
--volume $(openstack volume list --name bioinformatics02boot --project bioinformatics -f json | jq -r .[].ID) \
--block-device-mapping vdb=$(openstack volume list --name bioinformatics02data --project bioinformatics -f json | jq -r .[].ID):volume::true \
--flavor large \
--nic net-id=provider \
--nic net-id=bioinformatics-network \
--security-group $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(.Name == "default").ID') \
--security-group $(openstack security group list --project bioinformatics -f json | jq -r '.[] | select(.Name == "linux-default").ID') \
--key-name bioinformatics \
--user-data userdata.txt \
--os-project-domain-name='ldap' \
--os-project-name 'bioinformatics' \
bioinformatics02 -v
# remove the server
## note that the data volume has been deleted, it was attached with the 'delete-on-terminate' flag set true in the '--block-device-mapping' parameter
## the main volume has not been removed, we see that 'delete-on-terminate' is set false in 'openstack server show'
## the web console will allow the boot volume to be delete-on-terminate, the CLI lacks this capability yet REST API clearly supports the functionality
openstack server delete $(openstack server show bioinformatics02 --os-project-domain-name='ldap' --os-project-name 'bioinformatics' -f json | jq -r .id)
openstack volume list --project bioinformatics
+--------------------------------------+----------------------+-----------+------+---------------------------------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+----------------------+-----------+------+---------------------------------------------------------------+
| db137b16-67ed-4ade-8d89-fd57d463f573 | | in-use | 100 | Attached to ca402aed-84dd-47ad-b5ba-5fc74978f66b on /dev/vda |
| 1ff863bb-6cb3-4d40-8d25-06b61e974e38 | bioinformatics02boot | available | 50 | |
+--------------------------------------+----------------------+-----------+------+---------------------------------------------------------------+
```
## Test access to VM instances
```sh
# check the IP addresses allocated to the VM instance
openstack server list --project bioinformatics -c 'Name' -c 'Networks' --long --fit-width
+------------------+-----------------------------------------------------------+
| Name | Networks |
+------------------+-----------------------------------------------------------+
| bioinformatics02 | bioinformatics-network=172.16.3.254; provider=10.121.4.92 |
| bioinformatics01 | bioinformatics-network=172.16.3.74, 10.121.4.246 |
+------------------+-----------------------------------------------------------+
# gain access to the instances via native provider network ip and the floating ip respectively
ssh -i ~/bioinformatics_cloud bioinformatics@10.121.4.92
ssh -i ~/bioinformatics_cloud bioinformatics@10.121.4.246
```

91
8) Testing.md Executable file
View File

@ -0,0 +1,91 @@
## Testing node evacuation
```sh
# create guest VM
cd;source ~/overcloudrc
openstack server create --image cirros-0.5.1 --flavor m1.small --network internal test-failover
openstack server list -c Name -c Status
+---------------+--------+
| Name | Status |
+---------------+--------+
| test-failover | ACTIVE |
+---------------+--------+
# find the compute node that the guest VM is running upon
openstack server show test-failover -f json | jq -r '."OS-EXT-SRV-ATTR:host"'
overcloud-novacomputeiha-3.localdomain
# login to the compute node hosting the guest VM, crash the host
cd;source ~/stackrc
ssh heat-admin@overcloud-novacomputeiha-3.ctlplane.localdomain
sudo su -
echo c > /proc/sysrq-trigger
# this terminal will fail after a few minutes, the dashboard console view of the guest VM will hang
# node hard poweroff will achieve the same effect
# check nova services
cd;source ~/overcloudrc
nova service-list
| 0ad301e3-3420-4d5d-a2fb-2f00ba80a00f | nova-compute | overcloud-novacomputeiha-3.localdomain | nova | disabled | down | 2022-05-19T11:49:40.000000 | - | True |
# check guest VM is still running, after a few minutes it should be running on another compute node
openstack server list -c Name -c Status
openstack server show test-failover -f json | jq -r .status
# VM Instance has not yet registered as on a down compute node
ACTIVE
# Openstack has detected the a down compute node and is moving the instance, rebuilding refers to the QEMU domain there is no VM rebuilding and active OS state is preserved
REBUILDING
# if you see an error state either IPMI interfaces cannot be contacted by the controllers or there is a storage migration issue, check with 'openstack server show test-failover'
ERROR
# you probably wont see this unless you recover from an ERROR state with 'openstack server stop test-failover'
SHUTOFF
# check VM instance is on a new node
openstack server show test-failover -f json | jq -r '."OS-EXT-SRV-ATTR:host"'
overcloud-novacomputeiha-1.localdomain
# Unless the compute node does not come back up you should see it automatically rejoined to the cluster
# If it does not rejoin the cluster try a reboot and wait a good 10 minutes
# If a node still does not come back you will have to remove it and redeploy from the undercloud - hassle
nova service-list
| 1be7bc8f-2769-4986-ac5e-686859779bca | nova-compute | overcloud-novacomputeiha-0.localdomain | nova | enabled | up | 2022-05-19T12:03:27.000000 | - | False |
| 0ad301e3-3420-4d5d-a2fb-2f00ba80a00f | nova-compute | overcloud-novacomputeiha-3.localdomain | nova | enabled | up | 2022-05-19T12:03:28.000000 | - | False |
| c8d3cfd8-d639-49a2-9520-5178bc5a426b | nova-compute | overcloud-novacomputeiha-2.localdomain | nova | enabled | up | 2022-05-19T12:03:26.000000 | - | False |
| 3c918b5b-36a6-4e63-b4de-1b584171a0c0 | nova-compute | overcloud-novacomputeiha-1.localdomain | nova | enabled | up | 2022-05-19T12:03:27.000000 | - | False |
```
Other commands to assist in debug of failover behaviour.
> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/command_line_interface_reference/server#server_migrate # great CLI reference
> https://docs.openstack.org/nova/rocky/admin/evacuate.html # older reference, prefer openstack CLI commands that act as a wrapper to nova CLI
```sh
# test that the controller nodes can run ipmitool against the compute nodes
ipmitool -I lanplus -H 10.0.9.45 -p 2000 -U USERID -P PASSW0RD chassis status
# list physical nodes
openstack host list
nova hypervisor-list
# list VMs, get compute node for an instance
openstack server list
openstack server list -c Name -c Status
nova list
openstack server show <server> -f json | jq -r '."OS-EXT-SRV-ATTR:host"'
# if you get a VM instance stuck in a power on/off state and you cant evacuate it from a failed node, issue 'openstack server stop <server>'
nova reset-state --active <server> # set to active state even if it was in error state
nova reset-state --all-tenants # seems to set node back to error state if it was in active state but failed and powered off
nova stop [--all-tenants] <server>
openstack server stop <server> # new command line reference method, puts node in poweroff state, use for ERROR in migration
# evacuate single VM server instance to a different compute node
# not prefered, older command syntax for direct nova service control
nova evacuate <server> overcloud-novacomputeiha-3.localdomain # moves VM - pauses but doesn't shut down
nova evacuate --on-shared-storage test-1 overcloud-novacomputeiha-0.localdomain # live migration
# prefered openstack CLI native commands
openstack server migrate --live-migration <server> # moves VM - pauses but doesn't shut down, state is preserved (presumably this only works owing to ceph/shared storage)
openstack server migrate --shared-migration <server> # requires manual confirmation in web console, stops/starts VM, state not preserved
```

View File

@ -0,0 +1,330 @@
# check certificate for the Openstack Horizon dashboard
```sh
openssl s_client -showcerts -connect stack.university.ac.uk:443
Certificate chain
0 s:C = GB, ST = England, L = University, CN = stack.university.ac.uk
i:C = GB, ST = England, L = University, O = UOE, OU = Cloud, CN = University Openstack CA
```
We see the certificate is signed by the CA "University Openstack CA" created in the build guide, this is not quite a self signed certificate but has broadly the same level of security unless the CA cert is not installed on the client machines.
# Check the certificate bundle recieved from an external signing authority
## Unpack and inspect
```sh
sudo dnf install unzip -y
unzip stack.university.ac.uk.zip
tree .
├── stack.university.ac.uk.cer full certificate chain, order: service certificate, intermediate CA, intermediate CA, top level CA
├── stack.university.ac.uk.cert.cer service certificate for stack.university.ac.uk
├── stack.university.ac.uk.csr certificate signing request (sent to public CA)
├── stack.university.ac.uk.interm.cer chain of intermediate and top level CA certificates, order: intermedia CA (Extended CA), intermediate CA, top level CA 321
└── stack.university.ac.uk.key certificate private key
```
## Check each certificate to determine what has been included in the bundle
Some signing authorities will not include all CA certificates in the bundle, it is up to you to inspect the service certificate and trace back through the certificate chain to obtain the various CA certificates.
### certificate information
Inspect service certificate.
```sh
#openssl x509 -in stack.university.ac.uk.cert.cer -text -noout
cfssl-certinfo -cert stack.university.ac.uk.cert.cer
```
Service certificate attributes.
```
"common_name": "stack.university.ac.uk"
"sans": [
"stack.university.ac.uk",
"www.stack.university.ac.uk"
],
"not_before": "2022-03-16T00:00:00Z",
"not_after": "2023-03-16T23:59:59Z",
```
### full certificate chain content
Copy out each certificate from the full chain file `stack.university.ac.uk.cer` to its own temp file, run the openssl text query command `openssl x509 -in <cert.N> -text -noout` to inspect each certificate.
The full chain certificate file is listed in following order. From the service certificate `stack.university.ac.uk` each certificate is signed by the preceding CA.
| Certificate context name | purpose | capability |
| --- | --- | --- |
| CN = AAA Certificate Services | top level CA | CA capability |
| CN = USERTrust RSA Certification Authority | intermediate CA | CA capability |
| CN = GEANT OV RSA CA 4 | intermediate CA | CA capability<br>extended validation capability |
| CN = stack.university.ac.uk | the service certificate | stack.university.ac.uk certificate |
## Check that the certificate chain is present by default in the trust store on the clients
Open certmgr in windows, check in "Trusted Root Authorities/Certificates" for each CA/Intermediate-CA certificate, all certificated will likely be present.
- look for the context name (CN)
- check the "X509v3 Subject Key Identifier" matches the "subject key identifier" from the `openssl x509 -in stack.university.ac.uk.cert.cer -text -noout` output
Windows includes certificates for "AAA Certificate Services" and "USERTrust RSA Certification Authority", the extended validation Intermediate CA "GEANT OV RSA CA 4" maybe missing, this is not an issue as the client has the top level CAs so can validate and follow the signing chain.
For modern Linux distros we find only one intermediate CA, this should be sufficient as any handshake using certificates signed from this will be able to validate. If the undercloud can find a CA in its trust store the deployed cluster nodes will most likely have it.
```sh
trust list | grep -i label | grep -i "USERTrust RSA Certification Authority"
# generally all certificates imported into the trust store get rendered into this global file
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt
# search the trust store for "USERTrust RSA Certification Authority", copy the content of the certificate field into a temp file for the following stanza
nano -cw /usr/share/pki/ca-trust-source/ca-bundle.trust.p11-kit
[p11-kit-object-v1]
label: "USERTrust RSA Certification Authority"
trusted: true
nss-mozilla-ca-policy: true
modifiable: false
# check the "X509v3 Subject Key Identifier" matches the CA in the certificate chain you recieved from the signing authority.
openssl x509 -in <temp file> -text -noout | grep "53:79:BF:5A:AA:2B:4A:CF:54:80:E1:D8:9B:C0:9D:F2:B2:03:66:CB"
```
Browsers such as Edge and Chrome will use the OS trust store, Firefox distributes its own trust store.
- 3 bar burger -> settings -> security -> view certificates -> authorities -> The UserTrust Network -> USERTrust RSA Certification Authority
We find the fingerprint from the openssl command "X509v3 Subject Key Identifier" matches the certificate field "subject key identifier" in Firefox.
## Configure the undercloud to use the CAs
```sh
trust list | grep label | wc -l
148
sudo cp /home/stack/CERT/stack.university.ac.uk/stack.university.ac.uk.interm.cer /etc/pki/ca-trust/source/anchors/public_ca_chain.pem
sudo update-ca-trust extract
# although the certificate chain includes 3 certificates only 1 is imported, this is the imtermediate CA "CN = GEANT OV RSA CA 4" that is not part of a default trust store
trust list | grep label | wc -l
149
# check CA/trusted certificates available to the OS
trust list | grep label | grep -i "AAA Certificate Services"
label: AAA Certificate Services
trust list | grep label | grep -i "USERTrust RSA Certification Authority"
label: USERTrust RSA Certification Authority
label: USERTrust RSA Certification Authority
trust list | grep label | grep -i "GEANT OV RSA CA 4"
label: GEANT OV RSA CA 4
```
## Configure the controller nodes to use the publicly signed certificate
NOTE: "PublicTLSCAFile" is used both by the overcloud HAProxy configuration and the undercloud installer to contact https://stack.university.ac.uk:13000
- The documentation presents the "PublicTLSCAFile" configuration item as the root CA certificate.
- When the undercloud runs various custom Openstack ansible modules, the python libraries run have a completely empty trust store that do not reference the undercloud OS trust store and do not ingest shell variables to set trust store sources.
- For the python to validate the overcloud public API endpoint, the full trust chain must be present. Python is not fussy about the order of certificates in this file, the vendor CA trust chain file in this case was ordered starting with the root CA.
Backup /home/stack/templates/enable-tls.yaml `mv /home/stack/templates/enable-tls.yaml /home/stack/templates/enable-tls.yaml.internal_ca`
Create new `/home/stack/templates/enable-tls.yaml`, the content for each field is source as follows:
```
PublicTLSCAFile: '/etc/pki/ca-trust/source/anchors/public_ca_chain.pem'
SSLCertificate: content from stack.university.ac.uk.cer
SSLIntermediateCertificate: use both intermediate certificates, in the order intermediate-2, intermediate-1 (RFC5426)
SSLKey: content from stack.university.ac.uk.key
```
The fully populated /home/stack/templates/enable-tls.yaml:
NOTE: the intermediate certificates configuration item contains both intermediate certificates
Luckliy Openstack does not validate this field and pushes it directly into the HAProxy pem file, the order of the pem is as NGINX preferes (RFC5426), service certificate, intermediate CA2, intermediate CA1, root CA.
During the SSL handshake the client will check the intermediate certificates in the response, if they are not present in the local trust store signing will be checked up to the root CA which will be in the client trust store.
```yaml
parameter_defaults:
# Set CSRF_COOKIE_SECURE / SESSION_COOKIE_SECURE in Horizon
# Type: boolean
HorizonSecureCookies: True
# Specifies the default CA cert to use if TLS is used for services in the public network.
# Type: string
# PublicTLSCAFile: '/etc/pki/ca-trust/source/anchors/public_ca.pem'
PublicTLSCAFile: '/home/stack/templates/stack.university.ac.uk.interm.cer'
# The content of the SSL certificate (without Key) in PEM format.
# Type: string
SSLCertificate: |
-----BEGIN CERTIFICATE-----
MIIHYDCCBUigAwIBAgIRAK55qnAAkkQKzs6cusLn+0IwDQYJKoZIhvcNAQEMBQAw
.....
+vXuwEyJ5ULoW0TO6CuQvAvJsVM=
-----END CERTIFICATE-----
# The content of an SSL intermediate CA certificate in PEM format.
# Type: string
SSLIntermediateCertificate: |
-----BEGIN CERTIFICATE-----
MIIG5TCCBM2gAwIBAgIRANpDvROb0li7TdYcrMTz2+AwDQYJKoZIhvcNAQEMBQAw
.....
Ipwgu2L/WJclvd6g+ZA/iWkLSMcpnFb+uX6QBqvD6+RNxul1FaB5iHY=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFgTCCBGmgAwIBAgIQOXJEOvkit1HX02wQ3TE1lTANBgkqhkiG9w0BAQwFADB7
.....
vGp4z7h/jnZymQyd/teRCBaho1+V
-----END CERTIFICATE-----
# The content of the SSL Key in PEM format.
# Type: string
SSLKey: |
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAqXvJwxSDfxjapmRMqFlchTPPpGUi6n0lFbJ7G2YQ+HUBwaEZ
.....
PcVhU+Ybi7ABCOyRUzZWXDlf6DxF4Kgoe/Ak99nM7v0MIndlbgZBYA==
-----END RSA PRIVATE KEY-----
# ******************************************************
# Static parameters - these are values that must be
# included in the environment but should not be changed.
# ******************************************************
# The filepath of the certificate as it will be stored in the controller.
# Type: string
DeployedSSLCertificatePath: /etc/pki/tls/private/overcloud_endpoint.pem
```
## Update the overcloud nodes to have all of the CA + Intermediate CA certificates imported into their trust stores
Whilst the overcloud nodes shouldn't use the public certificate for inter-service API communication (this is not a TLS everywhere installation), include this CA chain as a caution.
Backup /home/stack/templates/inject-trust-anchor-hiera.yaml `mv /home/stack/templates/inject-trust-anchor-hiera.yaml /home/stack/templates/inject-trust-anchor-hiera.yaml.internal_ca`
Create new `/home/stack/templates/inject-trust-anchor-hiera.yaml`, the content for each field is source as follows:
```yaml
CAMap:
root-ca:
content: |
"CN = AAA Certificate Services" certificate content here
intermediate-ca-1:
content: |
"CN = USERTrust RSA Certification Authority" certificate content here
intermediate-ca-2:
content: |
"CN = GEANT OV RSA CA 4" certificate content here
```
The fully populated /home/stack/templates/inject-trust-anchor-hiera.yaml.
```sh
parameter_defaults:
# Map containing the CA certs and information needed for deploying them.
# Type: json
CAMap:
root-ca:
content: |
-----BEGIN CERTIFICATE-----
MIIEMjCCAxqgAwIBAgIBATANBgkqhkiG9w0BAQUFADB7MQswCQYDVQQGEwJHQjEb
.....
smPi9WIsgtRqAEFQ8TmDn5XpNpaYbg==
-----END CERTIFICATE-----
intermediate-ca-1:
content: |
-----BEGIN CERTIFICATE-----
MIIFgTCCBGmgAwIBAgIQOXJEOvkit1HX02wQ3TE1lTANBgkqhkiG9w0BAQwFADB7
.....
vGp4z7h/jnZymQyd/teRCBaho1+V
-----END CERTIFICATE-----
intermediate-ca-2:
content: |
-----BEGIN CERTIFICATE-----
MIIG5TCCBM2gAwIBAgIRANpDvROb0li7TdYcrMTz2+AwDQYJKoZIhvcNAQEMBQAw
.....
Ipwgu2L/WJclvd6g+ZA/iWkLSMcpnFb+uX6QBqvD6+RNxul1FaB5iHY=
-----END CERTIFICATE-----
```
## Deploy the overcloud
The FQDN of the floating IP served by the HAProxy containers on the controller nodes must have an upstream DNS A record, this should be present as the `CloudName:` parameter.
The DNS hosts should return the A record, for University - the internal DNS server and a publically published record resolve stack.university.ac.uk.
```sh
grep CloudName: /home/stack/templates/custom-domain.yaml
CloudName: stack.university.ac.uk
grep DnsServers: /home/stack/templates/custom-domain.yaml
DnsServers: ["144.173.6.71", "1.1.1.1"]
[stack@undercloud templates]$ grep 10.121.4.14 vips.yaml
PublicVirtualFixedIPs: [{'ip_address':'10.121.4.14'}]
dig stack.university.ac.uk @144.173.6.71
dig stack.university.ac.uk @1.1.1.1
;; ANSWER SECTION:
stack.university.ac.uk. 86400 IN A 10.121.4.14
```
Use the exact same arguments as the previous deployment to mitigate any unwanted changes to the cluster, for this build the script `overcloud-deploy.sh` should be up to date with this record.
```sh
./overcloud-deploy.sh
```
The update will complete for any overcloud nodes, however the undercloud may time out contacting the external API endpoint with the new SSL certificate changed.
The HAProxy containers on the controller nodes need to be restarted to pick up the new certificates.
If you were to run the deployment again (with no changes and restarted HAProxy containers) it should complete without issue and set the deployment with status 'UPDATE COMPLETE' when checking `openstack stack list`.
## Restart HAProxy containers on the controller nodes
Follow the instructions to restart the HAProxy containers on the overcloud controller nodes once the deployment has finished updating the SSL certificate.
> [https://access.redhat.com/documentation/en-us/red\_hat\_openstack\_platform/16.2/html/advanced\_overcloud\_customization/assembly\_enabling-ssl-tls-on-overcloud-public-endpoints#proc\_manually-updating-ssl-tls-certificates\_enabling-ssl-tls-on-overcloud-public-endpoints](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/advanced_overcloud_customization/assembly_enabling-ssl-tls-on-overcloud-public-endpoints#proc_manually-updating-ssl-tls-certificates_enabling-ssl-tls-on-overcloud-public-endpoints)
```sh
grep control /etc/hosts | grep ctlplane
10.122.0.30 overcloud-controller-0.ctlplane.university.ac.uk overcloud-controller-0.ctlplane
10.122.0.31 overcloud-controller-1.ctlplane.university.ac.uk overcloud-controller-1.ctlplane
10.122.0.32 overcloud-controller-2.ctlplane.university.ac.uk overcloud-controller-2.ctlplane
# for each controller node
ssh heat-admin@overcloud-controller-0.ctlplane.university.ac.uk
sudo su -
podman restart $(podman ps --format="{{.Names}}" | grep -w -E 'haproxy(-bundle-.*-[0-9]+)?')
```
# SSL notes
Verify a full chain of certificates easily.
```sh
openssl verify -verbose -CAfile <(cat CERT/stack.university.ac.uk/intermediate_ca_2.pem CERT/stack.university.ac.uk/intermediate_ca_1.pem CERT/stack.university.ac.uk/root_ca.pem) CERT/stack.university.ac.uk/service_cert.pe
```
Check a certificate key is valid for a certificate.
```sh
openssl x509 -noout -modulus -in CERT/stack.university.ac.uk/stack.university.ac.uk.cert.cer | openssl md5
(stdin)= 60a5df743ac212edb2b28bf315bce828
openssl rsa -noout -modulus -in CERT/stack.university.ac.uk/stack.university.ac.uk.key | openssl md5
(stdin)= 60a5df743ac212edb2b28bf315bce828
```
Format of nginx type cert, haproxy (built on nginx) uses this format. When populating Openstack configuration files with multiple intermediate certs in a single field order multiple intermediate certs as so.
```
# create chain bundle, order as per RFC5426 (IETF's RFC 5246 Section 7.4.2) search google for nginx cert chain order.
cat ../out/service.pem ../out/ca.pem > ../out/reg-chain.pem
# the order with multiple intermediate certs would resemble
cert
int 2
int 1
root
```

14
README.md Normal file
View File

@ -0,0 +1,14 @@
# What is this?
Openstack RHOSP 16.2 (tripleo) baremetal deployment with:
virtual undercloud
multiple server types
custom roles
ldap integration
public SSL validation on dashboard/api
standalone opensource Ceph Cluster with erasure-coding
Nvidia cumulus 100G switch(s) configuration with MLAG/CLAG
training documentation - domains, projects, groups, users, flavours, quotas, provider networks, private networks
more rough guides for manilla with ceph, quay registry integration - not present here

BIN
university_Network.drawio.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB