|
|
||
|---|---|---|
| cluster | ||
| .gitignore | ||
| README.md | ||
README.md
What is this?
Side project to build a CI testbench to work on various Ansible roles to fit with employers default HPC deployment.
Drivers
- There is not a true representation of the stack suitable for CI
- Existing testbench is slow to re-provision and very manual
- There isnt enough hardware to test multiple stacks
- Corp hypervisors are unsuitable (possibly Python IPMI listener -> Proxmox/VMWare API would be ok but many ancillary virtual networks would be required that may change on a stack to stack basis)
- Updating Ansible in vi on customer systems is tedious
Goal
The aim is to simulate baremetal node provision using XCAT -> iPXE/IPMI -> virtualBMC -> QEMU VMs, and continue to develop the Ansible roles that configure the various classes of node.
Components
Use commodity hardware to act as hypervisors and model the storage and network components
- tested on 2 and 3 nodes, single nvme, single NIC
Generate static or dynamic Ansible inventory natively via the XCAT API.
- working model
- requires networks to be pulled from XCAT
Use a dynamic role model triggered by XCAT group membership.
- existing working model
- all Ansible variables imported under top level object ready for keypairDB integration
- various helper roles to deep merge dictionaries and lists for individual site/deployment customisations
Use VXLAN point to point between each hypervisor to simulate the various cluster networks.
- working model that will scale to many hypervisors
Use hyperconverged Ceph to provide RBD for VM disk images, CephFS+Ganesha for NFS mounts hosting scheduler/HPC software
- latest Ceph is now nearly all yaml spec driven allowing automation, most exisitng Ansible is behind
- cluster build automation complete
- OSD + Pools complete
- RBD complete
- NFS outstanding
Deploy XCAT container, seed with inventory of to-be provisioned VMs
- to complete
Deploy virtualBMC
- working model
Deploy QEMU with RBD disk
- to complete