1079 lines
41 KiB
Markdown
Executable File
1079 lines
41 KiB
Markdown
Executable File
> RHOSP tripleo can also deploy Ceph
|
|
> To separate the storage deployment from the Openstack deployment to simplify any DR/Recovery/Redeployment we will create a stand-alone Ceph cluster and integrate with Openstack overcloud
|
|
> Opensource Ceph can be installed for further cost saving
|
|
> [https://access.redhat.com/documentation/en-us/red\_hat\_openstack\_platform/16.1/html-single/integrating\_an\_overcloud\_with\_an\_existing\_red\_hat\_ceph\_cluster/index](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/integrating_an_overcloud_with_an_existing_red_hat_ceph_cluster/index)
|
|
|
|
# Ceph pacific setup
|
|
|
|
## Get access to ceph nodes
|
|
|
|
- Rocky linux
|
|
- 3 Physical Nodes
|
|
- 1G Ethernet for Openstack control plane management network
|
|
- 2 x 25G Ethernet LACP bond for all service networks
|
|
- 4 disk per node, 2 OS RAID1 in BIOS, 2 Ceph 960GB
|
|
- Setup OS disk as LVM boot 1GB, root 240GB, swap 4GB
|
|
- Credentials - root:Password0
|
|
|
|
# Ceph architecture
|
|
|
|
Ceph services:
|
|
| \-\-\- |
|
|
| 3 monitors |
|
|
| 3 managers |
|
|
| 6 osd |
|
|
| 3 mds (2 standby) - not being used |
|
|
| 3 rgw (2 standby - fronted by LBL) - not being used |
|
|
|
|
Networks:
|
|
|
|
- 'Ceph public network' (Ceph services) VLAN13, this is the same network as the 'Openstack storage network'.
|
|
- 'Ceph cluster network' (OSD replication+services) VLAN15.
|
|
- 'Openstack storage management network' VLAN14, this network is a prerequisite of the Openstack Tripleo installer, it may not be used in with an External Ceph installation, it is added to cover all bases.
|
|
- 'Openstack control plane network' VLAN1(native), this network will serve as the main ingress to the Ceph cluster nodes.
|
|
- 'Openstack external network' VLAN4, this network has an externally routable gateway.
|
|
|
|
| Network | VLAN | Interface | IP Range | Gateway | DNS |
|
|
| --- | --- | --- | --- | --- | --- |
|
|
| Ceph public<br>(Openstack storage) | 13 | bond0 | 10.122.10.0/24 | NA | NA |
|
|
| Ceph cluster | 15 | bond0 | 10.122.14.0/24 | NA | NA |
|
|
| Openstack storage management | 14 | bond0 | 10.122.12.0/24 | NA | NA |
|
|
| Openstack control plane | 1(native) | ens4f0 | 10.122.0.0/24 | NA | NA |
|
|
| Openstack external | 1214 | bond0 | 10.121.4.0/24 | 10.121.4.1 | 144.173.6.71<br>1.1.1.1 |
|
|
|
|
IP allocation:
|
|
|
|
> For all ranges, addresses 7-13 in the last octet are reserved for Ceph, There are 3 spare IPs either for additional nodes or RGW/LoadBalancer services.
|
|
|
|
| Node | ceph1 | ceph2 | ceph3 |
|
|
| --- | --- | --- | --- |
|
|
| Ceph public<br>(Openstack storage) | 10.122.10.7 | 10.122.10.8 | 10.122.10.9 |
|
|
| Ceph cluster | 10.122.14.7 | 10.122.14.8 | 10.122.14.9 |
|
|
| Openstack storage management | 10.122.12.7 | 10.122.12.8 | 10.122.12.9 |
|
|
| Openstack control plane | 10.122.0.7 | 10.122.0.8 | 10.122.0.9 |
|
|
| Openstack external | 10.122.4.7 | 10.122.4.8 | 10.122.4.9 |
|
|
|
|
# Configure OS
|
|
|
|
> Perform all actions on all nodes unless specified.
|
|
> Substitute IPs and hostnames appropriatley.
|
|
|
|
## Configure networking
|
|
|
|
Configure networking with the nmcli method. Connect to the console of the out of band interface and configure the management interface.
|
|
|
|
```
|
|
# likely have NetworkManager enabled on RHEL8 based OS
|
|
systemctl list-unit-files --state=enabled | grep -i NetworkManager
|
|
|
|
# create management interface
|
|
# nmcli con add type ethernet ifname ens4f0 con-name openstack-ctlplane connection.autoconnect yes ip4 10.122.0.7/24
|
|
nmcli con add type ethernet ifname ens9f0 con-name openstack-ctlplane connection.autoconnect yes ip4 10.122.0.7/24
|
|
```
|
|
|
|
Connect via SSH to configure the bond and VLANS.
|
|
|
|
```
|
|
# create bond interface and add slave interfaces
|
|
nmcli con add type bond ifname bond0 con-name bond0 bond.options "mode=802.3ad, miimon=100, downdelay=0, updelay=0" connection.autoconnect yes ipv4.method disabled ipv6.method ignore
|
|
# nmcli con add type ethernet ifname ens2f0 master bond0
|
|
# nmcli con add type ethernet ifname ens2f1 master bond0
|
|
nmcli con add type ethernet ifname ens3f0 master bond0
|
|
nmcli con add type ethernet ifname ens3f1 master bond0
|
|
nmcli device status
|
|
|
|
# create vlan interfaces
|
|
nmcli con add type vlan ifname bond0.13 con-name ceph-public id 13 dev bond0 connection.autoconnect yes ip4 10.122.10.7/24
|
|
nmcli con add type vlan ifname bond0.15 con-name ceph-cluster id 15 dev bond0 connection.autoconnect yes ip4 10.122.14.7/24
|
|
nmcli con add type vlan ifname bond0.14 con-name openstack-storage_mgmt id 14 dev bond0 connection.autoconnect yes ip4 10.122.12.7/24
|
|
nmcli con add type vlan ifname bond0.1214 con-name openstack-external id 1214 dev bond0 connection.autoconnect yes ip4 10.121.4.7/24 gw4 10.121.4.1 ipv4.dns 144.173.6.71,1.1.1.1 ipv4.dns-search local
|
|
|
|
# check all devices are up
|
|
nmcli device status
|
|
nmcli con show
|
|
nmcli con show bond0
|
|
|
|
# check LACP settings
|
|
cat /proc/net/bonding/bond0
|
|
|
|
# remove connection profiles
|
|
nmcli con show
|
|
nmcli con del openstack-ctlplane
|
|
nmcli con del ceph-public
|
|
nmcli con del ceph-cluster
|
|
nmcli con del openstack-storage_mgmt
|
|
nmcli con del openstack-external
|
|
nmcli con del bond-slave-ens2f0
|
|
nmcli con del bond-slave-ens2f1
|
|
nmcli con del bond0
|
|
nmcli con show
|
|
nmcli device status
|
|
```
|
|
|
|
## Install useful tools and enable Podman
|
|
|
|
```sh
|
|
dnf update -y ;\
|
|
dnf install nano lvm2 chrony telnet traceroute wget tar nmap tmux bind-utils net-tools podman python3 mlocate ipmitool tmux wget yum-utils -y ;\
|
|
systemctl enable podman ;\
|
|
systemctl start podman
|
|
```
|
|
|
|
## Setup hostnames
|
|
|
|
Cephadm install tool specific setup, Ceph prefers to talk to its peers using IP (FQDN requires more setup and is not recommended in the documentation).
|
|
|
|
```sh
|
|
echo "10.122.10.7 ceph1" | tee -a /etc/hosts ;\
|
|
echo "10.122.10.8 ceph2" | tee -a /etc/hosts ;\
|
|
echo "10.122.10.9 ceph3" | tee -a /etc/hosts
|
|
|
|
hostnamectl set-hostname ceph1 # this should not be an FQDN such as ceph1.local (as recommended in ceph documentation)
|
|
hostnamectl set-hostname --transient ceph1
|
|
```
|
|
|
|
## Setup NTP
|
|
|
|
```
|
|
dnf install chrony -y
|
|
timedatectl set-timezone Europe/London
|
|
nano -cw /etc/chrony.conf
|
|
|
|
server ntp.university.ac.uk iburst
|
|
pool 2.cloudlinux.pool.ntp.org iburst
|
|
|
|
systemctl enable chronyd
|
|
systemctl start chronyd
|
|
```
|
|
|
|
## Disable annoyances
|
|
|
|
```
|
|
systemctl disable firewalld
|
|
systemctl stop firewalld
|
|
|
|
# DO NOT DISABLE SELINUX - now a requirement of Ceph, containers will not start without SELINUX enforcing
|
|
#sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
|
|
#getenforce
|
|
#setenforce 0
|
|
#getenforce
|
|
```
|
|
|
|
## Reboot
|
|
|
|
```sh
|
|
reboot
|
|
```
|
|
|
|
# Ceph install
|
|
|
|
## Download cephadm deployment tool
|
|
|
|
```
|
|
#curl --silent --remote-name --location https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm
|
|
curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
|
|
chmod +x cephadm
|
|
```
|
|
|
|
## Add the Ceph yum repo and install the cephadm tool to the system, then remove the installer.
|
|
|
|
```
|
|
# this may not be required with the pacific version of cephadm
|
|
# add rockylinux / almalinux to the accepted distributions in the installer
|
|
nano -cw cephadm
|
|
|
|
class YumDnf(Packager):
|
|
DISTRO_NAMES = {
|
|
'rocky' : ('centos', 'el'),
|
|
'almalinux': ('centos', 'el'),
|
|
'centos': ('centos', 'el'),
|
|
'rhel': ('centos', 'el'),
|
|
'scientific': ('centos', 'el'),
|
|
'fedora': ('fedora', 'fc'),
|
|
}
|
|
|
|
./cephadm add-repo --release pacific
|
|
./cephadm install
|
|
which cephadm
|
|
rm ./cephadm
|
|
```
|
|
|
|
## Bootstrap the first mon node
|
|
|
|
> This action should be performed ONLY on ceph1.
|
|
|
|
- Bootstrap the mon daemon on this node, using the mon network interface (referred to as the public network in ceph documentation).
|
|
- Bootstrap will pull the correct docker image and setup the host config files and systemd scripts (to start daemon containers).
|
|
- The /etc/ceph/ceph.conf config is populated with a unique cluster fsid ID and mon0 host connection profile.
|
|
|
|
```
|
|
mkdir -p /etc/ceph
|
|
cephadm bootstrap --mon-ip 10.122.10.7 --skip-mon-network --cluster-network 10.122.14.0/24
|
|
|
|
# copy the output of the command to file
|
|
|
|
Ceph Dashboard is now available at:
|
|
|
|
URL: https://ceph1:8443/
|
|
User: admin
|
|
Password: Password0
|
|
|
|
Enabling client.admin keyring and conf on hosts with "admin" label
|
|
Enabling autotune for osd_memory_target
|
|
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
|
|
|
|
sudo /usr/sbin/cephadm shell --fsid 5b99e574-4577-11ed-b70e-e43d1a63e590 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
|
|
|
|
Or, if you are only running a single cluster on this host:
|
|
|
|
sudo /usr/sbin/cephadm shell
|
|
|
|
cat /etc/ceph/ceph.conf
|
|
|
|
# minimal ceph.conf for 5b99e574-4577-11ed-b70e-e43d1a63e590
|
|
[global]
|
|
fsid = 5b99e574-4577-11ed-b70e-e43d1a63e590
|
|
mon_host = [v2:10.122.10.7:3300/0,v1:10.122.10.7:6789/0]
|
|
```
|
|
|
|
## Install the ceph cli on the first mon node
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
The cli can also be used via a container shell without installation, the cephadm installation method configures the cli tool to target the container daemons.
|
|
|
|
```
|
|
cephadm install ceph-common
|
|
ceph -v
|
|
|
|
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
|
|
|
|
# ceph status
|
|
ceph -s
|
|
|
|
cluster:
|
|
id: 5b99e574-4577-11ed-b70e-e43d1a63e590
|
|
health: HEALTH_WARN
|
|
OSD count 0 < osd_pool_default_size 3
|
|
|
|
services:
|
|
mon: 1 daemons, quorum ceph1 (age 2m)
|
|
mgr: ceph1.virprg(active, since 46s)
|
|
osd: 0 osds: 0 up, 0 in
|
|
|
|
data:
|
|
pools: 0 pools, 0 pgs
|
|
objects: 0 objects, 0 B
|
|
usage: 0 B used, 0 B / 0 B avail
|
|
pgs:
|
|
```
|
|
|
|
## Push ceph ssh pub key to other ceph nodes
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
```
|
|
ceph cephadm get-pub-key > ~/ceph.pub
|
|
for i in {2..3};do ssh-copy-id -f -i ~/ceph.pub root@ceph$i;done
|
|
```
|
|
|
|
Test connectivity of the ceph key.
|
|
|
|
```
|
|
ceph config-key get mgr/cephadm/ssh_identity_key > ~/ceph.pvt
|
|
chmod 0600 ~/ceph.pvt
|
|
ssh -i ceph.pvt root@ceph2
|
|
ssh -i ceph.pvt root@ceph3
|
|
```
|
|
|
|
## Add more mon nodes
|
|
|
|
> This action should be performed on ceph1.
|
|
> `_admin` label populates the /etc/ceph config files to allow cli usage on each host.
|
|
|
|
```
|
|
ceph orch host add ceph2 10.122.10.8 --labels _admin
|
|
ceph orch host add ceph3 10.122.10.9 --labels _admin
|
|
```
|
|
|
|
## Install the ceph cli on the remaining nodes
|
|
|
|
```
|
|
ssh -i ceph.pvt root@ceph2
|
|
cephadm install ceph-common
|
|
ceph -s
|
|
exit
|
|
|
|
ssh -i ceph.pvt root@ceph3
|
|
cephadm install ceph-common
|
|
ceph -s
|
|
exit
|
|
```
|
|
|
|
## Set the operating networks, the cluster network and public network are in the same network.
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
```
|
|
ceph config set global public_network 10.122.10.0/24
|
|
ceph config set global cluster_network 10.122.14.0/24
|
|
ceph config dump
|
|
```
|
|
|
|
## Add all labels to the node
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
These are arbitrary label values to assist with service placement, however there are special labels with functionality such as '_admin'.
|
|
|
|
> https://docs.ceph.com/en/latest/cephadm/host-management/
|
|
|
|
```
|
|
ceph orch host label add ceph1 mon ;\
|
|
ceph orch host label add ceph1 osd ;\
|
|
ceph orch host label add ceph1 mgr ;\
|
|
ceph orch host label add ceph1 mds ;\
|
|
ceph orch host label add ceph1 rgw ;\
|
|
ceph orch host label add ceph2 mon ;\
|
|
ceph orch host label add ceph2 osd ;\
|
|
ceph orch host label add ceph2 mgr ;\
|
|
ceph orch host label add ceph2 mds ;\
|
|
ceph orch host label add ceph2 rgw ;\
|
|
ceph orch host label add ceph3 mon ;\
|
|
ceph orch host label add ceph3 osd ;\
|
|
ceph orch host label add ceph3 mgr ;\
|
|
ceph orch host label add ceph3 mds ;\
|
|
ceph orch host label add ceph3 rgw ;\
|
|
ceph orch host ls
|
|
|
|
HOST ADDR LABELS STATUS
|
|
ceph1 10.122.10.7 _admin mon osd mgr mds rgw
|
|
ceph2 10.122.10.8 _admin mon osd mgr mds rgw
|
|
ceph3 10.122.10.9 _admin mon osd mgr mds rgw
|
|
3 hosts in cluster
|
|
```
|
|
|
|
## Deploy core daemons to hosts
|
|
|
|
> This action should be performed on ceph1.
|
|
> More daemons will be applied as they are added.
|
|
> https://docs.ceph.com/en/latest/cephadm/services/#orchestrator-cli-placement-spec
|
|
|
|
```
|
|
#ceph orch apply mon --placement="label:mon" --dry-run
|
|
ceph orch apply mon --placement="label:mon"
|
|
ceph orch apply mgr --placement="label:mgr"
|
|
ceph orch ls # keep checking until all services are up, should be <1 minute
|
|
|
|
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
|
|
alertmanager ?:9093,9094 1/1 25s ago 36m count:1
|
|
crash 3/3 111s ago 36m *
|
|
grafana ?:3000 1/1 25s ago 36m count:1
|
|
mgr 3/3 111s ago 43s label:mgr
|
|
mon 3/3 111s ago 50s label:mon
|
|
node-exporter ?:9100 3/3 111s ago 36m *
|
|
prometheus ?:9095 1/1 25s ago 36m count:1
|
|
```
|
|
|
|
## Setup the mgr dashboard to listen on a specific IP (the only range in this case)
|
|
|
|
> This action should be performed on ceph1.
|
|
> https://docs.ceph.com/en/latest/mgr/dashboard/
|
|
|
|
When adding multiple dashboards only one node will be the active dashboard and the others will be in standby status, should you connect to another hosts @https:8443 you will be redirected to the active dashboard node.
|
|
|
|
```sh
|
|
# dashboard is not being run on the public_network, instead on the routable network, we also put the Openstack dashboard here
|
|
ceph config set mgr mgr/dashboard/ceph1/server_addr 10.121.4.7 ;\
|
|
ceph config set mgr mgr/dashboard/ceph2/server_addr 10.121.4.8 ;\
|
|
ceph config set mgr mgr/dashboard/ceph3/server_addr 10.121.4.9
|
|
|
|
# stop/start ceph
|
|
systemctl stop ceph.target;sleep 5;systemctl start ceph.target
|
|
|
|
# check service endpoints, likely the mgr service is running on ceph1 with ceph2/3 acting as standby
|
|
ceph mgr services
|
|
|
|
{
|
|
"dashboard": "https://10.122.10.7:8443/",
|
|
"prometheus": "http://10.122.10.7:9283/"
|
|
}
|
|
|
|
# the dashboard seems to listen on any interface
|
|
ss -taln | grep 8443
|
|
|
|
LISTEN 0 5 *:8443 *:*
|
|
|
|
# config confims dashboard listening address
|
|
ceph config dump | grep "mgr/dashboard/ceph1/server_addr"
|
|
|
|
mgr advanced mgr/dashboard/ceph1/server_addr 10.121.4.7
|
|
```
|
|
|
|
Reset dashboard admin user password.
|
|
|
|
```
|
|
ceph dashboard ac-user-show
|
|
["admin"]
|
|
|
|
echo 'Password0' > password.txt
|
|
ceph dashboard ac-user-set-password admin -i password.txt
|
|
rm -f password.txt
|
|
```
|
|
|
|
Netstat shows graphana is also listening on ceph1.
|
|
|
|
> https://ceph1:8443/ Dashboard
|
|
> https://ceph1:3000/ Graphana
|
|
> http://ceph1:9283/ Prometheus
|
|
|
|
## Ceph OSD
|
|
|
|
#### Add OSD
|
|
|
|
> drive-groups method is a new way to specify which disk is to be made an OSD, (types - data, db, wal), you can select disks by cluster node, by path, by serial number, by model or by size - this is useful for large estates and very fast.
|
|
> https://docs.ceph.com/en/latest/cephadm/services/osd/#drivegroups
|
|
> https://docs.ceph.com/en/pacific/rados/configuration/bluestore-config-ref/
|
|
> https://docs.ceph.com/en/octopus/cephadm/drivegroups
|
|
|
|
These instructions are fairly new but will work with OSDs nested on LVM volumes and full disks, as will probably be the standard in future.
|
|
|
|
- Perform any disk prep if required
|
|
- Enter container shell.
|
|
- Seed keyring with OSD credential.
|
|
- Prepare OSD (import into mon map with keys etc).
|
|
- Signal to the host to create OSD daemon containers.
|
|
|
|
For the Production cluster build each node will a create logical volume on each of the 8 spinning disks, the SSD disk will be carved into 8 logical volumes with each volume acting as the wal/db device for a spinning disk.
|
|
|
|
Create the logical volumes on each node:
|
|
|
|
```
|
|
# find OSD disks
|
|
lsblk
|
|
|
|
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
|
sda 8:0 0 223.5G 0 disk
|
|
├─sda1 8:1 0 600M 0 part /boot/efi
|
|
├─sda2 8:2 0 1G 0 part /boot
|
|
└─sda3 8:3 0 221.9G 0 part
|
|
├─rl-root 253:0 0 217.9G 0 lvm /
|
|
└─rl-swap 253:1 0 4G 0 lvm [SWAP]
|
|
sdb 8:16 0 1.5T 0 disk
|
|
sdc 8:32 0 12.8T 0 disk
|
|
sdd 8:48 0 12.8T 0 disk
|
|
sde 8:64 0 12.8T 0 disk
|
|
sdf 8:80 0 12.8T 0 disk
|
|
sdg 8:96 0 12.8T 0 disk
|
|
sdh 8:112 0 12.8T 0 disk
|
|
sdi 8:128 0 12.8T 0 disk
|
|
sdj 8:144 0 12.8T 0 disk
|
|
|
|
# create volume groups on each disk
|
|
vgcreate ceph-block-0 /dev/sdc ;\
|
|
vgcreate ceph-block-1 /dev/sdd ;\
|
|
vgcreate ceph-block-2 /dev/sde ;\
|
|
vgcreate ceph-block-3 /dev/sdf ;\
|
|
vgcreate ceph-block-4 /dev/sdg ;\
|
|
vgcreate ceph-block-5 /dev/sdh ;\
|
|
vgcreate ceph-block-6 /dev/sdi ;\
|
|
vgcreate ceph-block-7 /dev/sdj
|
|
|
|
# create logical volumes on each volume group
|
|
lvcreate -l 100%FREE -n block-0 ceph-block-0 ;\
|
|
lvcreate -l 100%FREE -n block-1 ceph-block-1 ;\
|
|
lvcreate -l 100%FREE -n block-2 ceph-block-2 ;\
|
|
lvcreate -l 100%FREE -n block-3 ceph-block-3 ;\
|
|
lvcreate -l 100%FREE -n block-4 ceph-block-4 ;\
|
|
lvcreate -l 100%FREE -n block-5 ceph-block-5 ;\
|
|
lvcreate -l 100%FREE -n block-6 ceph-block-6 ;\
|
|
lvcreate -l 100%FREE -n block-7 ceph-block-7
|
|
|
|
# create volume groups on the SSD disk
|
|
vgcreate ceph-db-0 /dev/sdb
|
|
|
|
# divide the SSD disk into 8 logical volumes to provide a DB device
|
|
lvcreate -L 180GB -n db-0 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-1 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-2 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-3 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-4 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-5 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-6 ceph-db-0 ;\
|
|
lvcreate -L 180GB -n db-7 ceph-db-0
|
|
```
|
|
|
|
Write the OSD service spec file and apply, this should only be run on a single _admin node, Ceph1.
|
|
|
|
```
|
|
# enter into a container with the toolchain and keys
|
|
cephadm shell -m /var/lib/ceph
|
|
|
|
# pull credentials from the database to a file for the ceph-volume tool
|
|
ceph auth get-or-create client.bootstrap-osd -o /var/lib/ceph/bootstrap-osd/ceph.keyring
|
|
|
|
# If there is an issue ingesting an disk to an OSD, all partition structures can be destroyed with the following command
|
|
#ceph-volume lvm zap /dev/sdb
|
|
#sgdisk --zap-all /dev/sdb
|
|
# there are a few methods to rescan disk and have for kernel address, often a reboot is the quickest way to get OSDs recognised after schedule for ingestion
|
|
# exit
|
|
# reboot
|
|
|
|
# Example methods of provisioning disk as OSD via the cli, **use the service spec yaml method**
|
|
|
|
## for LVM
|
|
#ceph-volume lvm prepare --data /dev/almalinux/osd0 --no-systemd
|
|
#ceph cephadm osd activate ceph1 # magic command that creates the systemd unit file(s) on the host to bring up an OSD daemon container
|
|
#ceph-volume lvm list
|
|
|
|
## for whole disk, manual method, this is probably a legacy method but is reliable
|
|
#ceph orch daemon add osd ceph1:/dev/sda
|
|
#ceph orch daemon add osd ceph1:/dev/sdb
|
|
|
|
# **Prefered method of provision using service specification**
|
|
|
|
## service spec method
|
|
## for whole disk or LVM, new drive-groups method with a single configuration and one-shot command
|
|
# only needs to be performed on one node, ceph1
|
|
# you can perform this on the native operating system, this will help put the osd_spec.yml file in source control
|
|
# for LVM partitions on whole disk in University this was done in the cephadm shell (cephadm shell -m /var/lib/ceph) as theis is where the ceph orch command seemd to work
|
|
|
|
# for use of any kind of discovery based auto selection of the disk you can query a disk to get traits, this should work on whole disk and LVMs alike
|
|
# ceph-volume inventory /dev/ceph-block-0/block-0
|
|
#
|
|
# ====== Device report /dev/ceph-db-0/db-0 ======
|
|
#
|
|
# path /dev/ceph-db-0/db-0
|
|
# lsm data {}
|
|
# available False
|
|
# rejected reasons Device type is not acceptable. It should be raw device or partition
|
|
# device id
|
|
# --- Logical Volume ---
|
|
# name db-0
|
|
# comment not used by ceph
|
|
|
|
# create the service spec file, this will include multiple yaml documents delimited by ---,
|
|
vi osd_spec.yml
|
|
|
|
---
|
|
service_type: osd
|
|
service_id: block-0
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-0/block-0
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-0
|
|
---
|
|
service_type: osd
|
|
service_id: block-1
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-1/block-1
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-1
|
|
---
|
|
service_type: osd
|
|
service_id: block-2
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-2/block-2
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-2
|
|
---
|
|
service_type: osd
|
|
service_id: block-3
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-3/block-3
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-3
|
|
---
|
|
service_type: osd
|
|
service_id: block-4
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-4/block-4
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-4
|
|
---
|
|
service_type: osd
|
|
service_id: block-5
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-5/block-5
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-5
|
|
---
|
|
service_type: osd
|
|
service_id: block-6
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-6/block-6
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-6
|
|
---
|
|
service_type: osd
|
|
service_id: block-7
|
|
placement:
|
|
hosts:
|
|
- ceph1
|
|
- ceph2
|
|
- ceph3
|
|
spec:
|
|
data_devices:
|
|
paths:
|
|
- /dev/ceph-block-7/block-7
|
|
db_devices:
|
|
paths:
|
|
- /dev/ceph-db-0/db-7
|
|
|
|
ceph orch apply -i osd_spec.yml # creates the systemd unit file(s) on the host to bring up OSD daemon containers (1 container per OSD)
|
|
|
|
# exit the container
|
|
|
|
# wait whilst OSDs are created, you will see a container per OSD
|
|
podman ps -a
|
|
ceph status
|
|
|
|
cluster:
|
|
id: 5b99e574-4577-11ed-b70e-e43d1a63e590
|
|
health: HEALTH_OK
|
|
|
|
services:
|
|
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 75m)
|
|
mgr: ceph1.fgnquq(active, since 75m), standbys: ceph2.whhrir, ceph3.mxipmg
|
|
osd: 24 osds: 24 up (since 2m), 24 in (since 3m)
|
|
|
|
data:
|
|
pools: 1 pools, 1 pgs
|
|
objects: 0 objects, 0 B
|
|
usage: 4.2 TiB used, 306 TiB / 310 TiB avail
|
|
pgs: 1 active+clean
|
|
|
|
# check OSD tree
|
|
ceph osd df tree
|
|
|
|
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
|
|
-1 309.82068 - 310 TiB 4.2 TiB 19 MiB 0 B 348 MiB 306 TiB 1.36 1.00 - root default
|
|
-3 103.27356 - 103 TiB 1.4 TiB 6.3 MiB 0 B 116 MiB 102 TiB 1.36 1.00 - host ceph1
|
|
0 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.0
|
|
4 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.4
|
|
8 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.8
|
|
11 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.11
|
|
12 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.12
|
|
16 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.16
|
|
18 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.18
|
|
23 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 1 up osd.23
|
|
-5 103.27356 - 103 TiB 1.4 TiB 6.3 MiB 0 B 116 MiB 102 TiB 1.36 1.00 - host ceph2
|
|
1 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 1 up osd.1
|
|
3 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.3
|
|
6 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.6
|
|
9 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.9
|
|
14 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.14
|
|
15 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.15
|
|
19 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.19
|
|
22 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.22
|
|
-7 103.27356 - 103 TiB 1.4 TiB 6.3 MiB 0 B 116 MiB 102 TiB 1.36 1.00 - host ceph3
|
|
2 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.2
|
|
5 hdd 12.90919 1.00000 13 TiB 180 GiB 804 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.5
|
|
7 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.7
|
|
10 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 15 MiB 13 TiB 1.36 1.00 0 up osd.10
|
|
13 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.13
|
|
17 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.17
|
|
20 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 1 up osd.20
|
|
21 hdd 12.90919 1.00000 13 TiB 180 GiB 808 KiB 0 B 14 MiB 13 TiB 1.36 1.00 0 up osd.21
|
|
TOTAL 310 TiB 4.2 TiB 19 MiB 0 B 348 MiB 306 TiB 1.36
|
|
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
|
|
```
|
|
|
|
Deleting OSDs, at least one OSD should be left for metrics/config pools to function, removing all OSDs will tank an install and is only useful to remove a ceph cluster, usually you would rebuild fresh.
|
|
|
|
```
|
|
# remove all OSDs, this is only useful if you intend to destroy the ceph cluster - DANGEROUS
|
|
# doesnt really work when all OSDs are removed as key operating pools are destroyed not just degraded
|
|
|
|
#!/bin/bash
|
|
for i in {0..12}
|
|
do
|
|
ceph osd out osd.$i
|
|
ceph osd down osd.$i
|
|
ceph osd rm osd.$i
|
|
ceph osd crush rm osd.$i
|
|
ceph auth del osd.$i
|
|
ceph osd destroy $i --yes-i-really-mean-it
|
|
ceph orch daemon rm osd.$i --force
|
|
ceph osd df tree
|
|
done
|
|
ceph osd crush rm ceph1
|
|
ceph osd crush rm ceph2
|
|
ceph osd crush rm ceph3
|
|
```
|
|
|
|
## Enable autotune memory usage on OSD nodes
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
```
|
|
ceph config set osd osd_memory_target_autotune true
|
|
ceph config get osd osd_memory_target_autotune
|
|
```
|
|
|
|
## Enable placement group autoscaling for any pool subsequently added
|
|
|
|
> This action should be performed on ceph1.
|
|
|
|
```
|
|
ceph config set global osd_pool_default_pg_autoscale_mode on
|
|
ceph osd pool autoscale-status
|
|
```
|
|
|
|
# Erasure coding
|
|
|
|
## Understanding EC
|
|
|
|
The ruleset for EC is not so clear especially for small clusters, the following explanation/rules should be followed for a small 3 node Ceph cluster. In fact you have only one available scheme in reality, K=2, M=1.
|
|
|
|
- K the number of chunks origional data is divided into
|
|
- M the extra codes (basically parity) used with the data
|
|
- N the number of chunks created for each piece of data K+M
|
|
- Crush failure domain - can be OSD, RACK, HOST (and a few more if listed in crushmap such as PDU, DATACENTRE), basically this dictates the dispersal of the M data (i would guess K data also, to allow for larger schemes RACK).
|
|
- Failure domains - OSD seems to be only for testing?, HOST seems to be the most typical use case, RACK seems very sensible but requires many nodes.
|
|
- **What you wont find documented clearly is that there need to be at least as many hosts as K+M when using the HOST scheme for resilliency.**
|
|
- A 3 node cluster can only support K=2,M=1.
|
|
- In a RACK failure domain, say there are 4 racks (with an equal amount of nodes and OSDs most likely), you will have K=3,M=1 allowing for 1 total rack failure.
|
|
- EC origionally supported RGW object storage only, RBD pools are now supported (using ec_overwrites) but the pool metadata must still reside on a replicated pool, Openstack has an undocumented setting to use the metadata/data pools.
|
|
|
|
3 Nodes configuration OSD vs HOST, illustrate the failure domain scheme differences:
|
|
|
|
- Using a K=2,M=1 and OSD failure domain could mean host1 gets K=1,M=1 and host2 gets K=1. If host1 goes down you wont be able to recreate the data.
|
|
- Using K=2,M=1 and HOST failure domain would mean host1 gets K=1, host2 gets K=1, host3 gets M=1 - each node gets a K or M, data and parity is dispursed equally and allows for 1 full node failure.
|
|
|
|
Ceph supports many different K,M schemes, this doesnt mean they work or offer the protection you want, in some cases the pool creation will stall where the scheme is inadvisable.
|
|
It is recommended that you never use more than 80% of the capacity of the storage, above 80% will have performance penalties as data is shuffled about, 100% will set the cluster in read only mode and probably damage in flight data as in any filesystem.
|
|
|
|
Redhat state that where there is K=4,M=2 you may use K=8,M=4 for greater resillency, they do not state that 12 nodes would be realistically required for this in HOST failure domain.
|
|
K=4,M=2 on a 12 node cluster in HOST failure domain mode would work just fine, it would use less CPU/RAM when writing the data chunks to disk, a client may get less read performance on a busy cluster as it would only pull from 50% of the cluster nodes.
|
|
Where K+M is an odd number and nodes is an even number (visa versa), data would not be equally distributed across the cluster, with large data files such as VM images the disparity maybe noticable even after automatic re-balancing.
|
|
In usual replication lets say there are 3 nodes and 2way replication has been set in the crush map, large files maybe written to two nodes filling them to capacity, whilst considerable free space will be shown available that is effectively unusable, re-balancing will not help.
|
|
|
|
Redhat supports the following schemes with the jerasure EC plugin (this is the default algorithum):
|
|
|
|
- K=8,M=3 (minimum 11 nodes using HOST failure domain)
|
|
- K=8,M=4 (minimum 12 nodes using HOST failure domain)
|
|
- K=4,M=2 (minimum 6 nodes using HOST failure domain)
|
|
|
|
## EC usable space
|
|
|
|
### Example 1
|
|
|
|
For illustration each node has 4 disks (OSDs) of 12TB thus 48TB raw disk, take the following example:
|
|
|
|
- minimum 3 nodes K=2,M=1 - 144TB raw disk - (12 OSD * (2 K / ( 2 K + 1 M)) * 12TB OSD Size * 0.8 (80% capacity) ) - 76TB usable disk VS 3way replication ((144TB / 3) * 0.8) 38.4TB
|
|
- minimum 4 nodes K=3,M=1 - 192TB raw disk - (16 OSD * (3 K / (3 K + 1 M)) * 12TB OSD Size * 0.8) - 115TB usable disk VS 3way replication ((192TB / 3) * 0.8) 51.2TB
|
|
- minimum 12 nodes K=9,M=3 - 576TB raw disk - (48 OSD * (9 K / (9 K + 3 M)) * 12TB OSD Size * 0.8) - 345TB usable disk VS 3way replication ((576TB / 3) * 0.8) 153.6
|
|
|
|
### University Openstack
|
|
|
|
3 nodes, 8 disks per node (excluding SSD db/wal), 14TB disks thus 336TB raw disk.
|
|
All possible storage schemes only allow for 1 failed HOST.
|
|
|
|
- In a 3 way replication we have 336/3 = 112 * 0.8 = 89.6TB usable space
|
|
- In a 2 way replication (more prone to bitrot) we have 336/2 = 168 * 0.8 = 134.4TB usable space
|
|
- In a EC scheme of K=2,M=1 we have 24 * (2 / (2+1)) * 14 * 0.8 = 179TB usable space
|
|
|
|
# Openstack RBD storage
|
|
|
|
> CephFS/RGW are not being used on this cluster, it is purely to be used for VM image storage.
|
|
> For further Openstack CephFS/RGW integration see the OCF LAB notes, these are a much more involved Openstack deployment.
|
|
|
|
- For RHOSP 16 the controller role must contain all of the ceph services for use with an Openstack provisioned or externally provisioned ceph cluster.
|
|
- The Roles created for the University deployment already contain the Ceph services.
|
|
|
|
## Undercloud Ceph packages
|
|
|
|
Ensure that your undercloud has the right version of `ceph-ansible` before any deployment.
|
|
|
|
Get Ceph packages.
|
|
|
|
> https://access.redhat.com/solutions/2045583
|
|
|
|
- Redhat Ceph 4.1 = Nautilus release
|
|
- Redhat Ceph 5.1 = Pacific release
|
|
|
|
```sh
|
|
sudo subscription-manager repos | grep -i ceph
|
|
|
|
# Nautilus
|
|
sudo subscription-manager repos --enable=rhceph-4-tools-for-rhel-8-x86_64-rpms
|
|
|
|
# Pacific (if you are using external Ceph from the opensource repos you will likely be using this)
|
|
#sudo dnf remove -y ceph-ansible
|
|
#sudo subscription-manager repos --disable=rhceph-4-tools-for-rhel-8-x86_64-rpms
|
|
sudo subscription-manager repos --enable=rhceph-5-tools-for-rhel-8-x86_64-rpms
|
|
|
|
# install
|
|
sudo dnf info ceph-ansible
|
|
sudo dnf install -y ceph-ansible
|
|
```
|
|
|
|
## Create Openstack pools - University uses EC pools, skip to the next section
|
|
|
|
Listed are the recomended PG allocation using redhat defaults, this isnt very tuned and assumed 100PGs per OSD on a 3 node cluster with 9 disks, opensource Ceph is now up to 250PGs per OSD.
|
|
|
|
As PG autoscaling is enabled, and as this release is later than Nautilus we can avoid specifying PGs, meaning each pool will be initially allocated 32PGs and scale from there.
|
|
|
|
RBD pools.
|
|
|
|
```sh
|
|
# Storage for OpenStack Block Storage (cinder)
|
|
#ceph osd pool create volumes 2048
|
|
ceph osd pool create volumes
|
|
|
|
# Storage for OpenStack Image Storage (glance)
|
|
#ceph osd pool create images 128
|
|
ceph osd pool create images
|
|
|
|
# Storage for instances
|
|
#ceph osd pool create vms 256
|
|
ceph osd pool create vms
|
|
|
|
# Storage for OpenStack Block Storage Backup (cinder-backup)
|
|
#ceph osd pool create backups 512
|
|
ceph osd pool create backups
|
|
|
|
# Storage for OpenStack Telemetry Metrics (gnocchi)
|
|
#ceph osd pool create metrics 128
|
|
ceph osd pool create metrics
|
|
|
|
# Check pools
|
|
ceph osd pool ls
|
|
|
|
device_health_metrics
|
|
volumes
|
|
images
|
|
vms
|
|
backups
|
|
metrics
|
|
```
|
|
|
|
## Create Erasure Coded Openstack pools
|
|
|
|
1. create EC profile (https://docs.ceph.com/en/latest/rados/operations/erasure-code/)
|
|
2. create metadata pools with normal 3 way replication (default replication rule in the crushmap)
|
|
3. create EC pools K=2,M=1,failure domain HOST
|
|
|
|
metadata pool - replicated pool
|
|
data pool - EC pool (with ec_overwrites)
|
|
|
|
| Metadata Pool | Data Pool | Usage |
|
|
| --- | --- | --- |
|
|
| volumes | volumes_data | Storage for OpenStack Block Storage (cinder) |
|
|
| images | images_data | Storage for OpenStack Image Storage (glance) |
|
|
| vms | vms_data | Storage for VM/Instance disk |
|
|
| backups | backups_data | Storage for OpenStack Block Storage Backup (cinder-backup) |
|
|
| metrics | metrics_data | Storage for OpenStack Telemetry Metrics (gnocchi) |
|
|
|
|
Create pool example:
|
|
|
|
```sh
|
|
# if you need to remove a pool, remember change back to false state after deletion
|
|
#ceph config set mon mon_allow_pool_delete true
|
|
|
|
# create new erasure code profile (default will exist)
|
|
ceph osd erasure-code-profile set ec-21-profile k=2 m=1 crush-failure-domain=host
|
|
ceph osd erasure-code-profile ls
|
|
ceph osd erasure-code-profile get ec-21-profile
|
|
|
|
crush-device-class=
|
|
crush-failure-domain=host
|
|
crush-root=default
|
|
jerasure-per-chunk-alignment=false
|
|
k=2
|
|
m=1
|
|
plugin=jerasure
|
|
technique=reed_sol_van
|
|
w=8
|
|
|
|
# delete an EC profile
|
|
#ceph osd erasure-code-profile rm ec-21-profile
|
|
|
|
# create pool this will host metadata only after issuing the rbd_default_data_pool, by default the crushmap rule will set as replicated, include the parameter to illustrate the metadata must replicated not erasure coded
|
|
ceph osd pool create volumes replicated
|
|
ceph osd pool application enable volumes rbd
|
|
ceph osd pool application get volumes
|
|
|
|
# create erasure code enabled data pool
|
|
ceph osd pool create volumes_data erasure ec-21-profile
|
|
ceph osd pool set volumes_data allow_ec_overwrites true # this must be set for RBD pools to make changes for constantly opened disk file
|
|
ceph osd pool application enable volumes_data rbd # Openstack will usually ensure the pool is RBD application enabled, when specifying a data disk we must explicitly set the usage/application mode
|
|
ceph osd pool application get volumes_data
|
|
|
|
# set an EC data pool for the replicated pool, 'volumes' will subsequently only host metadata - THIS is a magic command not documented until 2022, typically in non-RHOSP each service has its own client.<service> user and EC data pool override
|
|
rbd config pool set volumes rbd_default_data_pool volumes_data
|
|
|
|
# If using CephFS with manilla the pool creation is the same, however dictation usage of the data pool is a little simpler and specified in the volume creation, allow_ec_overwrites must also be set for CephFS
|
|
#ceph fs new cephfs cephfs_metadata cephfs_data force
|
|
|
|
# Check pools, notice the 3way replicated pool would consume a total of 97TB where EC efficienciy could now consume a total of 193TB, around 179TB usable at max performance according to the EC calculation previously explained in this document
|
|
ceph osd pool ls
|
|
|
|
device_health_metrics
|
|
volumes
|
|
images
|
|
vms
|
|
backups
|
|
metrics
|
|
|
|
ceph df
|
|
--- RAW STORAGE ---
|
|
CLASS SIZE AVAIL USED RAW USED %RAW USED
|
|
hdd 310 TiB 306 TiB 4.2 TiB 4.2 TiB 1.36
|
|
TOTAL 310 TiB 306 TiB 4.2 TiB 4.2 TiB 1.36
|
|
|
|
--- POOLS ---
|
|
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
|
|
device_health_metrics 1 1 0 B 30 0 B 0 97 TiB
|
|
volumes_data 7 32 0 B 0 0 B 0 193 TiB
|
|
volumes 8 32 0 B 1 0 B 0 97 TiB
|
|
images 9 32 0 B 1 0 B 0 97 TiB
|
|
vms 10 32 0 B 1 0 B 0 97 TiB
|
|
backups 11 32 0 B 1 0 B 0 97 TiB
|
|
metrics 12 32 0 B 1 0 B 0 97 TiB
|
|
images_data 13 32 0 B 0 0 B 0 193 TiB
|
|
vms_data 14 32 0 B 0 0 B 0 193 TiB
|
|
backups_data 15 32 0 B 0 0 B 0 193 TiB
|
|
metrics_data 16 32 0 B 0 0 B 0 193 TiB
|
|
```
|
|
|
|
Once Openstack starts to consume disk the EC scheme is apparent.
|
|
|
|
```sh
|
|
# we have created a single 10GB VM Instance, the 10GB is thin provisioned, this Instance uses 1.2GB of space
|
|
[Universityops@test ~]$ df -Th
|
|
Filesystem Type Size Used Avail Use% Mounted on
|
|
devtmpfs devtmpfs 959M 0 959M 0% /dev
|
|
tmpfs tmpfs 987M 0 987M 0% /dev/shm
|
|
tmpfs tmpfs 987M 8.5M 978M 1% /run
|
|
tmpfs tmpfs 987M 0 987M 0% /sys/fs/cgroup
|
|
/dev/vda2 xfs 10G 1.2G 8.9G 12% /
|
|
tmpfs tmpfs 198M 0 198M 0% /run/user/1001
|
|
|
|
# ceph shows some metadata usage (for the RBD disk image) and 1.3GB of data used in volumes_data, note under an EC scheme we see 2.0GB of consumed disk VS 3.9GB under a 3way replication scheme
|
|
[root@ceph1 ~]# ceph df
|
|
--- RAW STORAGE ---
|
|
CLASS SIZE AVAIL USED RAW USED %RAW USED
|
|
hdd 310 TiB 306 TiB 4.2 TiB 4.2 TiB 1.36
|
|
TOTAL 310 TiB 306 TiB 4.2 TiB 4.2 TiB 1.36
|
|
|
|
--- POOLS ---
|
|
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
|
|
device_health_metrics 1 1 0 B 30 0 B 0 97 TiB
|
|
volumes_data 7 32 1.3 GiB 363 2.0 GiB 0 193 TiB
|
|
volumes 8 32 691 B 6 24 KiB 0 97 TiB
|
|
images 9 32 452 B 18 144 KiB 0 97 TiB
|
|
vms 10 32 0 B 1 0 B 0 97 TiB
|
|
backups 11 32 0 B 1 0 B 0 97 TiB
|
|
metrics 12 32 0 B 1 0 B 0 97 TiB
|
|
images_data 13 32 1.7 GiB 220 2.5 GiB 0 193 TiB
|
|
vms_data 14 32 0 B 0 0 B 0 193 TiB
|
|
backups_data 15 32 0 B 0 0 B 0 193 TiB
|
|
metrics_data 16 32 0 B 0 0 B 0 193 TiB
|
|
|
|
|
|
```
|
|
|
|
## Create RBD user for Openstack, assign capabilities and retrieve access token
|
|
|
|
Openstack needs credentials to access disk.
|
|
Use method 3, generally this is the way Ceph administration is going.
|
|
|
|
```sh
|
|
# 1. Redhat CLI method, one-shot command
|
|
#
|
|
ceph auth add client.openstack mgr 'allow *' mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=backups, profile rbd pool=metrics'
|
|
|
|
# 2. Manual method, you can update caps this way however all caps must be added at once, they cannot be apended
|
|
#
|
|
ceph auth get-or-create client.openstack
|
|
ceph auth caps client.openstack mgr 'allow *' mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=backups, profile rbd pool=metrics'
|
|
|
|
# Tighter mgr access, this should be fine but not tested with Openstack (official documentation does not cover tighter security model)
|
|
#
|
|
#ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=backups, profile rbd pool=metrics' mgr 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=backups, profile rbd pool=metrics'
|
|
|
|
# 3. Config Method easier to script and backup/source-control
|
|
#
|
|
# 1) generate a keyring with no caps
|
|
# 2) add caps
|
|
# 3) import user
|
|
ceph-authtool --create-keyring ceph.client.openstack.keyring --gen-key -n client.openstack
|
|
|
|
# NON EC profile
|
|
nano -cw ceph.client.openstack.keyring
|
|
|
|
[client.openstack]
|
|
key = AQCC5z5jtOmJARAAiFaC2HB4f2pBYfMKWzkkkQ==
|
|
caps mon = 'profile rbd'
|
|
caps osd = 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images, profile rbd pool=backups, profile rbd pool=metrics'
|
|
caps mgr = 'allow *'
|
|
|
|
# EC profile
|
|
|
|
[client.openstack]
|
|
key = AQCC5z5jtOmJARAAiFaC2HB4f2pBYfMKWzkkkQ==
|
|
caps mon = 'profile rbd'
|
|
caps osd = 'profile rbd pool=volumes, profile rbd pool=volumes_data, profile rbd pool=vms, profile rbd pool=vms_data, profile rbd pool=images, profile rbd pool=images_data, profile rbd pool=backups, profile rbd pool=backups_data, profile rbd pool=metrics, profile rbd pool=metrics_data'
|
|
caps mgr = 'allow *'
|
|
|
|
ceph auth import -i ceph.client.openstack.keyring
|
|
ceph auth ls
|
|
``` |