Ceph

 

 
 
Ceph is a distributed object-based filesystem

It consists of four components:

  • Clients

The clients provide access to the filesystem

  • Metadata servers (MDSs)

the metadata servers manage the namespace hierarchy

  • Object-storage devices (OSDs)

the object-storage devices reliably store data in the form of objects

  • Monitors (MONs)

the monitors manage the server cluster map

A Ceph OSD Daemon (Ceph OSD) stores data, handles data replication, recovery, backfilling,
rebalancing, and provides some monitoring information to Ceph Monitors 
by checking other Ceph OSD Daemons for a heartbeat.
A Ceph Monitor maintains maps of the cluster state, including the monitor map,
the OSD map, the Placement Group (PG) map, and the CRUSH map. 
Ceph maintains a history (called an “epoch”) of each state change in the Ceph 
Monitors, Ceph OSD Daemons, and PGs.
A Ceph Metadata Server (MDS) stores metadata on behalf of the Ceph Filesystem
(i.e., Ceph Block Devices and Ceph Object Storage do not use MDS). 
Ceph Metadata Servers make it feasible for POSIX file system users to execute 
basic commands like ls, find, etc. without placing an enormous burden on the Ceph Storage Cluster.

 
 

Note

at least 3 OSDs for Ceph

The Ceph RESTAPI is a WSGI application and it listens on port 5000 by default.
 
Firewall:
Ceph Monitors communicate using port 6789 by default. Ceph OSDs communicate in a port range of 6800:7810 by default.
 
 

Design the nodes

點 我!
 
 

0. Docker Ceph

Ceph容器

使用Docker官方提供的 Ceph container
直接把玩Ceph
 
啟動單機版Ceph

docker run -d --net=host -v /etc/ceph:/etc/ceph -e MON_IP=192.168.10.149 -e CEPH_NETWORK=192.168.10.0/24 ceph/demo

 
檢查容器狀態
docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
0afcbd9ae9e0        ceph/demo           "/entrypoint.sh"    14 hours ago        Up 14 hours                             pensive_elion

 
快速進入該容器
docker exec -i -t 0afcbd9ae9e0 /bin/bash

 

Ceph指令

查看整個ceph集群的狀態

ceph status

 
或是watch整個ceph集群的狀態
ceph -w

 

RADOS指令

Pool是Ceph中的邏輯概念,不同的應用可以使用不同的Pool。

 
pool相關指令

rados lspools
rbd
cephfs_data
cephfs_metadata
.rgw.root
.rgw.control
.rgw
.rgw.gc

 
# 獲得特定pool數據
rados -p .rgw ls
rados -p .rgw.root ls

 
容量相關
# 獲得當前OSD所用空間
rados df

 

Bucket指令

建立Bucket

ceph osd tree
ID WEIGHT  TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0      up  1.00000          1.00000

 
ceph osd crush add-bucket rack01 rack
ceph osd crush add-bucket rack02 rack
ceph osd crush add-bucket rack03 rack

added bucket <RACK_NAME> type rack to crush map

 

ceph osd tree
ID WEIGHT  TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-5       0 rack rack03
-4       0 rack rack02
-3       0 rack rack01
-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0      up  1.00000          1.00000

 
移動Rack
ceph osd crush move rack01 root=default
ceph osd crush move rack02 root=default
ceph osd crush move rack03 root=default

moved item id -3 name 'rack01' to location {root=default} in crush map

 

ID WEIGHT  TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0        up  1.00000          1.00000
-3       0     rack rack01
-4       0     rack rack02
-5       0     rack rack03

 

Object操作

建立pool

ceph osd pool create op1 128 128

pool 'op1' created

 

rados lspools
rbd
cephfs_data
cephfs_metadata
.rgw.root
.rgw.control
.rgw
.rgw.gc
op1

 
添加object
echo "Hello Ceph, You are Awesome like MJ" > /tmp/helloceph
rados -p op1 put object1 /tmp/helloceph
rados -p op1 ls

object1

 

ceph osd map op1 object1

osdmap e22 pool 'op1' (7) object 'object1' -> pg 7.bac5debc (7.3c) -> up ([0], p0) acting ([0], p0)

 
查看object

cd /var/lib/ceph/osd/
 
ls ceph-0/current/7.3c_head/
cat ceph-0/current/7.3c_head/object1__head_BAC5DEBC__7

Hello Ceph, You are Awesome like MJ

 

RBD指令

建立image

rbd ls
rbd create ceph-client1-rbd1 --size 10240
rbd --image ceph-client1-rbd1 info
rbd image 'ceph-client1-rbd1':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.1027.74b0dc51
        format: 1

 
掛載block device
# 1. 掛載rbd內核模組
modprobe rbd
 
# 2. 建立 4GB image `test`
rbd create --size 4096 test
 
# 3. 將test這個image map到rbd pool
rbd map test --pool rbd
 
# 4. 接著就能愉快地使用block device
mkfs.ext4 /dev/rbd/rbd/test
mount /dev/rbd/rbd/test /rbd

 

1. Ceph

Deploy the nodes

User creation `ceph` for each nodes

useradd -d /home/ceph -m ceph
passwd ceph
echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
chmod 0440 /etc/sudoers.d/ceph

 
Note: 之後每台node都使用user ceph登入
 

 

Firewall and SELinux

disable firewall

systemctl disable firewalld
systemctl stop firewalld

Ceph Monitors communicate using port 6789 by default.
Ceph OSDs communicate in a port range of 6800:7810 by default.

 
disable SElinux

sed -i s'/SELINUX=enforcing/SELINUX=disabled'/g /etc/sysconfig/selinux

 
edit the sudoers file
sed -i s'/Defaults    requiretty/#Defaults    requiretty'/g /etc/sudoers

 

Shared sshkey

A couple additional operations need to be done on the admin machine. First, we need to configure the password-less SSH for the “ceph” user: on the administration machine where you will run ceph-deploy first of all create the same “ceph” user. After you login into the machine with this user, run ssh-keygen to create its ssh keys, with a blank passphrase. Finally, copy the SSH key to each Ceph node
 
on admin node

# Login ceph user
ssh-keygen
 
# shared keys
ssh-copy-id ceph@osd1
ssh-copy-id ceph@osd2
ssh-copy-id ceph@mon1
ssh-copy-id ceph@mon2
..

 
vim ~/.ssh/config
Host osd1
Hostname osd1
User ceph

Host osd2
Hostname osd2
User ceph

Host mon1
Hostname mon1
User ceph

Host mon2
Hostname mon2
User ceph
...
sudo chmod 440 ~/.ssh/config

 

Provision data disk

on OSD nodes

lsblk -f
 
sudo parted /dev/sdx
(parted) mklabel gpt
(parted) mkpart primary xfs 0% 100%
(parted) quit
 
sudo mkfs.xfs /dev/sdx1

 

Inatall ceph-deploy

on admin node

sudo vim /etc/yum.repos.d/ceph.repo
[ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-giant/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 
sudo yum install ceph-deploy

 
mkdir ceph-deploy

 
Note: 以後都在這資料夾下做事,原因config等資料都會存在這
 

 

Setup the cluster

Setup the monitor nodes on admin node

cd ~/ceph-deploy
ceph-deploy new mon1 mon2

 
成功設置好monitor之後,會產生ceph.conf
編輯她

vim ~/ceph-deploy/ceph.conf
[global]
...
public network = 192.168.10.0/24
cluster network = 192.168.10.0/24
#cluster network = 192.168.0.0/24
 
#Choose reasonable numbers for number of replicas and placement groups.
osd pool default size = 2 # Write an object 2 times
osd pool default min size = 1 # Allow writing 1 copy in a degraded state
osd pool default pg num = 256
osd pool default pgp num = 256
 
#Choose a reasonable crush leaf type
#0 for a 1-node cluster.
#1 for a multi node cluster in a single rack
#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
#3 for a multi node cluster with hosts across racks, etc.
osd crush chooseleaf type = 1

 

Install ceph

Install ceph for each node
on admin node

ceph-deploy install ceph-admin mon1 mon2 osd1 osd2

 
確認每一台都安裝成功

ceph -v

 
當所有node都安裝成功後, create monitor and gather keys
ceph-deploy mon create-initial


Note: if for any reason the command fails at some point, you will need to run it again, this time writing it as

ceph-deploy --overwrite-conf mon create-initial

 
An OSD can be created with this two commands, one after the other
ceph-deploy osd prepare osd1:vdx1
ceph-deploy osd activate osd1:vdx1
...

Or the combined command:
ceph-deploy osd create osd1:vdx1
...
ceph osd tree

 
Then MOUNTPOINT will be set
lsblk -f

 

Finalizing

To have a functioning cluster, we just need to copy the different keys and configuration files from the admin node (ceph-admin) to all the nodes

ceph-deploy admin ceph-admin mon1 mon2 osd1 osd2

Ensure the ceph.keyring file has appropriate permissions set (e.g., chmod 644) on your client machine.

/etc/ceph/ceph.client.admin.keyring

 

 
The cluster is ready! You can check it from the admin-node using these commands

ceph health
HEALTH_OK
ceph status

 

Create a new Ceph block volume

記住,確保ceph status為 HEALTH_OK
並且OSDs要是 up,PGs狀態應該為 active+clean

ceph status

 
Ok, time to create the block device
rbd create myrbd --size 20480

 
Check the block device
rbd ls

myrbd

 
To retrieve informations about the block device

rbd --image myrbd info
rbd image 'myrbd':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.10ba.238e1f29
        format: 1

 

RBD Kernel modules

To verify if your kernel already supports rbd modules, try to load the module itself

modprobe rbd

 
Let’s map the block image to a block device
# 將myrbd這個image map到rbd pool
rbd map myrbd --pool rbd
# 接著就能愉快地使用block device
mkfs.xfs /dev/rbd0
mount /dev/rbd0 /mnt/storage

 
Unmount the filesystem and remove the RBD device
# umount /mnt/storage
echo "0" >/sys/bus/rbd/remove

 

Online expansion

You better always check the actual consumption before ending up in a risky situation

# First the physical space
ceph -s
# Second, the space assigned to the RBD image
rbd --image myrbd info
# Finally, the real consumption of the thin image
rbd diff rbd/myrbd | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'

 
Expand the block device
rbd --pool=rbd --size=51200 resize myrbd

After this, on the linux repository you can resize the filesystem on-the-fly with

xfs_growfs /mnt/storage

 

client - block device quick start

Ensure your Ceph Storage Cluster is in an active+clean state before working with the Ceph Block Device

Install Ceph

# Verify that you have an appropriate version of the Linux kernel
lsb_release -a
uname -r
# See: http://docs.ceph.com/docs/v0.79/start/os-recommendations/
# On the admin node
 
# To install Ceph on your ceph-client node
ceph-deploy install ceph-client
 
# To copy the Ceph configuration file and
# the ceph.client.admin.keyring to the ceph-client
ceph-deploy admin ceph-client

Note: You must edit /etc/hosts first

 
Configure a block device

# On the ceph-client node
 
# Load the rbd client module
sudo modprobe rbd
 
# Create a block device image (optional)
# rbd create myrbd --size 4096 m {mon-IP}] [-k /etc/ceph/ceph.client.admin.keyring]
 
# Map the image to a block device
sudo rbd map myrbd --pool rbd --name client.admin [-m {mon-IP}] [-k /etc/ceph/ceph.client.admin.keyring]

 
 
 
 

附錄. 常用指令

ceph -v
ceph health
ceph status
or
ceph -w
ceph osd tree
ceph osd pool get rbd pg_num
ceph osd lspools
rbd ls
rbd create <RBD_NAME> --size <SIZE>
rbd --image <RBD_NAME> info
rados df
rados lspools

 

附錄. Trouble shooting

[ceph_deploy][ERROR ] UnsupportedPlatform: Platform is not supported: CAKE 3.0

sudo vim /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

 
[ceph_deploy][ERROR ] RuntimeError: remote connection got closed, ensure requiretty is disabled for node
sudo vim /etc/suoders
# mark this line
Defaults    requiretty

 
[ceph_deploy][ERROR ] RuntimeError: NoSectionError: No section: 'ceph'
sudo yum remove ceph-release

 
[ceph2][INFO ] Running command: sudo ceph —cluster=ceph —admin-daemon /var/run/ceph/ceph-mon.ceph2.asok mon_status
[ceph2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
It seems that names of your nodes differ from their real hostnames,
so that file /var/run/ceph/ceph-mon.combo-full.asok has the incorrect name. I had changed name of hosts and that worked for me.

 
ERROR: missing keyring, cannot use cephx for authentication
librados: client.admin initialization error (2) No such file or directory
sudo chmod 644 /etc/ceph/ceph.client.admin.keyring
ceph -w

 
[ceph@combo ceph]$ ceph health
HEALTH_WARN clock skew; Monitor clock skew detected
ceph health detail
Synchronize your clocks. Running an NTP client may help.
vim /etc/ntp.conf
sudo systemctl start ntpd.service

 
HEALTH_WARN detected on mon.node; 64 pgs stuck inactive; 64 pgs stuck unclean
That sounds like there aren't any OSD processes running and connected to the cluster
If you check the output of `ceph osd tree`
[ceph@combo ceph-deploy]$ ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.03998 root default
-2 0.01999     host combo
 0 0.00999         osd.0     down        0          1.00000
 1 0.00999         osd.1     down        0          1.00000
-3 0.01999     host node
 2 0.00999         osd.2     down        0          1.00000
 3 0.00999         osd.3     down        0          1.00000
Then look at the logs in /var/log/ceph/ceph-osd* to see why the OSD isn't connecting

2015-11-16 09:54:28.621789 7ff58f19b880 -1 unable to find any IP address in networks: 192.168.0.0/24

Sounds like your cluster network have to reset
sudo vim /etc/ceph/ceph.conf
sudo vim /home/ceph/ceph-deploy/ceph.conf
ceph-deploy osd activate osd1:vdx1
...

 
 
 
 
 

Not a big deal

[ceph@combo ceph-deploy]$ df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/vg_livecd-root            13G  2.7G   11G  21% /
devtmpfs                             2.0G     0  2.0G   0% /dev
tmpfs                                2.0G  8.0K  2.0G   1% /dev/shm
tmpfs                                2.0G  8.7M  2.0G   1% /run
tmpfs                                2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs                                2.0G   12K  2.0G   1% /tmp
/dev/mapper/vg_drbd-lv_drbd          997M   33M  965M   4% /lv_drbd
/dev/mapper/vg_livecd-mnt_storage     21G   33M   21G   1% /mnt/storage
/dev/vda1                            477M  110M  338M  25% /boot
/dev/mapper/vg_livecd-var             10G  253M  9.8G   3% /var
/dev/mapper/vg_livecd-var_lib_pgsql  1.2G  131M  1.1G  12% /var/lib/pgsql
/dev/vdb1                             10G  5.1G  5.0G  51% /var/lib/ceph/osd/ceph-0
/dev/vdc1                             10G  5.1G  5.0G  51% /var/lib/ceph/osd/ceph-1
除非特別註明,本頁內容採用以下授權方式: Creative Commons Attribution-ShareAlike 3.0 License