Ceph is a distributed object-based filesystem

It consists of four components:

  • Clients

The clients provide access to the filesystem

  • Metadata servers (MDSs)

the metadata servers manage the namespace hierarchy

  • Object-storage devices (OSDs)

the object-storage devices reliably store data in the form of objects

  • Monitors (MONs)

the monitors manage the server cluster map

A Ceph OSD Daemon (Ceph OSD) stores data, handles data replication, recovery, backfilling,
rebalancing, and provides some monitoring information to Ceph Monitors 
by checking other Ceph OSD Daemons for a heartbeat.
A Ceph Monitor maintains maps of the cluster state, including the monitor map,
the OSD map, the Placement Group (PG) map, and the CRUSH map. 
Ceph maintains a history (called an “epoch”) of each state change in the Ceph 
Monitors, Ceph OSD Daemons, and PGs.
A Ceph Metadata Server (MDS) stores metadata on behalf of the Ceph Filesystem
(i.e., Ceph Block Devices and Ceph Object Storage do not use MDS). 
Ceph Metadata Servers make it feasible for POSIX file system users to execute 
basic commands like ls, find, etc. without placing an enormous burden on the Ceph Storage Cluster.



at least 3 OSDs for Ceph

The Ceph RESTAPI is a WSGI application and it listens on port 5000 by default.
Ceph Monitors communicate using port 6789 by default. Ceph OSDs communicate in a port range of 6800:7810 by default.

Design the nodes

點 我!

0. Docker Ceph


使用Docker官方提供的 Ceph container

docker run -d --net=host -v /etc/ceph:/etc/ceph -e MON_IP= -e CEPH_NETWORK= ceph/demo

docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
0afcbd9ae9e0        ceph/demo           "/entrypoint.sh"    14 hours ago        Up 14 hours                             pensive_elion

docker exec -i -t 0afcbd9ae9e0 /bin/bash




ceph status

ceph -w





rados lspools

# 獲得特定pool數據
rados -p .rgw ls
rados -p .rgw.root ls

# 獲得當前OSD所用空間
rados df




ceph osd tree
-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0      up  1.00000          1.00000

ceph osd crush add-bucket rack01 rack
ceph osd crush add-bucket rack02 rack
ceph osd crush add-bucket rack03 rack

added bucket <RACK_NAME> type rack to crush map


ceph osd tree
-5       0 rack rack03
-4       0 rack rack02
-3       0 rack rack01
-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0      up  1.00000          1.00000

ceph osd crush move rack01 root=default
ceph osd crush move rack02 root=default
ceph osd crush move rack03 root=default

moved item id -3 name 'rack01' to location {root=default} in crush map


-1 1.00000 root default
-2 1.00000     host node
 0 1.00000         osd.0        up  1.00000          1.00000
-3       0     rack rack01
-4       0     rack rack02
-5       0     rack rack03




ceph osd pool create op1 128 128

pool 'op1' created


rados lspools

echo "Hello Ceph, You are Awesome like MJ" > /tmp/helloceph
rados -p op1 put object1 /tmp/helloceph
rados -p op1 ls



ceph osd map op1 object1

osdmap e22 pool 'op1' (7) object 'object1' -> pg 7.bac5debc (7.3c) -> up ([0], p0) acting ([0], p0)


cd /var/lib/ceph/osd/
ls ceph-0/current/7.3c_head/
cat ceph-0/current/7.3c_head/object1__head_BAC5DEBC__7

Hello Ceph, You are Awesome like MJ




rbd ls
rbd create ceph-client1-rbd1 --size 10240
rbd --image ceph-client1-rbd1 info
rbd image 'ceph-client1-rbd1':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.1027.74b0dc51
        format: 1

掛載block device
# 1. 掛載rbd內核模組
modprobe rbd
# 2. 建立 4GB image `test`
rbd create --size 4096 test
# 3. 將test這個image map到rbd pool
rbd map test --pool rbd
# 4. 接著就能愉快地使用block device
mkfs.ext4 /dev/rbd/rbd/test
mount /dev/rbd/rbd/test /rbd


1. Ceph

Deploy the nodes

User creation `ceph` for each nodes

useradd -d /home/ceph -m ceph
passwd ceph
echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
chmod 0440 /etc/sudoers.d/ceph

Note: 之後每台node都使用user ceph登入


Firewall and SELinux

disable firewall

systemctl disable firewalld
systemctl stop firewalld

Ceph Monitors communicate using port 6789 by default.
Ceph OSDs communicate in a port range of 6800:7810 by default.

disable SElinux

sed -i s'/SELINUX=enforcing/SELINUX=disabled'/g /etc/sysconfig/selinux

edit the sudoers file
sed -i s'/Defaults    requiretty/#Defaults    requiretty'/g /etc/sudoers


Shared sshkey

A couple additional operations need to be done on the admin machine. First, we need to configure the password-less SSH for the “ceph” user: on the administration machine where you will run ceph-deploy first of all create the same “ceph” user. After you login into the machine with this user, run ssh-keygen to create its ssh keys, with a blank passphrase. Finally, copy the SSH key to each Ceph node
on admin node

# Login ceph user
# shared keys
ssh-copy-id ceph@osd1
ssh-copy-id ceph@osd2
ssh-copy-id ceph@mon1
ssh-copy-id ceph@mon2

vim ~/.ssh/config
Host osd1
Hostname osd1
User ceph

Host osd2
Hostname osd2
User ceph

Host mon1
Hostname mon1
User ceph

Host mon2
Hostname mon2
User ceph
sudo chmod 440 ~/.ssh/config


Provision data disk

on OSD nodes

lsblk -f
sudo parted /dev/sdx
(parted) mklabel gpt
(parted) mkpart primary xfs 0% 100%
(parted) quit
sudo mkfs.xfs /dev/sdx1


Inatall ceph-deploy

on admin node

sudo vim /etc/yum.repos.d/ceph.repo
name=Ceph noarch packages

sudo yum install ceph-deploy

mkdir ceph-deploy

Note: 以後都在這資料夾下做事,原因config等資料都會存在這


Setup the cluster

Setup the monitor nodes on admin node

cd ~/ceph-deploy
ceph-deploy new mon1 mon2


vim ~/ceph-deploy/ceph.conf
public network =
cluster network =
#cluster network =
#Choose reasonable numbers for number of replicas and placement groups.
osd pool default size = 2 # Write an object 2 times
osd pool default min size = 1 # Allow writing 1 copy in a degraded state
osd pool default pg num = 256
osd pool default pgp num = 256
#Choose a reasonable crush leaf type
#0 for a 1-node cluster.
#1 for a multi node cluster in a single rack
#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
#3 for a multi node cluster with hosts across racks, etc.
osd crush chooseleaf type = 1


Install ceph

Install ceph for each node
on admin node

ceph-deploy install ceph-admin mon1 mon2 osd1 osd2


ceph -v

當所有node都安裝成功後, create monitor and gather keys
ceph-deploy mon create-initial

Note: if for any reason the command fails at some point, you will need to run it again, this time writing it as

ceph-deploy --overwrite-conf mon create-initial

An OSD can be created with this two commands, one after the other
ceph-deploy osd prepare osd1:vdx1
ceph-deploy osd activate osd1:vdx1

Or the combined command:
ceph-deploy osd create osd1:vdx1
ceph osd tree

Then MOUNTPOINT will be set
lsblk -f



To have a functioning cluster, we just need to copy the different keys and configuration files from the admin node (ceph-admin) to all the nodes

ceph-deploy admin ceph-admin mon1 mon2 osd1 osd2

Ensure the ceph.keyring file has appropriate permissions set (e.g., chmod 644) on your client machine.



The cluster is ready! You can check it from the admin-node using these commands

ceph health
ceph status


Create a new Ceph block volume

記住,確保ceph status為 HEALTH_OK
並且OSDs要是 up,PGs狀態應該為 active+clean

ceph status

Ok, time to create the block device
rbd create myrbd --size 20480

Check the block device
rbd ls


To retrieve informations about the block device

rbd --image myrbd info
rbd image 'myrbd':
        size 10240 MB in 2560 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.10ba.238e1f29
        format: 1


RBD Kernel modules

To verify if your kernel already supports rbd modules, try to load the module itself

modprobe rbd

Let’s map the block image to a block device
# 將myrbd這個image map到rbd pool
rbd map myrbd --pool rbd
# 接著就能愉快地使用block device
mkfs.xfs /dev/rbd0
mount /dev/rbd0 /mnt/storage

Unmount the filesystem and remove the RBD device
# umount /mnt/storage
echo "0" >/sys/bus/rbd/remove


Online expansion

You better always check the actual consumption before ending up in a risky situation

# First the physical space
ceph -s
# Second, the space assigned to the RBD image
rbd --image myrbd info
# Finally, the real consumption of the thin image
rbd diff rbd/myrbd | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'

Expand the block device
rbd --pool=rbd --size=51200 resize myrbd

After this, on the linux repository you can resize the filesystem on-the-fly with

xfs_growfs /mnt/storage


client - block device quick start

Ensure your Ceph Storage Cluster is in an active+clean state before working with the Ceph Block Device

Install Ceph

# Verify that you have an appropriate version of the Linux kernel
lsb_release -a
uname -r
# See: http://docs.ceph.com/docs/v0.79/start/os-recommendations/
# On the admin node
# To install Ceph on your ceph-client node
ceph-deploy install ceph-client
# To copy the Ceph configuration file and
# the ceph.client.admin.keyring to the ceph-client
ceph-deploy admin ceph-client

Note: You must edit /etc/hosts first

Configure a block device

# On the ceph-client node
# Load the rbd client module
sudo modprobe rbd
# Create a block device image (optional)
# rbd create myrbd --size 4096 m {mon-IP}] [-k /etc/ceph/ceph.client.admin.keyring]
# Map the image to a block device
sudo rbd map myrbd --pool rbd --name client.admin [-m {mon-IP}] [-k /etc/ceph/ceph.client.admin.keyring]


附錄. 常用指令

ceph -v
ceph health
ceph status
ceph -w
ceph osd tree
ceph osd pool get rbd pg_num
ceph osd lspools
rbd ls
rbd create <RBD_NAME> --size <SIZE>
rbd --image <RBD_NAME> info
rados df
rados lspools


附錄. Trouble shooting

[ceph_deploy][ERROR ] UnsupportedPlatform: Platform is not supported: CAKE 3.0

sudo vim /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

[ceph_deploy][ERROR ] RuntimeError: remote connection got closed, ensure requiretty is disabled for node
sudo vim /etc/suoders
# mark this line
Defaults    requiretty

[ceph_deploy][ERROR ] RuntimeError: NoSectionError: No section: 'ceph'
sudo yum remove ceph-release

[ceph2][INFO ] Running command: sudo ceph —cluster=ceph —admin-daemon /var/run/ceph/ceph-mon.ceph2.asok mon_status
[ceph2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
It seems that names of your nodes differ from their real hostnames,
so that file /var/run/ceph/ceph-mon.combo-full.asok has the incorrect name. I had changed name of hosts and that worked for me.

ERROR: missing keyring, cannot use cephx for authentication
librados: client.admin initialization error (2) No such file or directory
sudo chmod 644 /etc/ceph/ceph.client.admin.keyring
ceph -w

[ceph@combo ceph]$ ceph health
HEALTH_WARN clock skew; Monitor clock skew detected
ceph health detail
Synchronize your clocks. Running an NTP client may help.
vim /etc/ntp.conf
sudo systemctl start ntpd.service

HEALTH_WARN detected on mon.node; 64 pgs stuck inactive; 64 pgs stuck unclean
That sounds like there aren't any OSD processes running and connected to the cluster
If you check the output of `ceph osd tree`
[ceph@combo ceph-deploy]$ ceph osd tree
-1 0.03998 root default
-2 0.01999     host combo
 0 0.00999         osd.0     down        0          1.00000
 1 0.00999         osd.1     down        0          1.00000
-3 0.01999     host node
 2 0.00999         osd.2     down        0          1.00000
 3 0.00999         osd.3     down        0          1.00000
Then look at the logs in /var/log/ceph/ceph-osd* to see why the OSD isn't connecting

2015-11-16 09:54:28.621789 7ff58f19b880 -1 unable to find any IP address in networks:

Sounds like your cluster network have to reset
sudo vim /etc/ceph/ceph.conf
sudo vim /home/ceph/ceph-deploy/ceph.conf
ceph-deploy osd activate osd1:vdx1


Not a big deal

[ceph@combo ceph-deploy]$ df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/vg_livecd-root            13G  2.7G   11G  21% /
devtmpfs                             2.0G     0  2.0G   0% /dev
tmpfs                                2.0G  8.0K  2.0G   1% /dev/shm
tmpfs                                2.0G  8.7M  2.0G   1% /run
tmpfs                                2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs                                2.0G   12K  2.0G   1% /tmp
/dev/mapper/vg_drbd-lv_drbd          997M   33M  965M   4% /lv_drbd
/dev/mapper/vg_livecd-mnt_storage     21G   33M   21G   1% /mnt/storage
/dev/vda1                            477M  110M  338M  25% /boot
/dev/mapper/vg_livecd-var             10G  253M  9.8G   3% /var
/dev/mapper/vg_livecd-var_lib_pgsql  1.2G  131M  1.1G  12% /var/lib/pgsql
/dev/vdb1                             10G  5.1G  5.0G  51% /var/lib/ceph/osd/ceph-0
/dev/vdc1                             10G  5.1G  5.0G  51% /var/lib/ceph/osd/ceph-1
除非特別註明,本頁內容採用以下授權方式: Creative Commons Attribution-ShareAlike 3.0 License