CAKE 安裝 GlusterFS 詳細流程

- - - 紀錄於機房三台server設定流程 - - -

Server HostName CAKE配置 對內(mgmt) 對外(web)
No.30 c1 Center+Node 192.168.10.10 140.92.25.126
No.29 n1 Node 192.168.10.11 140.92.25.238
No.32 n2 Node 192.168.10.12 140.92.25.229

 

Current status

DHT + Replica

Striped + Replica

Benchmarks on VM

1 VM / 2 VM / 3 VM

  • read / write on VM
  • create VM
  • snapshot VM
  • restore VM

如何增加/移除 compute node

有錯誤發生時?

Operation, File system installation

 

1. Install EPEL

touch /etc/yum.repos.d/Gluster.repo
[glusterfs-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/$basearch/
enabled=1
skip_if_unavailable=1
gpgcheck=1
gpgkey=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/pub.key

[glusterfs-noarch-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes.
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/noarch
enabled=1
skip_if_unavailable=1
gpgcheck=1
gpgkey=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/pub.key

[glusterfs-source-epel]
name=GlusterFS is a clustered file-system capable of scaling to several petabytes. - Source
baseurl=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/SRPMS
enabled=0
skip_if_unavailable=1
gpgcheck=1
gpgkey=http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/pub.key

 

2. Install glusterfs-server

yum install gdisk glusterfs{,-server,-fuse,-geo-replication} -y

 

3. clean Gluster.repo

rm /etc/yum.repos.d/Gluster.repo
yum clean all

 

4. glusterfs!

CAKE storage pool
  • center+node
    • /dev/vg_livecd/lv_mnt_storage
  • node
    • /dev/vg_livecd/lv_backup
CAKE mount point
  • for all
    • /mnt/storage

 


停止CAKE服務
- service cakeweb stop (center only)
- service caked stop
- service starterd stop (center only)
- service nfs stop (center only)
 
卸載mount point(for node)
- umount /mnt/storage
 
啟動GlusterFS服務 (for all)
- /etc/init.d/glusterd start or
- service glusterd start
 
加入trusted storage pools
- gluster peer probe n1
- gluster peer probe n2
 
mount the CAKE storage as a Gluster "brick" (center)
- mkdir /mnt/gfbrick
- mount /dev/vg_livecd/lv_mnt_storage /mnt/gfbrick/
- mkdir /mnt/gfbrick/brick1
 
mount the CAKE storage as a Gluster "brick" (node)
- mkdir /mnt/gfbrick
- mount /dev/vg_livecd/lv_backup /mnt/gfbrick/
- mkdir /mnt/gfbrick/brick1
 
建立GlusterFS DHT volume
- gluster volume create dht-vol c1:/mnt/gfbrick/brick1 n1:/mnt/gfbrick/brick1 n2:/mnt/gfbrick/brick1
- gluster volume start dht-vol
 
備份資料到Gluster volume (才能以DHT or Striped方式分散資料)
- mkdir /mnt/storage2
- mount -t glusterfs c1:/dht-vol /mnt/storage2/
- rsync -ravP /mnt/storage/* /mnt/storage2/
 
umout previous Gluster volume
- umount /mnt/storage2/ (only for center)
- umount /mnt/storage/ (only for center)
 
mount Gluster volume (for all)
- mount -t glusterfs c1:/dht-vol /mnt/storage/
 
重啟服務
- service caked restart
- service cakeweb restart (center only)

先不要開starterd, 因為mount選項還沒新增DFS mount option

 
End of story!
 


 
備註:
暫時不要重啟c1, n1, n2
也不可以開啟 starterd (會出錯)
 

5. Iometer

iometer it!

 

6. IOzone

wget http://www.iozone.org/src/current/iozone3_327.tar
tar xvf iozone3_327.tar
cd iozone3_327/src/current/
make linux
cd iozone3_327/src/current/
./iozone -s 30g -i 0 -i 1 -r 64k -f /mnt/storage/iozone-data -Rb ./iozone.xls

 
 

  

附錄一 - 新增掛載volume

停止服務 (center first)
- service cakeweb stop (only for center)
- service caked stop (for all)
 
新建volume
- 新增brick資料夾 (for all)
- mkdir /mnt/gfbrick/brick4/
- gluster volume create str-vol stripe 3 transport tcp c1:/mnt/gfbrick/brick4/ n1:/mnt/gfbrick/brick4/ n2:/mnt/gfbrick/brick4/
- gluster volume start str-vol
 
備份資料到新volume
- mkdir /mnt/storage2
- mount -t glusterfs c1:/str-vol /mnt/storage2/
- rsync -ravP /mnt/storage/* /mnt/storage2/
 
umout previous Gluster volume
- umount /mnt/storage2/ (only for center)
- umount /mnt/storage/ (for all)
 
停用舊的Gluster volume
- gluster volume stop dht-vol
 
mount new Gluster volume (for all)
- mount -t glusterfs c1:/str-vol /mnt/storage/
 
重啟服務
- service caked restart
- service cakeweb restart (center only)
 

附錄二 - 掛載回original CAKE nfs

停止服務 (center first)
- service cakeweb stop (only for center)
- service caked stop (for all)
 
umount CAKE storage
- umount /mnt/storage

關閉Gluster volume
- gluster volume stop xxx-vol

umount Gluster bricks
- umount /mnt/gfbrick

center original mount
- mount /dev/vg_livecd/lv_mnt_storage /mnt/storage

node original nfs mount
- 相關nfs參數可從 storage.ini 內取得
- mount -t nfs -o rw,vers=3,fg,proto=tcp,noacl,noatime,hard,intr,timeo=600 192.168.10.10:/mnt/storage /mnt/storage
 
重啟服務
- service caked restart
- service cakeweb restart (center only)
 

附錄三 - testing method of other paper

for writing measurement
- dd if=/dev/zero of=zerofile bs=4M count=2560 conv=fdatasync

for reading measurement
- dd if=zerofile of=/dev/null bs=4M count=2560
 

附錄四 - 移除volume (好像沒用)

- - 以dht-vol為例 - -
gluster volume stop dht-vol
gluster volume remove-brick dht-vol c1:/mnt/gfbrick/brick2 force
gluster volume remove-brick dht-vol n1:/mnt/gfbrick/brick2 force
gluster volume remove-brick dht-vol n2:/mnt/gfbrick/brick2 force
gluster volume delete dht-vol
 

附錄五 - Tuning

Key tuning parameters - 這只是參考

  • performance.write-behind-window-size 65535 (in bytes)
  • performance.cache-refresh-timeout 1 (in seconds)
  • performance.cache-size 1073741824 (in bytes)
  • performance.read-ahead off (only for 1GbE)
  • Default settings are suitable for mixed workloadsTuning

 
Disable offloading so TCP/IP functionality is NOT shifted to the LAN card. On both sender and receiver:

ethtool -K eth0 gso off
ethtool -K eth0 gro off
ethtool -K eth0 tso off

 
Turn off auto-tuning of the TCP receive buffer size. On the receiver:
sysctl net.ipv4.tcp_moderate_rcvbuf=0
or
echo 0 > /proc/sys/net/ipv4/tcp_moderate_rcvbuf

 
gluster volume set gv2 performance.io-thread-count 64  #default: 16
gluster volume set gv2 performance.write-behind-window-size 1GB  #default: 1MB
gluster volume set gv2 performance.cache-size 2GB  #default: 32MB 
gluster volume set gv2 performance.cache-max-file-size 16384PB
gluster volume set gv2 performance.cache-min-file-size 0
gluster volume set gv2 performance.cache-refresh-timeout 1
gluster volume set gv2 performance.read-ahead off  #(only for 1GbE)

gluster volume set gv2 cluster.stripe-block-size 128KB
cluster.self-heal-window-size 1024
cluster.self-heal-readdir-size 2KB
cluster.readdir-optimize on
# gluster volume set gluster cluster.min-free-disk 5%
# gluster volume set cluster.rebalance-stats  on
# gluster volume set cluster.readdir-optimize  on
# gluster volume set cluster.background-self-heal-count  20
# gluster volume set cluster.metadata-self-heal  on
# gluster volume set cluster.data-self-heal  on
# gluster volume set cluster.entry-self-heal: on
# gluster volume set cluster.self-heal-daemon  on
# gluster volume set cluster.heal-timeout  500
# gluster volume set cluster.self-heal-window-size  2
# gluster volume set cluster.data-self-heal-algorithm  diff
# gluster volume set cluster.eager-lock  on
# gluster volume set cluster.quorum-type  auto
# gluster volume set cluster.self-heal-readdir-size  2KB
# gluster volume set network.ping-timeout  5
>         performance.cache-size: 1gb
>         cluster.self-heal-daemon: off
>         cluster.data-self-heal-algorithm: full
>         cluster.metadata-self-heal: off
>         performance.cache-max-file-size: 2MB
>         performance.cache-refresh-timeout: 1
>         performance.stat-prefetch: off
>         performance.read-ahead: on
>         performance.quick-read: off
>         performance.write-behind-window-size: 4MB
>         performance.flush-behind: on
>         performance.write-behind: on
>         performance.io-thread-count: 32
>         performance.io-cache: on
>         network.ping-timeout: 2
>         nfs.addr-namelookup: off
>         performance.strict-write-ordering: on

 

附錄六 - CAKE HA 設定管理

After enable Center HA, you must check drbd (/opt/DynaVirtual_3.0/cluster/tools/commands/)
- ./check_drbd.sh
- 如果current state(cs)兩邊皆顯示為 connected 表示正常
- 如果顯示為 StandAlone/WFConnection 表示不正常 (裂腦問題)
 
Slave顯示為StandAlone, 執行
- ./drbdadm_discard-my-date.sh
- 狀態將更改為WFConnection等待連接
 
Master連接Slave
- ./drbdadm_connect_centerha.sh
 
再檢查一次
- ./check_drbd.sh
 

附錄七 - experimental results

dd


 

附錄八

Purge caches

sync; echo 3 > /proc/sys/vm/drop_caches

 

附錄九

NUFA Translator

The NUFA ("Non Uniform File Access") is a variant of the DHT ("Distributed Hash Table") translator,
intended for use with workloads that have a high locality of reference.

Instead of placing new files pseudo-randomly, it places them on the same nodes where they are created so that future accesses can be made locally.

For replicated volumes, this means that one copy will be local and others will be remote;
the read-replica selection mechanisms will then favor the local copy for reads.

For non-replicated volumes, the only copy will be local.
 
 
Interface

Use of NUFA is controlled by a volume option, as follows

gluster volume set <VOLNAME> cluster.nufa on

This will cause the NUFA translator to be used wherever the DHT translator otherwise would be. The rest is all automatic.
 

附錄十 - quota & auth.allow & top

gluster volume quota <VOL> enable
gluster volume quota <VOL> list
gluster volume quota <VOL> limit-usage / <size> 

gluster volume set <VOL> quota-deem-statfs on
gluster volume set <VOL> features.quota-timeout 5
gluster volume quota <VOL> alert-time 1d

gluster volume quota <VOL> remove <DIR>
gluster volume set <VOL> auth.allow <IP>
gluster volume top <VOL> open brick <brickPath> [list-cnt <value>]
gluster volume top <VOL> read brick <brickPath> [list-cnt <value>]
gluster volume top <VOL> write brick <brickPath> [list-cnt <value>]
gluster volume top <VOL> opendir brick <brickPath> [list-cnt <value>]
gluster volume top <VOL> readdir brick <brickPath> [list-cnt <value>]

 

附錄十一 - how to access gluster from multiple networks

詳見:http://www.nico.schottelius.org/blog/how-to-access-gluster-from-multiple-networks/
 
建立volume時不透過 IP,而是透過 hostname 的方式去建立
如此一來,在多張網卡的情況下(或是多張虛擬網卡)
可以達到奇效。

 
 
 
 
 
 
 


 
除非特別註明,本頁內容採用以下授權方式: Creative Commons Attribution-ShareAlike 3.0 License