zfsのストレージプールを作っている。
dmesgの結果は次の通り。
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <ST8000DM 005-2EH112 1520> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 152D00539000
da0: 400.000MB/s transfers
da0: 7630885MB (15628053168 512 byte sectors)
da0: quirks=0xa<NO_6_BYTE,4K>
da1 at umass-sim0 bus 0 scbus5 target 0 lun 1
da2 at umass-sim0 bus 0 scbus5 target 0 lun 2
da1: <WDC WD80 PUZX-64NEAY0 1520> Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 152D00539000
da1: 400.000MB/s transfers
da1: 7630885MB (15628053168 512 byte sectors)
da1: quirks=0x2<NO_6_BYTE>
da2: <WDC WD80 PUZX-64NEAY0 1520> Fixed Direct Access SPC-4 SCSI device
da2: Serial Number 152D00539000
da2: 400.000MB/s transfers
da2: 7630885MB (15628053168 512 byte sectors)
da2: quirks=0x2<NO_6_BYTE>
da3 at umass-sim0 bus 0 scbus5 target 0 lun 3
da3: <WDC WD80 EFZX-68UW8N0 1520> Fixed Direct Access SPC-4 SCSI device
da3: Serial Number 152D00539000
da3: 400.000MB/s transfers
da3: 7630885MB (15628053168 512 byte sectors)
da3: quirks=0x2<NO_6_BYTE>
da4 at umass-sim0 bus 0 scbus5 target 0 lun 4
da4: <WDC WD80 EFZX-68UW8N0 1520> Fixed Direct Access SPC-4 SCSI device
da4: Serial Number 152D00539000
da4: 400.000MB/s transfers
da4: 7630885MB (15628053168 512 byte sectors)
da4: quirks=0x2<NO_6_BYTE>
da5 at umass-sim0 bus 0 scbus5 target 0 lun 5
da5: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da5: Serial Number 152D00539000
da5: 400.000MB/s transfers
da5: 7630885MB (15628053168 512 byte sectors)
da5: quirks=0x2<NO_6_BYTE>
random: unblocking device.
da6 at umass-sim0 bus 0 scbus5 target 0 lun 6
da6: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da6: Serial Number 152D00539000
da6: 400.000MB/s transfers
da6: 7630885MB (15628053168 512 byte sectors)
da6: quirks=0x2<NO_6_BYTE>
Trying to mount root from zfs:zroot/ROOT/default []…
da7 at umass-sim0 bus 0 scbus5 target 0 lun 7
da7: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da7: Serial Number 152D00539000
da7: 400.000MB/s transfers
da7: 7630885MB (15628053168 512 byte sectors)
da7: quirks=0x2<NO_6_BYTE>
da8 at umass-sim0 bus 0 scbus5 target 0 lun 8
da8: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da8: Serial Number 152D00539000
da8: 400.000MB/s transfers
da8: 7630885MB (15628053168 512 byte sectors)
da8: quirks=0x2<NO_6_BYTE>
つまり、
da0:8TB(ST8000DM)
da1:8TB(WD80PUZX)
da2:8TB(WD80PUZX)
da3:8TB(WD80EFZX)
da4:8TB(WD80EFZX)
da5:8TB(ST8000AS)
da6:8TB(ST8000AS)
da7:8TB(ST8000AS)
da8:8TB(ST8000AS)
となっている。続いてzpoolのステータス
# zpool status
pool: zbackup
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ‘zpool online’.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 0 in 0h0m with 0 errors on Mon Apr 23 06:29:16 2018
config:NAME STATE READ WRITE CKSUM
zbackup DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
da8 ONLINE 0 0 0
5017281946433150361 UNAVAIL 0 0 0 was /dev/da4
da0 ONLINE 0 0 0
da5 ONLINE 0 0 0errors: No known data errors
pool: zdata
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zdata ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da6 ONLINE 0 0 0errors: No known data errors
pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
ada0p4 ONLINE 0 0 0errors: No known data errors
da1・da2・da3・da4・da6でRAIDZ3。これはトリプルパリティで最大3台までのHDDが同時に死んでもデータは保全されるってやつ。8TBのHDDを5台使って容量16TB。
…うーん。安全性(ryだけどちょっと無駄が多すぎる希ガス
da0・da4・da5・da8はRAIDZ1で24TBのストレージ。しかしエラーが発生してる…。
zpoolコマンドで復旧してみるってのもひとつの経験としてアリだけど、年末大出血(って何)ってことで東芝の14TBのHDDを3台購入してきましたわけで。
以上の状態を踏まえてですね、
さて、どうするか。
まず、raidz3は少々やり過ぎな気もしないでもないけど、しかし海門のHDDには過去痛い目に遭ってるから、なるべく冗長度は大きくしたい。
ちなみに現在の使用量は、df -gすると
zdata 14327 8465 5861 59% /usr/home/jail/ほげほげ
って訳でまぁ概算9TB弱ぐらい使ってる。
って事は、14TBのHDDで新たにRAIDZ3を組んで14TBのストレーj
いや、3台じゃRAIDZ3組めないでしょ
しまった。
…しかし、raidzの特徴として、現在のストレージプールを構成しているハードディスクの容量が増えれば、勝手にプールの容量も増えるという、かつてnewfsとかシコシコやっていた頃に比べると考えれない利便性があるとのこと。
じゃあ、次の方針でやってみよう。
- 現在raidz3を構成している、da1・da2・da3・da4・da6のうち3台を14TBに交換。そしてまたそのうち金があったら14TBを2枚買い足す( ノД`)シクシク…
- その外した3台の8TBのHDDのうちどれかをda4の代わりに差し替える。
あーでも、現在使っているストレージタワー(裸族の云々)はフルスロットル埋まっているから、手順としては次のようにせざるを得ないね。
- da4を物理的に切り離し、代わりに14TBの生HDDを突っ込む。
- zdataからどれか1台を離脱させる。
- 離脱したHDDをzbackupに入れる。
- zdataに14TBのHDDを参加させる。
後はビルドが完了次第、zdataのうちどれか2台の8TBを離脱させ、そこに14TBのHDDを入れてやれば良いということ。ばっちぐー(古
1.障害が発生しているda4を切り離す作業
#zpool offline zbackup da4
とかやってみると、zpool statusした結果は、
pool: zbackup
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: resilvered 0 in 0h0m with 0 errors on Mon Apr 23 06:29:16 2018
config:NAME STATE READ WRITE CKSUM
zbackup DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
da8 ONLINE 0 0 0
5017281946433150361 OFFLINE 0 0 0 was /dev/da4
da0 ONLINE 0 0 0
da5 ONLINE 0 0 0
お、ちゃんとオフラインになった。
2.zdataからどれか1台を離脱させる作業
これ実稼働中のファイルシステムに行うのがメチャクチャ怖い…。もちろんraidz3だからHDDを離脱させても問題ないのは分かってるけど、心臓に悪いね。
(ST8000ASとST8000DMってどっちが信頼性高いんだろう。ASはアーカイブ用でDMの方が性能は高いのかな。でも昔クラッシュしてえらい目に遭ったのはDMだったよな。うーん…と5chで情報収集して悩む事ン十分)
・・・ん?
なんでda4が両方のストレージプールにあったの?(?_?)←今気づいた
と思って良く見たら。
da0・da5・da8でraidz0で構成しているのがzbackup
da1・da2・da3・da4・da6でraidz3を構成しているのがzdata
何だか良く分からなくなってきたから、da4を落とそう。えいやっ
# zpool offline zdata da4
でzpool statusをすると
# zpool status
pool: zbackup
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: resilvered 0 in 0h0m with 0 errors on Mon Apr 23 06:29:16 2018
config:NAME STATE READ WRITE CKSUM
zbackup DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
da8 ONLINE 0 0 0
5017281946433150361 OFFLINE 0 0 0 was /dev/da4
da0 ONLINE 0 0 0
da5 ONLINE 0 0 0errors: No known data errors
pool: zdata
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
7675701080755519488 OFFLINE 0 0 0 was /dev/da4
da6 ONLINE 0 0 0errors: No known data errors
…大丈夫なんだろうか。
とりま、ここでda4を物理的に外して、代わりに14TBの1枚目を投入してみる。
またこれがアナログ的に手法で、
#cat /dev/da4 > /dev/null
とかやってアクセスランプがパカパカするHDDを見つけるという方法。
よし。
#shutdown -p now
してHDDを取り出…
あれ?da7はどこに行ったの?
そう。実は古いgmirrorな使い方をしていた/dev/da7が遊んでいることに今気づいたのですよ。
ってー訳で、da4とda7を取り外して14TBを突っ込む。dmesgしてみると、
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 152D00539000
da0: 400.000MB/s transfers
da0: 7630885MB (15628053168 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
da1 at umass-sim0 bus 0 scbus5 target 0 lun 1
da1: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 152D00539000
da2 at umass-sim0 bus 0 scbus5 target 0 lun 2
da1: 400.000MB/s transfers
da1: 7630885MB (15628053168 512 byte sectors)
da1: quirks=0x2<NO_6_BYTE>
da2: <TOSHIBA MN07ACA14T 1520> Fixed Direct Access SPC-4 SCSI device
da2: Serial Number 152D00539000
da2: 400.000MB/s transfers
da2: 13351936MB (27344764928 512 byte sectors)
da2: quirks=0x2<NO_6_BYTE>
da3 at umass-sim0 bus 0 scbus5 target 0 lun 3
da3: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da3: Serial Number 152D00539000
da3: 400.000MB/s transfers
da3: 7630885MB (15628053168 512 byte sectors)
da3: quirks=0x2<NO_6_BYTE>
da4 at umass-sim0 bus 0 scbus5 target 0 lun 4
da4: <ST8000AS 0002-1NA17Z 1520> Fixed Direct Access SPC-4 SCSI device
da4: Serial Number 152D00539000
da4: 400.000MB/s transfers
da4: 7630885MB (15628053168 512 byte sectors)
da4: quirks=0x2<NO_6_BYTE>
random: unblocking device.
da5 at umass-sim0 bus 0 scbus5 target 0 lun 5
da5: <ST8000DM 005-2EH112 1520> Fixed Direct Access SPC-4 SCSI device
da5: Serial Number 152D00539000
da5: 400.000MB/s transfers
da5: 7630885MB (15628053168 512 byte sectors)
da5: quirks=0xa<NO_6_BYTE,4K>
Trying to mount root from zfs:zroot/ROOT/default []…
da6 at umass-sim0 bus 0 scbus5 target 0 lun 6
da6: <WDC WD80 PUZX-64NEAY0 1520> Fixed Direct Access SPC-4 SCSI device
da6: Serial Number 152D00539000
da6: 400.000MB/s transfers
da6: 7630885MB (15628053168 512 byte sectors)
da6: quirks=0x2<NO_6_BYTE>
da7 at umass-sim0 bus 0 scbus5 target 0 lun 7
da7: <WDC WD80 PUZX-64NEAY0 1520> Fixed Direct Access SPC-4 SCSI device
da7: Serial Number 152D00539000
da7: 400.000MB/s transfers
da7: 7630885MB (15628053168 512 byte sectors)
da7: quirks=0x2<NO_6_BYTE>
da8 at umass-sim0 bus 0 scbus5 target 0 lun 8
da8: <WDC WD80 EFZX-68UW8N0 1520> Fixed Direct Access SPC-4 SCSI device
da8: Serial Number 152D00539000
da8: 400.000MB/s transfers
da8: 7630885MB (15628053168 512 byte sectors)
da8: quirks=0x2<NO_6_BYTE>
da9 at umass-sim0 bus 0 scbus5 target 0 lun 9
da9: <TOSHIBA MN07ACA14T 1520> Fixed Direct Access SPC-4 SCSI device
da9: Serial Number 152D00539000
da9: 400.000MB/s transfers
da9: 13351936MB (27344764928 512 byte sectors)
da9: quirks=0x2<NO_6_BYTE>
ん?da2とda9でマウントされてる。まあいいや
# zpool status
pool: zbackup
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: resilvered 0 in 0h0m with 0 errors on Mon Apr 23 06:29:16 2018
config:NAME STATE READ WRITE CKSUM
zbackup DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
da3 ONLINE 0 0 0
5017281946433150361 OFFLINE 0 0 0 was /dev/da4
da5 ONLINE 0 0 0
da0 ONLINE 0 0 0errors: No known data errors
pool: zdata
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
7675701080755519488 OFFLINE 0 0 0 was /dev/da4
da1 ONLINE 0 0 0errors: No known data errors
pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
ada0p4 ONLINE 0 0 0errors: No known data errors
よしよし。じゃー突っ込むよー
あれ?こんなコマンドを使うのかな?
# zpool replace zdata da4 da2
そしたら表示が
pool: zdata
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 30 22:24:04 2018
32.3M scanned out of 19.5T at 2.49M/s, (scan is slow, no estimated time)
6.22M resilvered, 0.00% done
config:NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
7675701080755519488 OFFLINE 0 0 0 was /dev/da4
da2 ONLINE 0 0 0
da1 ONLINE 0 0 0errors: No known data errors
こんな風になった。多分これでリビルド(?)してるんだろう。(とHDDのアクセスランプを見に行く)
raidz3だったら同時に2台目も入れ替えても大丈夫だろうけど、若い頃だったら多分突っ走ってやってただろうけど大人になった今はそういう危ない事はやらない。
で、後はda4をzbackupに復帰させてみる。
# zpool online zbackup da4
# zpool status
pool: zbackup
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 30 22:36:02 2018
870M scanned out of 12.5T at 39.5M/s, 92h25m to go
197M resilvered, 0.01% done
config:NAME STATE READ WRITE CKSUM
zbackup ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da0 ONLINE 0 0 0errors: No known data errors
pool: zdata
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 30 22:24:04 2018
57.1G scanned out of 19.5T at 79.0M/s, 71h33m to go
11.4G resilvered, 0.29% done
config:NAME STATE READ WRITE CKSUM
zdata DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
7675701080755519488 OFFLINE 0 0 0 was /dev/da4
da2 ONLINE 0 0 0
da1 ONLINE 0 0 0errors: No known data errors
pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: none requested
config:NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
ada0p4 ONLINE 0 0 0errors: No known data errors
両方合わせて70時間ぐらい待ってれば良いっぽい。
という訳で今年の作業終了~。ふー
2019/1/3追記。
ビルドが終わったようなので、14TBのHDD2台目を行ってみる。
# zpool replace zdata da1 da9
# zpool status zdata
pool: zdata
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jan 3 05:56:07 2019
66.1M scanned out of 19.5T at 2.00M/s, (scan is slow, no estimated time)
12.6M resilvered, 0.00% done
config:NAME STATE READ WRITE CKSUM
zdata ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
da8 ONLINE 0 0 0
da2 ONLINE 0 0 0
replacing-4 ONLINE 0 0 0
da1 ONLINE 0 0 0
da9 ONLINE 0 0 0errors: No known data errors
よーし、これでまたリビルド待ち。