ZFS Disk Mirroring, Striping and RAID-Z
This is the third in a series of tests1, but this time we’re going to test out how it handles multiple drives natively, rather than running over an existing software RAID+LVM setup. ZFS has the ability to dynamically add disks to a pool for striping (the default) mirroring or RAID-Z (with single or double parity) which are designed to improve speed (with striping), reliability (with mirroring) and performance and reliability (with RAID-Z).
I can’t use the same hardware as before for this testing, but I do happen to have an old (10+ years) Olivetti Netstrada with 4 200MHz Intel Pentium Pro processors, 256MB of RAM and 5 4GB SCSI drives. This means it’s a lot lot slower than the previous 2.6GHz P4 system and 1GB RAM with dual SATA’s so the overall times for the runs are not comparable at all – if for no other reason than Bonnie++ by default works with file sizes that are twice your RAM size to eliminate as much as possible of the effect of the OS caches.
So here is a result from this system for XFS on a single drive for comparison later on. No LVM here as the box is set up purely for testing ZFS.
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 10067 33 4901 17 9505 14 129.7 3
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 273 16 +++++ +++ 253 12 268 16 +++++ +++ 135 7
netstrada,496M,,,10067,33,4901,17,,,9505,14,129.7,3,
16,273,16,+++++,+++,253,12,268,16,+++++,+++,135,7
real 9m43.959s
user 0m1.650s
sys 1m20.160s
Note this is not my patched version of Bonnie++ that uses random data rather than 0’s for file content, but as I’m not going to test compression here it is unlikely to make much difference.
Now we want to set up ZFS. There are 4 drives completely free on the system (sdb, sdc, sdd, sde) so we’ll just use the bare drives, no need for partition tables now.
zpool create test /dev/sdb
Now we’ll create a file system that we’re going to work in. This is all the same as before because we’re not doing anything special. Yet.
zfs create test/volume1
Here’s a test result from just that drive.
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 3260 2 1778 2 6402 3 34.5 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 323 4 1067 7 301 3 329 4 1549 9 313 3
netstrada,496M,,,3260,2,1778,2,,,6402,3,34.5,0,
16,323,4,1067,7,301,3,329,4,1549,9,313,3
real 16m37.150s
user 0m2.170s
sys 0m23.330s
So not quite half the speed of XFS, but close. So, lets see what happens if I add another drive as a stripe.
Striping
Striping is the simplest way of adding a drive to a ZFS pool, all it takes is just:
zfs add test /dev/sdc
We can check the second drive has been added by doing zpool status, which says:
pool: test
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
/dev/sdb ONLINE 0 0 0
/dev/sdc ONLINE 0 0 0
errors: No known data errors
OK – now what has that done for performance ?
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 3175 2 1705 2 6104 3 34.9 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 325 4 1069 6 315 3 331 5 1557 6 345 3
netstrada,496M,,,3175,2,1705,2,,,6104,3,34.9,0,
16,325,4,1069,6,315,3,331,5,1557,6,345,3
real 16m46.772s
user 0m2.400s
sys 0m23.100s
Nothing at all really – so how about with all 4 drives striped in the pool ?
zpool add test /dev/sdd
zpool add test /dev/sde
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 3136 2 1704 2 5572 3 38.3 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 331 4 1055 6 330 3 333 4 1576 12 348 3
netstrada,496M,,,3136,2,1704,2,,,5572,3,38.3,0,16,331,4,1055, 6,330,3,333,4,1576,12,348,3
real 16m32.234s
user 0m1.990s
sys 0m23.720s
Still nothing – very odd indeed, but this may be one of the areas that work still has to be done.
Blow it away, start again
At the moment ZFS (on Solaris or Linux) only supports removing drives that are marked as being hot spares, so we’ll need to destroy this pool and start again. Once more it’s pretty easy to do (warning, no safety nets here, if you type the commands then your data will go away, pronto). First we need to remove any volumes in the pool.
zfs destroy -r test
Then we can destroy the pool itself.
zpool destroy test
Now we will start again at the same point as before, with just a single drive.
zfs add test /dev/sdb
zfs create test/volume1
Mirroring
To convert a single drive ZFS pool into a mirror we cannot use the zpool add command, we have to use zfs attach instead
zpool attach test /dev/sdb /dev/sdc
If we look at what zpool status says we see:
pool: test
state: ONLINE
scrub: resilver completed with 0 errors on Mon Jan 1 22:10:19 2007
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
mirror ONLINE 0 0 0
/dev/sdb ONLINE 0 0 0
/dev/sdc ONLINE 0 0 0
errors: No known data errors
So that confirms that we now have a mirror for testing – dead easy 2 ! So does it help with performance ?
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 3069 2 1484 2 5634 3 31.5 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 331 4 1087 7 338 4 329 4 1638 8 342 3
netstrada,496M,,,3069,2,1484,2,,,5634,3,31.5,0,16,331,4, 1087,7,338,4,329,4,1638,8,342,3
real 18m3.939s
user 0m2.130s
sys 0m23.560s
Er, no would appear to be the definitive answer. OK, so what about if we add the two remaining drives into the array and try again ?
zpool attach test /dev/sdb /dev/sdd
zpool attach test /dev/sdb /dev/sde
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 2475 1 1332 1 5638 3 29.2 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 324 4 1041 7 296 3 324 4 1424 8 307 3
netstrada,496M,,,2475,1,1332,1,,,5638,3,29.2,0,16,324,4, 1041,7,296,3,324,4,1424,8,307,3
real 19m59.974s
user 0m2.500s
sys 0m24.570s
So it appears that ZFS mirroring doesn’t impart any performance benefit, but is going to be very reliable.
RAID-Z
To test RAID-Z I’ll destroy the existing pool and then create a new RAID-Z pool using all 4 drives3.
zfs destroy -r test
zpool destroy test
zpool create test raidz /dev/sdb /dev/sdc /dev/sdd /dev/sde
zfs create test/volume1
This is reported by zfs status as:
pool: test
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/dev/sdb ONLINE 0 0 0
/dev/sdc ONLINE 0 0 0
/dev/sdd ONLINE 0 0 0
/dev/sde ONLINE 0 0 0
errors: No known data errors
OK – now lets see how that performs:
Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
netstrada 496M 3148 2 1475 2 5353 2 19.0 0
——Sequential Create—— ——–Random Create——–
-Create– –Read— -Delete– -Create– –Read— -Delete–
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 323 4 1064 6 319 3 322 4 1407 6 316 3
netstrada,496M,,,3148,2,1475,2,,,5353,2,19.0,0,16,323,4,1064,6,319,3, 322,4,1407,6,316,3
real 21m9.018s
user 0m2.250s
sys 0m23.130s
So slower again, but that’s going to be because it’s got to do parity calculations on top of its usual processing load.
Summary
The fact that you can easily make a RAID array or mirror and extend them from the command line is very nice, but the fact that adding drives to a striped system doesn’t seem to change the performance at all is a little odd. Now it could be that my test box is underpowered and that I was hitting its hardware limits, but the fact that XFS was much faster than it on a single drive seems to contradict that.
Looking at the results from zpool iostat -v whilst the RAID-Z case was doing its re-write test it seemed that the I/O load is being nicely balanced between the drives (see below), but it never seemed to exceed a little over 2MB/s. I did peek occasionally at zpool iostat (without the -v) and even with all 4 drives striped it didn’t exceed that barrier. It may be there is a bottleneck in the code further up that needs to be fixed first and after that the drives will hit their proper performance.
capacity operations bandwidth
pool used avail read write read write
———— —– —– —– —– —– —–
test 663M 15.0G 12 26 1.58M 1.85M
raidz1 663M 15.0G 12 26 1.58M 1.85M
/dev/sdb – – 9 25 403K 640K
/dev/sdc – – 9 25 404K 640K
/dev/sdd – – 9 26 403K 640K
/dev/sde – – 9 22 405K 637K
———— —– —– —– —– —– —–
Anyway, this is just an alpha release, so there’s much more to come!

Leave a Reply