ZFS Disk Mirroring, Striping and RAID-Z

This entry was posted in Uncategorized and tagged , , , on June 17, 2012, by

This is the third in a series of tests1, but this time we’re going to test out how it handles multiple drives natively, rather than running over an existing software RAID+LVM setup. ZFS has the ability to dynamically add disks to a pool for striping (the default) mirroring or RAID-Z (with single or double parity) which are designed to improve speed (with striping), reliability (with mirroring) and performance and reliability (with RAID-Z).

I can’t use the same hardware as before for this testing, but I do happen to have an old (10+ years) Olivetti Netstrada with 4 200MHz Intel Pentium Pro processors, 256MB of RAM and 5 4GB SCSI drives. This means it’s a lot lot slower than the previous 2.6GHz P4 system and 1GB RAM with dual SATA’s so the overall times for the runs are not comparable at all – if for no other reason than Bonnie++ by default works with file sizes that are twice your RAM size to eliminate as much as possible of the effect of the OS caches.

So here is a result from this system for XFS on a single drive for comparison later on. No LVM here as the box is set up purely for testing ZFS.

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 10067 33 4901 17 9505 14 129.7 3

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 273 16 +++++ +++ 253 12 268 16 +++++ +++ 135 7

netstrada,496M,,,10067,33,4901,17,,,9505,14,129.7,3,

16,273,16,+++++,+++,253,12,268,16,+++++,+++,135,7

 

real 9m43.959s

user 0m1.650s

sys 1m20.160s

Note this is not my patched version of Bonnie++ that uses random data rather than 0’s for file content, but as I’m not going to test compression here it is unlikely to make much difference.

Now we want to set up ZFS. There are 4 drives completely free on the system (sdb, sdc, sdd, sde) so we’ll just use the bare drives, no need for partition tables now.

zpool create test /dev/sdb

Now we’ll create a file system that we’re going to work in. This is all the same as before because we’re not doing anything special. Yet.

zfs create test/volume1

Here’s a test result from just that drive.

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 3260 2 1778 2 6402 3 34.5 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 323 4 1067 7 301 3 329 4 1549 9 313 3

netstrada,496M,,,3260,2,1778,2,,,6402,3,34.5,0,

16,323,4,1067,7,301,3,329,4,1549,9,313,3

 

real 16m37.150s

user 0m2.170s

sys 0m23.330s

So not quite half the speed of XFS, but close. So, lets see what happens if I add another drive as a stripe.

Striping

Striping is the simplest way of adding a drive to a ZFS pool, all it takes is just:

zfs add test /dev/sdc

We can check the second drive has been added by doing zpool status, which says:

pool: test

state: ONLINE

scrub: none requested

config:

 

NAME STATE READ WRITE CKSUM

test ONLINE 0 0 0

/dev/sdb ONLINE 0 0 0

/dev/sdc ONLINE 0 0 0

 

errors: No known data errors

OK – now what has that done for performance ?

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 3175 2 1705 2 6104 3 34.9 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 325 4 1069 6 315 3 331 5 1557 6 345 3

netstrada,496M,,,3175,2,1705,2,,,6104,3,34.9,0,

16,325,4,1069,6,315,3,331,5,1557,6,345,3

 

real 16m46.772s

user 0m2.400s

sys 0m23.100s

Nothing at all really – so how about with all 4 drives striped in the pool ?

zpool add test /dev/sdd
zpool add test /dev/sde

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 3136 2 1704 2 5572 3 38.3 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 331 4 1055 6 330 3 333 4 1576 12 348 3

netstrada,496M,,,3136,2,1704,2,,,5572,3,38.3,0,16,331,4,1055, 6,330,3,333,4,1576,12,348,3

 

real 16m32.234s

user 0m1.990s

sys 0m23.720s

Still nothing – very odd indeed, but this may be one of the areas that work still has to be done.

Blow it away, start again

At the moment ZFS (on Solaris or Linux) only supports removing drives that are marked as being hot spares, so we’ll need to destroy this pool and start again. Once more it’s pretty easy to do (warning, no safety nets here, if you type the commands then your data will go away, pronto). First we need to remove any volumes in the pool.

zfs destroy -r test

Then we can destroy the pool itself.

zpool destroy test

Now we will start again at the same point as before, with just a single drive.

zfs add test /dev/sdb
zfs create test/volume1

Mirroring

To convert a single drive ZFS pool into a mirror we cannot use the zpool add command, we have to use zfs attach instead

zpool attach test /dev/sdb /dev/sdc

If we look at what zpool status says we see:

pool: test

state: ONLINE

scrub: resilver completed with 0 errors on Mon Jan 1 22:10:19 2007

config:

 

NAME STATE READ WRITE CKSUM

test ONLINE 0 0 0

mirror ONLINE 0 0 0

/dev/sdb ONLINE 0 0 0

/dev/sdc ONLINE 0 0 0

 

errors: No known data errors

So that confirms that we now have a mirror for testing – dead easy 2 ! So does it help with performance ?

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 3069 2 1484 2 5634 3 31.5 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 331 4 1087 7 338 4 329 4 1638 8 342 3

netstrada,496M,,,3069,2,1484,2,,,5634,3,31.5,0,16,331,4, 1087,7,338,4,329,4,1638,8,342,3

 

real 18m3.939s

user 0m2.130s

sys 0m23.560s

Er, no would appear to be the definitive answer. OK, so what about if we add the two remaining drives into the array and try again ?

zpool attach test /dev/sdb /dev/sdd
zpool attach test /dev/sdb /dev/sde

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 2475 1 1332 1 5638 3 29.2 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 324 4 1041 7 296 3 324 4 1424 8 307 3

netstrada,496M,,,2475,1,1332,1,,,5638,3,29.2,0,16,324,4, 1041,7,296,3,324,4,1424,8,307,3

 

real 19m59.974s

user 0m2.500s

sys 0m24.570s

So it appears that ZFS mirroring doesn’t impart any performance benefit, but is going to be very reliable.

RAID-Z

To test RAID-Z I’ll destroy the existing pool and then create a new RAID-Z pool using all 4 drives3.

zfs destroy -r test
zpool destroy test
zpool create test raidz /dev/sdb /dev/sdc /dev/sdd /dev/sde
zfs create test/volume1

This is reported by zfs status as:

pool: test

state: ONLINE

scrub: none requested

config:

 

NAME STATE READ WRITE CKSUM

test ONLINE 0 0 0

raidz1 ONLINE 0 0 0

/dev/sdb ONLINE 0 0 0

/dev/sdc ONLINE 0 0 0

/dev/sdd ONLINE 0 0 0

/dev/sde ONLINE 0 0 0

 

errors: No known data errors

OK – now lets see how that performs:

Version 1.03 ——Sequential Output—— –Sequential Input- –Random-

-Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–

Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

netstrada 496M 3148 2 1475 2 5353 2 19.0 0

——Sequential Create—— ——–Random Create——–

-Create– –Read— -Delete– -Create– –Read— -Delete–

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP

16 323 4 1064 6 319 3 322 4 1407 6 316 3

netstrada,496M,,,3148,2,1475,2,,,5353,2,19.0,0,16,323,4,1064,6,319,3, 322,4,1407,6,316,3

 

real 21m9.018s

user 0m2.250s

sys 0m23.130s

So slower again, but that’s going to be because it’s got to do parity calculations on top of its usual processing load.

Summary

The fact that you can easily make a RAID array or mirror and extend them from the command line is very nice, but the fact that adding drives to a striped system doesn’t seem to change the performance at all is a little odd. Now it could be that my test box is underpowered and that I was hitting its hardware limits, but the fact that XFS was much faster than it on a single drive seems to contradict that.

Looking at the results from zpool iostat -v whilst the RAID-Z case was doing its re-write test it seemed that the I/O load is being nicely balanced between the drives (see below), but it never seemed to exceed a little over 2MB/s. I did peek occasionally at zpool iostat (without the -v) and even with all 4 drives striped it didn’t exceed that barrier. It may be there is a bottleneck in the code further up that needs to be fixed first and after that the drives will hit their proper performance.

capacity operations bandwidth

pool used avail read write read write

———— —– —– —– —– —– —–

test 663M 15.0G 12 26 1.58M 1.85M

raidz1 663M 15.0G 12 26 1.58M 1.85M

/dev/sdb – – 9 25 403K 640K

/dev/sdc – – 9 25 404K 640K

/dev/sdd – – 9 26 403K 640K

/dev/sde – – 9 22 405K 637K

———— —– —– —– —– —– —–

Anyway, this is just an alpha release, so there’s much more to come!

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright 2017 ©Aceadmins. All rights reserved.