Given the price of hard drives and the number of drives you can put into a single system albeit a desktop or a server, a very common question is how to arrange the drives to improve performance. Consequently, a somewhat commonly asked Linux storage question that you see on various mailing lists is, which is better for data striping, RAID-0 with mdadm or LVM? While many people will correctly point out that this argument is somewhat pointless because each is really intended for different tasks, the question is still fairly common. Nonetheless, in the quest for the best performance possible, there is still the question of which one is better (whether it’s meaningless or not). In this article both concepts will be contrasted in regard to performance with some discussion about appropriateness. To add at least a little chaos to the situation, some simple IOzone benchmarks with RAID-0 and LVM will be presented.
Data Striping
In this example, the first data piece, A1, is sent to disk 0, the second piece, A2, is sent to disk 1, and so on.
There are two terms that help define the properties of RAID-0.
- Stripe Width
This is the number of stripes that can be read to or written from, at the same time. Very simply, this is the number of drives used in the RAID-0 group. In example 1 the stripe width is 2. - Stripe Size
The phrase refers to the size of the stripes on each drive. The phrases block size, chunk size, stripe length or granularity will sometimes be used in place of stripe size but they are all equivalent.
RAID-0 can, in many cases, help IO performance because of the data striping (parallelism). If the data is smaller than the stripe size (chunk size) then it will be written to only one disk not taking advantage of the striping. But if the data size is greater than the stripe size, then read/write performance should increase because of the ability to use more than one disk for a read or write. Increasing the stripe width adds more disks and can improve read/write performance if the stripe width (chunk size) is greater than the data size.
Striped mapping maps the physical volumes (typically the drives) to the logical volume that is then used as the basis of the file system. LVM takes the first few stripes from the first physical volume (PV0) and maps them to the first stripes on the logical volume (LV0). Then it takes the first few stripes from the next physical volume (PV1) and maps them to the next stripes in LV0. The next stripes are taken from PV0 and mapped to LV0 and so on until the stripes on PV0 and PV1 are all allocated to the logical volume, LV0.
The advantage of the striped mapping is similar to RAID-0. When data is read from or written to the file system and if the data is large enough, it spans multiple stripes so that both physical devices can be used, improving performance.
Contrasting RAID-0 and LVM
From the previous discussions it is obvious that both RAID-0 and LVM achieve improved performance because of data striping across multiple storage devices. So in that respect they are the same. However, LVM and RAID are used for different purposes, and in many cases are used together. Let’s look at both techniques from different perspectives.
The size (capacity) of a RAID-0 group is computed from the smallest disk size among the disks in the group, multiplied by the number of drives in the group. For example, if you have two drives where one drive is 250GB in size and the second drive is 200GB, then the RAID-0 group is 400GB in size, not 450GB. So RAID-0 does not allow you to use the entire space of each drive if they are different sizes.
On the other hand, LVM allows you to combine all of the space in all of the drives into a single virtual space. You can use stripe mapping across the drives as you would in RAID-0, with the capacity being the same as RAID-0. However, LVM allows you to also use the remaining space for additional volume groups (VGs).
In the case of mdadm and software RAID-0 on Linux, you cannot grow a RAID-0 group. You can only grow a RAID-1, RAID-5, or RAID-6 array. This means that you can’t add drives to an existing RAID-0 group without having to rebuild the entire RAID group but having to restore all the data from a backup.
However, with LVM you can easily grow a logical volume. But, you cannot use stripe mapping to add a drive to an existing striped logical volume because you can’t interleave the existing stripes with the new stripes. This link explains it fairly concisely.
“In LVM 2, striped LVs can be extended by concatenating another set of devices onto the end of the first set. So you can get into a situation where your LV is a 2 stripe set concatenated with a linear set concatenated with a 4 stripe set.”
Despite not being able to maintain a striped mapping in LVM, you can easily add space to a strpped logical volume.
This article, written by the original developers of LVM for Linux, present four advantages of LVM.
- Logical volumes can be resized while they are mounted and accessible by the database or file system, removing the downtime associated with adding or deleting storage from a Linux server
- Data from one (potentially faulty or damaged) physical device may be relocated to another device that is newer, faster or more resilient, while the original volume remains online and accessible
- Logical volumes can be constructed by aggregating physical devices to increase performance (via disk striping) or redundancy (via disk mirroring and I/O multipathing)
- Logical volume snapshots can be created to represent the exact state of the volume at a certain point-in-time, allowing accurate backups to proceed simultaneously with regular system operation
These four advantages point to the fact that LVM is designed for ease of management rather than performance.
Performance Comparison of RAID-0 and LVM Striped Mapping
The previous section contrasted RAID-0 and LVM from a conceptual perspective, but the question of which one is faster still remains (even if the question isn’t a good one). This section will present a performance comparison of RAID-0 using mdadm and LVM. However, in the interest of time it doesn’t follow our good benchmarking guidelines (a full set of benchmarks would take over 160 hours). In this case IOzone is used as the benchmark.
IOzone is run in two ways: (1) Throughput, and (2) IOPS. Also, only the write, read, random write, and random read tests are run but a range of record sizes will be tested. Unlike tests from previous articles, each test was only run 1 time using ext4. The test system used a stock CentOS 5.3 distribution but with a 2.6.30 kernel (from kernel.org) and e3fsprogs was upgraded to the latest version as of the writing of this article, 1.41.9. The tests were run on the following system:
- GigaByte MAA78GM-US2H motherboard
- An AMD Phenom II X4 920 CPU
- 8GB of memory
- Linux 2.6.30 kernel
- The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
- /home is on a Seagate ST1360827AS
- There are two drives for testing. They are Seagate ST3500641AS-RK with 16 MB cache each. These are
/dev/sdb
and/dev/sdc
.
Both drives, /dev/sdb and /dev/sdc, were used for all of the tests.
To help improve run times, 3 threads were used on the quad-core system. The fourth core was kept for the software RAID or LVM processing. So in the IOzone command lines the “-t 3” option means that three threads were used. In addition, each thread had a size of 3GB, resulting in a total data size of 12GB. The important point is that the total amount of data is larger than memory (12GB > 8GB).
For the throughput tests, the following IOzone command line was used.
./iozone -Rb spreadsheet_ext4_write_and_read_1K_1.wks -i 0 -i 1 -i 2 -e -+n -r 1k -s 3G -t 3 > output_ext4_write_and_read_1K_1.txt
The command line is shown with a 1KB record size.
The IOPS tests used the following IOzone command line.
./iozone -Rb spreadsheet_ext4_write_and_read_1K_1.wks -i 0 -i 1 -i 2 -e -O -+n -r 1k -s 3G -t 3 > output_ext4_write_and_read_1K_1.txt
The RAID-0 array was constructed relying on defaults as shown in a previous article. The command used to construct the array was the following.
[root@test64 laytonjb]# mdadm --create --verbose /dev/md0 --level raid0 --raid-devices=2 /dev/sdb1 /dev/sdc1
The “chunk size”, or stripe width, defaults to 64KB.
To contrast RAID-0 and LVM they need to be constructed as similarly as possible. This is a bit more difficult in LVM since it is different than RAID. The basics of LVM were discussed in a previous article. After the physical volumes (PV’s) were created they were grouped into a single volume group.
[root@test64 laytonjb]# /usr/sbin/vgcreate primary_vg /dev/sdb1 /dev/sdc1 Volume group "primary_vg" successfully created [root@test64 laytonjb]# /usr/sbin/vgdisplay --- Volume group --- VG Name primary_vg System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 931.52 GB PE Size 4.00 MB Total PE 238468 Alloc PE / Size 0 / 0 Free PE / Size 238468 / 931.52 GB VG UUID yjkNSQ-416l-f5Bt-RZLt-38NH-8LT6-QfrjeJ
The key to stripe mapping in LVM is how the logical volume is created. For this article the number of stripes (“-i” option) was arbitrarily chosen to be 2, and the stripe width (“-I” option) was chosen to be 64KB to match RAID-0. The total size of the LV was arbitrarily chosen to be 465GB. The command line for creating the LV was the following.
[root@test64 laytonjb]# /usr/sbin/lvcreate -i2 -I64 --size 465G -n test_stripe_volume primary_vg /dev/sdb1 /dev/sdc1 Logical volume "test_stripe_volume" created [root@test64 laytonjb]# /usr/sbin/lvdisplay --- Logical volume --- LV Name /dev/primary_vg/test_stripe_volume VG Name primary_vg LV UUID igTRtk-wcqn-YVzR-HNQh-Ki2b-HznC-HcW589 LV Write Access read/write LV Status available # open 0 LV Size 465.00 GB Current LE 119040 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 512 Block device 253:0
Then the file system is created using the logical volume test_stripe_volume.
[root@test64 laytonjb]# /sbin/mkfs -t ext4 /dev/primary_vg/test_stripe_volume mke2fs 1.41.9 (22-Aug-2009) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 30474240 inodes, 121896960 blocks 6094848 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 3720 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 21 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
RAID-0 and LVM Test Results
The two tables below present the throughput and IOPS results for both RAID-0 and LVM. Table 1 contains the throughput results.
Table 1 – Throughput Tests
RAID-0 mdadm | LVM | |||||||
---|---|---|---|---|---|---|---|---|
Record Size | Write (KB/s) |
Read (KB/s) |
Random Read (KB/s) |
Random Write (KB/s) |
Write (KB/s) |
Read (KB/s) |
Random Read (KB/s) |
Random Write (KB/s) |
161,898 | 145,404 | 1,378 | 3,108 | 159,412 | 109,844 | 981 | 1,751 | |
186,725 | 151,225 | 5,150 | 7,976 | 183,352 | 155,871 | 4,429 | 7,233 | |
183,341 | 156,619 | 16,910 | 24,748 | 185,189 | 155,294 | 20,247 | 24,106 | |
182,698 | 173,024 | 33,319 | 44,386 | 188,842 | 150,967 | 30,142 | 40,277 | |
182,957 | 157,612 | 59,890 | 44,386 | 189,571 | 123,128 | 34,578 | 41,502 | |
189,282 | 157,612 | 134,966 | 98,189 | 184,341 | 147,475 | 87,534 | 92,821 | |
191,890 | 169,098 | 187,623 | 119,255 | 187,019 | 143,780 | 115,939 | 111,758 | |
186,289 | 157,202 | 194,943 | 137,252 | 182,579 | 137,214 | 143,452 | 136,715 | |
184,611 | 148,623 | 203,796 | 141,693 | 187,268 | 146,750 | 238,120 | 139,860 | |
186,541 | 149,814 | 223,136 | 144,753 | 187,935 | 121.341 | 199,823 | 139,860 |
Table 2 below contains the throughput results for both RAID-0 and LVM.
Table 2 – IOPS Tests
RAID-0 mdadm | LVM | |||||||
---|---|---|---|---|---|---|---|---|
Record Size | Write (Ops/s) |
Read (Ops/s) |
Random Read (Ops/s) |
Random Write (Ops/s) |
Write (Ops/s) |
Read (Ops/s) |
Random Read (Ops/s) |
Random Write (Ops/s) |
181,457 | 161,357 | 1,545 | 2,156 | 176,556 | 106,719 | 836 | 894 | |
23,591 | 19,087 | 622 | 1,034 | 23,753 | 13,135 | 450 | 1,086 | |
5,763 | 6,291 | 617 | 796 | 5,836 | 3,709 | 529 | 748 | |
2,943 | 2,756 | 611 | 673 | 2,748 | 2,873 | 510 | 524 | |
1,483 | 1,228 | 323 | 331 | 1,492 | 989 | 261 | 282 | |
363 | 388 | 206 | 185 | 359 | 235 | 166 | 180 | |
178 | 161 | 141 | 112 | 189 | 143 | 108 | 109 | |
46 | 42 | 45 | 34 | 45 | 31 | 33 | 33 | |
24 | 20 | 26 | 17 | 22 | 15 | 21 | 17 | |
11 | 12 | 15 | 9 | 11 | 8 | 10 | 9 |
Even though the results did not follow our good benchmarking habits, which really limits our ability to make any conclusions, it is interesting to do a quick comparison.
- For both RAID-0 and LVM, as the record size increases, write throughput performance increases slightly and read performance remains about the same. Both random read and random write performance increases fairly dramatically as the record size increases.
- For both RAID-0 and LVM, as the record size increases write IOPS and read IOPS decreases dramatically (this is logical since you have fewer records reducing the IOPS). The same is true for random read IOPS and random write IOPS.
- Finally, while it is almost impossible to justify comparing RAID-0 and LVM performance, human nature will push it us to do a comparison. It appears as though RAID-0 offers a bit better throughput performance than LVM, particularly at the very small record sizes. The same is true for IOPS.
Summary
A fairly common question people ask is whether it is better to use data striping with RAID-0 (mdadm) or LVM. But in reality the two are different concepts. RAID is all about performance and/or data reliability while LVM is about storage and file system management. Ideally you can combine the two concepts but that’s the subject of another article or two.
In the interest of trying to answer the orignal question of which one is better, a quick test was run with IOzone. We did not use our good benchmarking skills in the interest of time, but the test results give some feel for the performance of both approaches. The performance was actually fairly close except for small record sizes (1KB – 8KB) where RAID-0 was much better.