Professional Documents
Culture Documents
o o o
Solaris Nevada release, build 27a and later Solaris Express releases Starting in the Solaris 10 6/06 release
Verify file system integrity - Many times, administrators simply want to make sure that there is no on-disk corruption within their file systems. With most file systems, this involves running fsck while the file system is offline. This can be time consuming and expensive. Instead, ZFS provides the ability to 'scrub' all data within a pool while the system is live, finding and repairing any bad data in the process. There are future plans to enhance this to enable background scrubbing.
Repair on-disk state - If a machine crashes, the on-disk state of some file systems will be inconsistent. The addition of journalling has solved some of these problems, but failure to roll the log may still result in a file system that needs to be repaired. In this case, there are well known pathologies of errors, such as creating a directory entry before updating the parent link, which can be reliably repaired. ZFS does not suffer from this problem because data is always consistent on disk. A more insidious problem occurs with faulty hardware or software. Even file systems or volume managers that have per-block checksums are vulnerable to a variety of other pathologies that result in valid but corrupt data. In this case, the failure mode is essentially random, and most file systems will panic (if it was metadata) or silently return bad data to the application. In either case, an fsck utility will be of little benefit. Since the corruption matches no known pathology, it will be likely be unrepairable. With ZFS, these errors
will be (statistically) nonexistent in a redundant configuration. In an non-redundant config, these errors are correctly detected, but will result in an I/O error when trying to read the block. It is theoretically possible to write a tool to repair such corruption, though any such attempt would likely be a one-off special tool. Of course, ZFS is equally vulnerable to software bugs, but the bugs would have to result in a consistent pattern of corruption to be repaired by a generic tool. During the 5 years of ZFS development, no such pattern has been seen.
Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match?
On UFS, the du command reports the size of the data blocks within the file. On ZFS, du reports the actual size of the file as stored on disk. This size includes metadata as well as compression. This reporting really helps answer the question of "how much more space will I get if I remove this file?" So, even when compression is off, you will still see different results between ZFS and UFS. When you compare the space consumption that is reported by the df command with the zfs list command, consider that df is reporting the pool size and not just file system sizes. In addition, df doesn't understand descendent datasets or whether snapshots exist. If any ZFS properties, such as compression and quotas, are set on file systems, reconciling the space consumption that is reported by df might be difficult. Consider the following scenarios that might also impact reported space consumption:
For files that are larger than recordsize, the last block of the file is generally about 1/2 full. With the default recordsize set to 128 KB, approximately 64 KB is wasted per file, which might be a large impact. The integration of RFE 6812608 would resolve this scenario. You can work around this by enabling compression. Even if your data is already compressed, the unused portion of the last block will be zero-filled, and compresses very well.
On a RAIDZ-2 pool, every block consumes at least 2 sectors (512-byte chunks) of parity information. The space consumed by the parity information is not reported, but because it can vary, and be a much larger percentage for small blocks, an impact to space reporting might be seen. The impact is more extreme for a recordsize set to 512 bytes, where each 512-byte logical block consumes 1.5 KB (3 times the space).
Regardless of the data being stored, if space efficiency is your primary concern, you should leave the recordsize at the default (128 KB), and enable compression (to the default of lzjb).
File system quotas (quota property) - ZFS file systems can be used as logical administrative control points, which allow you to view usage, manage properties, perform backups, take snapshots, and so on. For home directory servers, the ZFS model enables you to easily set up one file system per user. ZFS quotas are intentionally not associated with a particular user because file systems are points of administrative control. ZFS quotas can be set on file systems that could represent users, projects, groups, and so on, as well as on entire portions of a file system hierarchy. This allows quotas to be combined in ways that traditional peruser quotas cannot. Per-user quotas were introduced because multiple users had to share the same file system. ZFS file system quotas are flexible and easy to set up. A quota can be applied when the file system is created. For example:
# zfs create tank/home/users/user1 # zfs create tank/home/users/user2 # zfs list -r tank/home/users NAME tank/home/users tank/home/users/user1 USED 76.5K 24.5K AVAIL 20.0G 20.0G REFER 27.5K 24.5K MOUNTPOINT /tank/home/users /tank/home/users/user1
tank/home/users/user2
24.5K
20.0G
24.5K
/tank/home/users/user2
ZFS quotas can be increased when the disk space in the ZFS storage pools is increased while the file systems are active, without having any down time.
Reference file system quotas (refquota property) - File system quota that does not limit space used by descendents, including file systems and snapshots
User and group quotas (userquota and groupquota properties) - Limits the amount of space that is consumed by the specified user or group. The userquota or groupquota space calculation does not include space that is used by descendent datasets, such as snapshots and clones, similar to the refquota property.
In general, file system quotas are appropriate for most environments, but user/group quotas are needed in some environments, such as universities that must manage many student user accounts. RFE 6501037 has integrated into Nevada build 114 and the Solaris 10 10/09 release. An alternative to user-based quotas for containing disk space used for mail, is using mail server software that includes a quota feature, such as the Sun Java System Messaging Server. This software provides user mail quotas, quota warning messages, and expiration and purge features.
Why doesn't the space that is reported by the zpool list command and the zfs list command match?
The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool, but varies depending on the pool's redundancy level. See the examples below. The zfs list command lists the usable space that is available to file systems, which is disk space minus ZFS pool redundancy metadata overhead, if any.
A non-redundant storage pool created with one 136-GB disk reports SIZE and initial FREE values as 136 GB. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount pool metadata overhead.
# zpool create tank c0t6d0 # zpool list tank NAME tank SIZE 136G ALLOC 95.5K FREE 136G CAP 0% DEDUP 1.00x HEALTH ONLINE ALTROOT -
# zfs list tank NAME tank USED 72K AVAIL 134G REFER 21K o MOUNTPOINT /tank
A mirrored storage pool created with two 136-GB disks reports SIZE as 136 GB and initial FREE values as 136 GB. This reporting is referred to as the deflated space value. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount of pool metadata overhead.
# zpool create tank mirror c0t6d0 c0t7d0 # zpool list tank NAME tank SIZE 136G ALLOC 95.5K FREE 136G CAP 0% DEDUP 1.00x HEALTH ONLINE ALTROOT -
# zfs list tank NAME tank USED 72K AVAIL 134G REFER 21K MOUNTPOINT /tank
A RAIDZ-2 storage pool created with three 136-GB disks reports SIZE as 408 GB and initial FREE values as 408 GB. This reporting is referred to as the inflated disk space value, which includes redundancy overhead, such as parity information. The initial AVAIL space reported by the zfs list command is 133 GB, due to the pool redundancy overhead.
# zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0 # zpool list tank NAME tank SIZE 408G ALLOC 286K FREE 408G CAP 0% DEDUP 1.00x HEALTH ONLINE ALTROOT -
# zfs list tank NAME tank USED 73.2K AVAIL 133G REFER 20.9K MOUNTPOINT /tank
Can I use ZFS as my root file system? What about for zones?
You can install and boot a ZFS root file system starting in the SXCE build 90 release and starting in the Solaris 10 10/08 release. For more information, see ZFS Boot. ZFS can be used as a zone root path in the Solaris 10 10/08 release, but configurations that can be patched and upgraded are limited. Additional ZFS zone root configurations that can be patched and upgraded are supported starting in the Solaris 10 5/09 release. For more information, see the ZFS Admin Guide. In addition, you cannot create a cachefs cache on a ZFS file system.
AVS/ZFS demonstrations are available here. Keep the following cautions in mind if you attempt to split a mirrored ZFS configuration for cloning or backup purposes:
o o
Support for splitting a mirrored ZFS configuration integrated with RFE 5097228. You cannot remove a disk from a mirrored ZFS configuration, back up the data on the disk, and then use this data to create a cloned pool.
If you want to use a hardware-level backup or snapshot feature instead of the ZFS snapshot feature, then you will need to do the following steps:
o o o o
Any attempt to split a mirrored ZFS storage pool by removing disks or changing the hardware that is part of a live pool could cause data corruption.
0 0 0
0 0 0
0 0 0
In this release, you could potentially remove the mirrored log device ( mirror-1) as follows:
# zpool remove export mirror-1 # zpool status export pool: export state: ONLINE scrub: none requested config:
c1t4d0
ONLINE
Currently, only cache, log, and spare devices can be removed from a pool. 2. New dedup ratio property due to integration of 6677093 (zfs should have dedup capability) - The zpool list command includes
dedupratio for each pool. You can also display the value of this read-only property by using the zpool get command. For example: # zpool list NAME export rpool SIZE 928G 928G ALLOC 47.5G 25.7G FREE 881G 902G CAP 5% 2% DEDUP 1.77x 1.40x HEALTH ONLINE ONLINE ALTROOT -
# zpool get dedup rpool NAME rpool PROPERTY dedupratio VALUE 1.40x SOURCE -
3. The zpool list output has changed due to integration of 6897693 (deduplication can only go so far) - In previous releases, the zpool list command reported used and available physical block space and zfs list reports used and available space to the file system. The previous zpool list used and available columns have changed to report allocated and free physical blocks. These changes should help clarify the accounting difference reported by the zpool list and zfs list commands.
# zpool list NAME export rpool SIZE 928G 928G ALLOC 47.5G 25.7G FREE 881G 902G CAP 5% 2% DEDUP 1.77x 1.40x HEALTH ONLINE ONLINE ALTROOT -
Any scripts that utilized the old used and available properties of the zpool command should be updated to use the new naming conventions.
2. 3.
Which third party backup products support ZFS? Does ZFS work with SAN-attached devices?
Computer Associates' BrightStor ARCserve product backs up and restores ZFS file systems, but ZFS ACLs are not preserved.
o o o
Accidental overwrites or phantom writes Mis-directed reads and writes Data path errors
Keep the following points in mind when using ZFS with SAN devices:
Overall, ZFS functions as designed with SAN-attached devices, as long as all the drives are only accessed from a single host at any given time. You cannot share SAN disks between pools on the same system or different systems. This limitation includes sharing SAN disks as shared hot spares between pools on different systems.
If you expose simpler devices to ZFS, you can better leverage all available features. In summary, if you use ZFS with SAN-attached devices, you can take advantage of the self-healing features of ZFS by configuring redundancy in your ZFS storage pools even though redundancy is available at a lower hardware level.