ZFS is a new kind of filesystem that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. We’ve blown away 20 years of obsolete assumptions, eliminated complexity at the source, and created a storage system that’s actually a pleasure to use.
ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of filesystems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all filesystems at all times.
All operations are copy-on-write transactions, so the on-disk state is always valid. There is no need to fsck(1M) a ZFS filesystem, ever. Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS will detect it and use another copy to repair it. ZFS introduces a new data replication model called RAID-Z. It is similar to RAID-5 but uses variable stripe width to eliminate the RAID-5 write hole (stripe corruption due to loss of power between data and parity updates). All RAID-Z writes are full-stripe writes. There’s no read-modify-write tax, no write hole, and — the best part — no need for NVRAM in hardware. ZFS loves cheap disks.
But cheap disks can fail, so ZFS provides disk scrubbing. Like ECC memory scrubbing, the idea is to read all data to detect latent errors while they’re still correctable. A scrub traverses the entire storage pool to read every copy of every block, validate it against its 256-bit checksum, and repair it if necessary. All this happens while the storage pool is live and in use.
ZFS has a pipelined I/O engine, similar in concept to CPU pipelines. The pipeline operates on I/O dependency graphs and provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. I/O loads that bring other filesystems to their knees are handled with ease by the ZFS I/O pipeline.
ZFS provides unlimited constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a filesystem, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.
ZFS backup and restore are powered by snapshots. Any snapshot can generate a full backup, and any pair of snapshots can generate an incremental backup. Incremental backups are so efficient that they can be used for remote replication — e.g. to transmit an incremental update every 10 seconds.
There are no arbitrary limits in ZFS. You can have as many files as you want; full 64-bit file offsets; unlimited links, directory entries, snapshots, and so on.
ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.
In addition to filesystems, ZFS storage pools can provide volumes for applications that need raw-device semantics. ZFS volumes can be used as swap devices, for example. And if you enable compression on a swap volume, you now have compressed virtual memory.
Sun just announced a series of open source storage appliances that use OpenSolaris and the ZFS file system. While the hardware includes some interesting options including solid state drives (SSD) for improving both read and write performance, the most alluring features are the file system and the analytics made available through SNIA-standard RPC calls, using the DTrace fault tracing system included in OpenSolaris. These features are not limited to Sun hardware, making it possible to duplicate the functionality with virtually any hardware.
[ Read the Test Center review of ZFS and view a screencast demo
. ]Among the ZFS goodies are an interesting feature called Hybrid Storage Pools, which integrate DRAM, read-optimized SSDs, write-optimized SSDs, and regular disk into a seamless whole. The SSDs are intended to replace small, expensive, read and write caches with higher-capacity 18GB write-biased SSDs and 100GB read-biased SSDs to get exceptional performance at a cost that should be competitive with more basic storage systems.
Sun has done considerable work on the SSDs to avoid the typical issues of limited life spans, over-provisioning the storage and optimizing wear-leveling algorithms to ensure that the SSDs should last a minimum of three years. Given how quickly SSDs are dropping in price, this seems a more than adequate lifetime.
DTrace is used to make all sorts of performance data available. The admin can drill down by file system, type of data, type of interface, and other parameters, finding which application is using the most I/Os, or the most bandwidth — even the life left in the SSD drives in the system, which enables extremely granular optimization. No partnerships have been announced yet, but Sun is working with many storage, virtualization, and systems management vendors to ensure that data interchange works well.
Between the management capabilities, the clustering capabilities of ZFS, and the data services such as snapshots, cloning, mirroring, replication, compression, thin provisioning, and support for iSCSI, CIFS, NFS, HTTP, and FTP protocols, the Fishworks storage system offers a lot of potential. The Sun hardware should provide good capabilities at a good price. But the best part is that the software magic is also available through the open source OpenSolaris and ZFS, as long as the hardware
good detail here: http://www.techworld.com/storage/features/index.cfm?featureid=2744
great article herE: http://www.tech-recipes.com/rx/1446/zfs_ten_reasons_to_reformat_your_hard_drives/
Now if you’re interested in trying it out (and who after reading what it can do is not?) try the following links:
- OSX – http://zfs.macosforge.org/trac/wiki
- FreeBSD – FreeBSD 7.0 now has excellent ZFS support
- Windows – oh boy… start reading again from the top.