In the year 2000, Microsoft introduced the support of sparse files with the release of New Technology File Systems (NTFS) Version 3.0. Operating systems based on the Windows NT family (starting from Windows 2000 and subsequent versions) are able to make use of this file management system. In this article we will look at what sparse files are and how they are used.
What are Sparse Files?
A sparse file is defined as a file type which efficiently handles empty regions within itself so that space utilization is optimized during storage.
In computing, a normal file may contain regions of empty blocks that do not contain actual information. This empty space is filled with bytes of zeros and stored alongside the regions that contain actual data in the file. Both the empty space and actual data take up file space. An example is a database file which stores a large volume of zero bytes to represent data that were either deleted or simply there to reserve those spaces for future data storage. Storing such files in standard format would take up a great deal of disk space that could be freed up for other purposes.
Through the use of sparse files functionality, a file containing many zero bytes is tagged as sparse and a special attribute will be associated with it. NTFS will then store such files in a different way from normal files. Only regions that contain actual data are allocated with storage space on the disk volume, while the zero bit data are not. The file system automatically tracks the location of these empty ranges and stores them in metadata as a representation of the actual empty blocks.
NTFS manages sparse files seamlessly in the background, filling the read memory buffer with zeros when a read operation tries to access the areas of the file where those zeros are located. The application is unaware of this conversion.
Sparse files are widely used in disk images, database files, log files and scientific applications.
Difference between File Compression and Sparse Files
Besides sparse files, NTFS also includes built-in functionality to compress files. Both tools are known for their space saving advantages on the disk volume, but they achieve that goal differently. The main disadvantage with using file compression is that it may degrade performance in a system while reading/writing the file. Precious resources are used for decompressing/compressing the file as required. Such overheads are sometimes not acceptable in certain critical applications.
Advantage and Disadvantage of Sparse Files
The biggest advantage with sparse files are that a user can create large-sized files that occupy very little storage space. Storage space is allocated automatically as data is written onto it. Large sparse files are created in a relatively shorter time as the file system does not need to pre-allocate disk space to write the zeros.
The benefits of sparse files is limited to applications which support them. If a program does not have the ability to recognize or utilize sparse files, then it would save a sparse file in its original, uncompressed state, resulting in no advantage. Users need to be careful in such situations as a sparse file that is only a few megabytes in size would suddenly swell to several gigabytes when non-supported applications copy them to the Destination.
Users can’t copy or create a sparse file if its nominal size is larger than the amount of free space (or quota size limits imposed on user accounts) available. For example, if the original size of a sparse file (with all the zero bytes) is 500MB, and there is a quota limit of 400MB on the user account used to create that sparse file, it would result in a quota-exceeded error even though the actual disk space occupied by the sparse file is only 50MB on the drive.
Hard disks that store sparse files are also prone to disk fragmentation, as the file system will write data to sparse files as required. This may lead to performance degradation over time. In addition, certain disk management utilities may inaccurately report the amount of free space available. When a file system that contains sparse files is nearly full, it can produce some unexpected results. For example, there may be “disk-full” errors when data is copied over an existing portion of a file that has been marked as sparse.
Backing up / restoring sparse files through Backup Applications
Some backup programs may not have the function to properly recognize, backup and/or restore sparse files. In such cases, a sparse file backed up by such applications will take up a lot more space on the Destination. Similarly, using a program to restore a properly backed up sparse file will cause the restored file to be stored in its original, expanded size. If the drive that the file is being restored to does not have enough space to contain the full file, it will result in “disk-full” errors.
The Backup and Synchronization commercial programs, SyncBackSE and SyncBackPro, come with options to correctly backup and/or restore sparse files. One of the three file copying methods available – Backup read/write file copying option (under Modify > Expert > Copy/Delete settings page) has the native function to backup sparse files correctly.
Alternatively, users can choose to use either of the other two file copying methods –Standard Windows file copying or Windows Explorer method of file copying, then enable the option Copy NTFS sparse files using Backup Read/Write copy method(under Copy/Delete > Advanced settings page). This option will allow SyncBackSE/Pro to automatically switch between Copying Methods whenever sparse files are found during copying. Note that it is also possible to copy sparse files without enabling either options, but those files will be backed up in their original, uncompressed sizes.
Sparse file copying is not supported in SyncBackFree, the freeware version of SyncBack.
The use of sparse files introduced in the NTFS file system comes with benefits and drawbacks which users may want to consider before implementation. Being aware of the issues sparse files can lead to will help you to avoid potential problems in the future.