Notes on File System Virtualization
What can File system virtualization do for us?
1. Virtualization works across devices
Instead of creating folders containing files from one server, for instance, the database administrator can create name spaces based on logical business subjects and assign files from multiple servers, even those running different operating systems and database software. One can have all the files needed in a single name space. He can open that logical folder with a single password and see all the files he needs to do his job on his computer screen. He can organize, access, combine and manipulate any of the data there without knowing or caring where the files are physically located.
2. It simplifies management
Easier data de-duplication, better server utilization, upgrade/repair/hierarchical based data movement without end user consciousness,
3. Virtualization eliminates geographical issues
The need to provide duplicate copies of data to each site greatly complicates the collaboration across geographical areas. Virtualization eliminates this issue. As long as the data resides somewhere on the corporate network, it can be added to the name space for any work group that needs it, and anyone with the security authorization can use it.
How it is different from Clustered File System?
A shared file system extends the file system concept by adding a mechanism for concurrency control. It provides each device accessing the file system with a consistent and serializable view of the file system, avoiding corruption and [unintended] data loss. Such file systems also usually employ some sort of a fencing mechanism to prevent data corruption in case of node failures.
Approaches on implementation
1. Distributed File System (Software)
It is a software-based technology that lives above the native file system and acts as a proxy to present a common name space for multiple files on different servers to users. Example is Microsoft DFS. Adavantages are its maturedness and stability. Disadvantages are such that this is restricted to Microsoft Windows servers, restricted to one kind of file systems only, no de-duplication, performance issues and poor co-ordination among geographical areas.
MSDN says, you should consider implementing DFS if:
- You expect to add file servers or modify file locations.
- Users who access targets are distributed across a site or sites.
- Most users require access to multiple targets.
- Server load balancing could be improved by redistributing targets.
- Users require uninterrupted access to targets.
- Your organization has Web sites for either internal or external use.
2. Global Unified Namespace – GUN (Hardware)
The approach is to put a physical device in the data stream on the network in front of the NAS and SAN servers. That box becomes the access point for all the end clients. It provides the full benefits of file virtualization, across heterogeneous populations of file servers, combining NAS and SAN access, and across any geographies. But disadvantages are less mature, less stable, less vendors support and single point of failure.
With in-band, or symmetric, virtualization, interoperability and security are major merits. On the other hand performance suffered.
With out-of-band, or asymmetric, virtualization, the virtualization function is typically executed in an appliance that lies outside the data path. The metadata and storage data traverse different paths on the network. In the case of an appliance, the metadata resides on the appliance that’s linked to the host by a separate connection, while the storage data travels across the network.
This approach reduces the data load on the network and is generally easier to scale than in-band virtualization. However, out-of-band virtualization products are typically proprietary, and execution requires installation of client software on the hosts. Out-of-band systems can also be more vulnerable to security threats, such as spoofing.
File Area Network (FAN)
FAN is one of a reference (contradicting, right?) architecture for File system virtualization implementation.
Following are salient features of File Area Network:
1. Both SAN and NAS environments are supported
2. Support for Global Unified Namespace (GUN)
3. File data optimization techniques range anywhere from duplicate data elimination via content addressed storage and commonality factoring to complex inline compression techniques that achieve maximum storage efficiency.
4. File security and DRM services
5. File management services – Quota administration, storage expansion and migration and replication services
6. The end clients could be on any type of platform or computing device.
7. Multiple way to connect end clients.
What is in market place?
1. Users are confused by the proliferation of vendors with products that all have some overlaps. File Area Network has been developed to overcome this issue. But it is yet to polular.
2. This technology is limited to unstructed data in file system. There is a scope to extend this to structured data like E-Mails, PDF, Calendar, etc.
3. The virtualization function can be performed on a server attached to the switch. The basic premise here is to move more of the virtualization “intelligence” to the network level. The switch or router resides in the data path between the hosts and the storage network, intercepting commands from the hosts to the storage system(s). The advantage of switch- and router-based virtualization is that you don’t need an agent on each host. Because of the hefty horsepower that resides in many of these systems, they also have the potential to provide better performance than more traditional approaches. Switch- and router-based virtualization also ranks high when it comes to security. On the downside, a switch or router on a storage network can become a single point of failure or a performance bottleneck.
In the next part of this article, let us see what are all the products available in market to fulfill the file system virtualization requirement.