Techdirt Insigit Community Share your feedback on the rapidly evolving
Storage Area Network (SAN) market.
Powered by the Techdirt Insight Community.

We Need a Storage Revolution

 

Stephen Foskett Submitted by Stephen Foskett on July 2nd, 2008

Many discussions in The Future of Storage have focused on the relative merits of one protocol or another, but I have been pleased to see a few touch on the core issue at hand: We continue to patch together a system based on outdated concepts. Most storage protocols continue to mimic direct attached storage, and most of our so-called networks act as point to point channels. An ultra-modern virtualized storage infrastructure with all the latest bells and whistles still holds the concepts of block and file at its core. Whenever the storage industry has tried to bring about real storage management they have been stymied by a lack of context for data. No amount of virtualization, and no new protocol, will fix this. Put simply, we need a storage revolution.

Channels, Blocks, and Files

Most innovation in the 1980s and early 1990s focused on moving storage out of the server. SCSI allowed disk to exist in a separate cabinet, RAID allowed multiple physical disks to become a virtual one, and these were mixed to become the prototype storage array. Although SCSI allowed one-to-many connectivity, it was never a true peer-to-peer network, even once it was mixed with network concepts in the form of Fibre Channel.

Even today, SAN storage is focused on providing faster, more flexible, and feature-packed direct-attached storage. A modern virtual SAN hides a complex arrangement of caching, data protection, tiered storage, replication, and deduplication, masquerading the lot as a simple, lowly disk drive. It is sad but true that all of our work as an industry has been dedicated to recreating what we started with.

Networked file-based storage is no better. Although NAS devices have all the advanced features of their SAN cousins, they must present a simple file tree to the host to retain compatibility. File virtualization merely presents a larger homogenous tree.

Inside the server, too, features and complexity are hidden to retain a familiar file system format. Volume managers can do anything a virtualization device can, but must present their output as a simple (though virtual) disk drive. File systems, too, have added features but still present a familiar tree of mount points, inodes, and files. Even ZFS, possibly the most advanced combination of volume management and file system technology yet, must present a simple tree of storage to applications.

The Metadata Roadblock

This outdated paradigm, of disks and file trees, is ill-suited to today’s storage challenges. Data must be categorized so actions can be taken to preserve or destroy it based on policies. Data must be searchable so users and applications can find what they want. Data must be flexible so it can be used in new ways. Our antiquated notions are not capable of meeting these challenges.

One simple problem is that we lack context for our data. Most file systems merely assign to a file a name, location, owner, and security attributes. The most advanced can contain extended metadata, but this is rarely seen in practice since many applications cannot agree on how to use this data. Microsoft’s Office suite can store and share extended file attributes, for example, but these live inside the file rather than in the file system. The promise of expanded Office attributes is only realized in conjunction with a content management system like SharePoint which lies above the lowly file system.

What if the storage system could keep this data instead? What if it could logically group files according to project or client, mining keywords and authors, and maintaining revisions? These concepts are not new, having been implemented in content management systems for years, and certain elements appeared in file systems, like Apple’s HFS and VMS’ Files-11, for decades.

Cut Down the Tree

File metadata would allow advanced features, but truly taking advantage of them requires a more fundamental shift in the way applications access files. Rather than sticking to a traditional hierarchy of directories in a tree (which was, after all, simply a primitive metadata system), we should remove the tree altogether. Allow files to become data objects, identified by arbitrary attributes and managed according to an overarching policy.

This future vision is decidedly different from our current notion of storage, but is not so far off. Many organizations now rely on central data warehouses based on SQL-language relational databases. As many storage managers have grumbled, databases tend to ignore storage management concepts entirely, managing their own content independently.

But not all applications need a database back-end, so another initiative seeks to provide generic object storage for wider use. Called content-addressable storage or CAS, these devices have traditionally been used only for archival purposes, since that was their first market application. As vendors break free of proprietary interfaces in favor of open ones like XAM, CAS could transform storage itself by eliminating both file and block storage at once.

Similar concepts are already at work in the so-called Web 2.0 world. Non-traditional databases like Google BigTable, Amazon S3, and Hadoop allow massive scalability for object storage. API-sharing initiatives with many Web 2.0 companies can be seen as similar prototypical object storage frameworks. Any of these could be leveraged to provide a new world of data storage, and many are gaining traction even now.

Although traditional block storage is here to stay for disk drives, and tree-type file systems are likely to remain the foundation of operating system storage, new object-based concepts could change the world in fundamental ways. As applications become “web aware”, they also become object aware, increasing the likelihood of such a storage revolution. For the majority of applications, this new world would be a welcome one indeed.

 

3 Responses to “We Need a Storage Revolution”

  1. Pana Says:

    User-taggable file data, Heat mapping and asociative network rings, see mind mapping software. This is what has to happen to storage of data, http://www.thebrain.com

    IMHO associative visualization, and the ideas of networks of association will cross polinate many disciplines in the future.

    Also concepts from the networking world need to be brought in like TTL (time to live), treat files as if they are giant packets that are temporarily stored on disks would go a long way towards automating and managing data when done intelligently.

    You can’t hope to sort files in any standardized way because data is constantly changing and being updated like a language, new types of media and content are created and new things are discovered.

    Files are basically giant packets, they exist in a circuit (input/output) so they should have a “time to live” or time to consider before deletion or archiving, while hot or frequently used, read and writed files stay near the speedier core. While files not that often used but occur at regular periodic intervals, are moved towards the center as the near static periodic time draws near.

    See an example of a heatmap here: http://blogoscoped.com/click2/

  2. Joseph Hunkins Says:

    An excellent “big insight” Stephen. There seem to be a decreasing number of reasons to confine thinking to modifying things along the traditional lines. Rather it may be time to bite the bullet and work towards a lot more implementations and storage solutions that use the blossoming cloud computing infrastructure.

  3. The Mental Pundit Says:

    Interesting assessment of technology old and new. I am not sure however I really agree with the break down.

    The article seems to my mind to make an apple and oranges comparison. It is comparing how data is physically stored to how data is logically addressed.

    Physically data resides on a block on a disk somewhere (as you mentioned). Logically it resides on a filesystem / database / object repository.

    So comparing Object storage to blocks/disk/filesystem is clearly misleading.

    Object Storage systems can and in fact currently all are built on block/disks. What’s more there is absolutely no reason why an Object Storage system can’t be built on top of a filesystem. You could for instance use ZFS as the system you use to manage all the underlining disk and do all the block allocations etc and build your object system on top of ZFS. I would suspect that that most Object Storage systems probably write to the disk directly but they don’t need to.

    “Although traditional block storage is here to stay for disk drives” – this isn’t 100% correct either. Next year you may see disk drives that utilise object address commands at the SCSI level. (do a google on “SCSI Object-Based Storage Device Commands (OSD)”). And just because the hardware is object storage, it doesn’t mean that it can’t be used to create a hierarchical traditional tree filesystem. There are lots of benefits of transferring a lot of intelligence into the disk even in a conventional filesystem.

    Lastly, accessing everything through a searchable / uniquely tagged object storage system isn’t necessarily a panacea. Sure, its great when things are nicely tagged but you are assuming an element of order that people don’t possess and applications are inadequate to provide. A tree filesystem for all its flaws, provides a browsable method of navigating your way to your data. Ie home / picture / digital camera / freds wedding . Tree structures force you to re-organise when the number of files in directories get too big, forcing you to create more sub classifications (ie dir) and move files around. How is that going to work with data stored solely through a metadata system, are you going to go through and add more tags / classification to everything?

    Object-based system are going to be part of the future but they have some limitation also. Maybe the answer is a hybrid system break down..

    The Mental Pundit
    /tmp

Leave a Reply