Thursday, Jun 22, 2017
HomeTopicsApplicationExecutive Viewpoint 2017 Prediction: Archive360 – Unleash the Value of Unstructured Data

Executive Viewpoint 2017 Prediction: Archive360 – Unleash the Value of Unstructured Data


Every organization faces the mundane challenge of cleaning up old, unstructured data from file shares, servers, desktops and the like.  And let’s be honest, this task has been moved to the “back burner” for years.  With the availability of low cost Public Cloud storage, I predict that in 2017 this trend will change and more organizations will reduce the load of unstructured data on the on premises infrastructure.

At first glance, managing unstructured data does not sound like a major issue, but consider that unstructured content accounts for 90% of all digital information.  Taking into account the cost of enterprise storage and the cost for the support and maintenance of aging document repositories; the proper management of unstructured data directly impacts the IT budget in a major way.

The first step to cleanup aging and potentially unwanted unstructured data is to analyze it carefully.  Take an inventory of your file shares and note the names, type, age and owner of all the files.  Work together with department business owners to classify data according to its business and legal value.  Data that has no business or legal value should be promptly removed.

Data required for business reference, legal, audits and compliance needs to be preserved. The key question is whether or not it makes sense to move this data to a more economical storage location.  New archiving applications are emerging that manage unstructured data in the Public Cloud for very low cost and provide important functions for indexing, search, export, disposition and access control.

2017 will be the year that organizations turn to the Public Cloud to dramatically reduce the cost of managing unstructured data.  In this article we will explore the benefits of the Public Cloud and learn why a Next Generation Archive application is the ‘key’ to unleashing the value of unstructured data.

Public Cloud

The Public Cloud is the perfect platform for managing unstructured data.  The Public Cloud offers virtually limitless scalability, increased flexibility and the ability to access and share data from any location.  But the most important reason is cost.  Amazon, Azure, Google and other leading Public Cloud vendors sell ‘cold’ blob storage for as little as $0.01 – $0.02 per GB per month.  Compared to the total cost of enterprise storage, you can enjoy 50X savings!

Another cost saving that is often over-looked is the cost saving of application retirement.  At a minimum, unstructured data rests on network file shares which have costly annual support and maintenance fees.  Quite often unstructured data is resting in an aging application repository or on premises archiving application that have costly support and maintenance fees and costly server and storage hardware.  Taken together, the total cost of keeping aging network file shares and application servers running on-premises can be shocking!

Next Generation Archiving

To reduce the cost of managing unstructured data a next generation archiving application is necessary.  The archiving application runs 100% in the cloud and performs as a thin layer on top of the blob storage providing collection, indexing, search and access control at a minimal cost.  The archive application runs on a virtual machine with a SQL database to store meta information.  Indexing is a critical service to enable content search and is available “as needed” by the cloud service provider.  And finally, web services, encryption, active directory, business analytics and more are useful services to complete the archiving application.

The good news is that all services are consumed on a “as-needed” basis thereby minimizing cost.  As storage demand scales up, low-cost storage is available automatically with unlimited capacity.  For eDiscovery, compute and indexing services can be scaled up to meet high demand and a tight deadline.  And importantly, when all the “fires” have been extinguished, storage and compute services can be immediately scaled back down to save cost.

Data Collection

Data collection is critical to the success of the archiving application.  It is a mistake to assume that files will be simply copied to the new repository.  The truth is unstructured data comes in many formats and locations that require a sophisticated approach.

Email data is a good example.  Email data is critical for business reference and it is critical for regulatory compliance and legal discovery.  Email data can be found in on premises email servers, email archives, journal archives and PST files.  Email collection tools identify active/inactive mailboxes, rehydrates email stubs if any and migrates email in its original format to the cloud repository.

SharePoint data is another good example.  SharePoint sites have undoubtedly spread throughout your organization and have created silos of information.  Much of this information is old and obsolete and is consuming valuable enterprise storage.  For SharePoint, the collection process provides tools to discovery sites and migrate content.

The location of unstructured data creates another challenge for collection.  Do you have access to file shares, desktops and laptops where data could be hiding?  Are you aware of the business needs of each location?  Unstructured data that belongs to HR and Finance will require special handling as compared to Marketing, for example.

The process of managing unstructured data begins with analysis, identification and classification of data.  Only then can proper decisions be made whether to keep delete data or preserve it in a cloud archive.

Indexing and Search

Collection of unstructured data for the cloud repository may be all that is required assuming that the archiving application preserves meta information like custodian and time stamp for each object it collects.  But, it would be nice to be able to search the content.  Before you can search data, it must be indexed and this is a very important function to consider.

Do you want to index all the content or would you rather just search subsets of the content?  The major issue to consider is cost.  Indexing consumes compute and storage resources.  Depending on the amount of content to index, this can be very expensive.  It is better to index only the data you wish to search and conserve compute and storage cost.  A simple example is to index all content for specific custodians for a specific period of time.

With search comes nice features like hit highlighting, legal holds, tagging, saving and exporting.  A lightweight case management application helps to organize multiple searches by matter and allows for easy access control.  Full auditing and reporting provides easy access to reports for audits and legal chain-of-custody evidence.  And automated disposition makes it easy to adhere to retention rules without incurring management overhead.


In 2017, archiving will emerge as a “killer” application for the Public Cloud.  The Public Cloud is the perfect platform for next generation archive applications that leverage its low cost blob storage and services.  Compared to typical enterprise storage, storage cost savings of 50X can be achieved and aging applications running on premises can be decommissioned, reducing support and maintenance cost.

The archive application performs as a thin layer on top of cold blob storage and provides important services your business requires for collection, search, index and access control.  The archive application runs 100% in the Public Cloud and consumes cloud services on a “as needed basis” thereby optimizing cost.

If your organization is currently spending hundreds of thousands of dollars managing aging, unwanted unstructured data, then 2017 is right time to begin shopping for a next generation archive application.


  • jim mc / December 7, 2016

    Bob- great article about next gen archiving.