Tuesday, Jan 16, 2018
HomeFeaturesExecutive Viewpoint 2018 Prediction: Cloudian

Executive Viewpoint 2018 Prediction: Cloudian

AI/ML/DL will only be as Successful as the Storage Technologies that Support them

The effectiveness of artificial intelligence (AI) will depend on how effectively businesses build their storage infrastructures. Companies that develop well-architected systems with built-in scalability, a plan for growth, and easy management of data will thrive with AI; companies that treat storage as an afterthought will lag behind.

Let’s use the metaphor of: the human brain. If AI or machine learning (ML) were brain functions, they’d probably take place in the cerebrum, the center of higher brain function. But the cerebrum can do nothing without the cerebral cortex and various components of the limbic system, which oversee the organization and retention of memory. Without the memories (and their associated data), the cerebrum has nothing to work with. It can’t recognize patterns, and it can’t adapt those patterns in response to new data if that data isn’t processed or stored efficiently.

In same way, the most advanced AI, ML and deep learning (DL) applications will be unable to advance past their infancy without access to a very large – and ever-growing – data set. Storage also must be highly accessible; otherwise, the AI application will be unable to get enough data to deliver precise, up-to-date suggestions in a timely fashion. That’s why an emerging set of new object storage APIs to optimize AI/ML workflows will be so important. One example is to enable a more extensive ability to handle “streaming” input, which is often the case with AI/ML workflows. Integrating Kafka or other streaming APIs to object storage opens up many new applications. Another example is data locality, or specific locations where data resides within a system, which has implications for meeting emerging data security standards. APIs that allow users to filter data by criteria are also important; object storage APIs will need to expand to handle more query capability to facilitate easier processing by applications.  Examples include filtering by time or by metadata value.

Business craves the wide-ranging benefits of versatile AI that can make decisions based on a broad set of data from multiple sources. In order to do that, AI needs a “memory” that is accessible, easily managed and scalable.

The brain’s function isn’t dependent on budget, and nobody has to weigh learning more against the expense of housing the resulting memories. But businesses do. It makes no sense to deploy an AI solution if its benefits are cancelled out by the expense of managing the storage of the data on which it is dependent.


  • Tim Wessels / December 21, 2017

    Well, I agree with Gary Ogasawara that any computer system that is purporting to do AI, ML or DL will need deep and cheap data storage to facilitate the process. The human brain may have up to 2.5PB of raw storage capacity, but it is not uniformly fast storage. It can take variable amounts of time to recall some data. But to answer a specific question or recommend a course of action, the brain appears to need just the data relevant to answering the question or recommending a course of action. It doesn’t need all of the data stored in the brain, just the relevant data. How does it get it quickly enough?

    The IBM Watson computer is a voracious consumer of data on many subjects, yet the size of Watson’s computational brain has decreased to the dimensions of three stacked pizza boxes. But like humans, IBM Watson needs only the data relevant to answer a particular question or recommend a course of action. It still needs capacity storage for data ingest and archiving, but it only needs the data relevant to answering a question or recommending a course of action to reside in its computational brain. The trick is how do you get the relevant data quickly into its fast machine memory to provide the answer to a question in a reasonable amount of time. The solution involves IBM Watson’s use of Unstructured Information Management Architecture (UIMA). IBM developed UIMA. Apache UIMA is an Apache Software Foundation Project.

    Unstructured data in humans and machines need good search and analysis capabilities to find relevant data with fast enough data paths to use it to answer questions or recommend a course of action.