Seven out of ten executives whose companies had made investments in artificial intelligence reported minimal or no impact from them, according to a 2019 research report from MIT. This isn’t because the technology isn’t there; machine learning is being applied successfully in a multitude of contexts all over the world. Two of the main factors contributing to the high failure rate of Enterprise AI projects are inadequate data infrastructure and talent scarcity.
It takes a long time to find the right data for a problem and preparing that data in order to feed it to a model is the most time-intensive part of the machine learning pipeline, averaging 80% of a data scientist’s time. Data scientists are rare and expensive and many organizations interested in implementing AI aren’t hiring the necessary number of data scientists or giving them the infrastructure they need to make Enterprise AI succeed.
But it doesn’t have to be this way. Instead of being another one of the failures, your project could become part of the 30 percent that succeeds, and you might not even need to hire more data scientists to do it. You can modernize your data infrastructure, cut down the time spent on data prep and successfully enable enterprise-wide AI, if you integrate one simple thing: a feature store.
What is a feature store?
A feature store is a repository of features, feature sets and feature values, along with their feature history. The feature store has a set of services that interact with this repository, which includes defining features, searching for features, retrieving the current value of features, associating meta-data with those features, defining a training set from groups of features and backfilling new features into training sets. In some implementations, feature stores have user interfaces that call those services and in others they are just APIs.
In practice, a feature store automates the input, tracking and governance of data into machine learning models. Feature stores compute and store features, allowing them to be registered, discovered, used and shared across a company. A feature store makes sure features are always up to date for predictions and maintains the history of each feature’s values in a consistent manner, so that models can be trained and re-trained.
What does a feature store do?
Feature stores take the most mundane, tedious and time-intensive data tasks out of the equation so data scientists can shift their focus from rote data plumbing to model building and experimentation.
Feature stores manage data pipelines that transform raw data into feature values. These can be scheduled pipelines that aggregate petabytes of data at a time (like calculating the average 30-, 60- and 90-day spending amounts of each customer of a large retailer), or real-time pipelines that are triggered by events and update feature values instantly (such as updating the sum total of today’s spending for a particular customer every time they swipe their credit card).
Feature stores serve a single vector of features made up of the freshest feature values to machine-learning models. For example, if an application wants to recommend a particular product to a user, the model may need to know the average amount that user has spent in a particular spending category as well as the total length of time spent shopping in the last 48 hours. The Feature Store will have the most up-to-date values for those metrics available in milliseconds for the model, instead of having to run the data pipeline to calculate them.
A feature store makes searching through available features and feature definitions simple and straightforward. It exposes APIs and UIs to the data scientist to see currently available features, pipelines and training datasets that are either being used in production models or under development. Data scientists can then pick and choose the features needed for their use case and easily incorporate them into models.
Feature stores add a much-needed structure to data engineering, allowing for reproducibility and explainability across the different teams in an enterprise. Providing a single consistent repository for features ensures that features throughout an enterprise will be calculated in the same way, so there’s no ambiguity in feature definitions across models. In addition, having access to pre-calculated features dramatically cuts down the amount of time data scientists spend on feature engineering, the single most time-intensive step of a model pipeline. Data scientists now have the time to focus on other things, which allows you to scale AI experimentation to the entire enterprise with only a handful of data scientists.
In summary …
As the world continues to move forward, machine learning will only become more important. A model is only as good as the features it’s been fed, which means the success of your company’s AI is only as good as the data you have. The quality of your data is only limited by your data infrastructure and the bandwidth of your data scientists. If you want your technological investments to pay off, building a feature store is one of the best ways to optimize your machine learning operations and ensure that your enterprise AI is among the successful 30%.
Monte Zweben is CEO of Splice Machine and has been featured in eWEEK eSPEAKS.