I attended GigaOm’s Structure Data conference last year, and the conference was all about the promise of Hadoop, big data and unstructured data, which is somewhat ironic for an event with “structure” in its title.
This year’s event, March 19-20, was more about the deliverables in the form of case studies, user experiences and the limits of what to realistically expect from your big plans for big data. The change from promise to reality is welcome. Here are my five top takeaways from the first day of the event.
1. Big data and the deployment of Hadoop-type infrastructures are as much about process as they are technology. “Hadumping” was one bit of slang making the rounds regarding customers filling their big data lakes with a bunch of data and then trying to figure out what to do with it. The term came from former rocket scientist and now Turner Broadcasting analyst executive Colin Coleman.
He was referring to the temptation to set up a Hadoop-based data infrastructure and then proceed to dump all forms of data into the system without a whole lot of planning on whether you need the data, how you are going to extract what you need and how you are going to analyze the data after it’s extracted.
Old-time business intelligence techies take note: Your talents are much needed, just not on the platforms where you learned your trade. The better case studies—including a remarkable privacy initiative from MetLife and Ford’s plans to use open source to allow developers to create new applications based on masses of car data—were built around developers getting top-down approval to rethink how their companies use data and then getting the freedom to operate outside the usual new product strictures.
2. Hadoop is still not that easy to implement. In discussions with MetaScale and in a presentation by Alpine Data Labs, the focus was on taking—or at least masking the complexity—and making unstructured data easier to accumulate, integrate and query by the business executives needing answers.
Hadoop has—somewhat unfortunately—acquired the aura of being a magical term that can fulfill all your data needs. This year it’s clear that Hadoop and its associated modules are quickly evolving into a platform that holds a lot of appeal to customers but still requires the attributes that platforms need to work successfully in the enterprise.
Security, easy-to-learn tools and hooks into existing corporate systems are all evolving, but are not totally baked at this point. “[Hadoop is] going to break away from the realm of science projects, and start producing valuable insights and analytics that are actually operational,” said Steve Hillion, vice president of product at Alpine Data Labs.
GigaOm Pundit Suggests Enterprises Have Gone From Hadoop to ‘Hadumping’
For that breakaway to happen, the complexity of Hadoop (and the data processing engine Spark) must recede into the background and gain an overlay that’s accessible to techies and business executives eager to use the power of aggregated data sources.
3. Big data challenges the big vendors. The traditional database, business intelligence and infrastructure vendors are challenged by open-source software running on commodity hardware and sold as a subscription. This is not a new phenomenon, but the struggles that companies such as Oracle and IBM are having in making the cloud transition are becoming acute.
“Big companies are very incented to fracture the market,” said Robert Bearden, CEO of HortonWorks. He explained that a fractured market gives the traditional vendor more time to transition their sales and marketing teams. Meanwhile, enterprise customers want a cohesive market free of proprietary hooks.
The arguments and concerns regarding the fragmentation of Hadoop distributions are real, and the possibility of getting caught by proprietary hooks you aren’t aware of is a legitimate concern to CIOs. Traditional vendors restructuring their entire sales, marketing and development organizations to adjust to the open-source reality will be the enterprise story of 2014.
4. The Internet of things may not be a Hadoop thing. In a presentation from still in beta SpaceCurve, CTO Andrew Rogers contended that a new type of platform will be required for real-time data streaming within and from the Internet of things.
The sheer volume of data, the speed with which the data will flow into corporate networks and the need to analyze the data on the fly argue against traditional batch and transaction processing systems as well as newer models including Hadoop. Rogers has a company to promote, but he also has a point about what happens when you try to match up the amount of data heading toward corporations in a sensor-based world and current systems aimed at accumulating and managing mostly human-generated data.
5. Legacy systems aren’t going away, but they aren’t the future. No one at the Giga event was advocating tossing out all those relational systems and business intelligence platforms already in place at many enterprises. However, the future is in incorporating those systems into new platforms rather than trying to reconfigure those legacy pieces to do something for which they were never intended.
Eric Lundquist is a technology analyst at Ziff Brothers Investments, a private investment firm. Lundquist, who was editor-in-chief at eWEEK (previously PC WEEK) from 1996-2008, authored this article for eWEEK to share his thoughts on technology, products and services. No investment advice is offered in this article. All duties are disclaimed. Lundquist works separately for a private investment firm, which may at any time invest in companies whose products are discussed in this article and no disclosure of securities transactions will be made.