NEW YORK—Informatica, a provider of data integration software, has announced a new streaming data collection technology to plug Hadoop into the Internet of things.
The new technology, known as Informatica Vibe Data Stream for Machine Data, is designed to simplify big data collection of machine data from many sources and its delivery to Hadoop, and a wide range of other targets, over any geographic boundary, the company said.
Informatica made the announcement at the O’Reilly Strata Conference and Hadoop World event here.
Informatica Vibe Data Stream for Machine Data makes high-volume, high-velocity and high-scale streaming data collection available. It is built on Informatica’s fast, high-performing messaging technology. Customers deploying Informatica’s Vibe Data Stream for Machine Data software will be able to deliver holistic operational intelligence across the enterprise by collecting previously difficult to access data, the company said.
“Vibe Data Stream provides efficient, high-performance streaming data collection from sensors, mobile devices, log files, and generally machine data from the Internet of things,” said Ash Kulkarni, senior vice president and general manager of data integration at Informatica, in a statement. “This technology easily delivers previously difficult to collect data directly to targets in real time, enabling companies to find deeper insights in this type of machine-generated big data.”
Informatica Vibe Data Stream for Machine Data is a key new component of Informatica’s end-to-end data integration platform, powered by Informatica Vibe, the company’s embeddable virtual data machine (VDM). Informatica Vibe Data Stream uses embeddable Vibe agents to collect data in real time and stream millions of records per second into big data platforms such as Hadoop and Cassandra. Informatica Vibe Data Stream also streams data directly into Informatica PowerCenter and Informatica CEP, enabling real-time event processing and analytics.
“At Cloudera, we’re seeing customers across industries come to depend on Hadoop-based platforms to create enterprise data hubs for big data projects,” Charles Zedlewski, vice president of products at Cloudera, said in a statement. “This new offering from Informatica will help organizations capture the value of all their data with an easy-to-use, powerful solution for stream data processing on Hadoop.”
Many companies and industries need to efficiently perform high-volume, high-velocity and high-scale streaming data collection over LAN and WAN environments—to enable real-time big data analytics and operational intelligence.
However, current architectures to achieve this require large amounts of infrastructure resources, including servers and storage, and high levels of software development expertise. Consequently, this ability currently is beyond the reach of many companies, Informatica officials said. Also, competing solutions require additional coding or development to approximate the capabilities of Informatica’s Vibe Data Stream for Machine Data technology, and those solutions typically lack the performance, reliability, efficiency and ease-of-use of Vibe Data Stream for Machine Data.
Informatica Plugs Hadoop Into the Internet of Things
Informatica’s Vibe Data Stream for Machine Data technology meets this challenge by making reliable, high-throughput streaming data collection broadly available to many companies. A centralized interface enables simplified set-up, deployment, administration and monitoring. Moreover, flexible configurations can be created for a variety of sources to target patterns.
“Informatica’s Vibe Data Stream for Machine Data is an important addition to the modern data architecture,” Shaun Connolly, vice president of corporate strategy at Hortonworks, said in a statement. “Vibe Data Stream for Machine Data ensures timely data delivery between Hadoop and the rest of the enterprise by providing reliable, high-volume stream data collection and support for large data distributions with high throughput and concurrency.”
Customers can take advantage of Informatica Vibe Data Stream for Machine Data technology by deploying data collectors or Vibe agents on various sources, which then provide streaming data collection and distribution through a high-performance messaging bus based on Informatica Ultra Messaging. Informatica Vibe Data Stream for Machine Data delivers the data directly to multiple targets for either stream processing for real-time analytics or batch processing for big data analytics and transactional applications.
“As companies implement more big data solutions, the need to use high-performance message delivery with those solutions will grow,” wrote Gartner in a July report, Hype Cycle for Big Data, 2013. “Moreover, the demands of real-time systems, particularly the Internet of things, mobile devices and world-class cloud applications, will drive adoption of high-performance message delivery, even when big data database technology is not involved.”
Customers in the logistics, transportation and manufacturing industries, for example, could use Informatica Vibe Data Stream for Machine Data technology for device, sensor or machine data collection. Web entities or retail operations could take advantage of Informatica Vibe Data Stream for Machine Data for Web log data, and telecommunications network operators and utilities could use Vibe Data Stream for Machine Data for network or switch data, the company said.
“By using MapR and Informatica Vibe Data Stream for Machine Data, companies gain the benefit of enterprise-grade capabilities for real-time streaming into Hadoop,” Jack Norris, chief marketing officer at MapR Technologies, said in a statement. “This enables highly available, efficient, and reliable real-time data collection and streaming across a wide variety of data sources over local and wide area networks.”
Informatica currently is conducting customer trials of its Vibe Data Stream for Machine Data technology and plans to make it generally available later in the fourth quarter of 2013. The company also will provide, later in the fourth quarter, an SDK customers can use to develop agents for custom sources and targets.