Microsoft announced some major new developments surrounding its cloud-based big data processing capabilities in advance of AzureCon, a free virtual event that kicks off Sept. 29.
Among them is Azure Data Lake Store, an expansion of Azure Data Lake, the company’s cloud-based repository for big data workloads, which was first announced at the Build conference in April. Azure Data Lake Store is aimed at simplifying big data processing and analytics for enterprises, according to T. K. Rengarajan, corporate vice president of Microsoft Data Platform.
“The Data Lake Store provides a single repository where you can easily capture data of any size, type and speed without forcing changes to your application as data scales,” stated Rengarajan in a Sept. 28 announcement. “In the store, data can be securely shared for collaboration and is accessible for processing and analytics from HDFS [Hadoop Distributed File System] applications and tools.”
HDFS is the scalable and distributed storage component of the popular Hadoop big data processing platform. Microsoft plans to offer Azure Data Lake as a preview later this year.
Azure Data Lake Store helps set the stage for enterprise Internet of things (IoT) initiatives, according to Rengarajan. “For example, data can be ingested in real-time from sensors and devices for IoT solutions, or from online shopping websites into the store without the restriction of fixed limits on account or file size unlike current offerings in the market.”
In addition, the Azure Data Lake suite is gaining an analytics service based on Apache YARN called Azure Data Lake Analytics. Also known as MapReduce 2.0, YARN is the second most popular data processing engine behind Apache Spark.
“This service will be available in preview later this year and includes U-SQL, a language that unifies the benefits of SQL with the expressive power of user code,” Rengarajan said. “U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.”
U-SQL is a new query language that melds “the ease of use of SQL with the expressive power of C#,” he went on to explain. “The U-SQL language is built on the same distributed runtime that powers the big data systems inside Microsoft.”
Finally, Microsoft announced the general availability of HDInsight on Linux. HDInsight supports a number of open-source analytics engines, including HBase, Hadoop, Spark and Storm. “We work closely with Hortonworks and Canonical to provide the HDP [Hortonworks Data Platform] distribution on the Ubuntu Operating System that powers the Linux version of HDInsight in the Data Lake,” said Rengarajan.
“This is another strategic step by Microsoft to meet customers where they are and make it easier for you run Hadoop workloads in the cloud,” he added. The managed cluster offering is subject to a 99.9 percent uptime service-level agreement (SLA), according to Rengarajan.