What is HDInsight on AKS? (Preview)
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.
HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark™, Apache Flink®️, and Trino without the overhead of managing and monitoring containers.
You can build end-to-end, petabyte-scale Big Data applications spanning streaming through Apache Flink, data engineering and machine learning using Apache Spark, and Trino's powerful query engine.
All these capabilities combined with HDInsight on AKS’s strong developer focus enables enterprises and digital natives with deep technical expertise to build and operate applications that are right fit for their needs. HDInsight on AKS allows developers to access all the rich configurations provided by open-source software and the extensibility to seamlessly include other ecosystem offerings. This offering empowers developers to test and tune their applications to extract the best performance at optimal cost.
HDInsight on AKS integrates with the entire Azure ecosystem, shortening implementation cycles and improving time to realize value.
Technical architecture
HDInsight on AKS introduces the concept of cluster pools and clusters, which allow you to realize the complete value of data lakehouse. Cluster pools allow you to use multiple compute workloads on a single data lake, thereby removing the overhead of network management and resource planning.
- Cluster pools are a logical grouping of clusters that help build robust interoperability across multiple cluster types and allow enterprises to have the clusters in the same virtual network. Cluster pools provide rapid and cost-effective access to all the cluster types created on-demand and at scale. One cluster pool corresponds to one cluster in AKS infrastructure.
- Clusters are individual compute workloads, such as Apache Spark, Apache Flink, and Trino that can be created rapidly in few minutes with preset configurations.
You can create the pool with a single cluster or a combination of cluster types, which are based on the need and can custom configure the following options:
- Storage
- Network
- Logging
- Monitoring
The following diagram shows the logical technical architecture of components installed in a default cluster pool. The clusters are isolated using namespaces in AKS clusters.
Modernized cloud-native compute platform
The latest version of HDInsight is orchestrated using AKS, which enables the platform to be more robust and empowers the users to handle the clusters effectively. Provisioning of clusters on HDInsight on AKS is fast and reliable, making it easy to manage clusters and perform in-place upgrades. With vast SKU choices and flexible subscription models, modernizing data lakehouses using open-source, cloud-native, and scalable infrastructure on HDInsight on AKS can meet all your analytics needs.
Key features include:
- Fast cluster creation and scaling.
- Ease of maintenance and periodic security updates.
- Cluster resiliency powered by modern cloud-native AKS.
- Native support for modern auth with OAuth, and Microsoft Entra ID.
- Deep integration with Azure Services – Azure Data Factory (ADF), Power BI, Azure Monitor.
Connectivity to HDInsight
HDInsight on AKS can connect seamlessly with HDInsight. You can reap the benefits of using needed cluster types in a hybrid model. Interoperate with cluster types of HDInsight using the same storage and metastore across both the offerings.
HDInsight offers Apache Kafka®, Apache HBase® and other analytics workloads in Platform as a Service (PaaS) form factor.
The following scenarios are supported:
- Apache Flink connecting to Apache HBase
- Apache Flink connecting to Apache Kafka
- Apache Spark connecting to Apache HBase
- Apache Spark connecting to Apache Kafka
Security architecture
HDInsight on AKS is secure by default. It enables enterprises to protect enterprise data assets with Azure Virtual Network, encryption, and integration with Microsoft Entra ID. It also meets the most popular industry and government compliance standards upholding the Azure standards. With over 30 certifications that help protect data along with periodic updates, health advisor notifications, service health analytics, along with best-in-class Azure security standards. HDInsight on AKS offers several methods to address your enterprise security needs by default. For more information, see HDInsight on AKS security.
Region availability (public preview)
Asia Pacific
- Australia East
- Central India
- East Asia
- Japan East
- Korea Central
- Southeast Asia
Europe
- France Central
- North Europe
- Norway East
- Switzerland North
- UK South
- West Europe
Americas
- Brazil South
- Canada Central
- Central US
- East US
- East US 2
- East US 2 EUAP
- South Central US
- West US
- West US 2
- West US 3
Note
- The Trino brand and trademarks are owned and managed by the Trino Software Foundation. No endorsement by The Trino Software Foundation is implied by the use of these marks.
- Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation (ASF).
- Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation (ASF).
- Apache, Apache Flink, Flink and the Flink logo are trademarks of the Apache Software Foundation (ASF).
- Apache HBase, HBase and the HBase logo are trademarks of the Apache Software Foundation (ASF).
- Apache®, Apache Spark™, Apache HBase®, Apache Kafka®, and Apache Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for