Azure Data Bricks Architecture Part-1

 

Introduction:
 We decide to provide several blog posts for learning Azure DataBricks step by step. This blog post is useful for beginners who want to start Azure DataBricks.

 

Azure DataBricks High Level Diagram:

 




Azure DataBricks is an analytics platform optimized for the Microsoft Azure cloud services platform. It provides a collaborative environment with Apache Spark-based analytics that enables big data processing, real-time analytics, and machine learning tasks.

Here are some key features of Azure DataBricks:

·         Optimized Apache Spark Environment: It allows you to set up an Apache Spark environment quickly, with autoscaling and auto-termination to optimize costs.

·         Collaborative Workspace: It supports multiple languages like Python, Scala, R, Java, and SQL, and integrates with tools like GitHub and Azure DevOps for version control.

·         Machine Learning Capabilities: Azure DataBricks integrates with Azure Machine Learning to provide automated machine learning capabilities, making it easier to identify suitable algorithms and manage machine learning models.

·         Modern Data Warehousing: It enables you to combine data at any scale, providing insights through analytical dashboards and operational reports.

·         It’s designed to simplify and accelerate data science on large datasets and is rooted in open source with a commitment to integrating with open-source tools.

 

What Spark Provides:

·         100% open source under Apache license

·         Simple and easy to use API

·         In memory processing engine

·         Distributed computing platform

·         Unified engine which support SQL, Streaming, ML and Graph Processing

High Level Architecture of SPARK

 



 

Note: Catalyst Optimizer is used for High Optimized Query and Tungsten is used for Memory management.

What DataBricks Provide to Spark:

·         Cluster

·         Work Space/Note Book

·         Administration and Controls

·         Optimized Spark (5x Faster)

·         Database / Table

·         Delta Lake

·         SQL Analytics

·         ML Flow

 

What Azure Provide with DataBricks:

·         Azure Active Directory

·         Unified Azure Portal and Billing

·        Data Services Like
Data Lake, Blob Storage, Azure Cosmos DB, SQL DB, Synapse

·         Messaging Service
Azure IOT Hub, Azure Event Hub

·         Power BI

·         Azure ML

·         Azure Data Factory

·         Azure DevOps

Comments

Popular Posts

Triggering Pipeline in ADF

Working with Spark – Spark RDD

Master Child Table from Flat file by using ADF Data Flow