Azure Data Bricks Architecture Part-1
Introduction:
We decide to provide several blog posts for
learning Azure DataBricks step by step. This blog post is useful for beginners
who want to start Azure DataBricks.
Azure DataBricks High Level Diagram:
Azure DataBricks
is an analytics platform optimized for the Microsoft Azure cloud services
platform. It provides a collaborative environment with Apache Spark-based
analytics that enables big data processing, real-time analytics, and machine
learning tasks.
Here are some key features of Azure
DataBricks:
·
Optimized
Apache Spark Environment: It allows you to set up an Apache Spark environment
quickly, with autoscaling and auto-termination to optimize costs.
·
Collaborative
Workspace: It supports multiple languages like Python, Scala, R, Java, and SQL,
and integrates with tools like GitHub and Azure DevOps for version control.
·
Machine
Learning Capabilities: Azure DataBricks integrates with Azure Machine Learning
to provide automated machine learning capabilities, making it easier to
identify suitable algorithms and manage machine learning models.
·
Modern
Data Warehousing: It enables you to combine data at any scale, providing
insights through analytical dashboards and operational reports.
·
It’s
designed to simplify and accelerate data science on large datasets and is
rooted in open source with a commitment to integrating with open-source tools.
What Spark Provides:
·
100%
open source under Apache license
·
Simple
and easy to use API
·
In
memory processing engine
·
Distributed
computing platform
·
Unified
engine which support SQL, Streaming, ML and Graph Processing
High Level Architecture of SPARK
Note: Catalyst
Optimizer is used for High Optimized Query and Tungsten is used for Memory
management.
What DataBricks Provide to Spark:
·
Cluster
·
Work
Space/Note Book
·
Administration
and Controls
·
Optimized
Spark (5x Faster)
·
Database
/ Table
·
Delta
Lake
·
SQL
Analytics
·
ML
Flow
What Azure Provide with DataBricks:
·
Azure
Active Directory
·
Unified
Azure Portal and Billing
· Data
Services Like
Data Lake, Blob Storage, Azure Cosmos DB, SQL DB, Synapse
·
Messaging
Service
Azure IOT Hub, Azure Event Hub
·
Power
BI
·
Azure
ML
·
Azure
Data Factory
·
Azure
DevOps
Comments
Post a Comment