Azure DataBricks Accessing Data Lake (Using Access Key)

 

Introduction:
In this Blog post we are going to discuss about accessing Azure Data Lake Gen2.

 

How can we Access the Azure Data Lake Gen2:

Access can be done by

Ø  Using Storage Access key

Ø  Using Shared access signature (SAS token)

Ø  Using Service Principal

Ø  Using Azure Active directory authentication pass-through

Ø  Using unity catalog

 

In this post we are going to discuss about accessing Azure Data Lake Gent 2 by using Access key.





Authenticate Data Lake with Access Key:

·         Each storage account comes with 2 access keys

·         Each access key is 512 bits

·         Access key gives full access of storage account

·         Conceder it as super user

·         Key can be rotated (re-generated)

 

Access Key Spark configuration:

Here we take myschool as Azure Data Lake gen2 and there is a container within this data lake named bronze. The bronze containers have a csv file named school.csv

 

spark.conf.set(

    "fs.azure.account.key.myschool.dfs.core.windows.net",

    "<512 bit access key in Azure data Lake>")

 

 

Microsoft recommended abfs (azure blob file system) driver protocol to access

abfss://bronze@myschool.dfs.core.windows.net

Note book command:

spark.conf.set(

    "fs.azure.account.key.myschool.dfs.core.windows.net",

    "<512 bit access key in Azure data Lake>")

dbutils.fs.ls("abfss://bronze@myschool.dfs.core.windows.net")

è  It gives us the list of files within bronze containers

display(Spark.read.csv("abfss://bronze@myschool.dfs.core.windows.net/school.csv"))

è  It is going to read and display the school.csv file

Comments

Popular Posts

Vertipaq Engine – Data Compression

Azure Data Bricks Architecture Part-1