Azure Storage Options for Big Data

Azure Storage for BigData

Image Source

Microsoft Azure offers a wide array of storage options for documents, objects, data and media. Companies working with big data need repositories that can hold large amounts of data in one place. Choosing the right storage for your data in Azure can be complicated. Read on to learn about Azure’s offerings for big data, and the appropriate use case for each service.

Azure Data Storage Types

There are two main categories of storage in Azure depending on the type of data you want to store:

  • Relational data storage—relational databases store and organize the data in tables with rows and columns. The rows have unique IDs while the columns store the attributes. You can store this data in the cloud using Azure SQL storage. This storage option enables users to migrate on-premise SQL databases to Azure.

Non relational data storage—this type of databases doesn’t use tables to store data. Instead, non-relational databases store unstructured data. For example, key/value pairs. This type of storage enables you to store documents, media files, and objects.

 

Azure SQL Database

You can use Azure SQL to store relational data in a transactional way, in the cloud. You can use SQL language specifications to store data in this database. The database enables you to make incremental and full backups. The backups are replicated also in the primary data center and in another geographical location.

Azure SQL offers automatic indexing and other advanced capabilities such as dynamic data masking. Dynamic data masking helps companies to reduce sensitive data exposure by masking it. This process enables the use of complex queries in relational data. Azure also offers Database services for for PostgreSQL and MySQL:

Azure Database for MySQL

It is based on the open-source MySQL database, which is typically used in Linux, MySQL, Apache and PHP stacks. The database enables you to work with MySQL tools like Workbench. This is a good fit for companies using MySQL and Microsoft products, and want to migrate to a cloud database.

Azure Database for PostgreSQL

It is the fully-managed cloud version of PostgreSQL. This database is different from MySQL in that it is an object-relational database. Azure DB for Postgres stores data in table structures. It supports objects and query language.

 

Azure Storage

Azure Storage is ideal to store files and small datasets that don’t need advanced queries. The service consists of four types of storage, each one focusing on a particular scenario.

Azure File Storage

This service provides shared network file storage on virtual machines (VM). Azure File Storage uses the SMB protocol to access, store and share files. You can use AFS to mount file shares for applications running in VMs.

Azure Queue Storage

It is used mostly to transfer data between apps .You can use this service for storing the messages between applications asynchronously. You can put messages on the queue and process them without a specific order.

Azure Blob Storage

You can use Blob Storage to store massive amounts of unstructured data, such as videos, images and audio files. Blob Storage supports three different types of blobs:

  • Block blobs—they consist of storage units called blocks. This is a general-purpose type of storage you can use to store varied objects. A block can be up to 100MB of size and the maximum number of blocks in a blob can be 50,000.
  • Append blobs—this type of blob also consists of blocks, but you only can add (append) a block to the end of the existing blob. These blobs are intended for applications such as telemetry, streaming, and logging. The maximum size of an append blob is 4MB with a maximum of 50,000 blocks in the append blob.
  • Page blobs—consist of 512-byte pages optimized for read-write operations. You can update the contents of these blobs by writing more pages. These blobs are suited to storing data for database files, virtual machines, and backups. The maximum size of a page blob is 8 TB.

Azure Disk Storage

Disk storage functions as a virtual hard disk attached to virtual machines. They are used to store application data and are optimized for high throughput operations.

 

Azure SQL Data Warehouse

Azure SQL Data Warehouse is intended for structured, relational data type format.  It stores the data in tables, and feature indexes, constraints, and keys. Azure SQL Data Warehouse is optimized for reporting. You define the way the data is structured before loading it in the warehouse, thereby ensuring the data is ready for analysis and reporting.

The advantage of the warehouse is that it centralizes massive amounts of data. Unlike the Azure SQL database, the storage in the warehouse is unlimited. Another advantage of the SQL Data Warehouse is that it supports complex queries. In addition, it supports Polybase, which enables the SQL server to process queries reading data from external sources.

 

Data Lake Store

A data lake is a repository that can hold massive amounts of raw data in its native format. Unlike data warehouses, the data gets structured only when you decide to take it out from the repository. Data lakes are especially useful for storing big data.

You can use a data lake to store data for analyses. Once you decide to analyze the data, you can select just the data you need and transform it into structured using Azure Data Lake Analytics. Then you can move it into an SQL warehouse.  Some of the benefits of using a data lake include:

  • Cost-effective—reduces the cost of data ingestion because you don’t need to refine the data before storing it. One of the advantages of Azure Data Lake Store is the low cost of storage capacity.
  • Refine and analyze in batches—you can select which data you want to analyze and only refine it when you need it.
  • Offloading storage—you can use a data lake to offload legacy data from data warehouses, reducing storage costs.
     

Wrap Up

Large and unstructured amounts of data need dedicated storage that enables complex queries while keeping costs at the minimum. A data lake often proves to be the most cost-effective option for storing big data in Azure, as it offers users the ability to select and analyze data as needed. However, before commiting to a storage service, be sure to properly assess your needs. To find the right solution for you, continue expanding your knowledge base.

 


Author Bio

Gilad David Maayan author image

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

LinkedIn: https://www.linkedin.com/in/giladdavidmaayan/