The SQL engine has always been present in the core component of Microsoft’s Data architecture, initially as Azure SQL Data Warehouse and now under the name Azure Synapse Workspace. But in this time there have been changes on it until it becomes what it is today.
Let’s see what the SQL Pool currently presents to help us implement the Massive Parallel Processing (MPP) architecture and how it has come to separate the storage layer and give the compute layer tremendous elasticity.
If we go down to the component level, we see that the SQL Dedicated Pool is made up of four named elements:
- Control Node
- Compute Nodes
- Data Movement Services (DMS)
In the case of Distributions, SQL Dedicated Pool has a fixed number of 60 basic storage units, which means that the data is divided and stored between these distributions. Something that is especially relevant when we want to execute efficient queries. To do this, we have the ability to use three types of distribution when we create our tables.
The Control Node is the SQL Server Endpoint, what it does is store the metadata of the storages, with which when receiving the query from the user, it coordinates the execution through the Compute Nodes, which are the ones that finally make the query. It is the Azure Synapse computing unit and currently we can configure our component with a maximum of 60 Compute Nodes. Which means that the Distributions are divided by the allocated number of compute nodes.
Finally the DMS act when there are joins, group by in our queries. What they do is coordinate the movement of data between the computing nodes to compose the result.
Another concept to keep in mind is the Data Warehouse Unit (DWU), which is the set of CPU, Memory and IO, that is, the computational power of our SQL Dedicated Pool. This value can be increased or decreased depending on the needs of the organization.
Let’s see how to configure our SQL Dedicated Pool. To do this, the first thing is to have previously configured an Azure Synapse Workspace, from it, we select the SQL one from the Analytics pools area.
Once there, we see that we currently have (by default) a SQL Serverless. This only incurs in cost, if it is used, but does not increase the final account, something different happens with the SQL Dedicated Pool. That is why it is so important to “turn it off” once the work is finished.
By clicking on New we access a screen where we configure the DWU value with which we want to work.
If we take it to the maximum extreme of 30,000 DWU, in addition to supporting a cost of almost 400 euros per hour, we would have 60 computing nodes, that is, one for each distribution. Something that undoubtedly grants tremendous computing capabilities. But for us, 100 DWU will be enough. Once configured, we can begin to use its capabilities.
To connect, you only have to use the Dedicated SQL Endpoint, the user and the Password with which you configured your Azure Synapse Workspace.
And you will be ready to start your adventure creating tables, virtualizing resources through Polybase, compose your Stored Procedures, views, … The unsupported features are the creation of Triggers and queries between different databases. We must also review the part that affects constraints, identity columns and relationships as they work differently from the traditional SQL Server. Some toll must be paid.
In the next installment about Azure Synapse I will configure external tables using Polybase and I will talk about Round Robin, Hash and Reference, which are three aspects to be very clear if we want our queries to be efficient.