Copy SharePoint Lists Dynamically using Azure Data Factory and OData Query

1

Introduction:

A SharePoint list is a simple collection of data that you can easily create and share with your team members. Lists are found in both on-premises and cloud versions of SharePoint. There are many ready-to-use list templates like tasks, surveys, contacts, and issue tracking. SharePoint List is useful for both business users and technical users. In one of our current projects, we use it along with Power APPs and Power Automate for Tabular model dynamic row level security setup (this will be another topic for a future blog J).

Before SharePoint Online List connector for ADF was available, we used to use SQL Server Integration Service (SSIS) to copy SharePoint List using OData Source to connect to SharePoint List.

In this blog, I will show you how to copy SharePoint Lists using Azure Data Factory. And to make copying multiple lists more dynamic, we created control tables along with parameters. Then we assigned parameters with values of OData query statements for copying SharePoint Lists.

Prerequisites:

  1. App registration

In order to connect to SharePoint, the SharePoint List Online connector uses service principal as the authentication method. So, we need first to create an app registration in Azure Active Directory and copy Application ID, Tenant ID. Secondly, create a new client secret as show below: 

2. Grant SharePoint Online site permission to your registered application

For this step, Microsoft provides a good document. One thing I want to mention is that please make sure this step is performed by a person who has the SharePoint Online site owner permission. The developers usually don’t have that permission.

You can check it (little different from MSFT doc) by going to the site home page -> click “Settings” in the right corner -> then click “permissions” -> check “Site owners – full control” to see who is the “Owner”.

3. Create Linked Service

We use SharePoint Online List connector to create a Linked Service for SharePoint lists online, the configuration should like this.

Pipeline and dynamic loading configuration

  1. Overall high-level pipeline setting

In the main pipeline, we include logging steps to write all Azure pipeline activity into a database table, so we can refer it when we want to do error debug and diagnosis. Another useful feature we added is the email notification for pipeline running status using Azure Logic Apps. 

To copy multiple SharePoint lists using one pipeline, we created “bulk extraction pipeline” which in turn will call “copy each list to adw” (Screenshot in section 2 below).

2. Dynamic copying with Lookup and ForEach activity

In the above pipeline, we use Lookup activity to get the list of items that need to be copied. Then we use ForEach loop to copy each item one by one and loaded into the destination storage/database (Azure Synapse Analytics (ADW) in our case).

3. OData Query to fine control source data

As shown in the screenshot below, the source of Copy activity is set to use Query. In the query itself, we are using parameters to store each query. We are all familiar with SQL query for getting data, and luckily enough, like SQL query, we can use OData query to select, filter the SharePoint list data.

Here is the link for the OData query documentation. The most common ones used are $select, $filter, $top, $orderby. Using OData query can help the creation of dynamic pipelines using parameters. Another advantage of using OData query is that we can select only columns you need, not the system generated metadata columns into your destination table.

If table was used as the source for Copy activity, most of metadata columns (like Modified, CreatedById, etc. as highlighted above), that might not be needed to copy to the destination, were selected too.

While using OData query, we can select only the columns we need (see screenshot below). On top of that we filter and order data too. 

Below is a simple example of how we create a control table and how we put OData query as a parameter value inside the control table. Please add a comment if more details needed.

Summary

With more and more connectors, activities, and functions are being added to Azure Data Factory, it’s more and more easier to use ADF as a tool not only to orchestrate data flow but also to do transforms (please see my previous blog for one example). In the upcoming blogs, I will continue to summarize our real-life data solutions using Azure Data Factory as well as other Azure Data Services.

Bo Huang

About

I am a Senior Cloud Data/AI solutions Consultant and Architect with expertise in Business Intelligence, Data Modeling, Advanced Analytics, AI/ML, and Low-code/no-code app development. I have more than 14 years of experience of data collection, analytics, modeling, interpretation, and visualization to support business operations. I am well versed in the implementation of full life cycle of traditional BI as well as Advanced Analytics projects: requirements gathering, design thinking, development, maintenance of data warehouse, ETL, dimensional modeling, AI/ML model training and deployment, and data visualization. I have a master’s degree in computer information system, which gave me the foundation to be a good technology consultant. I was awarded Microsoft Certified Solutions Expert (MCSE): Data Management and Analytics with several certifications on AI/ML from Microsoft as well as from Google Cloud Platform. Throughout of my career as technology consultant, I have been recognized for the work I have done by my employers as well as my clients. I have received several prestigious awards like the Superhero Award from Avanade in 2019 and the Rock Star Award from Accenture in 2018.

More on Bo Huang.

Related Posts

Your email address will not be published. Required fields are marked *