Skip to Content

Extract nested JSON data from an API and load into Azure Synapse Analytics using Azure Data Factory

Sogeti Labs
October 15, 2021

About the author

Sogeti Labs

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Comments

4 thoughts on “Extract nested JSON data from an API and load into Azure Synapse Analytics using Azure Data Factory

  1. Hi Bo Huang, thank you for sharing !
    I have three questions regarding your ETL.
    First, on the step 1, I didn’t find any mention about the delete activity (Delete 1). Please, let me know If I understand, but I think doesn’t any function for delete activity. What does the activity Delete do?
    Second, on the step 2, in the sink Azure Synapse, which table are you sinking? DBO.STG_FULL ou DBO.STG_INC?
    Finally, on the last step, “PL_ETL”. What does the pipeline PL_ETL do?
    Best regards,
    Leonardo

    1. Hi Leonardo,
      Thanks for your comments. To answer your question:
      1) Delete activity is for deleting all files in Blob from pervious API call, like TRUNCATE and LOAD a table.
      2) In the sink, we load data into table DBO.STG_INC (this has just las 2 days’ data), and then using DBO.STG_INC table to update the full stage table DBO.STG_FULL.
      3) PL_ETL is for the downstream ETL steps to incorporate this REST API data with other tables to create, for example, some dimension and/or fact tables.
      Let me know if you have any more comments and suggestions.

  2. Hi Bo Huang,
    I also have a few questions regarding this setup
    I am rather new to ADF and I need to understand when you parameterize your url.
    Can you give an example what the get metadata url should look like at the end, I get confused where your base url is blue on the second line {utcnow()}&…………… whats supposed to be here. I would appreciate if you can share an example of what the end url string looks like obviously a dummy url
    Thanks
    L

    1. Hi Lucian,
      Sorry for late response but I just saw this comment.
      The blue bar covered in the third line of the url is just another param for filtering (you can think it as businessline, geography location, etc.).
      It serves as the same functionality as other params (DateRangeStart, pagesize, etc.).
      If we are running the pipeline today of 2021-12-15, the final url will look like this (if we ignore that blue bar param).
      https://{your api url}/DateRangeStart=2021-12-13&DateRangEnd=2021-12-15&pagesize=500
      Above I simplify the value for DateRangeStart and DateRangEnd, since we are using utcnow() which contains not only date but also time in UTC.

Leave a Reply

Your email address will not be published. Required fields are marked *

Slide to submit