Updated – Get SharePoint List Data … Fast

Posted by Pat Mahoney November 12, 2022 in Uncategorized

While storing your data in a database is recommended, many citizen developers do not have the access or skills to create databases. As a result, many valuable business solutions are based on top of SharePoint Lists, and all that data needs to be analyzed too. A previous post shared some M code to pull data from SharePoint lists faster than the native connector (especially V1). Please read that article to get all the context, as this post is only to explain and address two gaps in the previous code (highlighted in comments, and my own experience with it):

The query does not allow Scheduled Refresh in the Power BI Service
Duplicate items are returned for lists with deleted items

These two issues are addressed in the M code below. Actually, two versions are provided. Both are provided in case one is more easily adaptable to your use case. Also, while the first may be slightly faster, the List.Generate approach is more elegant (so I wanted to share that one too). Like before there are commented lines of code to demonstrate the needed syntax for different scenarios (e.g., when you need to expand data from Person and/or Lookup columns).

The first issue was straight forward to address, as it is a problem many users pulling web data face. Scheduled Refresh does not allow for dynamic sources, so the updated code below uses the RelativePath option in Web.Contents. This keeps the base url static for authentication, while the RelativePath updates dynamically. Note that when first setting up Scheduled Refresh, it may be necessary to check the "Skip Test Connection" box to get it to work.

To address the duplicate items issue, a bigger change was needed. The previous approach checks for the number of list items (or, with code provided in a helpful comment, the max ID) and generates a list of numbers incremented by 5000 (the max # of items that can be returned in a single call) up to that number. Each number is then used in the $skip parameter, along with a $top value of 5000. Say a list has 10,000 items, but the first two items were previously deleted. This would mean that the first remaining item has an ID value of 3 and the last item has an ID of 10,002. With a $skip value of 0 and $top of 5000, items 3 through 5002 would be returned. In the second API call ($skip = 5000), two of the same items would be returned (5001 and 5002). Instead of using $top and $skip, the new version below uses $filter on the ID column. This way guarantees that only the items between the specified ID values are returned.

let
  siteurl = "https://<your tenant url>/sites/<your site name>",  // use ... /sites/<your site name>/<your subsite name> if applicable                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  listname = "<your list name>",
  itemcount = Json.Document(Web.Contents(siteurl, [RelativePath = "_api/web/lists/GetByTitle('" & listname & "')/items?$select=ID&$orderby=ID%20desc&$top=1", Headers = [Accept = "application/json"]]))[value]{0}[ID],
  StartIDs = List.Numbers(0, Number.RoundUp(itemcount / 5000), 5000),
  ConvertToTable = Table.FromList(StartIDs, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  Add_EndIDs = Table.AddColumn(ConvertToTable, "Addition", each [Column1] + 4999, type number),
  RenamedColumns = Table.RenameColumns(Add_EndIDs, {{"Column1", "StartID"}, {"Addition", "EndID"}}),
  #"Changed Type" = Table.TransformColumnTypes(RenamedColumns, {{"StartID", type text}, {"EndID", type text}}),
  //Comment in only one of the fieldselect lines below, defining your select and expand columns using the example syntax shown                                                                                                                            
  fieldselect = "&$top=5000",  // all fields with no expansion                                                                                                                                                                                                                         
  //fieldselect = "&$top=5000&$select=ID,Title,Date,PersonColumn,ChoiceColumn,LookupColumn", // list desired fields (no expansion) -No Spaces!                                                                                                                                            
  //fieldselect = "&$top=5000&$select=ID,Title,Date,PersonColumn/LastName,PersonColumn/FirstName,ChoiceColumn,LookupColumn/Title,LookupColumn/Project,LookupColumn/ProjectStatus&$expand=PersonColumn,LookupColumn", //expand list fields - No Spaces!                                                                                                                                                                                                                                                    
  GetData = Table.AddColumn(#"Changed Type", "Items", each Json.Document(Web.Contents(siteurl, [RelativePath = "_api/web/lists/GetByTitle('" & listname & "')/items?$filter=(ID ge " & [StartID] & ") and (ID le " & [EndID] & ")" & fieldselect, Headers = [Accept = "application/json"]]))[value]),
  #"Removed Other Columns" = Table.SelectColumns(GetData, {"Items"}),
  #"Expanded Items" = Table.ExpandListColumn(#"Removed Other Columns", "Items")
in
  #"Expanded Items"

A second (and, in my opinon, more elegant) approach is possible with List.Generate. When pagination is set to true in the API call, a link to the next set of data is provided with each call (until the last items are returned). List.Generate can be used to recursively use the nextLink URL until it is empty. Please see the version below. In my testing, data are returned quickly for both. Theoretically, the List.Generate approach would be slower as every call is sequential (the results of each call need to be checked to see if another call is needed). With the above version, API calls for different ID ranges could be made in parallel.

let
  siteurl = "https://<your tenant url>/sites/<your site name>",  // use ... /sites/<your site name>/<your subsite name> if applicable                                                                                                                                                                                                            
  listname = "<your list name>",
  //Comment in only one of the fieldselect lines below, defining your select and expand columns using the example syntax shown                                                                                                                            
  fieldselect = "&$top=5000", // all fields with no expansion                                                              
  //fieldselect = "&$top=5000&$select=ID,Title,Date,PersonColumn,ChoiceColumn,LookupColumn", // list desired fields (no expansion) -No Spaces!                                                                                                                                            
  //fieldselect = "&$top=5000&$select=ID,Title,Date,PersonColumn/LastName,PersonColumn/FirstName,ChoiceColumn,LookupColumn/Title,LookupColumn/Project,LookupColumn/ProjectStatus&$expand=PersonColumn,LookupColumn",  //expand list fields - No Spaces!                                                                                                   
  InitialWebCall = Json.Document(Web.Contents(siteurl, [RelativePath = "_api/web/lists/GetByTitle('" & listname & "')/items?$skipToken=Paged=TRUE" & fieldselect, Headers = [Accept = "application/json"]])),
  datalist = List.Generate(() => InitialWebCall, each List.Count([value]) > 0, each try Json.Document(Web.Contents(siteurl, [RelativePath = "_api" & Text.AfterDelimiter([odata.nextLink], "_api"), Headers = [Accept = "application/json"]])) otherwise [value = {}], each [value]),
  #"Converted to Table" = Table.FromList(datalist, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1")
in
  #"Expanded Column1"

If you use either version, please share your feedback in the comments.

About the Author: Pat Mahoney

Pat Mahoney is a chemist in Indiana who enjoys teaching others to use Power BI and other tools, and providing pro bono data science help/instruction to nonprofits, schools, etc.

23 Comments

Justin says:

November 17, 2022 at 8:11 pm

Thanks Pat, the List.Generate solution works great!

My final hurdle is trying to figure out how to get the text values of choice fields as the "FieldValuesAsText" option is not generated. I know I can create a relationship with another table, but is there any other way you know of as I have a large number of these?

Reply
1. Pat Mahoney says:
  
  November 18, 2022 at 2:26 am
  
  Glad it's working for you, Justin. The demo list I used in the video also includes a choice column and it returned the text value (not the index). Is it a lookup column by chance and you need to use the $expand syntax shown in the 3rd fieldselect option?
  
  Reply
  1. Justin says:
    
    November 18, 2022 at 2:25 pm
    
    Sorry, they are local MMS fields, does $expand work for those? Seems like the data in the "Label" property is the ID of the value and no way to get text?
    
    Reply
    1. Rita says:
      
      November 9, 2023 at 11:49 am
      
      Hi Justin, did you ever solve this issue?
Kevin Hoyt says:

December 8, 2022 at 9:30 pm

Thank you! Just what I needed to load my large lists. Appreciate the posting!

Reply
Kevin says:

December 20, 2022 at 11:11 pm

What credential type did you set for the data source? These work in Power BI desktop but I am not able to use it in a dataflow hosted on the PBI service.

Reply
1. Pat Mahoney says:
  
  December 21, 2022 at 3:05 am
  
  You need to use the Organization Account authentication and choose the "Skip Test Connection" option.
  
  Reply
  1. Kevin says:
    
    December 22, 2022 at 7:02 pm
    
    I don't have the "Skip Test Connection". It's giving me the "Dynamic Datasource" warnings again.
    
    I even tried using anonymous and using a token. Works fine in Desktop, or in Dataflow editor, but when publishing/saving giving the Dynamic Datasource. Darn!
    
    Reply
  2. Kevin says:
    
    December 23, 2022 at 8:24 pm
    
    I've tried both Organization and Anonymous (using bearer token), but neither have the "Skip Connection Test. They work both in the Desktop and Dataflow table editor, but cannot save it/publish it as it gives the Dynamic Dataset warning still.
    
    I'm using a list that states has an field that determine if I need to get data from another list. Thus, looping through and calling this as a function for each of those to add a column as a table to each row. I've been able to run an api call per row before but seems that could be blocked now.
    
    Reply
    1. Pat Mahoney says:
      
      December 24, 2022 at 1:57 pm
      
      Not sure why you are not seeing it. Are you going to the Edit Credentials in the Settings for your published dataset? You may want to post on community.powerbi.com so you can share more details and schreenshots.
Joan says:

January 29, 2023 at 11:28 am

Hi Pat,
Thank you!
Could this approach be re used to get SP folder contents faster than the native PBI connector?
I am failing on how to adjust the Query for this purpose.
Regards,

Reply
Karl says:

February 23, 2023 at 11:17 pm

Pat thank you so much for doing your research and posting here. I, and I'm sure thousands of others really appreciate your efforts.
I used the first option because I couldn't get the 2nd (your more elegant solution) to work in Desktop.

The first one works fine but maybe a little too well. We have a list that we update weekly and delete entries older than 2 weeks. My problem is that your query is pulling in all the deleted entries along with the current ones.

Reply
Karl says:

February 23, 2023 at 11:26 pm

Actually, I'm an idiot! I was querying the wrong list. But the thank you still stands!

Reply
Karl says:

May 30, 2023 at 3:43 am

So today I circled back to this because I had to create another query for a list. And this time your 2nd query worked fine for some reason. Except that the 2nd one only returns 5,000 items from the list. It seems it's just grabbing the top 5,000 and not paging. The first query returns all 17,000 items in the list.

Reply
peddyreddy suresh says:

May 31, 2023 at 4:41 am

Hi Pat,

I used first version refreshing is working fine Desktop and on demand.
But scheduled is getting disabled not refreshing Daily.

Reply
Eduardo Dallabrida says:

July 4, 2023 at 10:28 pm

Thanks for this!!

I opened a ticket with Microsoft to solve this, but this article got it fixed.

Reply
Jake Mannion says:

July 7, 2023 at 8:16 pm

Using this method now for pulling 300k records from SP2016 into Dataverse … working wonderfully so far and very speedy. Shared with our internal data/reporting community.

THANK YOU!!!

Reply
Jimmy says:

November 23, 2023 at 8:32 pm

Genius, was having trouble with SharePoint lists that are growing exponentially. Worked like a charm.

Reply
Derek Williams says:

January 19, 2024 at 12:32 am

I had issues with a 4 million row sharepoint list, that seemed to error out with a possible memory issue with the first solution.

Second solution worked, but albeit slower. I would advise anyone here to just use the second solution to avoid any scalability issue.

Reply
Vickneshwaran T says:

January 20, 2024 at 11:43 am

I used the second option, but seems it returns with few duplicate items as total number of list items in SharePoint list is 2851, but total rows generated is 2862.

Reply
Munib says:

January 24, 2024 at 11:53 am

Hi,
I have a query, I want to get data of lists from all subsites at once.
The below given link is provding my all the urls of my subsites using odata feed.
http://mysite/pwa/site/_api/ProjectData/Projects?$select=%20ProjectWorkspaceInternalUrl/
Successfully done: I used your code and successfully fetch record for a single subsite. By giving the subsite link.
By I am unable to get data for my all subsites at one and need to refresh it in power bi service.
My Query: Can you please help me out that how can i get concatenate the above given url reponse of table with your solution to get list data at once.
I will be very thankfull if you provide me the solution.

Reply
jon says:

February 6, 2024 at 5:15 pm

Really awesome work here! I'm looking for help with the first M-Code solution you provided. I seem to be having issues referencing the lookup columns in the list, and am not quite sure how to work around that. Any suggestions?

Reply
John says:

February 13, 2024 at 5:02 pm

Pat, I owe you a beer…or maybe a case.

I was fighting with this for nearly a year, scouring forums for an approach that the Power BI service would be able to automate refresh with…and I just. kept. striking. out.

THANK YOU for putting this out there.

Reply