This blog is brought to you by the Excel Community where you can find additional blogs, extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Filtered. There is also an online forum where you can ask questions and share ideas with other community members.
Hello all and welcome back to the Excel Tip of the Week! This week, we have a Developer level post in which we're going to take a look at starting out using Power Query to connect to and manipulate tabular data from webpages. This is just scratching the surface of data connections and connectivity - using a very simple example - but it should hopefully be illustrative in how to get started using PQ to take in data from the web and analyse it yourself.
Creating the connection
For our example, we are going to use this Wikipedia page, which includes various tables listing the highest-grossing media franchises of all time. We are going to extract the main table from this page and manipulate it in Power Query to produce an automated summary, that should be automatically update-able as and when the Wikipedia page updates.
We get started by creating our query with Data => From Web and pointing it at the webpage:
This takes us to a Wizard which lets us select how we want to access this website:
Simple anonymous web user is sufficient for our purposes - next is picking which part of the page we want. The table we're after is easily identifiable:
We can then use Transform Data to start manipulating the data import.
Manipulating the data
Once our data is in Power Query, we can start to manipulate it the same way as any other Power Query dataset. Below I run through a few example transformations that are interesting to consider for this particular example, but really this is where it's up to you to think about what you want to see!
Convert the total revenue to numbers
This one's pretty universal - the text format used by the table isn't the most helpful for comparisons and combinations. We can extract text between delimiters to get just the number part:
Note the crucial space before "billion"! After this we can just change the column type to Currency and then multiply by 1,000,000,000:
Remove unhelpful columns
The revenue breakdown column and creator column aren't of interest to me as they are just unformatted text, so I will remove them with right click - Remove Columns. The Owner(s) column will go the same way in a moment, but first...
Add a conditional column
I want to analyse this list by considering Disney-owned versus non-Disney franchises. So we create a conditional column:
This leaves us with the dataset I want, which I'll finish by exporting back into an Excel table. From here, we can create some simple PivotTables to explore further.
If you want to download my workbook, it's attached at the bottom of this post - but you could also create your own version from scratch in five minutes or less!
Previous post - SUMPRODUCT redux TOTW index Next post - Spilling errors with dynamic arrays