The Art of Data Joining: A Guide to Discovering SEO Insights

One of the most valuable things I’ve learned in my career is the benefits of joining different data tables to uncover SEO insights and opportunities that would otherwise be missed. Tools like Botify and OnCrawl use data from a physical crawl, backlinks, keyword footprint, analytics, Google Search Console, and log files, and they do a great job of presenting the data in a way that makes sense and tells a story. 

Still, these tools can be pricey and don’t allow for tweaks and customizations to their proprietary systems. However, with a free tool like Microsoft PowerBI, we can combine different data sources to discover issues and opportunities, then be better prepared to prioritize them holistically or take a deep dive into competitors’ websites. Using this guide, you can look at a website from a very high level and drill down into the issues and opportunities you find, from keywords and content opportunities to technical SEO.

the art of data joining a guide to discovering SEO insights

Need an Example of How This Would Work?

Say you’ve crawled a website and learned that 20% of the URL inventory returns a 404 not-found HTTP response. Based on this alone, we might recommend removing all internal links to 404 URLs to reduce the overall threshold of indexable to non-indexable pages. Unfortunately, this activity would take time and resources and may not produce measurable KPI value.

However, if we join the crawl data with keyword exports from SEMRush, backlinks from AHREFs, and clicks/impressions from Google Search Console, we can see which 404-status URLs have tangible SEO value. In this case, we might discover that a few URLs have a decent keyword footprint and some valuable referring domains. Perhaps one has a couple thousand impressions and a few clicks in the last 30 days. With that data, we can now prioritize reclaiming the potential SEO value already living in the currently not-found URLs.

This workflow will teach you to take CSV exports from a ScreamingFrog crawl and an Organic Search Traffic report from SEMRush. At the end of this workflow, you can view keyword data at the directory and URL levels for every page in your inventory.

Gather the Data

1. Crawl your website with ScreamingFrog and export internal_all.

2. Visit SEMRush and export Organic Search Traffic.

Import the CSV Files, One at a Time, into PowerBI

We will start with internal_all.csv

3. Open PowerBI and, in the upper left, click Get data > Text/CSV

the art of data joining: a guide to discovering SEO insights - import

4. Choose UTF-8 as the File Origin on the next screen, and click Transform.

the art of data joining: a guide to discovering SEO insights - save UTF

Clean and Join the Data

5. Right-click on the header of the Address column and choose Duplicate Column.

the art of data joining: a guide to discovering SEO insights - duplicate column

6. Right-click on the header of the new column (Address – Copy), choose Transform in the dropdown menu, and click Lowercase.

7. Repeat this step, click Trim, repeat it again, and click Clean.

the art of data joining: a guide to discovering SEO insights - lower trim clean

8. Double-click on the new column header and rename it Key.

the art of data joining: a guide to discovering SEO insights - name key

9. In the Queries panel on the left, right-click internal_all and choose Reference. This will create a copy of your internal_all table linked to the original.

the art of data joining: a guide to discovering SEO insights - reference table

10. All the way to the right of the table, right-click the header of the Key column and choose Remove Other Columns.

the art of data joining: a guide to discovering SEO insights - remove columns reference

11. Double-click  internal_all (2) in the Queries pane on the left and rename it Bridge.

the art of data joining: a guide to discovering SEO insights - rename bridge

12. Repeat the process with your SEMRush export.

  1. In SEMrush, duplicate the URL column.
  2. Repeat steps 6-10 
the art of data joining: a guide to discovering SEO insights - semrush duplicate

13. Next, we want to prevent PowerBI from loading the SEMRush reference table into the data model. This will allow PowerBI to work with a full inventory of URLs without having to load more data into the dashboard than necessary.

the art of data joining: a guide to discovering SEO insights - disable semrush load

14. With the Bridge table selected, click Append Queries in the top right.

the art of data joining: a guide to discovering SEO insights - append queries

15. Leave Two tables selected and choose SEMRush (2) from the dropdown.

the art of data joining: a guide to discovering SEO insights - append

16. At this point, your Bridge table should have a full list of every URL from both data sources. Naturally, there will be duplication. Next, click Remove Rows at the top and choose Remove Duplicates.

17. Repeat this step twice more, choosing Remove Blank Rows, then Remove Errors.

the art of data joining: a guide to discovering SEO insights - remove columns

18. Click Close & Apply in the top left.

the art of data joining: a guide to discovering SEO insights - close apply

After the loading screen has completed, your interface should look similar to this:

the art of data joining: a guide to discovering SEO insights - loaded ready

Define the Relationship

The next part is where the rubber hits the road. This involves joining the internal_all and SEMRush tables using the Bridge.

1. Click Modeling in the menu at the top. Next, click Manage Relationships, then New…
At this point, there may be predefined relationships listed. If so, delete those—we’re doing this from scratch today.

the art of data joining: a guide to discovering SEO insights - manage relationships

2. In the top dropdown, choose internal_all, and Bridge in the second dropdown. 

3. The two Key columns should be highlighted. If they are not, click on the Key column in each table.

4. Ensure the Cardinality is One to Many (1:*). If this is not an option, something went wrong. Go back and check your work.

5. Set Cross filter direction to Both.

Now, your screen should look like this:

the art of data joining: a guide to discovering SEO insights - first relationship

6. Click OK.

7. Repeat steps 1 through 6 to join the SEMRush table to the Bridge.

8. Click Close in the Manage Relationships window.

Turning Data into Information

Congratulations! Now your ScreamingFrog export is joined with SEMRush, and you can use the numbers together. Here is an example:

The Bridge table is our primary inventory of URLs. In this data model, the Bridge Key is where we always start. I created a Table from the Visualizations pane. I added the key column from the Bridge table, indexability from internal_all, and the number of keywords the URL is ranking for from SEMRush. If we have any non-indexable URLs with an organic keyword footprint, we will find them here.

the art of data joining: a guide to discovering SEO insights - example table

Keep in mind these are just the basics. In future posts, I will go into greater detail about gathering and joining tables in more complicated ways and creating snappy visualizations to help you discover and illustrate the stories hidden in your data. Stay tuned!

Scroll to Top