For a lot of websites, organic search is a very valuable traffic acquisition channel. It can bring in highly valuable users without the need for costly paid ads. Its visibility in search engine rankings can help to strengthen your website's credibility and authority, while also building brand recognition.
In this tutorial we will focus on metrics like clicks, impressions and the average position of search queries in Google’s search engine. As always, we will try to get some additional context by joining the data from Google Search Console with our GA4 session data, to get a better picture at how organic search traffic contributes to our website engagement and conversions.
What is Google Search Console (GSC)?
Google Search Console is a free tool offered by Google that allows you to track, maintain and troubleshoot your website's presence in Google's search results.
It provides data regarding clicks, impressions, and average position in the search results, as well as information on your website's performance related to queries and specific pages, and any errors or issues with your website or indexed pages.
What data is available in GSC?
There is an integration between GA4 and GSC. This makes it possible to activate some reports in the GA4 interface (that are not visible by default).
However, even if you set up this integration, the GSC data will not be included in the GA4 export data. The only way to obtain the data is via the Google Search Console API. Extracting the data from this API and sending it to BigQuery requires some data engineering work or - as we prefer - this can be handled by a third party service (see tutorial on how to use Fivetran's free plan to ingest Google Search Console data).
Dimensions & metrics
The GSC API provides different reports. For this tutorial we will use the most granular
keyword_page_report as generated by Fivetran. This report contains the following dimensions:
- search type
And the following metrics:
- (average) position
Limitations of joining GSC with GA4 data
When joining Search Console with GA4 data you can expect some issues. Due to limitations in the data that is exposed via the API, the GSC data is far from complete when zooming in on search query level.
Therefore I would not advise to use it for official reporting purposes. However, doing a deep dive in the data will probably get you some great ideas on how to improve your SEO performance.
- a possibly large chunk of missing data for search queries and landing pages, check how much data you are missing (in our case we miss roughly 80% on search query level)
- other issues causing differences between GSC clicks and GA4 sessions
- different timestamps: GSC timestamps data according to Pacific Daylight Time, whereas GA4
event_dateis based on the timezone configured in the GA4 settings (and
event_timestampis based on UTC time)
Nested and repeated fields
That being said, our goal for this tutorial is to join detailed GSC data with GA4 sessionised data. For every combination of GA4 dimensions we will get multiple rows from the GSC data. This is known as a 'many to one' relationship. We can expect this because multiple search queries can lead to one landing page.
To be able to store all this data in a meaningful and efficient way, we will create nested and repeated fields (just like the GA4
event_params are structured).
This way we can feed just one source table into a visualisation tool like Looker Studio and still be able to create visualisations respecting the different scopes (e.g. session or search query level).
We will also add some contextual data, like a field that indicates whether a search query can be considered
In this tutorial you will learn how to:
- prepare Google Search Console (GSC) data to be joined with GA4 session data
- join the data from GSC with GA4 session data to get a better understanding of how organic search traffic contributes to website engagement and conversions
- deal with limitations when joining GSC with GA4 data