GA4 | tips & tricks

Why your BigQuery results don't (exactly) match with Google Analytics reports (GA4)

It is expected that the results will not match (exactly) with your reports in the Google Analytics 4 user interface. Don't worry about it, it can occur for a variety of reasons.

Let's start with some bad news: your Google Analytics reports are not necessarily a representation of reality. Let's assume you have the perfect implementation and can measure all behaviour on your website in detail (which is nearly impossible). Your GA reports would still suck big time. Some of the reasons:

users using and/or sharing multiple devices
usage of ad blockers
browsers implementing tracking protection mechanisms
arbitrary session timeout settings
sampling in reports
cardinality in reports
usage of count approximation algorithm in reports

With this in mind, it is safe to conclude that Google Analytics is best used as a tool to spot trends. It can handle large amounts of behavioural website data and it enables you - when interpreted in the right way - to catch the signal between all the noise.

BigQuery results vs Google Analytics reports

Ok, now you're querying the GA export data in BigQuery and when you compare the results with the GA user interface, the numbers just don't add up. How is this possible?

First of all: don't worry about it too much. As we've concluded earlier, Google Analytics data is not exactly metaphysics. However, there are some things to keep in mind while querying GA data in BigQuery.

Definitions

The BigQuery export schema contains raw data that is collected and processed by Google Analytics. When you are used to GA reports, you'll soon find out that a lot of metrics don't exist here. You'll have to calculate them based on the definitions that Google provides. Sometimes these are crystal clear, but in a lot of cases the documentation is inconsistent or just absent. The upside is that you have the freedom to create your own definitions. Why use the default metrics when you can customise everything?

Scopes

Dealing with data on different levels can be hard. In GA4 there is user, session, event and item data scope data available. Therefore you'll need to have a basic understanding of how these scopes relate to each order, to be able to generate meaningful results. Remember: BigQuery won't tell you if your results are incorrect, as long as your query is valid.

Query logic

'Show me your queries and I'll tell you who you are'. Even when all definitions are clear and you completely understand the ins and outs of scopes in Google Analytics, the results can differ depending on the author of the query. A rule of thumb: keep it simple, stupid. In some cases, however, this is easier said than done.

💡

As Google Analytics developer advocate Minhaz Kazi explains in this official Google article, we can't expect the results between the GA4 user interface and the BigQuery export to match, because they serve different use cases: ' (...) the standard reporting surfaces and the BigQuery export data aren't expected to be reconcilable (...)'.

Examples of GA4 specifics

Google Analytics 4 reports use a definition of active users (users who are currently engaged). An additional field is_active_user ~~will become available in the near future~~ is available.
The user interface (and API) show estimations of sessions, not the exact count of unique session ids. Here is how to replicate this approximation.
When replicating the session source/medium report in the GA4 user interface, it is nearly impossible to get the numbers to match. ~~There is no source / medium information in the session_start event.~~ (update: event parameters for session_start and first_visit were added early November, 2023) Also the session attribution used in the GA4 user interface is still a black box.
Data exported to BigQuery might show more users when compared with GA4 property reports based on Google Signals data.
Is it a bug or a feature? With GA4 you never know. This community powered GA4 bug list also contains a few BigQuery related issues, like batch event timestamps, missing session-scope traffic source information, the infamous google/cpc misattributed as google/organic when a gclid is available.
Also see this informational GA4 support article: make sure timezones match and check if any data streams or events are excluded from the BigQuery export.

Now it's your turn!

I hope you've enjoyed this and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.

🔒 GA4 | reports

🔒 GA4 | tutorials

GA4 | dimensions & metrics

GA4 | tips & tricks

🔒 DWH | data ingestion

🔒 DWH | Google Ads

SQL tips & tricks

GA4 | alternatives

UA | Universal Analytics