Let's start with some bad news: your Google Analytics reports are not necessarily a representation of reality. Let's assume you have the perfect implementation and can measure all behaviour on your website in detail (which is nearly impossible). Your GA reports would still suck big time. Some of the reasons:
- users using and/or sharing multiple devices
- usage of ad blockers
- browsers implementing tracking protection mechanisms
- arbitrary session timeout settings
- sampling in reports
- cardinality in reports
- usage of count approximation algorithm in reports
With this in mind, it is safe to conclude that Google Analytics is best used as a tool to spot trends. It can handle large amounts of behavioural website data and it enables you - when interpreted in the right way - to catch the signal between all the noise.
BigQuery results vs Google Analytics reports
Ok, now you're querying the GA export data in BigQuery and when you compare the results with the GA user interface, the numbers just don't add up. How is this possible?
First of all: don't worry about it too much. As we've concluded earlier, Google Analytics data is not exactly metaphysics. However, there are some things to keep in mind while querying GA data in BigQuery.
The BigQuery export schema contains raw data that is collected and processed by Google Analytics. When you are used to GA reports, you'll soon find out that a lot of metrics don't exist here. You'll have to calculate them based on the definitions that Google provides. Sometimes these are crystal clear, but in a lot of cases the documentation is inconsistent or just absent. The upside is that you have the freedom to create your own definitions. Why use the default metrics when you can customise everything?
Dealing with data on different levels can be hard. In GA4 there is user, session, event and item data scope data available. Therefore you'll need to have a basic understanding of how these scopes relate to each order, to be able to generate meaningful results. Remember: BigQuery won't tell you if your results are incorrect, as long as your query is valid.
'Show me your queries and I'll tell you who you are'. Even when all definitions are clear and you completely understand the ins and outs of scopes in Google Analytics, the results can differ depending on the author of the query. A rule of thumb: keep it simple, stupid. In some cases, however, this is easier said than done.
Examples of GA4 specifics
- Google Analytics 4 reports use a definition of active users (users who are currently engaged). An additional field
will become available in the near futureis available.
- The user interface (and API) show estimations of sessions, not the exact count of unique session ids. Here is how to replicate this approximation.
- When replicating the session source/medium report in the GA4 user interface, it is nearly impossible to get the numbers to match.
There is no source / medium information in the(update: event parameters for
first_visitwere added early November, 2023) Also the session attribution used in the GA4 user interface is still a black box.
- Data exported to BigQuery might show more users when compared with GA4 property reports based on Google Signals data.
- Is it a bug or a feature? With GA4 you never know. This community powered GA4 bug list also contains a few BigQuery related issues, like batch event timestamps, missing session-scope traffic source information, the infamous
- Also see this informational GA4 support article: make sure timezones match and check if any data streams or events are excluded from the BigQuery export.
Now it's your turn!
I hope you've enjoyed this and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.