#2 - SQL query support from ChatGPT, GA4 count approximation demystified
You are reading the second GA4BigQuery newsletter.
The response to the relaunch of the platform has been amazing. Every month we welcome more unique website visitors and the amount of subscribers (both free and premium) has been overwhelming.
My goal is still the same: provide valuable content to data analysts, marketeers and engineers who seek guidance on how to query the GA4 BigQuery export data set.
New content on GA4BigQuery
Since the last newsletter I've added 5 new articles:
- How to create a custom user segment based on a sequence of GA4 events in BigQuery
- How to (back)stitch your custom user id to GA4's client id in BigQuery to enable cross-device analysis
- How to create a pivot table with GA4 data in BigQuery
- How to sessionize your GA4 event data in BigQuery
- part 1: default 30-minute session timeout definition
- part 2: custom session definitions based on user action
ChatGPT vs. GA4 data export
Did you try ChatGPT? The natural language model that shocked the tech world does seem to have some knowledge about GA4, but doesn't (yet) handle requests that involve the GA4 BigQuery export. When asked, it will likely present you queries based on the Universal Analytics (GA3) data export model. This is understandable, because the ChatGPT model is only trained until 2021.
However, when you ask ChatGPT to support you in creating queries for the Firebase data export (GA4 has adopted the Firebase event model), it shows a glimpse of the future.
It might still be far from flawless in the specific context of GA4 data, but it can be a very helpful tool to fix general syntax problems in your SQL code.
GA4's unique count approximation demystified
One of the reasons that your BigQuery results don't match with the GA4 user interface is the usage of count approximation in the latter.
You can imagine I was very pleased to see a Google official source actively sharing a detailed blog post (reload the page if you get the message
Service Unavailable) about this topic. It includes instructions on how to replicate those counts in BigQuery, using the HLL++ algorithm.
Using this approach will make your results in BigQuery less precise, but if you need your tables to match with the GA4 user interface (or API), at least now you know how to accomplish that. One can only hope more in-depth articles will follow!
Relevant blog posts from the community
- Why are my intraday table timestamps set in the future when I query them?
- A step-by-step guide to training a machine learning model using GA4 and BigQuery ML
- How do I trigger a scheduled query when the GA4 daily export happens?
- Why the death of sessions is a myth
- Dataform setup and gentle intro based on GA4 BigQuery events
That's it for now. Thanks for reading and happy holidays!