Skip to main content

GA4 | tips & tricks

GA4 BigQuery export schema change log (or how to generate your own)

This blog post will guide you through how to keep track of GA4 BigQuery export changes, even when Google doesn't provide an official change log. You'll also learn how to generate your own change log.

Understanding the frequent changes in GA4 BigQuery export schema is crucial to keep your sanity. This blog post will guide you through how to keep track of these changes, even when Google doesn't provide an official change log. You'll also learn how to generate your own change log using a simple query.

Breaking changes

The GA4 BigQuery export schema can be considered a 'work in progress'. New fields are added periodically, which can cause your queries to break because the new fields are not added for historical data. BigQuery will alert you that it doesn't recognize these fields when querying a wider date range.

On top of that, Google is not very transparent about schema changes. That's why keeping track of these changes is crucial for effective data analytics. Below, you'll find an unofficial, but fairly accurate change log for your reference.

Unofficial GA4BigQuery schema change log


events_

  • 2023-10-25 - item.item_params added
  • 2023-07-17 - is_active_user added
  • 2023-05-03 - collected_traffic_source added
  • 2021-06-28 - privacy_info added
  • 2020-03-14 - ecommerce added
  • ~ 2019 - first schema release (based on Firebase export schema)

users_

  • 2023-09-18 - privacy_info added
  • ~ 2023-08 - first schema release

pseudonymous_users_

  • 2023-09-27 - privacy_info added
  • ~ 2023-08 - first schema release

Generate your own change log

Thanks to the COLOMN_FIELD_PATHS view in the INFORMATION_SCHEMA that is available to use in BigQuery, we are actually able to generate a GA4 BigQuery export schema change log for every GA4 property as well.

Use this query and you are able to see per field when it first appeared in your GA4 BigQuery export data set.

💡
Replace <gcp project> with your own project and <ga4 property id> with your GA4 property id.
select
  regexp_replace(table_name,'\\d+','') as table_name,
  field_path,
  min(parse_date('%Y%m%d',regexp_replace(table_name,'^pseudonymous_users_|^users_|^events_',''))) as first_seen,
from
  <gcp project>.analytics_<ga4 property id>.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
where
  regexp_contains(table_name,'^pseudonymous_users_|^users_|^events_\\d+')
group by
  table_name,
  field_path
order by
  first_seen desc,
  table_name

Understanding the query

  • table_name: This identifies the table name.
  • field_path: This identifies the specific field you're looking at.
  • min(parse_date(...)) as date_added: This calculates the first date the field appeared in the dataset.
  • group by field_path: This groups the results by each unique field.
  • order by date_added desc: This orders the results by the date the field was added, with the most recent first.
💡
The oldest dates in your results are not necessarily schema changes. These are the dates your dataset was first created.

Now it's your turn!

I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.