Skip to main content

GA4 | tips & tricks

How to query streaming GA4 export data and combine it with historical data in BigQuery

Businesses are finding it important to analyse data in near real-time in order to make better decisions. One way to do this is by using BigQuery streaming export to combine GA4 intraday tables with daily tables, which allows for up-to-date insights.

Analysing data in near real-time is becoming increasingly important for businesses to make the best decisions for their organisation. To get the most up-to-date insights, combining intraday tables with daily tables in Google Analytics export data in BigQuery can be a practical use case.

BigQuery streaming export (if configured in the linking process) makes data for the current day available within a few minutes.

For each day, streaming export creates one new table: events_intraday_YYYYMMDD. An internal staging table that includes records of session activity that took place during the day. Streaming export is a best-effort operation and may not include all data for reasons such as the processing of late events and/or failed uploads. Data is exported continuously throughout the day. This table can include records of a session when that session spans multiple export operations. This table is deleted when events_YYYYMMDD is complete. (source)

Limitations

Good to be aware of: the streaming export does not include user-attribution data for new users, such traffic_source.medium. For existing users, their user-attribution data is included but it takes approximately 24 hours to be fully processed. Therefore, it is recommended to not rely on the data from the streaming export and instead obtain user-attribution data from the full daily export.

Example query (intraday only)

Querying an intraday table is fairly simple. In this case we query the table for Jan 15th, 2023.

select
    event_date as date,
    count(distinct user_pseudo_id) as total_users
from
    `ga4bigquery.analytics_250794857.events_intraday_20230115`
group by
    date

Example query (combined)

The easiest way to include intraday data in your query as well as historical data is use the wildcard in combination with a wider _table_suffix filter. With this condition it will include any table in the data set that starts with events_ and contains a date with the format YYYYMMDD.

πŸ’‘
Make sure your data set doesn’t contain any other tables with a title that starts with events_, to prevent those from being included in your query results.

In the following example the current_date() is 2023-01-15. The regexp_extract(_table_suffix,'[0-9]+') in the where clause makes sure we can query the dates we specified in the dynamic date range.

If we don't apply a regexp_replace for the first column then the date value for row 1 would be intraday_20230115 instead of the format of the other dates, e.g. 20230114. This is because the _table_suffix returns all characters after events_.

select
    regexp_replace(_table_suffix,'intraday_','') as date,
    count(distinct user_pseudo_id) as total_users
from
    `ga4bigquery.analytics_250794857.events_*`
where
    regexp_extract(_table_suffix,'[0-9]+') between
    format_date('%Y%m%d',date_sub(current_date(), interval 7 day))
    and format_date('%Y%m%d', current_date())
group by
    date
order by
    date desc

Now it's your turn!

I hope you've enjoyed this and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.