Skip to main content

UA | Universal Analytics

Tutorial: How to query multiple 'ga_session' tables with '_table_suffix'

Every day of Google Analytics data in BigQuery is stored in a separate table. This is how to query multiple days at once.

πŸ’‘
This article is about GA3 - Universal Analytics

Every day of Google Analytics data in BigQuery is stored in a separate table. If you only need data from one day (let's say August 1st, 2016) the from clause in your query will look something like this:

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_20160801

But in most cases you will need to query a larger period of time. Enter _table_suffix. More details on the general usage here, but to be able to query multiple tables of Google Analytics data you only need to look at the example queries I present you here.

Note that you can use static dates, dynamic dates for a rolling period (the last 90 days) or combine static and dynamic dates. It is also possible to include the intraday table in your query.

All tables in data set

This is the easiest one. Just use the asterisk as a wildcard to match any table in the data set that starts with ga_sessions_. Note: this will also include the intraday table (more on this below).

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`

Static date range

When you perform an analysis on a static data range you should use fixed start and end dates. In this example we select August 1st 2016 to August 1st 2017.

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
  _table_suffix between '20160801' and '20170801'

Dynamic date range

In this example we select a rolling last 30 day period: today minus 30 days until and including yesterday.

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
  _table_suffix between format_date('%Y%m%d',date_sub(current_date(), interval 30 day))
  and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))

Combination of static and dynamic dates

We know we have 366 day tables in our sample data set, so we could use a fixed end date here (20170801), but usually I prefer a combination of a fixed start date and a dynamic end date (in this case: today minus one).

If new data is added to our data set it will automatically included in our query. In this example we select the data range August 1st 2016 up to and including yesterday. BigQuery will figure out if the tables actually exist and will only return data for tables that match the conditions and do exist.

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
  _table_suffix between '20160801' 
  and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))

Intraday tables

For every Google Analytics view that is exported to BigQuery, a ga_sessions_intraday_ table will be exported multiple times a day as well. Notice the different format for historical data (yesterday and beyond):

ga_sessions_yyyymmdd

versus intraday:

ga_sessions_intraday_yyyymmdd

The easiest way to include intraday data in your query as well as historical data is use the wildcard in combination with a wider _table_suffix filter. With this condition it will include any table in the data set that starts with ga_sessions_ and contains a date with the format yyyymmdd.

select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
  regexp_extract(_table_suffix, r'[0-9]+') between format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
  and format_date('%Y%m%d', current_date())

Note: make sure your data set doesn’t contain any other tables with a title that starts with ga_sessions_!