18 - Happy New Year & BigQuery updates from the latter end of 2025
Hi Folks,
First and foremost, I would like to hereby wish you a Blessed, Successful, Prosperous, Healthy and Happy New Year!
Let's have a re-cap on what happened in last November and December on the BigQuery front.
New on GA4BigQuery
The year end was extremely busy, so only one new article was published: a comprehensive study on how to control BigQuery costs for on-demand billing: how to put some guardrails in place (this part is free) and various ways to reduce the cost of storing and querying large amounts of data. Although it is a single article, it contains a lot of information: it is by far the longest one on the platform (I just did not want to split it up for the sake of it).
The LinkedIn summary of the article sparked a lot of great discussion, so stay tuned for an update as I'll be adding more sections to it, to keep everything in the same place.
Further AI (and other) functions released
Google is churning out new AI functions with lightning speed:
AI.DETECT_ANOMALIES: you can supply trend data for "training" (or comparison) and then another data where the model will find points / intervals that do not fit into the usual "trend". SQL example in this excellent LinkedIn post.
AI.EMBED: turn text, image, audio, video, or documents into embeddings.
AI.SIMILARITY: calculate similarity between pairs of text, pairs of images, or across text and images. See this post that shows how this and the above functions hang together.
There are things happening in the good old, boring data engineering world as well: the JSON_FLATTEN function is now available to unpack consecutively nested arrays. This can work wonders if you ingest custom event level data from a service (e.g. Kafka topic or a CDP), but won't have much use in the GA4 export.
New tab: Scheduling
I am not sure it's a 2025 update as I literally noticed it yesterday, but I'm including it nevertheless. The brand new “Scheduling” tab lists all scheduled assets in a single window, across all sorts of different types of pipelines (scheduled queries, Dataform, Notebooks, Data Prep etc. jobs). It also shows, for each flow, whether the last few runs were successful (somewhat potentially inspired by Airflow's DAG grid view, though this one is nowhere near that elaborate).
New Scheduling tab in BigQuery.
It can become really valuable when an org uses multiple Dataform repositories, and now they can view all their executions in the same place. In many ways it behaves like the old Scheduled Queries tab though: you can click into either job to view all the previous runs, and clicking the “Edit” button there will take you to the appropriate place i.e. an SQL editor window, or the corresponding Dataform workspace (releases & scheduling tab).
Data Transfer updates
While we're all busy analyzing the typical digital marketing data sources, BigQuery is also positioning in the cloud data warehousing space, by allowing new migrations from on-prem databases and CRM systems (all the below are in Preview):
A "query text heatmap" is available in the execution graph, highlighting the exact query text that contributes for high slot consumption. This helps optimize queries, especially when you're on slot based pricing. Example below is joining Google Ads click (GCLID) data to GA4 as a preliminary step before channel grouping (article here). A longer summary available in this LinkedIn post.
Query text heatmap example in execution graph
If you switch on Gemini for Google Cloud, you can use the "Data Insights" feature to generate metadata i.e. table and column descriptions, questions the table in question can answer and SQL on how to answer those questions. The novelty is that you can now save and publish the generated information, which can now be used to produce first-pass documentation that is a lot better than nothing. Note that switching on Gemini incurs some cost, so do consult the documentation.
Community posts around BigQuery
the Premium version of GA4Dataform now processes a selection of Google Ads tables, when the transfer is switched on. (Note that premium GA4BigQuery subscribers can avail of a discount!) It also adds sessions totals metrics to the session table, giving query cost reduction opportunities. The authors also published an example use case by plotting daily spend vs conversions to get a first pass view on diminishing returns (as an entrance gate to marketing mix modelling).
A very nice Medium article about migrating from DBT to Dataform (why and how). Note that although the linked article itself is paywalled, it contains another link to the free version of the same article.
A code example for using the rather complicated MATCH_RECOGNIZE clause (feature reported in newsletter #14)
A tip in a MeasureSlack discussion to scrub data in selected event parameters (e.g. GDPR necessity). Note that you have to be a member of the MeasureSlack workspace to see this (which is highly recommended of course).
Some gotchas on the previously released AI.FORECAST function
A clever piece of analysis for "word proximity" using regex, on an e-commerce product feedback example (to find records with words that occur close to each other)
And last, but not least...
Other relevant blog posts from the community
This section is heavily "filtered", there were so many cool things shared in the past two months. I'm also categorizing them here (which I haven't done before), because two months' of updates are a bit hard to navigate.
SegmentStream held a bold but very sharp & deep Marketing Measurement course in Nov & Dec. Now it's available as a self-paced learning package. It is paid, but it's definitely worth listening to, even if you don't agree with all their provocative statements on social media. (Edit: coupon code for $30 discount: MMO25-MJAV)
Meridian (Google/s marketing mix measurement tool) now includes a new "scenario planner" .
A video walkthrough on Google's geo experimentation package, "Matched Markets"
Strategy
An article about "Brand moments": taking CRO to a more strategic and impactful level.
OWOX has enabled the Snowflake destination for their free connectors (e.g. Google Ads, Facebook, LinkedIn). OWOX is disrupting the ELT space with their free connectors; so it's worth keeping an eye on what they are up to.
Other stuff
my favourite is this CDP simulator — if you are into that kind of thing (personalization, omnichannel marketing and real-time interactions). This is a very good illustration to show what all such a solution looks like and delivers. I think experts would also enjoy it. LinkedIn post summary here.
18 - Happy New Year & BigQuery updates from the latter end of 2025
— Balazs Vajna
18 - Happy New Year & BigQuery updates from the latter end of 2025
Hi Folks,
First and foremost, I would like to hereby wish you a Blessed, Successful, Prosperous, Healthy and Happy New Year!
Let's have a re-cap on what happened in last November and December on the BigQuery front.
New on GA4BigQuery
The year end was extremely busy, so only one new article was published: a comprehensive study on how to control BigQuery costs for on-demand billing: how to put some guardrails in place (this part is free) and various ways to reduce the cost of storing and querying large amounts of data. Although it is a single article, it contains a lot of information: it is by far the longest one on the platform (I just did not want to split it up for the sake of it).
The LinkedIn summary of the article sparked a lot of great discussion, so stay tuned for an update as I'll be adding more sections to it, to keep everything in the same place.
Further AI (and other) functions released
Google is churning out new AI functions with lightning speed:
AI.DETECT_ANOMALIES: you can supply trend data for "training" (or comparison) and then another data where the model will find points / intervals that do not fit into the usual "trend". SQL example in this excellent LinkedIn post.AI.EMBED: turn text, image, audio, video, or documents into embeddings.AI.SIMILARITY: calculate similarity between pairs of text, pairs of images, or across text and images. See this post that shows how this and the above functions hang together.There are things happening in the good old, boring data engineering world as well: the
JSON_FLATTENfunction is now available to unpack consecutively nested arrays. This can work wonders if you ingest custom event level data from a service (e.g. Kafka topic or a CDP), but won't have much use in the GA4 export.New tab: Scheduling
I am not sure it's a 2025 update as I literally noticed it yesterday, but I'm including it nevertheless. The brand new “Scheduling” tab lists all scheduled assets in a single window, across all sorts of different types of pipelines (scheduled queries, Dataform, Notebooks, Data Prep etc. jobs). It also shows, for each flow, whether the last few runs were successful (somewhat potentially inspired by Airflow's DAG grid view, though this one is nowhere near that elaborate).
It can become really valuable when an org uses multiple Dataform repositories, and now they can view all their executions in the same place. In many ways it behaves like the old Scheduled Queries tab though: you can click into either job to view all the previous runs, and clicking the “Edit” button there will take you to the appropriate place i.e. an SQL editor window, or the corresponding Dataform workspace (releases & scheduling tab).
Data Transfer updates
While we're all busy analyzing the typical digital marketing data sources, BigQuery is also positioning in the cloud data warehousing space, by allowing new migrations from on-prem databases and CRM systems (all the below are in Preview):
Other BigQuery updates
Community posts around BigQuery
The authors also published an example use case by plotting daily spend vs conversions to get a first pass view on diminishing returns (as an entrance gate to marketing mix modelling).
MATCH_RECOGNIZEclause (feature reported in newsletter #14)AI.FORECASTfunctionAnd last, but not least...
Other relevant blog posts from the community
This section is heavily "filtered", there were so many cool things shared in the past two months. I'm also categorizing them here (which I haven't done before), because two months' of updates are a bit hard to navigate.
Google Analytics
Tracking & Measurement
Marketing Measurement
Strategy
Other stuff
Digital Analytics is as vibrant as ever. Here's to a strong 2026! 🥂
Best regards,
Balazs