A true SQL delicacy – max_by
This section describes handy SQL features and functions in BigQuery. This article covers the MAX_BY() function to aggregate data more easily.
Editor's note: this post is written by Tamas Ujhelyi—follow him for handy SQL tips on BigQuery and Snowflake.
If you love SQL delicacies, you'll be a fan of this.
I've used it a lot recently in my job, and am I grateful it exists!
It's called: 𝑚𝑎𝑥_𝑏𝑦
, a convenient function in BigQuery.
It helps with a common task: getting a column's value based on the maximum value of another column.
Think of getting users' most recent order IDs.
Or the last comments under a webshop's products.
Or the last event_name
of an app user.
Normally, you could do that with a row_number()
window function (or some other way).
But max_by
isn't called a convenient function for nothing.
You can achieve the very same result by just inputting two column names:
𝑚𝑎𝑥_𝑏𝑦(𝑐𝑜𝑙𝑢𝑚𝑛_𝑦𝑜𝑢_𝑛𝑒𝑒𝑑, 𝑏𝑎𝑠𝑒𝑑_𝑜𝑛_𝑜𝑡ℎ𝑒𝑟_𝑐𝑜𝑙𝑢𝑚𝑛'𝑠_𝑚𝑎𝑥_𝑣𝑎𝑙𝑢𝑒)
So getting users' last orders would be:
𝑚𝑎𝑥_𝑏𝑦(𝑜𝑟𝑑𝑒𝑟_𝑖𝑑, 𝑜𝑟𝑑𝑒𝑟𝑒𝑑_𝑎𝑡)
(Of course, combine it with group by user_id
.)
It's cases like this where I find the more I learn about SQL, the more it fascinates me.
Anyways, I hope you'll find it as useful as I have.
And have an awesome day, okay? ✌