TheKinrar @TheKinrar

0 posts0 participants0 posts today

**Neil Craig** @tdp_org@mastodon.social · Jun 27

We (BBC) have a telemetry system to measure media streaming performance & a while back we ran an experiment to slurp off a copy of those metrics (rebuffering, media type, bit rates etc.) to my log pipeline which flows in to BigQuery.
Yesterday I cranked up the sample rate to BigQuery from 25% to 100% so we have a resilient system in case of incidents for Glastonbury, Wimbledon & the Euros (plus BAU content).
It's definitely bumped the log rate, maybe an extra billion lines today.
#BBC #BigQuery

Screenshot of a Grafana dashboard showing today's stats for our logging pipeline:
* BigQuery writes (peak): 152k rows/second
* BigQuery writes (30 day peak): 983k rows/second
* Total log files: 6.90 million
* Cloud Run errors: 2
* Written lines: 6.02 billion
* p50 Cloud Run execution time: 439 milliseconds

Continued thread

**Eric Heiken** @ericheiken@mastodon.social · Jun 12

Jun 12

Eric Heiken @ericheiken@mastodon.social

Aaaaand there it is.

GCP is officially in an active incident.

https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW

status.cloud.google.comGoogle Cloud Service Health

#gcp #googlecloud #bigquery

**Eric Heiken** @ericheiken@mastodon.social · Jun 12

Jun 12

Eric Heiken @ericheiken@mastodon.social

Google Cloud Platform seems to be having issues. I'm having problems loading stuff in BigQuery and I'm getting 503s on ETLs pushing into BQ. Are you seeing problems?
#bigquery #datascience

**Jons Mostovojs** @jonn@social.doma.dev · May 15

May 15

Jons Mostovojs @jonn@social.doma.dev

How messy is #terraform with #gcp?

I'm trying to make a system where #rust worker ingests data of HTTP requests via #cloudrun, and passes them into #bigtable, which uses further ingestion recipe to export the data into #bigquery

I have tried to make a complete terraform declaration for this, but got into permission issues, then I have tried to make a system that generates all the artefacts like service accounts and docker image and then refers to those from terraform builds, but I almost don't see a value of doing it like that.

Does anyone have an example of #cloudrun #cdc? I am new to this and I feel really slow.

#lazyweb #askfedi

**Code Enigma** @codeenigma@toots.codeenigma.com · Apr 29

Apr 29

Code Enigma @codeenigma@toots.codeenigma.com

Our head of development, Dioni, released the first version of the #BigQuery module for #Drupal last month:
https://www.drupal.org/project/bigquery_integration

Hat tip to our client the Waste and Resources Action Programme (WRAP) for agreeing to share it.

**Neil Craig** @tdp_org@mastodon.social · Apr 16

Apr 16

Neil Craig @tdp_org@mastodon.social

TIL the Tranco domain ranking data is available as a public data source in BigQuery - so you can do e.g.:

```
select
domain,
rank
from `tranco.daily.daily`
where date = date_sub(current_date(), interval 1 day)
and domain in ("bbc.co.uk", "bbc.com")
order by rank asc
```
https://tranco-list.eu

tranco-list.euA research-oriented top sites ranking hardened against manipulation - Tranco

#BigQuery #Tranco #DomainRanking

**Spatialists** @spatialists@mapstodon.space · Apr 14

Apr 14

Spatialists @spatialists@mapstodon.space

New geospatial data in Google BigQuery: #Google is adding geospatial content to its #DWH solution #BigQuery. Additions encompass annotated Street View #imagery, Places (#POI) data, and #traffic data, among others.
https://spatialists.ch/posts/2025/04-14-new-geospatial-data-in-google-bigquery/ #GIS #GISchat #geospatial #SwissGIS

spatialists.ch – geospatial newsNew geospatial data in Google BigQuery – spatialists.ch – geospatial news

More from

Spatialists

**Clinton** @clintonsears@mastodon.social · Apr 3

Apr 3

Clinton @clintonsears@mastodon.social

Diving into #Vermont wildlife for the #30DayChartChallenge "circle" day! Using #Python & #plotly to compare monthly #Moose and #BlackBear sightings Data wrangled with #BigQuery and #SQL. Any guesses which animal is seen more consistently throughout the year? #DataViz #Wildlife #RadialChart

Who's Out and About? Tracking Vermont's Big Two by Month
While moose are a year-round presence, black bears hibernate during the winter months
Moose (19,670 Observations) Black Bear (2,059 Observations)
Data Source: https://www.sciencebase.gov/catalog/item/663ce56cd34e77890839%1c8
Observations are from trail cameras located in Caledonia, Essex, and Orleans Counties, (2014-2022).

**N-gated Hacker News** @ngate@mastodon.social · Mar 4

Mar 4

N-gated Hacker News @ngate@mastodon.social

"Map of Python" is the digital dabbling of a cartographer, lost in the vast jungle of 500,000+ #Python packages, desperately trying to create art from JSON blobs. But hey, at least there's #BigQuery to save the day from the horror of actually downloading data!
https://fi-le.net/pypi/ #DataVisualization #Cartography #ArtInTech #HackerNews #ngated

fi-le.netfi-le.netfi-le.net, the Fiefdom of Files

**James C** @jamescooke@fosstodon.org · Feb 27

Feb 27

James C @jamescooke@fosstodon.org

> We are adding additional services to your project(s) to create a unified platform for AI-powered data analytics.

#BigQuery

**Stuifbergen.com** @blog@stuifbergen.com · Feb 26

Feb 26

Stuifbergen.com @blog@stuifbergen.com

GA4 intraday exports and cookieless pings

I build a lot of reports for clients that use Big Query GA4 as source.

Now.. that works like a charm. But.. you will need to wait some time to get processed data from the events_ tables.

More recent data will appear in the streaming _intraday_ tables, if you have that enabled. But.. that data is not always complete! Especially when your site has consent mode enabled, and does not set a cookie until after consent.

Here’s how it works:

The scenario

Someone visits the site for the first time (source: some campaign), gets confronted with the cookie banner, and then clicks accept.

We tagged the site correctly, so this is what happens

a page_view event triggers (with URL parameters) – and notices analytics consent is denied (the default)
the tracker attaches some parameters to this hit, to help processing
- a session is started
- this is the first visit
there is an item list on the page: view_item_list event is triggered
the cookiebanner pops up (event: cookiebar_view)
the visitor clicks accept (event: cookiebar_accept) and the tracker gets sent a granted signal
now the cookie can be uses, and is attached to an automatic event user_engagement

Sounds simple. Now, let’s see what is streamed into Big Query:

The streaming data gap

Basically, the intraday tables store what happens, as it happens.

cookie field ( user_pseudo_id ) is filled in on hits on/after consent
cookie field is NULL for hits before consent

As it should be, right? But there’s a third bullet:

first batch of events will not appear in the intraday table!

Here’s what we see (most recent hit first, read from bottom to top)

the page_view is missing in the streaming table
the collected_traffic_source information is missing (it is always only filled in on the first batch of events)
As a byproduct, we also do not see the session start and first visit
the other events are all sent without a cookie
after consent, we see the user_pseudo_id – finally

The next day.. Google has glued it all together

Processed data: every event has a row

The following is in the processed data: (most recent hit first, read from bottom to top)

The page_view event and all other events leading up to the consent have a cookie attached to it! Google rescued that information
the “Attached” parameters to the hit expand to two extra rows
- session_start
- first_visit
we have source information: collected_traffic_source is present – on the first batch, as normal

Not visible in the screenshot: session_traffic_source_last_click – the session information is properly filled in.

The consequences

If you decide to use intraday tables in your Big Query reports: be aware that although the information is fresh (no pun intended, GA360 users), it’s incomplete

intraday misses crucial events, namely the first batch (most often a page_view)
- bye bye landing_page reports based on page_views
- bye bye traffic source reports based on session_traffic_source_last_click or collected_traffic_source
intraday misses cookies on some events
- which is not too much of an issue, really

Your experiences?

Do you use intraday tables in your models? Have you found clever workarounds to get the correct data in?

Let me know! Drop a comment here, or send me a bluesky message!

Still here?

Check out GA4Dataform – a product I’ve helped build that turns the GA4 Big Query exports into usable tables!

Related posts:

Google Analytics 4 truncates page location Making sense of Event Parameters in GA4 Make your GA4 life easier: Some powertips!Smart incremental GA4 tables in Dataform

#bigQuery #consentMode #cookies

**Gus** @goosewastaken@mastodon.social · Feb 11

Feb 11

Gus @goosewastaken@mastodon.social

DataTalksClub's Data Engineering Zoomcamp Week 3 - BigQuery as a data warehousing solution.

For this week's module, we used Google's BigQuery to read Parquet files from a GCS bucket, and compare querying on regular, external and partitioned/clustered tables.

My answers to this module: https://github.com/goosethedev/de-zoomcamp-2025/blob/ecb1f1f3fc69b8d10703eb07328567dab2acf688/03-data-warehousing/README.md

#dataengineering #bigquery #bootcamp

**Neil Craig** @tdp_org@mastodon.social · Jan 30

Jan 30

Neil Craig @tdp_org@mastodon.social

FFS. Turns out (after I built a feature) that you can't supply a schema for BigQuery Materialised Views.

> Error: googleapi: Error 400: Schema field shouldn't be used as input with a materialized view, invalid

So it's impossible to have column descriptions for MVs? That sucks.

#BigQuery

**Neil Craig** @tdp_org@mastodon.social · Jan 20

Jan 20

Neil Craig @tdp_org@mastodon.social

Whilst migrating our log pipeline to use the BigQuery Storage API & thus end-to-end streaming of data from Storage (GCS) via Eventarc & Cloud Run (read, transform, enrich - NodeJS) to BigQuery, I tested some big files, many times the largest we've ever seen in the wild.

It runs at just over 3 log lines/rows per millisecond end-to-end (i.e. inc. writing to BigQuery) over 3.2M log lines.

Would be interested to know how that compares with similar systems.

#BBC #NodeJS #BigQuery

**Jules** @zjuul@mastodon.social · Jan 13

Jan 13

Jules @zjuul@mastodon.social

We released our product as Open Source today!

So if you're into Google Analytics and Big Query: check it out, it's free to use.

Tags: #gpl #ga4 #bigquery #measure #sql #dbt #dataform

https://ga4dataform.com/were-open-source-ga4dataform-community-is-released/

GA4Dataform · Jan 13We're open source! GA4Dataform Community is released - GA4DataformWe're thrilled to announce that we've made our GA4Dataform Core product Open Source, and published it on Github under the name GA4Datafom Community.

**Nicolas Fränkel** @frankel@mastodon.top · Jan 10 *

Jan 10 *

Nicolas Fränkel @frankel@mastodon.top

What's the best way to wake up than to try to fix a failing GitHub Workflow, and to finally realize #Google deleted the underlying #BigQuery table because the default expiration date is 60 days?

Can't there be a big warning when you create the table?

#DXFailure IMHO

**Denis** @constantorbit@hachyderm.io · Dec 9, 2024

Dec 9, 2024

Denis @constantorbit@hachyderm.io

Anyone else who use BigQuery seeing elevated errors? 502s?

#gcp #bigQuery

(I do miss the days when Twitter was a reliable communication path for finding out down-for-everyone-or-just-me?)

**Neil Craig** @tdp_org@mastodon.social · Dec 5, 2024

Dec 5, 2024

Neil Craig @tdp_org@mastodon.social

After several iterations, I think I've finally got my log ingest pipeline working properly, at scale, using the #BigQuery Storage API.
Some complications with migrating from the "legacy" "streaming" (it's not in the sense of code) API have been really hard to deal with e.g.:
* A single row in a write fail means the entire write fails
* SQL column defaults don't apply unless you specifically configure them to
* 10MB/write limit
I rewrote the whole thing today & finally things are looking good!

Continued thread

**Joe** @hungryjoe@functional.cafe · Dec 5, 2024

Dec 5, 2024

Joe @hungryjoe@functional.cafe

#Snowflake #Terraform and #BigQuery

Snowflake (the data warehouse) tree ornament

**Knowledge Zone** @kzoneind@mstdn.social · Nov 24, 2024

Nov 24, 2024

Knowledge Zone @kzoneind@mstdn.social

#ITByte: Amazon #Redshift and Google #BigQuery are two of the most popular cloud #Data warehouses: two comparable fully managed petabyte-scale cloud data warehouses.

Here is a short comparison between the two.

https://knowledgezone.co.in/posts/Cloud-Datawarehouse--Amazon-RedShift-vs-Google-BigQuery-637f21808e87f5ff4977d579

Recent searches

Search options

Administered by:

Server stats:

#bigquery