Duplicate content when synchronizing aurora mysql to bigquery

aurora
bigquery
mysql
#1

Hi there

I use stitch to send data to big query, it appears than on some table I get duplicates row.

For instance I have a table user, in this table the email column is unique

if I run
SELECT count(*) FROMusergroup by email order by 1 desc

on my database I get only 1 user for each email
in big query I get 970 users for 1 email

it seems the column “_sdc_extracted_at” is creating duplicates…

Does anyone know how to fix that?

Thanks,
Victor

#2

Hey Victor,

I know we just discussed this through Stitch Support, though for the reference of other community members:

Stitch’s BigQuery destination functions in an append-only manner, where any records extracted are appended to the destination table and no de-duplication occurs during loading.

With that in mind, users can account for this duplication in their destination queries, and there’s a guide to querying append-only tables available on this page that includes examples of how users can grab the latest version of each record.