Importing GA to postgresql, inconsistencies when comparing results

Hello,

I’m using stitch to import data from google analytics to a postgresql database. The import works, but when I compare the results from querying postgresql and GA directly, the results differ.

I’m testing the metric ‘ga:newUser’ with the following query in postgresql:
SELECT COUNT(report.“newUsers”)
FROM google_user_metrics_2.report as report
WHERE
report.“newUsers” = 1
AND report.“start-date”::date = ‘{{start date}}’

Things I’ve tried and checked

  • deleting the integration and creating a new one
  • same ga keys are being used
  • different conversions for the start-date, tried with ranges as well
  • testing a different metric, ‘ga:users’ also gives inconsistent results
  • checked the sdc_rejected table for entries, it’s empty

So I’m wondering, are there other things I could check and what could cause these inconsistencies?

Hi there,

There are a few common causes for discrepancies of this nature when working with Google Analytics data:

  1. Stitch’s Google Analytics integration requests reports from Google’s Core Reporting API at the daily level, where the start-date and end-date are the same value (eg start-date: 2019-01-01; end-date: 2019-01-01). As a result, if comparisons are made against reports within Google’s UI that use a wider date range (eg start-date: 2019-01-01; end-date: 2019-01-02), then the input for these reports would differ and could wholly account for the variance you’re seeing, even if you’re still segmenting by date within those reports.

  2. Due to Google’s data sampling thresholds, if the volume of sessions for the combined metrics and dimensions exceeds 500,000, then any reports returned for these properties will be subject to data sampling in both the Google Analytics UI and from their Core Reporting API. The impact is similar to that of the issue above, where in this case the input for both reports could come from different samples of the source data, and consequently yield different results.

  3. If you’re working with custom dimensions, it’s also possible that NULL values for these dimensions are resulting in any lines that would include NULL values being excluded from the Core Reporting API’s results (where they are conversly included in the UI results). In order to rule this out, you would want to export any custom flat-table reports you’ve built within the Google Analytics UI to compare those exported results against Stitch-replicated data. This page explains this distinction in a bit more detail, which may be helpful to review if you’re working with custom dimensions.

If none of these necessarily accounts for the discrepancy you’re seeing, Stitch’s support team will be able to assist as long as you’re able to provide row-level examples of the discrepancy you’re seeing, and we outline the information that we’ll need to assist in the Contacting Support section of our Data Discrepancy Troubleshooting Guide. You’ll be able to reach Stitch support via in-application chat or via email at support@stitchdata.com

Thanks for your detailed reply, in my case it was point 1. When comparing the values from a single day (using the same start date and end date in GA) the values correspond.