Redshift integration (Source)

Hi Folks,

Just looking to source some data from a Redshift instance, there was no specific Redshift integration, so I tried selecting a Postgres DB. I configured everything, but during validation the script failed:
2020-10-07 04:38:29,641Z tap - INFO Running DB discovery: SELECT datname
2020-10-07 04:38:29,641Z tap - FROM pg_database
2020-10-07 04:38:29,641Z tap - WHERE datistemplate = false
2020-10-07 04:38:29,641Z tap - AND datname != ‘rdsadmin’ with itersize 20000
2020-10-07 04:38:30,435Z tap - CRITICAL Fetch ALL is not supported on single-node clusters. Please specify the fetch size (maximum 1000 for single-node clusters) or upgrade to a multi node installation.

The SQL is valid and works, so the error appears to be in the way the python driver is configured.

What should I do about this ?

Exie, if you haven’t already, please contact our support team at stitchsupport@talend.com, or by clicking the magenta icon in the lower right of the Stitch screen.

Thanks Lee,

I had a look at the Singer Tap for postgres here:

The offending line:

    found_dbs = (row[0] for row in cur.fetchall())

Could easily be swapped out for

    found_dbs = [row[0] for row in cur]

It looks like there are a few other uses of fetchAll() in there that could be swapped out pretty easily, but the bigger problem is adapting the metadata queries. The Postgres tap has moved with the times and is geared for 9+ which will fail against Redshift.

Based on that, it probably makes more sense to fork the tap than patch it for broader compatibility as you will likely damage some of the Postgres support.