Destination Databricks: Implement refreshes by edgao · Pull Request #40692 · airbytehq/airbyte (original) (raw)
closes https://github.com/airbytehq/airbyte-internal-issues/issues/8534. closes https://github.com/airbytehq/airbyte-internal-issues/issues/8835. closes https://github.com/airbytehq/airbyte-internal-issues/issues/8857#issuecomment-2259169235. Structurally identical to all the other refreshes PRs (e.g. #38713).
❗ this PR also unpins the current version, i.e. we will release the connector publicly here.
As a refresher:
- pass the raw table suffix around to the relevant places
- DestinationHandler needs to fetch an InitialStatus for the temp raw table in addition to the real raw table
- I did some minor code clarification in this file
- StorageOperation needs to implement some new methods
- See the new integration test class for how we verify those new methods
- some code shuffling in the StreamOperation to enable that integration test
- some new expectedrecords.jsonl files (CDK bump pulled in some new test cases)
- fixes a bug in
check
(wasn't deleting the test table). Added a test case for this.- (this only really matters for us - we had an older test table without generation_id, which caused check to fail. Real users wouldn't run into this... but it feels like best practice to clean up ourselves)
- fixes a bug in StreamOperation, where we would write an empty file, which then caused an error in the COPY command
I'll do a prerelease image + set up a sync in the perf test workspace, but in the meantime, this should be ready for review.