🎉 Destination Databricks: Support Azure storage (#15140) by abaerptc · Pull Request #15329 · airbytehq/airbyte (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation25 Commits12 Checks0 Files changed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
What
Adds Azure storage as an option for Databricks destination.
This PR resolves #15140.
How
Adds Azure storage as an alternative to S3.
Recommended reading order
*.json
- spec and sample configsDatabricks*Config.java
- abstracted cloud storage, with two implementations (S3 and Azure)Databricks*Destination.java
- switches between S3 and Azure implementations depending on configDatabricks*StreamCopierFactory.java
- abstracted, with two implementations (S3 and Azure)Databricks*StreamCopier.java
- abstracted, with existing S3 implementation refactored and Azure implementation addedDatabricksSqlOperations.java
- minor change to allow execution of multiple statementsDockerfile
- bumped versionbuild.gradle
- new depsAzureBlobStorageConnectionChecker.java
- adding new constructor to allow reuse in Databricks connector*Test.java
*.md
- docs
🚨 User Impact 🚨
No breaking changes.
Pre-merge Checklist
Updating a connector
Community member or Airbyter
- Grant edit access to maintainers (instructions)
- Secrets in the connector's spec are annotated with
airbyte_secret
- Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
. - Code reviews completed
- Documentation updated
- Connector's
README.md
- Connector's
bootstrap.md
. See description and examples - Changelog updated in
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog example
- Connector's
- PR name follows PR naming conventions
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. Instructions.
- /test connector=connectors/ command is passing
- New Connector version released on Dockerhub and connector version bumped by running the
/publish
command described here
Tests
None.
Cannot run S3 acceptance test locally.
All committers have signed the CLA.
The Azure integration test passes, but I don't have access to Databricks on AWS, and it's proving difficult to configure Azure Databricks to write to S3, so I'm not able to fully run the S3 integration tests. I have confirmed that the data gets written to S3 fine, but still need to confirm that the tables can be created from the S3 location in Databricks. Could someone from Airbyte please trigger the integration tests to run from this PR?
Member
marcosmarxm commented
•
edited by github-actionsbot
Loading
I pushed a new commit that should fix the error seen in the previous run of the S3 acceptance test.
The Azure acceptance test will require the addition of a new secrets file by an Airbyte member (see sample_secrets/azure_config.json).
/test connector=connectors/destination-databricks
I added azure_config to Airbyte secret manager.
@abaerptc, thank you for the PR! Could you add me as a collaborator to your fork? In this way I can push changes to this branch if needed as I am working on merging it.
@marcosmarxm, there is another PR pending merging for Databricks. Would you mind me doing the coordination and merging it later?
Member
marcosmarxm commented
•
edited by github-actionsbot
Loading
@marcosmarxm, there is another PR pending merging for Databricks. Would you mind me doing the coordination and merging it later?
Sure @tuliren I'd just to confirm tests are running properly here.
Member
marcosmarxm commented
•
edited by github-actionsbot
Loading
@tuliren You're added as a collaborator.
@abaerptc now the integration tests are running with your changes!
@abaerptc now the integration tests are running with your changes!
@marcosmarxm I see a few failures with messages like "Invalid configuration value detected for fs.azure.account.key". I think you need to give your Databricks instance access to your Azure account via the Spark configuration property fs.azure.account.key
.
Awesome PR @abaerptc !
Just a side note, the AWS version supports destination pattern matching (see airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksS3StorageConfig.java), is this something you would consider adding? (of course, I am happy to help!)
Awesome PR @abaerptc !
Just a side note, the AWS version supports destination pattern matching (see airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksS3StorageConfig.java), is this something you would consider adding? (of course, I am happy to help!)
@TamGB I don't object to having pattern matching added. However, I don't have the availability to work on it. Please go ahead and implement it if you'd find it valuable.
Is this abstracted enough to add support for Databricks on Google Cloud (i.e. add GCS as a target bucket along with AWS S3, and Azure Blob)?
Is this abstracted enough to add support for Databricks on Google Cloud (i.e. add GCS as a target bucket along with AWS S3, and Azure Blob)?
You should be able to add another implementation of the data storage method pretty easily based on what I've done here, yes.
@abaerptc looks like there are conflicts, could you resolve them please?
@tuliren there is a lot of community interest for this PR, would you be able to move it up in your queue for review/merging? Thank you in advance :)
So sorry @abaerptc for the long delay to reply. @tuliren is actively working to improve databricks destination for other features and to implement your contribution too. Soon he will return to you!
Sorry about the delay.
To test the Azure integration, we would need a separate Databricks cluster. I was not able to do that so far. Given that it has been delayed for a long time, I will do some local testing, and mute the integration test related to Azure for now.
I target to merge it today.
Contributor
tuliren commented
•
edited by github-actionsbot
Loading
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Contributor
tuliren commented
•
edited by github-actionsbot
Loading
/publish connector=connectors/destination-databricks run-tests=false
🕑 Publishing the following connectors:
connectors/destination-databricks
https://github.com/airbytehq/airbyte/actions/runs/3255000004
Connector | Did it publish? | Were definitions generated? |
---|---|---|
connectors/destination-databricks | ✅ | ✅ |
if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️
@abaerptc, thank you very much for your contribution. Sorry about the long delay.
Please don't forget to remove me from your repo.
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request
🎉 Destination Databricks: Support Azure storage (airbytehq#15140)
Update docs and version.
Revert unintentional change to S3 config parsing.
Revert unnecessary additional table property.
change spec.json
Ignore azure integration test
Add issue link
auto-bump connector version
Co-authored-by: Marcos Marx marcosmarxm@users.noreply.github.com Co-authored-by: marcosmarxm marcosmarxm@gmail.com Co-authored-by: Liren Tu tuliren@gmail.com Co-authored-by: Octavia Squidington III octavia-squidington-iii@users.noreply.github.com