Added AzureBlobFileSystem support for StructuredDatasets#1109
Added AzureBlobFileSystem support for StructuredDatasets#1109wild-endeavor merged 2 commits intoflyteorg:masterfrom
Conversation
|
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Signed-off-by: Nick Müller <nmueller@blackshark.ai>
81b3370 to
82da680
Compare
Codecov Report
@@ Coverage Diff @@
## master #1109 +/- ##
=======================================
Coverage 86.95% 86.95%
=======================================
Files 276 276
Lines 25492 25493 +1
Branches 2865 2865
=======================================
+ Hits 22167 22168 +1
Misses 2847 2847
Partials 478 478
Continue to review full report at Codecov.
|
|
hey @MorpheusXAUT do you also have the data persistence plugin for abfs? is that what you meant by you previously added support for it under the hood? |
@wild-endeavor oh, good point 🤔 I believe that still hasn't been cleaned up & contributed back to this repo, sorry about that. 😕 |
|
this pr is fine. thanks! |
Signed-off-by: Nick Müller <nmueller@blackshark.ai>
|
@wild-endeavor Looking at it again, I was actually mistaken last night. We don't use an extra (custom-written) datapersistence plugin for I've added EDIT: not quite sure why that one check failed, looks unrelated to me? |
|
rerunning the failed onnx job. not entirely sure what's happening there. |
| BIGQUERY = "bq" | ||
| S3 = "s3" | ||
| ABFS = "abfs" | ||
| GCS = "gs" | ||
| LOCAL = "/" |
There was a problem hiding this comment.
Is it time to turn this into an enum, @wild-endeavor ?
|
Congrats on merging your first pull request! 🎉 |
Signed-off-by: Nick Müller <nmueller@blackshark.ai>
Signed-off-by: Nick Müller <nmueller@blackshark.ai>
TL;DR
This PR adds support for storing
StructuredDatasetsusing AzureBlobFileSystem (abfs).Type
Are all requirements met?
Complete description
As discussed before, we've added support for
abfs(usingadlfs/stowunder the hood) forStructuredDatasetsby adding it to the registered protocols for transformers.I've also noticed one file (
plugins/flytekit-spark/flytekitplugins/spark/sd_transformers.py) which still used string constants and didn't support GCS either. As we're currently not using Spark anywhere, I wasn't able to verify this change though, so I can revert those lines if you'd prefer.Tracking Issue
flyteorg/flyte#2709
Follow-up issue
NA