-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Some libraries, such as polars and pandas, have an almost seamless method for interacting with cloud storage paths.
e.g.:
import polars as pl
pl.scan_csv('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'}).collect()This is nice, because I don't need to import any other libraries, setup credentials or blob clients, etc.
It automatically finds any available credentials in my local environment, presumably with something like DefaultAzureCredential.
This means that when testing locally, I just need to be authenticated with Azure CLI, and everything just works.
I don't even need to manually specify environment variables.
It also means that I can deploy the same code to the server, and it will automatically find the appropriate environment variables to authenticate as a service principal with AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.
I may have missed something, but it seems that cloudpathlib has not enabled this kind of automatic credential detection with DefaultAzureCredential. Instead, I need to do the following to get an authenticated working CloudPath:
from azure.identity import DefaultAzureCredential
from cloudpathlib import CloudPath, AzureBlobClient
credential = DefaultAzureCredential()
client = AzureBlobClient(account_url="https://mystorageaccount.blob.core.windows.net", credential=credential)
path = CloudPath('az://container/path/to/file.csv', client=client)Ideally, it would be nice to be able to do the setup automatically.
I'm imagining the following future state:
from cloudpathlib import CloudPath
path = CloudPath('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'})(There may be a nicer way to specify the account name. I'm just copying the API from polars and pandas here. I kind of wish that it was standard to include the account name in the path somehow, as passing the account name in separately feels clunky to me. It would be nice if we could use az://mystorageaccount/container/...)
See the documentation for DefaultAzureCredential. (There's a reason it's called Default!):
- DefaultAzureCredential documentation
- Azure Storage DefaultAzureCredential examples
- Azure Identity Overview (including DefaultAzureCredential examples)
Note: If you are using fsspec + adlfs, adlfs requires the storage option anon=False to be set to enable DefaultAzureCredential.
For example, when using pandas, you must specify storage_options={'anon': False}.
When using fsspec directly, you need to pass it as follows:
fs = fsspec.filesystem('az', account_name='mystorageaccount', anon=False)For more details, see:
https://github.com/fsspec/adlfs#setting-credentials