diff --git a/docs/integrations/ai/huggingface.mdx b/docs/integrations/ai/huggingface.mdx index 550609e..dd28583 100644 --- a/docs/integrations/ai/huggingface.mdx +++ b/docs/integrations/ai/huggingface.mdx @@ -1,10 +1,12 @@ --- -title: "Hugging Face-Lance Integration" +title: "Hugging Face Hub" sidebarTitle: "Hugging Face" -description: "Use LanceDB directly on Hugging Face-hosted Lance datasets for multimodal search and retrieval." +description: "Use LanceDB directly on Lance datasets hosted on the Hugging Face Hub for multimodal search and retrieval." --- -LanceDB can open Lance datasets hosted on the [Hugging Face Hub](https://huggingface.co/datasets?format=format:lance) with `hf://` URIs. +[Hugging Face Hub](https://huggingface.co/datasets?format=format:lance&sort=trending) is a popular platform for sharing machine learning datasets, models, and other resources. + +LanceDB can directly scan Lance datasets hosted on the [Hugging Face Hub](https://huggingface.co/datasets?format=format:lance) with `hf://` URIs. This is enabled under the hood by the [lance-huggingface](https://lance.org/integrations/huggingface/) integration that allows users to stream Lance datasets directly from Hugging Face without needing to download them first. diff --git a/docs/tables/index.mdx b/docs/tables/index.mdx index b982c1a..a89769e 100644 --- a/docs/tables/index.mdx +++ b/docs/tables/index.mdx @@ -6,7 +6,7 @@ icon: "table" keywords: ["create table", "polars", "pandas", "pyarrow", "dataframe", "nested data"] --- -import { PyConnect, TsConnect, RsConnect } from '/snippets/connection.mdx'; +import { PyConnect, PyConnectCloud, TsConnect, TsConnectCloud, RsConnect, RsConnectCloud } from '/snippets/connection.mdx'; import { PyBasicImports, PyDataLoad, @@ -139,14 +139,16 @@ with several integer fields, indicating each character's attributes. ] ``` - + The `vector` arrays here are synthetic and for demonstration purposes only. In your real-world applications, you'd generate these vectors from the raw text fields using a suitable embedding model. - + ## Connect to a database -We start by connecting to a LanceDB database path. +### Option 1: Local database + +We start by connecting to a LanceDB database path. The example below uses a local path in LanceDB OSS. @@ -162,20 +164,38 @@ We start by connecting to a LanceDB database path. -If you're using LanceDB Enterprise, replace the local connection string -with the appropriate remote URI and authentication details. +### Option 2: Remote database - -**Working with remote tables** +You can also connect LanceDB OSS directly to object storage. For credentials, endpoints, and provider-specific options, see +[Configuring storage](/storage/configuration). -When you connect to a remote URI (Cloud/Enterprise), `open_table(...)` returns a *remote* table. +If you're using a managed LanceDB service on either LanceDB Cloud or Enterprise, you can connect using a `db://` URI, +along with any encessary credentials. Simply replace the local path with a remote `uri` +that points to where your data is stored, and you're ready to go. + + + + {PyConnectCloud} + + + + {TsConnectCloud} + + + + {RsConnectCloud} + + + +To learn more about LanceDB Enterprise, see the [Enterprise documentation](/enterprise). + + +- When you connect to a remote URI (Cloud/Enterprise), `open_table(...)` returns a *remote* table. Remote tables support core operations (ingest, search, update, delete), but some convenience methods for bulk data export are not available. - -In the Python SDK, `table.to_arrow()` and `table.to_pandas()` are not implemented for remote tables. +- In the Python SDK, `table.to_arrow()` and `table.to_pandas()` are not implemented for remote tables. To retrieve data, use search queries instead: `table.search(query).limit(n).to_arrow()`. - - + ## Create a table and ingest data @@ -232,9 +252,9 @@ initial testing). - + If you want to avoid overwriting an existing table, omit the overwrite mode. - + ### From Pandas DataFrames Python Only