-
Notifications
You must be signed in to change notification settings - Fork 402
[feat] add Nessie integration doc #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive documentation for integrating Apache Doris with Nessie, an open-source transactional catalog for data lakes. The documentation covers AWS environment setup, Nessie deployment using Docker Compose, and connecting Doris to Nessie for Iceberg data management.
Key changes:
- Added complete Nessie integration guide with step-by-step instructions for both credential vending and static credentials modes
- Provided detailed AWS IAM configuration for secure credential management
- Included practical examples for creating and querying Iceberg tables through Nessie
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| sidebars.ts | Added doris-nessie entry to the main sidebar navigation under lakehouse best practices |
| versioned_sidebars/version-4.x-sidebars.json | Added doris-nessie entry to version 4.x sidebar navigation |
| versioned_sidebars/version-3.x-sidebars.json | Added doris-nessie entry to version 3.x sidebar navigation |
| docs/lakehouse/best-practices/doris-nessie.md | New English documentation for Nessie integration in the current version |
| versioned_docs/version-4.x/lakehouse/best-practices/doris-nessie.md | New English documentation for Nessie integration in version 4.x |
| versioned_docs/version-3.x/lakehouse/best-practices/doris-nessie.md | New English documentation for Nessie integration in version 3.x |
| i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/doris-nessie.md | New Chinese documentation for Nessie integration in the current version |
| i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/doris-nessie.md | New Chinese documentation for Nessie integration in version 4.x |
| i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/doris-nessie.md | New Chinese documentation for Nessie integration in version 3.x |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### 1.2 Create IAM Role for Object Storage Access (Optional) | ||
|
|
||
| If you plan to use Credential Vending mode, you need to create an IAM role for Nessie to use through the STS AssumeRole mechanism. This design follows the security best practices of least privilege principle and separation of duties. | ||
|
|
||
| 1. Create trust policy file | ||
|
|
||
| Create `nessie-trust-policy.json` file: | ||
|
|
||
| ```bash | ||
| cat > nessie-trust-policy.json << 'EOF' | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_USER" | ||
| }, | ||
| "Action": "sts:AssumeRole" | ||
| } | ||
| ] | ||
| } | ||
| EOF | ||
| ``` |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states that the IAM role is created "for Nessie to use through the STS AssumeRole mechanism," but the trust policy actually allows an IAM user to assume the role, not Nessie itself. The trust policy should ideally allow the Nessie service (running on EC2 or ECS with an instance profile) to assume this role, or the description should be clarified to explain that Nessie uses the user's credentials to assume the role.
| 's3.region' = 'us-east-1' | ||
| ); | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.
| > 提示:上述示例中的 `YOUR_ACCESS_KEY` 和 `YOUR_SECRET_KEY` 仅为占位符,用于演示配置方式。在实际环境中请替换为真实凭证,并避免在生产环境中直接硬编码访问密钥,优先使用前文所示的凭证下发等更安全的凭证管理方式。 |
| ### 1.2 创建用于对象存储访问的 IAM 角色(可选) | ||
|
|
||
| 如果您计划使用 Credential Vending 模式,需要创建一个 IAM 角色供 Nessie 通过 STS AssumeRole 机制使用。这种设计遵循最小权限原则和职责分离的安全最佳实践。 | ||
|
|
||
| 1. 创建信任策略文件 | ||
|
|
||
| 创建 `nessie-trust-policy.json` 文件: | ||
|
|
||
| ```bash | ||
| cat > nessie-trust-policy.json << 'EOF' | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_USER" | ||
| }, | ||
| "Action": "sts:AssumeRole" | ||
| } | ||
| ] | ||
| } | ||
| EOF | ||
| ``` |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states that the IAM role is created for Nessie to use through the STS AssumeRole mechanism, but the trust policy actually allows an IAM user to assume the role, not Nessie itself. The trust policy should ideally allow the Nessie service (running on EC2 or ECS with an instance profile) to assume this role, or the description should be clarified to explain that Nessie uses the user's credentials to assume the role.
|
|
||
| Create a `.env` file to store AWS credentials: | ||
|
|
||
| ```bash |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).
| ```bash | |
| ```bash | |
| # SECURITY NOTE: | |
| # This file contains sensitive credentials. Do NOT commit `.env` to version control | |
| # and restrict its permissions so that only your user can read it, for example: | |
| # chmod 600 .env |
| AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY | ||
| AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).
| > Security note: The `.env` file contains sensitive credentials. Do **not** commit it to version control (e.g., add `.env` to `.gitignore`), and ensure it has appropriate file permissions so that only authorized users can read it. |
| AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY | ||
| AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).
| > **安全提示:** `.env` 文件中通常包含访问密钥等敏感凭证,请勿将该文件提交到版本控制系统(如 Git),并确保在生产环境中为该文件设置严格的访问权限,仅允许必要的服务账户访问。 |
| 's3.region' = 'us-east-1' | ||
| ); | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.
| > 注意:上述示例中的 `YOUR_ACCESS_KEY` 和 `YOUR_SECRET_KEY` 仅为占位符,用于演示配置格式。请在实际使用时替换为真实凭证,并在生产环境中优先采用前文所述的凭证分发/临时凭证等更安全的凭证管理方式,避免在配置中硬编码长期有效的访问密钥。 |
| 's3.region' = 'us-east-1' | ||
| ); | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.
| > Note: The values `YOUR_ACCESS_KEY` and `YOUR_SECRET_KEY` are placeholders. Replace them with your actual credentials for testing only. For production environments, avoid hardcoding credentials and prefer a more secure credential management approach (such as the credential vending method described earlier in this document). |
| AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY | ||
| AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY | ||
| ``` | ||
|
|
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).
| > **Security note:** The `.env` file contains sensitive credentials. Do not commit it to version control (for example, add `.env` to `.gitignore`), and ensure its file permissions restrict access to only the necessary users or services. |
| Create a `.env` file to store AWS credentials: | ||
|
|
||
| ```bash |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).
| Create a `.env` file to store AWS credentials: | |
| ```bash | |
| Create a `.env` file to store AWS credentials. **This file contains sensitive information and must not be committed to version control; ensure it is stored securely with restrictive permissions (for example, `chmod 600 .env`).** | |
| ```bash | |
| # WARNING: This file contains sensitive AWS credentials. | |
| # Do NOT commit this file to version control and restrict its access, for example: | |
| # chmod 600 .env |
Versions
Languages
Docs Checklist