Skip to content

Conversation

@xylaaaaa
Copy link
Contributor

Versions

  • [✅ ] dev
  • [ ✅] 4.x
  • [ ✅] 3.x
  • 2.1

Languages

  • [ ✅] Chinese
  • [ ✅] English

Docs Checklist

  • Checked by AI
  • Test Cases Built

Copilot AI review requested due to automatic review settings December 30, 2025 03:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for integrating Apache Doris with Nessie, an open-source transactional catalog for data lakes. The documentation covers AWS environment setup, Nessie deployment using Docker Compose, and connecting Doris to Nessie for Iceberg data management.

Key changes:

  • Added complete Nessie integration guide with step-by-step instructions for both credential vending and static credentials modes
  • Provided detailed AWS IAM configuration for secure credential management
  • Included practical examples for creating and querying Iceberg tables through Nessie

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
sidebars.ts Added doris-nessie entry to the main sidebar navigation under lakehouse best practices
versioned_sidebars/version-4.x-sidebars.json Added doris-nessie entry to version 4.x sidebar navigation
versioned_sidebars/version-3.x-sidebars.json Added doris-nessie entry to version 3.x sidebar navigation
docs/lakehouse/best-practices/doris-nessie.md New English documentation for Nessie integration in the current version
versioned_docs/version-4.x/lakehouse/best-practices/doris-nessie.md New English documentation for Nessie integration in version 4.x
versioned_docs/version-3.x/lakehouse/best-practices/doris-nessie.md New English documentation for Nessie integration in version 3.x
i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/best-practices/doris-nessie.md New Chinese documentation for Nessie integration in the current version
i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/best-practices/doris-nessie.md New Chinese documentation for Nessie integration in version 4.x
i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/best-practices/doris-nessie.md New Chinese documentation for Nessie integration in version 3.x

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +58
### 1.2 Create IAM Role for Object Storage Access (Optional)

If you plan to use Credential Vending mode, you need to create an IAM role for Nessie to use through the STS AssumeRole mechanism. This design follows the security best practices of least privilege principle and separation of duties.

1. Create trust policy file

Create `nessie-trust-policy.json` file:

```bash
cat > nessie-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_USER"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
```
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states that the IAM role is created "for Nessie to use through the STS AssumeRole mechanism," but the trust policy actually allows an IAM user to assume the role, not Nessie itself. The trust policy should ideally allow the Nessie service (running on EC2 or ECS with an instance profile) to assume this role, or the description should be clarified to explain that Nessie uses the user's credentials to assume the role.

Copilot uses AI. Check for mistakes.
's3.region' = 'us-east-1'
);
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.

Suggested change
> 提示:上述示例中的 `YOUR_ACCESS_KEY``YOUR_SECRET_KEY` 仅为占位符,用于演示配置方式。在实际环境中请替换为真实凭证,并避免在生产环境中直接硬编码访问密钥,优先使用前文所示的凭证下发等更安全的凭证管理方式。

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +58
### 1.2 创建用于对象存储访问的 IAM 角色(可选)

如果您计划使用 Credential Vending 模式,需要创建一个 IAM 角色供 Nessie 通过 STS AssumeRole 机制使用。这种设计遵循最小权限原则和职责分离的安全最佳实践。

1. 创建信任策略文件

创建 `nessie-trust-policy.json` 文件:

```bash
cat > nessie-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:user/YOUR_USER"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
```
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states that the IAM role is created for Nessie to use through the STS AssumeRole mechanism, but the trust policy actually allows an IAM user to assume the role, not Nessie itself. The trust policy should ideally allow the Nessie service (running on EC2 or ECS with an instance profile) to assume this role, or the description should be clarified to explain that Nessie uses the user's credentials to assume the role.

Copilot uses AI. Check for mistakes.

Create a `.env` file to store AWS credentials:

```bash
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).

Suggested change
```bash
```bash
# SECURITY NOTE:
# This file contains sensitive credentials. Do NOT commit `.env` to version control
# and restrict its permissions so that only your user can read it, for example:
# chmod 600 .env

Copilot uses AI. Check for mistakes.
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).

Suggested change
> Security note: The `.env` file contains sensitive credentials. Do **not** commit it to version control (e.g., add `.env` to `.gitignore`), and ensure it has appropriate file permissions so that only authorized users can read it.

Copilot uses AI. Check for mistakes.
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).

Suggested change
> **安全提示:** `.env` 文件中通常包含访问密钥等敏感凭证,请勿将该文件提交到版本控制系统(如 Git),并确保在生产环境中为该文件设置严格的访问权限,仅允许必要的服务账户访问。

Copilot uses AI. Check for mistakes.
's3.region' = 'us-east-1'
);
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.

Suggested change
> 注意:上述示例中的 `YOUR_ACCESS_KEY``YOUR_SECRET_KEY` 仅为占位符,用于演示配置格式。请在实际使用时替换为真实凭证,并在生产环境中优先采用前文所述的凭证分发/临时凭证等更安全的凭证管理方式,避免在配置中硬编码长期有效的访问密钥。

Copilot uses AI. Check for mistakes.
's3.region' = 'us-east-1'
);
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded credentials are shown directly in the SQL example. Consider adding a note that these placeholder values should be replaced with actual credentials and that in production, more secure credential management approaches (like the credential vending method shown earlier) should be preferred.

Suggested change
> Note: The values `YOUR_ACCESS_KEY` and `YOUR_SECRET_KEY` are placeholders. Replace them with your actual credentials for testing only. For production environments, avoid hardcoding credentials and prefer a more secure credential management approach (such as the credential vending method described earlier in this document).

Copilot uses AI. Check for mistakes.
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
```

Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).

Suggested change
> **Security note:** The `.env` file contains sensitive credentials. Do not commit it to version control (for example, add `.env` to `.gitignore`), and ensure its file permissions restrict access to only the necessary users or services.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +151
Create a `.env` file to store AWS credentials:

```bash
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env file shows credentials in plain text format without any security warning. Consider adding a security note reminding users that the .env file contains sensitive credentials and should be protected (e.g., not committed to version control, proper file permissions set).

Suggested change
Create a `.env` file to store AWS credentials:
```bash
Create a `.env` file to store AWS credentials. **This file contains sensitive information and must not be committed to version control; ensure it is stored securely with restrictive permissions (for example, `chmod 600 .env`).**
```bash
# WARNING: This file contains sensitive AWS credentials.
# Do NOT commit this file to version control and restrict its access, for example:
# chmod 600 .env

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant