Add example notebook for downloading S3 logs#69
Conversation
This notebook demonstrates how to download and analyze log files from the staging-eks-otel-logging S3 bucket. It includes: - Listing all JSON files in the bucket - Downloading and parsing OpenTelemetry log records - Extracting individual logRecords from nested structures - Extracting body and attributes from log records - Creating a pandas DataFrame for analysis The notebook handles the partitioned S3 structure (year/month/day/hour/minute) and extracts partition information as separate columns for easier filtering. Dependencies (boto3, pandas, ipykernel) have been added to dev dependencies.
PR Review: Add example notebook for downloading S3 logsSummaryThis PR adds a Jupyter notebook for downloading and analyzing S3 logs. While functional, there are several improvements needed before merging. Issues Found🔴 Critical: Repository Scope MismatchThis notebook analyzes logs from Action Required: Please clarify why this belongs in the 🔴 Hardcoded Staging Resources
🟡 Code Quality IssuesInefficient pandas operations (lines 192-193): df['log_body'] = df.apply(lambda row: extract_body(row.to_dict()), axis=1)
attributes_list = df.apply(lambda row: extract_attributes(row.to_dict()), axis=1)
Unused imports (line 42): from pathlib import Path # Not used
import sys # Not used
from collections.abc import Iterator # Not usedLine length violations: 🟡 Documentation & UsabilityMissing information:
Security consideration:
Suggestions
Verdict❌ Request Changes - Primary concern is repository scope alignment. Please clarify the purpose and relevance to this project. Review completed in <2 min |
- Remove unused imports - Fix f-string formatting - Add noqa comments for complexity - Fix line length issues
This PR adds an example IPython notebook demonstrating how to download and analyze log files from the
staging-eks-otel-loggingS3 bucket. Just an example - I'm not sure we even want to check it into this repo. I wanted to try it. I think the logs path might be too granular. If we have lots of these it will take some time to download all of them. We can condense them to parquet files.What's included
Notebook:
scripts/download_s3_logs.ipynb- A complete example showing:Dependencies: Added
boto3,pandas, andipykernelto dev dependenciesFeatures
The notebook handles the partitioned S3 structure (
logs/year=2025/month=12/day=04/hour=15/minute=21/) and automatically extracts partition information as separate columns for easier filtering and analysis.Usage
uv sync --group devThis is intended as an example/reference for anyone who needs to download and analyze logs from the S3 bucket.