Last Updated: October 6, 2025
Review Frequency: Quarterly
- Introduction
- Repository Overview
- Development Environment Setup
- Project Architecture
- Content Management
- Deployment Procedures
- Maintenance and Updates
- Monitoring and Analytics
- Security and Best Practices
- Troubleshooting
This Standard Operating Procedure (SOP) establishes the authoritative guidelines for managing and maintaining the OLake documentation repository. The repository serves as the central source for OLake's documentation, blogs, and technical content, built using Docusaurus v3.8.1.
This SOP covers:
- Documentation repository management
- Content creation and maintenance workflows
- Development and deployment procedures
- Security protocols and compliance requirements
- Monitoring and analytics
This document is intended for:
- Internal Team
- External Contributors
Readers should have:
- Basic understanding of Git version control
- Familiarity with Markdown/MDX
- Basic knowledge of React and Typescript
code- Commands, file names, directories- Bold - Important notes, warnings
- Italic - Emphasis, new terms
-
Blockquotes - Examples, quotes
-
Quality Assurance
- Maintain consistent documentation standards
- Ensure technical accuracy
- Implement comprehensive review processes
-
Efficiency
- Streamline content deployment
- Optimize development workflows
-
Collaboration
- Facilitate team coordination
- Enable effective communication
olake-docs/
├── blog/ # Blog posts and articles
│ ├── authors.yml # Author information
│ ├── tags.yml # Blog tag definitions
│ └── YYYY-MM-DD-*.mdx # Blog post files
│
├── docs/ # Main documentation
│ ├── getting-started/ # Onboarding guides
│ ├── core/ # Core concepts
│ ├── features/ # Feature documentation
│ ├── connectors/ # Connector guides
│ ├── tutorials/ # Step-by-step guides
│ ├── shared/ # Shared MDX components
│ └── writers/ # Documentation writer guides
│
├── iceberg/ # Iceberg-specific documentation
│ ├── authors.yml # Author information
│ └── tags.yml # tags
│
├── src/ # Source code
│ ├── components/ # React components
│ │ ├── site/ # Website current Home Page components (Hero, Features, etc.)
│ │ ├── common/ # Reusable UI components
│ │ └── layouts/ # Layout components
│ │
│ ├── pages/ # Static pages (React)
│ │ ├── index.tsx # Homepage
│ │ ├── about.tsx # About page
│ │ └── contact.tsx # Contact page
│ │
│ ├── theme/ # Docusaurus theme customizations
│ │ ├── MDXComponents.tsx # Custom MDX components
│ │ ├── Layout/ # Layout overrides
│ │ ├── Navbar/ # Navbar customizations
│ │ └── Footer/ # Footer customizations
│ │
│ └── utils/ # Utility functions
│ ├── helpers.ts # Helper functions
│ └── constants.ts # Constants and config
│
├── styles/ # Global styles
│ └── index.css # Global CSS and Tailwind imports
│
├── utils/ # Utility scripts
│ ├── helpers.js # Utility helper functions
│ └── config.tsx # Configuration utilities
│
├── static/ # Static assets
│ ├── img/ # Images
│ │ ├── blog/ # Blog images
│ │ └── docs/ # Documentation images
│ │ └── authors/ # authors images
│ │ └── community/ # Community images
│
├── kubernetes/ # Kubernetes configurations
│ ├── configmaps/ # Configuration maps
│ ├── secrets/ # Secret definitions
│ └── deployments/ # Deployment manifests
│
├── airflow/ # Airflow configurations
│ ├── dags/ # DAG definitions
│ └── plugins/ # Custom plugins
│
├── docusaurus.config.js # Main Docusaurus configuration
│ # - Site metadata and settings
│ # - Theme configuration (navbar, footer, etc.)
│ # - Plugin configuration
│ # - Presets configuration
│ # - Algolia search configuration
│ # - Analytics and tracking
│ # - Redirects and URL rewrites
│
├── sidebars.js # Documentation sidebar structure
├── sidebarsIcebergQE.js # Iceberg Query Engine sidebar
├── tailwind.config.js # Tailwind CSS configuration
├── tsconfig.json # TypeScript configuration
├── package.json # Dependencies and scripts
└── postcss.config.js # PostCSS configuration
- Purpose: Core product documentation
- File Format: MDX (Markdown + JSX)
- Organization:
- Hierarchical structure based on topics
- Each section has its own sidebar navigation
- Shared components for reusability
- Naming Convention:
lowercase-with-hyphens.mdx - Special Files:
intro.mdx: Landing pagesidebars.js: Navigation structure
- Purpose: Technical articles, updates, tutorials
- File Format: MDX
- Naming Convention:
YYYY-MM-DD-post-title.mdx - Required Frontmatter:
--- title: Post Title authors: [author_id] tags: [tag1, tag2] description: Brief description image: ./img/blog/YYYY/MM/cover.webp ---
- Image Organization:
- Cover images:
static/img/blog/YYYY/MM/post-title-cover.webp - Content images:
static/img/blog/YYYY/MM/post-title-1.webp
- Cover images:
- Purpose: Custom React components and functionality
- Key Directories:
components/: Reusable React componentstheme/: Docusaurus theme customizationspages/: Custom static pagesutils/: Helper functions and utilities
- Purpose: Images, fonts, downloads
- Organization:
- Images organized by content type and date
- Fonts in dedicated directory
- Naming Convention:
lowercase-with-hyphens.extension - Supported Formats:
- Images:
.png,.jpg,.svg,.webp(prefer webp for more performance) - Fonts:
.woff2,.woff - Documents:
.pdf,.zip
- Images:
- Root Level:
docusaurus.config.js: Main configurationsidebars.js: Documentation navigationpackage.json: Dependencies and scriptstsconfig.json: TypeScript configurationtailwind.config.js: CSS framework config
main: Production-ready codedevelop: Development branch- Feature branches:
feature/feature-name - Doc branches:
doc/truncated-doc-name - Blog branches:
blog/truncated-blog-name - Hotfix branches:
hotfix/issue-description
master: Requires PR and approvals
- Docusaurus v3.8.1
- React v18.3.1
- TailwindCSS v3.4.17
- TypeScript v5.0.0
- ESLint
- Prettier
- Output directory:
/build - Static assets:
/build/static - Generated HTML:
/build/docs,/build/blog - Asset manifests:
/build/asset-manifest.json
- Hot reloading enabled
- Source maps included
- Development server on port 3000
- Version: 18.x LTS or later
- Version: 2.30.0 or later
- Recommended: Visual Studio Code
- Required Extensions:
- ESLint
- Prettier
- MDX
- Tailwind CSS IntelliSense
- GitLens
# Clone the repository
git clone https://github.com/datazip-inc/olake-docs.git
# Navigate to project directory
cd olake-docs
# Install dependencies
npm install# Start development server
npm start# Build for production
npm run build
# Serve production build
npm run serve-
Node Version Mismatch
# Check Node version node -v # Switch to correct version nvm use
-
Dependencies Issues
# Clear npm cache npm cache clean --force # Remove node_modules rm -rf node_modules # Reinstall dependencies npm install
-
Port Conflicts
# Find process using port 3000 lsof -i :3000 # Kill process kill -9 <PID>
-
TypeScript Errors
- Check
tsconfig.jsonsettings - Run
npm run typecheckfor detailed errors - Update type definitions:
npm install @types/* --save-dev
- Check
-
MDX Compilation Errors
- Verify MDX syntax
- Check component imports
- Create feature branch
- Make changes
- Commit with conventional commits
- Push and create PR
graph TD
A[Content Sources] --> B[Build System]
B --> C[Static Site]
B --> D[Search Index]
E[Configuration] --> B
F[Assets] --> B
subgraph "Content Sources"
A1[MDX Files]
A2[React Components]
A3[Documentation]
A4[Blog Posts]
end
subgraph "Build System"
B1[Docusaurus]
B2[Webpack]
B3[TypeScript]
B4[PostCSS]
end
subgraph "Static Site"
C1[HTML]
C2[CSS]
C3[JavaScript]
C4[Assets]
end
graph LR
A[Pages] --> B[Layouts]
B --> C[Components]
C --> D[Base Components]
subgraph "Component Layers"
D[Base Components]
C[Components]
B[Layouts]
A[Pages]
end
- Docusaurus v3.8.1
- Documentation framework
- MDX support
- Plugin system
- Search integration
-
React 18.3.1
- Component-based UI
- Server Components
- Suspense
- Error Boundaries
-
TailwindCSS 3.4.17
- Utility-first CSS
- Responsive design
- Dark mode support
-
Webpack 5
- Module bundling
- Code splitting
- Asset optimization
- Development server
-
TypeScript 5.0
- Static typing
- Type checking
- Code intelligence
- Modern JavaScript features
-
ESLint
- Code linting
- Custom rules
-
Prettier
- Code formatting
- Style consistency
- IDE integration
// Base component structure
interface BaseComponentProps {
className?: string;
children?: React.ReactNode;
// Common props
}
// Example component hierarchy
const Page = ({ children, ...props }) => {
return (
<Layout {...props}>
<Main>{children}</Main>
</Layout>
);
};-
Layout Components
Layout.tsx: Base layout wrapperHeader.tsx: Site headerFooter.tsx: Site footerSidebar.tsx: Documentation sidebarNavigation.tsx: Main navigation
-
Content Components
MDXComponents.tsx: MDX renderersCodeBlock.tsx: Code highlightingTableOfContents.tsx: Content navigationPagination.tsx: Page navigation
-
Interactive Components
Search.tsx: Search interfaceThemeToggle.tsx: Theme switcherCopyButton.tsx: Copy functionalityTabs.tsx: Content tabs
-
Utility Components
SEO.tsx: Meta tags managementAnalytics.tsx: Usage trackingErrorBoundary.tsx: Error handlingLoading.tsx: Loading states
-
HomePage
- olake-docs/src/components/site
graph TD
A[MDX Content] --> B[Docusaurus]
B --> C[Static HTML]
D[Config] --> B
E[Assets] --> B
F[Components] --> B
// docusaurus.config.js
module.exports = {
title: 'OLake Documentation',
tagline: ' Data Lake ',
url: 'https://olake.io/docs',
baseUrl: '/',
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/favicon.ico',
organizationName: 'datazip-inc',
projectName: 'olake-docs',
// Theme configuration
themeConfig: {
// Navigation
navbar: {},
// Search
algolia: {},
// Footer
footer: {}
},
// Presets
presets: [
[
'@docusaurus/preset-classic',
{
docs: {},
blog: {},
theme: {},
pages: {}
}
]
],
// Plugins
plugins: []
}// tsconfig.json
{
"compilerOptions": {
"target": "es2020",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"forceConsistentCasingInFileNames": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "node",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"incremental": true,
"baseUrl": ".",
"paths": {
"@/*": ["src/*"]
}
},
"include": ["**/*.ts", "**/*.tsx"],
"exclude": ["node_modules"]
}@docusaurus/plugin-content-docs@docusaurus/plugin-content-blog@docusaurus/plugin-content-pages@docusaurus/plugin-sitemap@docusaurus/plugin-google-analytics
// Example custom plugin
module.exports = function (context, options) {
return {
name: 'custom-plugin',
// Lifecycle hooks
async loadContent() {},
async contentLoaded({ content, actions }) {},
// Configuration
getThemePath() {},
getTypeScriptThemePath() {},
// Client modules
getClientModules() {}
}
}- Code splitting
- Tree shaking
- Asset optimization
- Caching strategies
- Compression
- Lazy loading
- Image optimization ( prefer webp with less than 100kb)
- Font loading
- CSS optimization
- JavaScript optimization
- Lighthouse scores
- Web Vitals
- Bundle analysis
- Performance metrics
The project integrates with several third-party services and plugins to enhance functionality. These are primarily configured in docusaurus.config.js.
- Purpose: Provides the search functionality for the documentation and blog.
- Configuration: Algolia is configured in the
themeConfig.algoliasection ofdocusaurus.config.js. This includes the App ID, API Key, and Index Name. - Updates: Algolia DocSearch crawler visits website once every 24 hours to index the content, It uses the sitemap to discover the pages (https://olake.io/sitemap.xml) , when we push changes to master branch , github actions deploys those changes and then sitemap will be updated as algolia crawler will reindex the changes within 24 hours , search results will also be updated .
- Purpose: Google Analytics is used for tracking website traffic, user engagement, and content performance.
- Configuration: While not directly configured with a tracking ID in
docusaurus.config.js, the necessary DNS prefetch links for Google Analytics services are included in theheadTags.
- Purpose: The
@docusaurus/theme-mermaidplugin is used to render Mermaid diagrams within Markdown files, allowing for the creation of flowcharts, sequence diagrams, and other visualizations from code-like text.
- Purpose: The
plugin-image-zoomandideal-imageplugins are used to provide better user experience with images. They enable responsive image sizes and click-to-zoom functionality.
- Purpose: Used for lead generation and user registration on the website. The main registration form for the pilot program is powered by HubSpot.
- Configuration:
- Portal ID:
21798546 - Form ID:
86391f69-48e0-4b35-8ffd-13ac212d8208 - Location: Embedded in the
RegistrationSectioncomponent on the homepage
- Portal ID:
- How it works:
- The HubSpot form script loads lazily when the registration section comes into view or when users navigate directly to
#olake-form-productanchor - This improves page load performance by not loading the form script until needed
- Forms automatically submit lead data to HubSpot for tracking and follow-up
- The HubSpot form script loads lazily when the registration section comes into view or when users navigate directly to
- Setup in code: Located in
src/components/site/RegistrationSection.jsx
- Purpose: Join the waitlist form used at /ai-lake (e.g., AILake waitlist)
- Configuration: Form is embedded using Tally's embed script
- Location: Used in the
Glacecomponent (src/components/site/Glace.tsx) - Advantages: Quick to set up
- Purpose: Automatically notifies search engines (Bing, Yandex, etc.) when content is updated, speeding up indexing
- Configuration: Custom plugin located at
src/plugins/indexnow/index.js - How it works:
- After each build, the plugin reads the generated sitemap
- Extracts all URLs from the sitemap
- Submits them to IndexNow API endpoints for Bing and Yandex
- Helps get new content indexed faster than waiting for search engines to crawl naturally
- API Key: Stored in
INDEXNOW_KEYenvironment variable (also hardcoded as fallback)
-
Documentation
- Product documentation
- Getting started guides
-
Blog Posts
- Product updates
- Technical deep dives
- Case studies
- Community highlights
-
Component Library
- UI components
- Documentation components
- Interactive examples
- Code snippets
docs/
├── getting-started/ # Onboarding documentation
│ ├── introduction.mdx
│ ├── quick-start.mdx
│ └── installation.mdx
├── core/ # Core concepts
│ ├── architecture.mdx
│ └── concepts.mdx
├── features/ # Feature documentation
│ ├── feature-a.mdx
│ └── feature-b.mdx
├── tutorials/ # Step-by-step guides
│ ├── tutorial-1.mdx
│ └── tutorial-2.mdx
└── api/ # API documentation
├── overview.mdx
└── endpoints.mdx
-
File Naming
- Use kebab-case:
feature-name.mdx - Group related files in directories
- Use descriptive names
- Use kebab-case:
-
Frontmatter
--- id: unique-id title: Page Title sidebar_label: Sidebar Label description: Page description for SEO keywords: [keyword1, keyword2] image: /img/docs/feature-image.png ---
-
Content Structure
# Title ## Overview Brief introduction ## Prerequisites Required knowledge/setup ## Steps 1. Step one 2. Step two ## Examples Code examples ## Troubleshooting Common issues and solutions
blog/
├── 2025/
│ └── 10/
│ ├── post-title.mdx
│ └── images/
│ ├── cover.png
│ └── diagram-1.png
├── authors.yml
└── tags.yml
-
File Naming
YYYY-MM-DD-post-slug.mdx -
Frontmatter
--- title: Post Title authors: [author_id] tags: [tag1, tag2] description: Brief description image: ./img/blog/2025/10/cover.png date: 2025-10-06 draft: false ---
-
Author Configuration
# authors.yml author_id: name: Author Name title: Job Title url: https://github.com/author image_url: https://github.com/author.png
-
Content Template
# Introduction Brief overview ## Problem Statement What problem are we solving? ## Solution How we solved it ## Implementation Technical details ## Results Outcomes and benefits ## Conclusion Summary and next steps
src/
├── components/
│ ├── common/
│ │ ├── Button/
│ │ │ ├── index.tsx
│ │ │ ├── styles.module.css
│ │ │ └── types.ts
│ │ └── Card/
│ ├── docs/
│ │ └── CodeBlock/
│ └── blog/
│ └── PostCard/
-
File Structure
// index.tsx import { ComponentProps } from './types' import styles from './styles.module.css' export const Component = ({ prop1, prop2 }: ComponentProps) => { // Implementation }
-
Type Definitions
// types.ts export interface ComponentProps { prop1: string prop2?: number children?: React.ReactNode }
-
Styling
/* styles.module.css */ .container { /* styles */ }
-
Planning
- Identify documentation needs
- Create outline
- Review with stakeholders
-
Writing
- Follow style guide
- Include examples
- Add images/diagrams
-
Review
- Technical review
- Editorial review
- Stakeholder review
-
Publishing
- Update sidebar
- Check links
- Deploy changes
- Technical accuracy
- Grammar and spelling
- Links working
- Images optimized
- Mobile responsive
- SEO optimized
-
Self Review
- Run spellcheck
- Verify code samples
- Check formatting
-
Peer Review
- Technical accuracy
- Code review
- Content flow
-
Editorial Review
- Grammar and style
- Clarity
- Consistency
# Build the site
npm run build
# Serve locally to verify
npm run serve# Deploy to GitHub Pages
npm run deploy- Manual Verification
- Check homepage loads
- Verify search functionality
- Test navigation
- Check external links
- Verify assets loading
- Asset compression
- Code splitting
- Tree shaking
- Cache optimization
- CDN configuration
- Browser caching
- Service worker
- Image optimization
After each deployment, the following happens automatically:
-
Sitemap Generation:
- New sitemap is generated at
https://olake.io/sitemap.xml - Contains all documentation, blog posts, and pages
- Helps search engines discover new and updated content
- New sitemap is generated at
-
IndexNow Notification:
- IndexNow plugin automatically notifies Bing and Yandex
- Sends list of all URLs from sitemap
- Speeds up indexing of new content
-
Algolia Search Indexing:
- Algolia crawler will reindex the site within 24 hours
- Can manually trigger reindex from Algolia Crawler dashboard if needed
- Search results will reflect new content after reindex completes
-
Manual verification checklist:
- Check that new pages are accessible
- Verify images load correctly
- Test search functionality with new content (after 24 hours)
- Check mobile responsiveness
- Verify forms still work (HubSpot, Tally)
- Page Load Time: Should be under 3 seconds
- Time to Interactive: Should be under 5 seconds
- First Contentful Paint: Should be under 1.8 seconds
- Largest Contentful Paint: Should be under 2.5 seconds
- Cumulative Layout Shift: Should be under 0.1
-
Google Lighthouse:
- Run regular audits on key pages
- Track performance, accessibility, SEO scores
- Target: All scores above 90
-
Page Speeddev
- https://pagespeed.web.dev/ -Run analysis on core web vitals
- Preferred format: WebP / svg
- Maximum size: 100KB or lower per image
- Compression: Always optimize before uploading
- Alt text: Always include for accessibility
- Follow ESLint rules
- Use TypeScript for type safety
- Keep components small and focused
- Reuse components when possible
- Include meta descriptions on all pages
- Use proper heading hierarchy (H1 → H2 → H3)
- Add alt text to all images
- Use descriptive URLs
- Add structured data where appropriate
- Keep content fresh and updated
-
Build failures
- Check Node.js version
- Verify dependencies
- Review build logs
-
Content rendering issues
- Validate MDX syntax
- Check component imports
- Verify image paths
- Official Docusaurus documentation
- GitHub repository issues
- Internal documentation
- Community support channels
- Last Updated: October 6, 2025
- Version: 1.0
- Maintainers: OLake Documentation Team