Skip to content

Stability Fixes, Performance Optimizations, and v1.0.5 Release#8

Open
intelliDean wants to merge 133 commits intozecdev:mainfrom
intelliDean:feature/stabilization-and-perf-v2
Open

Stability Fixes, Performance Optimizations, and v1.0.5 Release#8
intelliDean wants to merge 133 commits intozecdev:mainfrom
intelliDean:feature/stabilization-and-perf-v2

Conversation

@intelliDean
Copy link
Copy Markdown

@intelliDean intelliDean commented Apr 9, 2026

Summary

This PR fulfils the requirements for Milestone 3 by converting ZecKit into a fully reusable GitHub Action. It enables any external repository to effortlessly spin up a Zcash devnet and execute the E2E Golden Flow (Fund → Shield → Send → Verify) against both zaino and lightwalletd backends.

Key Improvements:

1. Stability Fixes (Zebra 4.3.0 Support)

  • Startup Crash Fix: Resolved a critical issue where Zebra would crash on startup due to the unsupported initial_regtest_peers configuration field.
  • Embedded Asset Synchronisation: Updated the CLI to include fixed configuration assets, ensuring a seamless zeckit up experience.
  • Enhanced Health Checks: Refined node health detection to properly wait for RPC readiness.

2. Performance Optimisations (GitHub ARM64 Runners)

  • Native ARM64 Builds: Refactored the build-images workflow to use native GitHub-hosted ARM64 runners (ubuntu-24.04-arm) instead of slow QEMU emulation.
  • Split-Matrix Strategy: Implemented an AMD64/ARM64 parallel build matrix with a manifest-merging job, reducing image publication times by ~85% (from hours to ~25 minutes).
  • Caching: Optimised CI caching to minimise redundant build steps.

3. Versioning and Workspace Cleanup

  • Version v1.0.5: Incremented the workspace version to reflect these major stability and performance milestones.
  • Synchronized Branches: Ensured internal branch parity for main, m3-implementation, and nu6-upgrade.

This PR establishes a robust, high-performance baseline for all future ZecKit development.

Implements Milestone 3 deliverables on top of the existing M1/M2 foundation:

## GitHub Actions CI
- e2e-test.yml: Full devnet startup, smoke tests, artifact upload on failure
- smoke-test.yml: Lightweight health checks on every push
- Job timeout set to 120 minutes to accommodate Docker build time

## Two-Node Zebra Regtest Cluster
- zebra-miner: internal miner, mines blocks continuously
- zebra-sync: second node for cluster readiness verification
- Fixed zebra-sync.toml config fields for Zebra 4.1.0 compatibility
  (initial_testnet_peers, crawl_new_peer_interval)
- All indexer/faucet services point to zebra-miner for reliable data

## CLI Enhancements (zeckit up / test)
- health.rs: detailed RPC error messages surfaced during wait loops
- up.rs: periodic error reporting during Zebra startup
- test.rs: 7-test smoke suite; cluster sync is warn-only (Regtest P2P
  peering is best-effort in isolated CI environments)

## Docker / Entrypoint
- entrypoint.sh: verbose startup logging, config validation, zebrad --version
- Sync node waits for miner to be reachable before starting zebrad
- Removed container_name fields to prevent naming conflicts in CI
- Relaxed port bindings to 0.0.0.0 for CI compatibility

## E2E Golden Flow (verified in CI)
  fund (650 ZEC coinbase) -> shield (transparent->Orchard) -> shielded send (0.05 ZEC)
  TXID confirmed on-chain, faucet live with 650+ ZEC Orchard balance

## README
- CI badges (E2E Tests, Smoke Test, License)
- M3 complete status with deliverable list
- Updated test suite table (7 tests with WARN explanation)
- M4 roadmap entry
Previously, the --timeout flag was applied independently to each
service (Miner, Sync, Backend, Faucet) in sequence. In the worst
case, a 1-minute timeout could compound into 4+ minutes of actual
wait time, making the CI startup_timeout_minutes parameter
unreliable.

This refactor introduces a single shared deadline computed once at
the start of Step 3:

  let deadline = Instant::now() + Duration::from_secs(timeout * 60);

Every wait-loop now checks Instant::now() >= deadline against this
shared clock. When the global budget expires anywhere in the
startup sequence, the process immediately fails with a clear error
message indicating the global timeout was exceeded.

The sync-parity soft-fail retains its 30-second local budget but
now also respects the global deadline, so it can no longer silently
consume time that allows startup to escape the requested timeout.

Progress percentages for Backend and Faucet now reflect the
remaining global budget rather than a fresh per-stage clock.

Fixes: Failure Drill 'startup-timeout' drill incorrectly reporting
'pass' because the devnet became healthy after the injected 1-minute
window due to compound per-service timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant