Skip to content

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Dec 28, 2025

This is a big PR which revisits the long sought-after Ant + Ivy replacement with Gradle. I integrated a lot of the work we did a number of years back and also demystified the Gradle implementation for plugins which stopped this task in its track previously.
I've tried to maintain parity between the Ant target names and Gradle task names so the build feels the same. I updated the README with some new guidance as well. I've updated the GitHub Action so hopefully this can be tested more thoroughly.
I do want to thank the previous contributors to this task as well. They did a fantastic job and really did the bulk of the work.
Thanks for any review.

Gradle Build Benchmark Results

Date: December 27, 2025
System: macOS (darwin 25.2.0)
Gradle Version: 8.5
Java Version: 11+

Build Times

Benchmark Command Time Notes
Cold build ./gradlew clean runtime --no-daemon 13.1s First build, no caches, no daemon
Warm build ./gradlew clean runtime 10.6s Daemon running, caches populated
Incremental build ./gradlew runtime (1 file changed) 1.1s Only recompiles affected code
No-op build ./gradlew runtime (nothing changed) 0.9s Just checks up-to-date status

Artifact Sizes

Artifact Size Description
apache-nutch-1.22-SNAPSHOT.jar 889 KB Core Nutch classes
apache-nutch-1.22-SNAPSHOT.job 305 MB Hadoop job JAR with all dependencies
runtime/local/ 420 MB Full local runtime directory

Task Execution Summary

  • 318 total tasks in the build graph
  • 241 tasks executed on clean build
  • 76 tasks from cache (Gradle build cache)
  • Parallel execution enabled (org.gradle.parallel=true)

Gradle Features Utilized

Feature Status Benefit
Incremental compilation ✅ Enabled Only recompiles changed files
Build cache ✅ Enabled Reuses outputs from previous builds
Parallel execution ✅ Enabled Builds independent tasks concurrently
Gradle Daemon ✅ Enabled Keeps JVM warm between builds
Up-to-date checking ✅ Smart Skips tasks when inputs unchanged

Comparison with Ant Build

Build Time Comparison

Benchmark Ant Gradle Improvement
Cold build (clean runtime) 20.6s 13.1s 36% faster
Incremental build (1 file changed) 10.0s 1.1s 89% faster
No-op build (nothing changed) 3.8s 0.9s 76% faster

Artifact Size Comparison

Artifact Ant Gradle Difference
Core JAR 842 KB 889 KB +6%
Job JAR 292 MB 305 MB +4%
runtime/local/ 355 MB 420 MB +18%

Analysis

Build Performance:

  • Gradle's incremental compilation provides the biggest win — rebuilding after a single file change is 9x faster than Ant
  • Cold builds are 36% faster due to parallel task execution and optimized dependency resolution
  • No-op builds benefit from Gradle's smart up-to-date checking (0.9s vs 3.8s)

Artifact Sizes:

  • Gradle produces slightly larger artifacts due to different dependency resolution
  • The Job JAR is 4% larger but uses the more efficient nested JAR format (vs unpacked classes in Ant)
  • Runtime directory is larger due to additional transitive dependencies being included

Developer Experience:

  • Gradle Daemon keeps JVM warm between builds, reducing startup overhead
  • Build cache allows reusing outputs across clean builds
  • Parallel execution utilizes multiple CPU cores effectively

How to Reproduce

# Stop any running daemon
./gradlew --stop

# Cold build (no daemon)
time ./gradlew clean runtime --no-daemon

# Warm build (with daemon)
time ./gradlew clean runtime

# Incremental build (touch a file, rebuild)
touch src/java/org/apache/nutch/crawl/CrawlDb.java
time ./gradlew runtime

# No-op build
time ./gradlew runtime

# Check artifact sizes
ls -lh build/*.jar build/*.job
du -sh runtime/local/

@lewismc lewismc self-assigned this Dec 28, 2025
@lewismc lewismc marked this pull request as draft December 28, 2025 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant