Skip to content

Upgrade Hive client to 2.3.9 and HIVE platform JDK toolchain to 17#177

Open
YogeshKothari26 wants to merge 1 commit into
linkedin:masterfrom
YogeshKothari26:yokothar/transport-hive-2.3.9-jdk17
Open

Upgrade Hive client to 2.3.9 and HIVE platform JDK toolchain to 17#177
YogeshKothari26 wants to merge 1 commit into
linkedin:masterfrom
YogeshKothari26:yokothar/transport-hive-2.3.9-jdk17

Conversation

@YogeshKothari26
Copy link
Copy Markdown

@YogeshKothari26 YogeshKothari26 commented May 11, 2026

Summary

Bumps the plugin's pinned Hive client 1.2.2 → 2.3.9 and the HIVE platform's JDK toolchain 8 → 17 in Defaults.java. Bytecode for the HIVE platform stays at Java 8 (options.release.set(8)) so the produced consumer UDF jars remain runnable on Java 8 runtimes.

Spark (spark_2.11, spark_2.12) and Trino subprojects are not changed.

Motivation

  • Hive 1.2.2 transitively pulls org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde, which is not resolvable on Maven Central; downstream builds that don't add an explicit exclusion fail to resolve. Hive 2.3.9 is excludable cleanly.
  • Hive 1.2.2's embedded HiveServer2 + DataNucleus + Derby reflection paths fail under JDK 17. Hive 2.3.9 is JDK-17-friendly with the standard --add-opens flags applied here.
  • Net result: downstream UDF projects can move their build JVM to Java 17 without per-project workaround patches for Hive itself.

Changes

  • Defaults.java: HIVE platform JavaLanguageVersion.of(8) → of(17) + add org.pentaho exclusion to the consumer's hive-exec compileOnly configuration.
  • TransportPlugin.java: pin bytecode release = 8 for non-Trino Java platforms; add --add-opens JVM args to the test launcher when the platform's JLV is >= 17 and the platform is non-Trino.
  • transportable-udfs-plugin/build.gradle: hive-version 1.2.2 → 2.3.9 in the generated version-info.properties.
  • transportable-udfs-hive/build.gradle: hive-exec 1.2.2 → 2.3.9 (compileOnly + testImplementation), add org.pentaho exclusion, add --add-opens to the subproject's own test task when the build JVM is JDK 17.
  • transportable-udfs-test-hive/build.gradle: hive-exec / hive-service 1.2.2 → 2.3.9, add org.pentaho exclusion.
  • HiveTester.java: two Hive 2.x compat fixes — replace the removed FunctionInfo(boolean, String, GenericUDF) ctor with FunctionInfo(FunctionType.PERSISTENT, ...); disable METASTORE_SCHEMA_VERIFICATION and enable datanucleus.schema.autoCreateAll on the embedded HiveConf (Hive 2.3.x's embedded Derby strictly verifies schema version on startup).

Testing

1. End-to-end Spark integration

A Spark cluster job registers Hive UDFs built by this plugin snapshot and runs SQL queries that exercise:

  • Scalar in / out — StdString upper-case with null and empty-string edge cases.
  • Multi-arg primitives — 2-arg StdLong addition with null-argument propagation.
  • Complex output — StdMap<StdString,StdString> construction with key-lookup, size, and null-key cases.

All cases registered via CREATE TEMPORARY FUNCTION + SELECT against a SparkSession. Exercises driver↔executor serialization, Spark's Hive UDF compatibility shim, and metastore-backed function registration end-to-end. All test cases pass.

2. JAR diff vs master

SHA-256 + size + class-set + bytecode-major-version on the 7 shipped plugin artifacts (snapshot vs baseline 0.2.0):

  • 5 of 7 plugin jars are CONTENT_IDENTICAL (every class file byte-identical; zip metadata drift only). The 2 jars with deliberate content change are transportable-udfs-plugin.jar (the toolchain wiring) and transportable-udfs-test-hive.jar (the Hive 2.x call-site fixes) — i.e. only the files this PR touches.
  • version-info.properties flips hive-version=1.2.2 → 2.3.9 as intended.
  • Hive UDF wrapper class file confirmed major version: 52 via javap -v (Java 8 bytecode target — preserves grid runtime compatibility). Trino wrapper unchanged at major version: 61.

3. Plugin example UDFs (build + unit)

./gradlew build test on the PR branch — 51 / 51 example tests pass (17 hiveTest on JDK 17 launcher, 17 trinoTest, 17 generic) across transportable-udfs-examples covering every StdUDF arity + Std type (primitives, StdArray, StdMap, StdStruct, nested). Build is green on both JDK 8 and JDK 17 build JVMs.

Note for consumers bumping their build JVM to JDK 17

The Spark platforms (spark_2.11, spark_2.12) intentionally stay at JavaLanguageVersion.of(8) because Spark 3.1.1 hits SPARK-33772 on JDK 17. If you bump your build JVM to JDK 17 and use either Spark platform, keep a JDK 8 toolchain reachable from Gradle (e.g., add its path to org.gradle.java.installations.paths in your gradle.properties). The HIVE platform itself needs no consumer-side workaround. This pin lifts once the plugin's pinned Spark version is bumped to 3.5.x.

Bumps the plugin's pinned Hive client 1.2.2 -> 2.3.9 and the HIVE
platform's JDK toolchain 8 -> 17 in Defaults.java. Bytecode for the
HIVE platform stays at Java 8 (options.release.set(8)) so produced
consumer UDF jars remain runnable on Java 8 runtimes.

Spark (spark_2.11, spark_2.12) and Trino subprojects are not changed.

Motivation: downstream UDF projects can move their build JVM to
Java 17 without per-project workarounds for Hive 1.2.2's unresolvable
org.pentaho transitive dep or its JDK 17 reflection access issues.

See PR description for the full change list and testing summary.
@YogeshKothari26 YogeshKothari26 force-pushed the yokothar/transport-hive-2.3.9-jdk17 branch from b904862 to 0c71148 Compare May 25, 2026 09:12
@YogeshKothari26 YogeshKothari26 marked this pull request as ready for review May 25, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant