Skip to content

Conversation

@jdarais
Copy link
Contributor

@jdarais jdarais commented Dec 29, 2025

This PR adds some missing functionality for the Duration type:

  • Schema::Duration is updated to contain a FixedSchema struct, which contains name, namespace, size, doc, etc. fields that can exist on Duration types. (Name and Namespace fields for Schema::Duration #378)
  • apache_avro's Duration type is updated to implement Serialize and Deserialize to support inclusion in structs de/serialized from/to avro "the serde way"
  • Small fix to the Deserializer implementation in serde/de.rs to allow Value::Duration to be deserialized. (Duration is deserialized as bytes.)

This PR also deprecates SchemaKind::is_named in favor of a new Schema::is_named function, which can correctly determine whether a schema is for a named type, even when it is for a logical type that could have either a named or non-named underlying type, (such as "decimal" or "uuid".)

One note on this change: I noticed that the derive macro for AvroSchema was using the Duration schema for core::time::Duration. It is no longer trivial to create a schema for a Duration type, since the schema would need to be given a name. It looks like other named types are excluded from these derive macros, so I just deleted the impl_schema!(core::time::Duration, Schema::Duration) line, but I'm not sure if there's something better that can be done here.

Copy link
Contributor

@Kriskras99 Kriskras99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First glance looks good, I'll have a more in depth look tomorrow

Schema::LocalTimestampMicros => zag_i64(reader).map(Value::LocalTimestampMicros),
Schema::LocalTimestampNanos => zag_i64(reader).map(Value::LocalTimestampNanos),
Schema::Duration => {
Schema::Duration(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check if the FixedSchema is of size 12 (and error otherwise)

}
}

struct DurationVisitor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this struct and the implementation into the deserialize function, there is no value in it existing outside of Deserialize

use apache_avro_test_helper::TestResult;

#[test]
fn test_duration_from_value() -> TestResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all new tests to start with avro_rs_382, so we can easily get back to the PR

test_primitives!(LocalTimestampNanos);

#[test]
fn test_avro_3939_compare_schemata_duration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be avro_rs_382

@jdarais
Copy link
Contributor Author

jdarais commented Dec 30, 2025

@Kriskras99 thanks for the feedback, I made the suggested changes

Value::BigDecimal(ref big_decimal) => {
visitor.visit_str(big_decimal.to_plain_string().as_str())
}
_ => Err(de::Error::custom(format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no more needed because the check is exhaustive, right ?

LocalTimestampNanos,
/// An amount of time defined by a number of months, days and milliseconds.
Duration,
Duration(FixedSchema),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this.
To prevent API breaks in the future it would be better to use Duration(DurationSchema) where:

enum DurationSchema {
  Fixed(FixedSchema)
}

This way if some day Duration is represented by another (e.g. more compact) way it will be easier to add a second variant to DurationSchema.
Similar to UuidSchema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that switching from Duration(FixedSchema) to Duration(DurationSchema) down the road may not be a whole lot more disruptive than starting with Duration(DurationSchema) and adding a new enum value down the road. In the latter case, library consumers would still need to update match statements to accommodate the new enum variant, and nothing would have prevented consumers from writing code that assumes that duration is a fixed type.

Admittedly, having to migrate consumer code from using a FixedSchema to using an enum would be slightly more disruptive, but I think it would be worth it to not have to deal with the extra enum layer for now. Let me know what you think.

inner: InnerDecimalSchema::Fixed(FixedSchema { attributes, .. }),
..
})
| Schema::Uuid(UuidSchema::Fixed(FixedSchema { attributes, .. })) => Some(attributes),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an arm for Schema::Duration too.
Maybe also a new test for schema equality. Currently two duration schemas with different attributes would match or not depending on include_attributes

inner: InnerDecimalSchema::Fixed(FixedSchema { aliases, .. }),
..
})
| Schema::Uuid(UuidSchema::Fixed(FixedSchema { aliases, .. })) => aliases.as_ref(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add support for Schema::Duration here too

inner: InnerDecimalSchema::Fixed(FixedSchema { doc, .. }),
..
})
| Schema::Uuid(UuidSchema::Fixed(FixedSchema { doc, .. })) => doc.as_ref(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

duration @ Value::Duration { .. } => duration,
Value::Fixed(size, bytes) => {
if size != 12 {
return Err(Details::GetDecimalFixedBytes(size).into());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old issue: GetDecimalFixedBytes is not the correct error type here.

Suggested change
return Err(Details::ResolveDuration(Value::Fixed(size, bytes.clone())).into());

let specification_eq_res = SPECIFICATION_EQ.compare(&schema_one, &schema_two);
let struct_field_eq_res = STRUCT_FIELD_EQ.compare(&schema_one, &schema_two);
assert_eq!(specification_eq_res, struct_field_eq_res)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add two more (negative) tests:

  1. schemas with different names
  2. schemas with different sizes

@@ -1,3 +1,5 @@
use serde::{Deserialize, Serialize, de};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this import below the ASF licence header

Schema::Enum(EnumSchema { name, .. })
| Schema::Fixed(FixedSchema { name, .. })
| Schema::Uuid(UuidSchema::Fixed(FixedSchema { name, .. }))
| Schema::Decimal(DecimalSchema {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Schema::Duration support here

@jdarais
Copy link
Contributor Author

jdarais commented Jan 4, 2026

@martin-g thanks for all the feedback! I should have taken a closer look at the Schema enum and schema comparison functions. I made fixes and added some tests to address all but one of your comments, and on the one remaining one, I responded and would be interested in your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants