-
Notifications
You must be signed in to change notification settings - Fork 44
Duration inner fixed schema and serialization #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Kriskras99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First glance looks good, I'll have a more in depth look tomorrow
avro/src/decode.rs
Outdated
| Schema::LocalTimestampMicros => zag_i64(reader).map(Value::LocalTimestampMicros), | ||
| Schema::LocalTimestampNanos => zag_i64(reader).map(Value::LocalTimestampNanos), | ||
| Schema::Duration => { | ||
| Schema::Duration(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to check if the FixedSchema is of size 12 (and error otherwise)
avro/src/duration.rs
Outdated
| } | ||
| } | ||
|
|
||
| struct DurationVisitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline this struct and the implementation into the deserialize function, there is no value in it existing outside of Deserialize
avro/src/duration.rs
Outdated
| use apache_avro_test_helper::TestResult; | ||
|
|
||
| #[test] | ||
| fn test_duration_from_value() -> TestResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change all new tests to start with avro_rs_382, so we can easily get back to the PR
avro/src/schema_equality.rs
Outdated
| test_primitives!(LocalTimestampNanos); | ||
|
|
||
| #[test] | ||
| fn test_avro_3939_compare_schemata_duration() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be avro_rs_382
|
@Kriskras99 thanks for the feedback, I made the suggested changes |
| Value::BigDecimal(ref big_decimal) => { | ||
| visitor.visit_str(big_decimal.to_plain_string().as_str()) | ||
| } | ||
| _ => Err(de::Error::custom(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no more needed because the check is exhaustive, right ?
| LocalTimestampNanos, | ||
| /// An amount of time defined by a number of months, days and milliseconds. | ||
| Duration, | ||
| Duration(FixedSchema), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this.
To prevent API breaks in the future it would be better to use Duration(DurationSchema) where:
enum DurationSchema {
Fixed(FixedSchema)
}This way if some day Duration is represented by another (e.g. more compact) way it will be easier to add a second variant to DurationSchema.
Similar to UuidSchema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that switching from Duration(FixedSchema) to Duration(DurationSchema) down the road may not be a whole lot more disruptive than starting with Duration(DurationSchema) and adding a new enum value down the road. In the latter case, library consumers would still need to update match statements to accommodate the new enum variant, and nothing would have prevented consumers from writing code that assumes that duration is a fixed type.
Admittedly, having to migrate consumer code from using a FixedSchema to using an enum would be slightly more disruptive, but I think it would be worth it to not have to deal with the extra enum layer for now. Let me know what you think.
| inner: InnerDecimalSchema::Fixed(FixedSchema { attributes, .. }), | ||
| .. | ||
| }) | ||
| | Schema::Uuid(UuidSchema::Fixed(FixedSchema { attributes, .. })) => Some(attributes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an arm for Schema::Duration too.
Maybe also a new test for schema equality. Currently two duration schemas with different attributes would match or not depending on include_attributes
| inner: InnerDecimalSchema::Fixed(FixedSchema { aliases, .. }), | ||
| .. | ||
| }) | ||
| | Schema::Uuid(UuidSchema::Fixed(FixedSchema { aliases, .. })) => aliases.as_ref(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add support for Schema::Duration here too
| inner: InnerDecimalSchema::Fixed(FixedSchema { doc, .. }), | ||
| .. | ||
| }) | ||
| | Schema::Uuid(UuidSchema::Fixed(FixedSchema { doc, .. })) => doc.as_ref(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
avro/src/types.rs
Outdated
| duration @ Value::Duration { .. } => duration, | ||
| Value::Fixed(size, bytes) => { | ||
| if size != 12 { | ||
| return Err(Details::GetDecimalFixedBytes(size).into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old issue: GetDecimalFixedBytes is not the correct error type here.
| return Err(Details::ResolveDuration(Value::Fixed(size, bytes.clone())).into()); |
| let specification_eq_res = SPECIFICATION_EQ.compare(&schema_one, &schema_two); | ||
| let struct_field_eq_res = STRUCT_FIELD_EQ.compare(&schema_one, &schema_two); | ||
| assert_eq!(specification_eq_res, struct_field_eq_res) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add two more (negative) tests:
- schemas with different names
- schemas with different sizes
avro/src/duration.rs
Outdated
| @@ -1,3 +1,5 @@ | |||
| use serde::{Deserialize, Serialize, de}; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this import below the ASF licence header
| Schema::Enum(EnumSchema { name, .. }) | ||
| | Schema::Fixed(FixedSchema { name, .. }) | ||
| | Schema::Uuid(UuidSchema::Fixed(FixedSchema { name, .. })) | ||
| | Schema::Decimal(DecimalSchema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add Schema::Duration support here
|
@martin-g thanks for all the feedback! I should have taken a closer look at the Schema enum and schema comparison functions. I made fixes and added some tests to address all but one of your comments, and on the one remaining one, I responded and would be interested in your thoughts. |
This PR adds some missing functionality for the Duration type:
This PR also deprecates
SchemaKind::is_namedin favor of a newSchema::is_namedfunction, which can correctly determine whether a schema is for a named type, even when it is for a logical type that could have either a named or non-named underlying type, (such as "decimal" or "uuid".)One note on this change: I noticed that the derive macro for AvroSchema was using the Duration schema for core::time::Duration. It is no longer trivial to create a schema for a Duration type, since the schema would need to be given a name. It looks like other named types are excluded from these derive macros, so I just deleted the
impl_schema!(core::time::Duration, Schema::Duration)line, but I'm not sure if there's something better that can be done here.