Removing a shape from Subquery Index in the Where Clause Filter has been found to be O(n), with n being the number of shapes in that index, in the sync-service v1.6. Adding and removing shapes needs to be O(1) to allow the system to scale, which v1.5 adhered to. Since shape removal currently blocks replication processing, slow removal can cause WAL lag.
Issue in production
This issue caused WAL lag for Autoarc. Below you can see v1.6 on the left and v1.5 on the right. In the top graph you can see the WAL lag 1.6 caused. In the 2nd graph you can see that this was no due to replication processing. In the bottom graph you can see that shape removal was significantly slower in 1.6, enough to cause the WAL lag.
source
The trigger for what's causing the shapes to be removed is "Materializer shape invalidated" which was present in v1.5 and will be dealt with on a separate issue.
Benchmarks
Microbenchmarks around the SubqueryIndex have confirmed shape removal is O(n) and account for the delays seen by AutoArc.
Cause
The SubqueryIndex introduced in v1.6 is a reverse index managed by the shape consumers themselves and is roughly in the form value -> shape_handle where evaluating which shape handles are affected by what values is O(1) but shape removal involves a scan through the ETS table to find the shape handles. A naive solution would be to also hold shape_handle -> values however this would increase memory footprint on what is already a highly memory inefficient structure. Since we want to reduce memory footprint, we should consider a redesign of how the index works.
Removing a shape from Subquery Index in the Where Clause Filter has been found to be O(n), with n being the number of shapes in that index, in the sync-service v1.6. Adding and removing shapes needs to be O(1) to allow the system to scale, which v1.5 adhered to. Since shape removal currently blocks replication processing, slow removal can cause WAL lag.
Issue in production
This issue caused WAL lag for Autoarc. Below you can see v1.6 on the left and v1.5 on the right. In the top graph you can see the WAL lag 1.6 caused. In the 2nd graph you can see that this was no due to replication processing. In the bottom graph you can see that shape removal was significantly slower in 1.6, enough to cause the WAL lag.
source
The trigger for what's causing the shapes to be removed is "Materializer shape invalidated" which was present in v1.5 and will be dealt with on a separate issue.
Benchmarks
Microbenchmarks around the SubqueryIndex have confirmed shape removal is O(n) and account for the delays seen by AutoArc.
Cause
The SubqueryIndex introduced in v1.6 is a reverse index managed by the shape consumers themselves and is roughly in the form
value -> shape_handlewhere evaluating which shape handles are affected by what values is O(1) but shape removal involves a scan through the ETS table to find the shape handles. A naive solution would be to also holdshape_handle -> valueshowever this would increase memory footprint on what is already a highly memory inefficient structure. Since we want to reduce memory footprint, we should consider a redesign of how the index works.