Simplified lots of traits#238
Closed
nishaq503 wants to merge 14 commits into
Closed
Conversation
feat: reviving python experiments feat: implemented some combos for evaluating mbed wip: removing metric trait wip: lots of changes wip: most tests pass fix: repeated rnn now passes wip: restored mbed and musals simplified Dataset trait wip: restored mbed wip: restored CHAODA wip: restored the shell updating versions of deps cleaned up deps wip: cleanup up deps
e941fce to
b3235a2
Compare
93564d3 to
79b7dff
Compare
2ef214e to
4a7c664
Compare
4f05013 to
863f7e2
Compare
Collaborator
Author
|
I have something better in the works |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR greatly simplifies a lot of traits in
abd-clamas well as the public API. A summary of the changes is as follows:abd-clamno longer depends ondistancesand does not make use of theNumbertrait. Instead we now use thenumcrate and have aDistanceValuetrait with no required methods and a blanket implementation for any type that implements its super-traits.Metrictrait and its implementations have been removed. Anything that was generic over aMetricis now generic over functions using aFn(&I, &I) -> Tsignature, whereIis an item from aDatasetandTis aDistanceValue. This allows users much more freedom in defining their metrics as functions and they no longer have to first shoehorn them into structs and then implement the oldMetrictrait for their structs.Datasettrait has been greatly simplified. We now provide a blanket implementation ofDatasetfor any type that implementsAsRef<[I]>andAsMut<[I]>, i.e. any type that behaves like a slice of items of typeI. This includes standard collections likeVecs and arrays.Datasethas no super-traits and its only required methods areget,get_mutandcardinality, making it exceedingly easy to implement for other types.distanceshave been upgraded. The only exception isbitcodeand the reason is noted in the rootCargo.toml.disk-iofeature inabd-clamis no more. The relevant functionality is now included by default. I may end up changing this in a future PR after studying how other notable crates provide aserdefeature.Clusterno longer uses a random seed or any internal randomness. If the user wants comparable functionality, they can simply shuffle their dataset before using it to build a tree.Adaptertrait has been removed because it was too complicated to work with. For now, all adaptedClustertypes in the crate have a method calledfrom_cluster_tree. If we come up with a simpler idea, we can start using it.BalancedBallstruct and balanced clustering has been removed. If we want to compare it for benchmarks, we can restore it in a separate binary crate dedicated for such benchmarks.Vertextrait to extendClusterfor use inchaoda. TheVertextrait makes it so that there is no longer a global constantNUM_RATIOSrestricting the number of properties that can be used for anomaly detection. Instead, theVertextrait has an associated constantNUM_FEATURESand any type that implementsVertexcan define its own number of features that will be used in CHAODA. I have not yet fully tested the actual CHAODA implementation for this and will target that in future PRs.EncoderandDecodertraits and things look promising. The heart of the current issue is that compressed clusters store their own data (at the leaf level) instead of the compressed tree having an associatedDatasetthat has been compressed. This makes it impossible to shoehorn into the current design of theSearchAlgorithmtrait in Cakes.