Clustering with K-Means

Implemented K-Means and K-Medoids for vectors and curves with C++

Initializers:

Random Selection: With random selection we select uniformly at random k points of the dataset to start as centroids
K-Means++: With K-Means++ we select k points of the dataset, spread very far away from the data

Assigners:

Loyd's: For every point compute distances to all centroids, and select the one that is closer. It seems as a very good practice, but in reality it converges really slow, because the decisions affect the clusters very little, so there are not big differences in centroid update.
Inverse Assignment: We use LSH nearest neighbor for this assigner. Apply Range Search with LSH for the R-nearest neighbors, to get for every centroid the R-closest points of the dataset and assign to those points as centroid id. For points who came as nearest neighbors of more the one centroid, we calculate the distance, and select the one that is closer. That way, the algorithm converges faster because some changes affect the clustering score in a certain amount, where the centroids are changing rapidly.

Updaters:

PAM - K Medoids: K-Medoids is a very good algorithm, insensitive to outliers providing very good results. But its complexity is very high O(n^2 * K * i) since it has to calculate for every combination of points their distances. In order to make this algorithm faster, we implemented a database, where the distances are pre calculated exhaustively.
K-Means: K-Means is sensitive to outliers, and worse than K-Medoids for small datasets, where the time of completion is not a factor. K-Means works very well for very big datasets, where K-Medoids is extremely slow. It gives very good results in O(n * k * i) where k * i << n

Metric for cluster score:

We used Silhouette for clustering evaluation and validation of consistency.

Vectors ~0.98
Curves ~0.4

Usage:

Compilation
make cluster
Run
./build/cluster.x –i <input_file> -c <configuration_file> -o <output_file>

Collaborators

Konstantinos Athinaios
Theofanis Aslanidis

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Clustering		Clustering
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering with K-Means

Initializers:

Assigners:

Updaters:

Metric for cluster score:

Usage:

Collaborators

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clustering with K-Means

Initializers:

Assigners:

Updaters:

Metric for cluster score:

Usage:

Collaborators

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages