Skip to content

Improved Diversity Measures #256

@PaulWAyers

Description

@PaulWAyers

I am not 100% satisfied with the diversity measures we currently have in place.

One other diversity measure we could implement would be the data volume: given the pairwise distances, the Cayley-Menger determinant gives the volume of the data. Clearly the volume is nonzero only if the dimension of the feature vectors is greater than the number of data points. When this is not true (or even if it is) this reference suggests forming a kernelized Gramian matrix with elements

$$ g_{ij} = e^{-\gamma d_{ij}^2} $$

with the induced distance squared,

$$ D_{ij}^2 = 2 - 2g_{ij} $$

These capture data volume, but do not capture "holes" in the data. A simple way to capture holes in the data is to look at the maximum gap between points that are included,

$$ div = \max{i,j} d_{ij} $$

or we can look at all the points, and find the one which is most distant from all others,

$$ div = \max_i \min{j} d_{ij} $$

The latter measure is directly optimized by the MaxMin algorithm.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions