Skip to content

Means and Standard Deviation should be from random sample in large data sets #14

@GoogleCodeExporter

Description

@GoogleCodeExporter
If a data set is very large (say over maybe 10,000), rather than go through all 
points to find mean and especially standard deviation, the program should 
sample at most 10,000 random points from the data set to do so.

Mean might be left going through all since it can be computed with min and max, 
but standard deviation greatly slows down the start of an SOM calculation with 
large data sets (it is used for the default variance normalization).

Original issue reported on code.google.com by kyle.tha...@gmail.com on 3 Jun 2011 at 3:19

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions