PickAI/README at main · CLEANit/PickAI · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
ISING DATA
Authors: Evan Thomas, Kyle Mills, Isaac Tamblyn
Date: 16 July 2020


The comprised datasets are two-dimensional ising spin systems.


				Datasets

Exhaustive datasets containing all possible configurations are produced by the "ising_exhaustive_generation.py" code. For larger systems, producing exhaustive datasets becomes computationally expensive. Therefore, larger datasets are generated using generative neural networks.


		Training configuration sampler by genetic algorithm

This network is trained in "ising_sampler_train.py", by a genetic algorithm. Candidate networks are given random NxN data as their input and output NxN configurations, which when filtered are acceptable ising spin configurations. The genetic algorithm scores candidate networks based on the energy distribution of their output configurations. The scoring metric is a geometric-average of several desirable distribution properties.


		Producing Uniform datasets with the configuration sampler

Once efficient configuration sampler networks have been trained, they are loaded by
"load_make_fungible.py", and used to generate a uniform dataset. To minimize
peculiarities of any particular configuration-sampling network, the final ten
networks are used in conjunction to generate the final dataset. The loaded networks
are given the random input, then their outputs are filtered and their energies
calculated. As an additional attempt to minimize any random biased produced
by the configuration samplers, the outputs generated are copied and flipped in all
possible combinations of dimensions.

The generated outputs have energies fall into a discrete number of energy-classes.
The code then works to generate D/n examples of each energy-class; where D is the
desired size of the dataset, and n is the number of energy classes. Therefore, the
total size of the dataset will be roughly D, but not exactly if D does not divide
evenly into n. If an additional energy class is encountered during the generation
process, the code adapts the goal amount for each energy class to be D/(n+1). As a
result, the final total size of the dataset may be larger than D. However, the
dataset is guaranteed to be perfectly uniform.


Licensing information

The hdf5 data files (*.hdf5) included in this work by Evan Thomas, et al, is licensed
under CC BY-NC-SA 4.0. This license is contained in the included file
"LICENSES/LICENSE_FOR_HDF5_DATA"

The Python code files (*.py) included in this work by Evan Thomas, et al,
is licensed under the GNU Public License Version 3.0 or later. The license is
is contained in the file "LICENSES/LICENSE_FOR_PYTHON_CODE"