-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Description
Hello, dear medallia staffs.
Thank you for your nice Java code. It is beautiful, neatly but seems not precious.
I computed the accuracy rate, and it is 20% lower than the original version.
I trained on text8 with the same parameters, which are:
Java
File f = new File("text8");
if (!f.exists())
throw new IllegalStateException("Please download and unzip the text8 example from http://mattmahoney.net/dc/text8.zip");
List<String> read = Common.readToList(f);
List<List<String>> partitioned = Lists.transform(read, new Function<String, List<String>>() {
@Override
public List<String> apply(String input) {
return Arrays.asList(input.split(" "));
}
});
Word2VecModel model = Word2VecModel.trainer()
.setMinVocabFrequency(5)
.useNumThreads(20)
.setWindowSize(8)
.type(NeuralNetworkType.CBOW)
.setLayerSize(200)
.useNegativeSamples(25)
.setDownSamplingRate(1e-4)
.setNumIterations(15)
.setListener(new TrainingProgressListener() {
@Override public void update(Stage stage, double progress) {
System.out.println(String.format("%s is %.2f%% complete", Format.formatEnum(stage), progress * 100));
}
})
.train(partitioned);
try(final OutputStream os = Files.newOutputStream(Paths.get("vectors.bin"))) {
model.toBinFile(os);
}
C
./word2vec -train text8 -output vectors.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 8 -binary 1 -iter 15
Use the same judge program and test file:
./compute-accuracy vectors.bin 30000 < questions-words.txt
Your Java implementation:
capital-common-countries:
ACCURACY TOP1: 58.30 % (295 / 506)
Total accuracy: 58.30 % Semantic accuracy: 58.30 % Syntactic accuracy: nan %
capital-world:
ACCURACY TOP1: 36.78 % (534 / 1452)
Total accuracy: 42.34 % Semantic accuracy: 42.34 % Syntactic accuracy: nan %
currency:
ACCURACY TOP1: 12.69 % (34 / 268)
Total accuracy: 38.77 % Semantic accuracy: 38.77 % Syntactic accuracy: nan %
city-in-state:
ACCURACY TOP1: 25.21 % (396 / 1571)
Total accuracy: 33.16 % Semantic accuracy: 33.16 % Syntactic accuracy: nan %
family:
ACCURACY TOP1: 55.23 % (169 / 306)
Total accuracy: 34.80 % Semantic accuracy: 34.80 % Syntactic accuracy: nan %
gram1-adjective-to-adverb:
ACCURACY TOP1: 8.07 % (61 / 756)
Total accuracy: 30.64 % Semantic accuracy: 34.80 % Syntactic accuracy: 8.07 %
gram2-opposite:
ACCURACY TOP1: 9.48 % (29 / 306)
Total accuracy: 29.39 % Semantic accuracy: 34.80 % Syntactic accuracy: 8.47 %
gram3-comparative:
ACCURACY TOP1: 38.25 % (482 / 1260)
Total accuracy: 31.13 % Semantic accuracy: 34.80 % Syntactic accuracy: 24.63 %
gram4-superlative:
ACCURACY TOP1: 23.91 % (121 / 506)
Total accuracy: 30.60 % Semantic accuracy: 34.80 % Syntactic accuracy: 24.50 %
gram5-present-participle:
ACCURACY TOP1: 22.08 % (219 / 992)
Total accuracy: 29.53 % Semantic accuracy: 34.80 % Syntactic accuracy: 23.87 %
gram6-nationality-adjective:
ACCURACY TOP1: 63.17 % (866 / 1371)
Total accuracy: 34.50 % Semantic accuracy: 34.80 % Syntactic accuracy: 34.25 %
gram7-past-tense:
ACCURACY TOP1: 26.35 % (351 / 1332)
Total accuracy: 33.47 % Semantic accuracy: 34.80 % Syntactic accuracy: 32.64 %
gram8-plural:
ACCURACY TOP1: 44.25 % (439 / 992)
Total accuracy: 34.39 % Semantic accuracy: 34.80 % Syntactic accuracy: 34.17 %
gram9-plural-verbs:
ACCURACY TOP1: 18.15 % (118 / 650)
Total accuracy: 33.53 % Semantic accuracy: 34.80 % Syntactic accuracy: 32.90 %
Questions seen / total: 12268 19544 62.77 %
Original C implementation:
capital-common-countries:
ACCURACY TOP1: 82.81 % (419 / 506)
Total accuracy: 82.81 % Semantic accuracy: 82.81 % Syntactic accuracy: nan %
capital-world:
ACCURACY TOP1: 62.26 % (904 / 1452)
Total accuracy: 67.57 % Semantic accuracy: 67.57 % Syntactic accuracy: nan %
currency:
ACCURACY TOP1: 23.13 % (62 / 268)
Total accuracy: 62.22 % Semantic accuracy: 62.22 % Syntactic accuracy: nan %
city-in-state:
ACCURACY TOP1: 44.68 % (702 / 1571)
Total accuracy: 54.96 % Semantic accuracy: 54.96 % Syntactic accuracy: nan %
family:
ACCURACY TOP1: 75.82 % (232 / 306)
Total accuracy: 56.52 % Semantic accuracy: 56.52 % Syntactic accuracy: nan %
gram1-adjective-to-adverb:
ACCURACY TOP1: 17.20 % (130 / 756)
Total accuracy: 50.40 % Semantic accuracy: 56.52 % Syntactic accuracy: 17.20 %
gram2-opposite:
ACCURACY TOP1: 21.90 % (67 / 306)
Total accuracy: 48.71 % Semantic accuracy: 56.52 % Syntactic accuracy: 18.55 %
gram3-comparative:
ACCURACY TOP1: 64.60 % (814 / 1260)
Total accuracy: 51.83 % Semantic accuracy: 56.52 % Syntactic accuracy: 43.54 %
gram4-superlative:
ACCURACY TOP1: 39.72 % (201 / 506)
Total accuracy: 50.95 % Semantic accuracy: 56.52 % Syntactic accuracy: 42.86 %
gram5-present-participle:
ACCURACY TOP1: 39.52 % (392 / 992)
Total accuracy: 49.51 % Semantic accuracy: 56.52 % Syntactic accuracy: 41.99 %
gram6-nationality-adjective:
ACCURACY TOP1: 87.24 % (1196 / 1371)
Total accuracy: 55.08 % Semantic accuracy: 56.52 % Syntactic accuracy: 53.94 %
gram7-past-tense:
ACCURACY TOP1: 38.21 % (509 / 1332)
Total accuracy: 52.96 % Semantic accuracy: 56.52 % Syntactic accuracy: 50.73 %
gram8-plural:
ACCURACY TOP1: 67.54 % (670 / 992)
Total accuracy: 54.21 % Semantic accuracy: 56.52 % Syntactic accuracy: 52.95 %
gram9-plural-verbs:
ACCURACY TOP1: 37.38 % (243 / 650)
Total accuracy: 53.32 % Semantic accuracy: 56.52 % Syntactic accuracy: 51.71 %
Questions seen / total: 12268 19544 62.77 %
Can you give me any suggestions or ideas about this? I am ready to help you if needed.
Thank you.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels