This repository was archived by the owner on Feb 13, 2021. It is now read-only.

Description
Thank you for providing the code.
We were trying out to mine wikipedia using this shell script for our entity linker using the dump for 2018/05/01. We were able to generate the hash file but surprisingly the file size was 284 MB. In contrast, the pre-trained model provided has a file size of 1.3G for English Hash trained from November 2015 Wikipedia
@aasish could you suggest what might be happening wrong. Is it because of the compression or are we missing out on some entities? Is there a way that we could combine both the hash files so that we can take into account the recent entities.