-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Labels
Description
Hello,
I'm currently using corpkit as a research tool for my master's thesis in library and information science as well as for experimenting on my spare-time. It works well but occasionally I get a few error messages when parsing a corpus which I don't understand. Could you perhaps explain them to me?
They're either
"EOFError: EOF when reading a line"
or
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"
Most recently, these messages occured when trying to parse a plain text file consisting of James Joyce's Ulysses retrieved from Project Gutenberg (https://www.gutenberg.org/ebooks/4300). As far as I understand the file is encoded to UTF-8 and should work fine.
Thanks in advance.
Reactions are currently unavailable