Parsing errors - "EOFError" and "UnicodeDecodeError"

Hello,

I'm currently using corpkit as a research tool for my master's thesis in library and information science as well as for experimenting on my spare-time. It works well but occasionally I get a few error messages when parsing a corpus which I don't understand. Could you perhaps explain them to me?

They're either

"EOFError: EOF when reading a line"

or

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"

Most recently, these messages occured when trying to parse a plain text file consisting of James Joyce's Ulysses retrieved from Project Gutenberg (https://www.gutenberg.org/ebooks/4300). As far as I understand the file is encoded to UTF-8 and should work fine.

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing errors - "EOFError" and "UnicodeDecodeError" #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parsing errors - "EOFError" and "UnicodeDecodeError" #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions