Skip to content

Parsing errors - "EOFError" and "UnicodeDecodeError" #44

@bjornekstrom

Description

@bjornekstrom

Hello,

I'm currently using corpkit as a research tool for my master's thesis in library and information science as well as for experimenting on my spare-time. It works well but occasionally I get a few error messages when parsing a corpus which I don't understand. Could you perhaps explain them to me?

They're either

"EOFError: EOF when reading a line"

or

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)"

Most recently, these messages occured when trying to parse a plain text file consisting of James Joyce's Ulysses retrieved from Project Gutenberg (https://www.gutenberg.org/ebooks/4300). As far as I understand the file is encoded to UTF-8 and should work fine.

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions