Skip to content

Performance: readData utterly slow for files with many lines of data #57

@FObersteiner

Description

@FObersteiner

Description

Loading data from small files completes in a decent amount of time. With many lines of data (10k+), the process becomes a "bottleneck".

What I Did

read 4.3k lines of data, ffi1001:

%timeit myfile.readData()
67.9 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

read 86.6k lines of data, ffi1001:

%timeit myfile.readData()
51.5 s ± 2.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's nearly a minute per file! If I'd want to load many such files, I'd have to go have a lot of coffee in the meantime ☕👾


tracing the execution of the call to readData, I find

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions