Skip to content

Write benchmarks#630

Merged
adamreeve merged 32 commits intoG-Research:masterfrom
larrytamnjong:write-benchmarks
Apr 15, 2026
Merged

Write benchmarks#630
adamreeve merged 32 commits intoG-Research:masterfrom
larrytamnjong:write-benchmarks

Conversation

@larrytamnjong
Copy link
Copy Markdown
Contributor

No description provided.

larrytamnjong and others added 20 commits February 25, 2026 15:15
…ling and add new commands for various reading scenarios
Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Adam Reeve <adreeve@gmail.com>
@larrytamnjong
Copy link
Copy Markdown
Contributor Author

Hi @adamreeve , added initial code to read the decompressed file and write it to Parquet. I started with the num_plasma dataset:

./spdp.exe < "num_plasma.sp.spdp" > num_plasma.bin

The binary is then read and saved as Parquet with different configurations. Please let me know if this is on track.

Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Larry, I've left a couple of suggestions

Comment thread csharp.config.benchmarks/Program.cs Outdated
Comment thread csharp.config.benchmarks/ParquetSharpConfigBenchmarks.cs Outdated
@larrytamnjong
Copy link
Copy Markdown
Contributor Author

Hi @adamreeve , I ran the tests with the three files and drafted some documentation of the findings. Please let me know what you think.

Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @larrytamnjong, I've left some suggestions.

Comment thread csharp.config.benchmarks/ParquetSharpConfigBenchmarks.cs Outdated
Comment thread csharp.config.benchmarks/ParquetSharpConfigBenchmarks.cs Outdated
Comment thread csharp.config.benchmarks/ParquetSharpConfigBenchmarks.cs Outdated
Comment thread docs/guides/WriteBenchmarks.md Outdated
Comment thread docs/guides/WriteBenchmarks.md Outdated
Comment thread docs/guides/WriteBenchmarks.md Outdated
Comment thread docs/guides/WriteBenchmarks.md Outdated
Comment thread docs/guides/WriteBenchmarks.md Outdated
@larrytamnjong
Copy link
Copy Markdown
Contributor Author

@adamreeve I’ve resolved the suggestions you left, please take another look and let me know what you think.

Copy link
Copy Markdown
Contributor

@adamreeve adamreeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These latest changes look good thanks Larry, I just have one small suggested change.

For the next step, it would be good to add measurements of how the write configuration affects the read time, when using consistent read options. Using the logical-chunked configuration would make sense as you found this was optimal in your previous tests. You might want to allow specifying the file to read when using the read benchmark commands to support this.

Comment thread docs/guides/WriteBenchmarks.md Outdated
@larrytamnjong
Copy link
Copy Markdown
Contributor Author

@adamreeve I’ve added read measurements and moved the PR out of draft. Please let me know if everything looks okay.

@larrytamnjong larrytamnjong marked this pull request as ready for review April 14, 2026 10:51
@larrytamnjong
Copy link
Copy Markdown
Contributor Author

@adamreeve as discussed I have updated the read results to be an average of 5 runs.

@adamreeve
Copy link
Copy Markdown
Contributor

👍 Thanks for all your work on this @larrytamnjong

@adamreeve adamreeve merged commit 44f29d5 into G-Research:master Apr 15, 2026
49 checks passed
@larrytamnjong larrytamnjong deleted the write-benchmarks branch April 16, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants