Conversation
…ling and add new commands for various reading scenarios
…djust table headers for clarity
Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Adam Reeve <adreeve@gmail.com>
…into write-benchmarks
…ompression options
|
Hi @adamreeve , added initial code to read the decompressed file and write it to Parquet. I started with the num_plasma dataset:
The binary is then read and saved as Parquet with different configurations. Please let me know if this is on track. |
adamreeve
left a comment
There was a problem hiding this comment.
Thanks Larry, I've left a couple of suggestions
…encoding and compression options
|
Hi @adamreeve , I ran the tests with the three files and drafted some documentation of the findings. Please let me know what you think. |
adamreeve
left a comment
There was a problem hiding this comment.
Thanks @larrytamnjong, I've left some suggestions.
Co-authored-by: Adam Reeve <adreeve@gmail.com>
|
@adamreeve I’ve resolved the suggestions you left, please take another look and let me know what you think. |
adamreeve
left a comment
There was a problem hiding this comment.
These latest changes look good thanks Larry, I just have one small suggested change.
For the next step, it would be good to add measurements of how the write configuration affects the read time, when using consistent read options. Using the logical-chunked configuration would make sense as you found this was optimal in your previous tests. You might want to allow specifying the file to read when using the read benchmark commands to support this.
Co-authored-by: Adam Reeve <adreeve@gmail.com>
|
@adamreeve I’ve added read measurements and moved the PR out of draft. Please let me know if everything looks okay. |
…d times for each configuration
|
@adamreeve as discussed I have updated the read results to be an average of 5 runs. |
|
👍 Thanks for all your work on this @larrytamnjong |
No description provided.