Skip to content

Improve CSV gzip decompression safety and error reporting#1587

Open
Quillen09 wants to merge 1 commit intoOpenEnergyDashboard:developmentfrom
Quillen09:issue-9
Open

Improve CSV gzip decompression safety and error reporting#1587
Quillen09 wants to merge 1 commit intoOpenEnergyDashboard:developmentfrom
Quillen09:issue-9

Conversation

@Quillen09
Copy link
Copy Markdown

@Quillen09 Quillen09 commented Mar 23, 2026

Description

This pull request improves protection against gzip-based denial‑of‑service attacks
during CSV uploads by making the decompression limit explicit in the failure message.

Previously, oversized gzip uploads were rejected with a generic error that did not
indicate what limit was exceeded. This change updates the gunzip utility to include
the configured maximum decompressed size (in bytes) in the error message and adds
unit-level test coverage to prevent regressions.

Type of change

  • Note merging this changes the database configuration.
  • This change requires a documentation update

Checklist

  • I have followed the OED pull request ideas
  • I have removed text in ( ) from the issue request
  • You acknowledge that every person contributing to this work has signed the
    OED Contributing License Agreement and each author is listed in the Description section.

Limitations

This change focuses on error clarity and regression coverage only. The configured
decompression limit value is unchanged, and no additional rate‑limiting or
throttling behavior has been introduced.

Copy link
Copy Markdown
Member

@huss huss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Quillen09 Thank you for your first contribution to OED. I have made some comments to consider. Also, the description discussed unit tests but I don't see any tests. Could you explain or add them, esp. given this is security related code. Finally, can you explain your statement in the description that OED already enforced file limits but did not return a clear error message. I think it should be part of the record and an understanding of how this relates to that. Please let me know if anything is not clear, you have thoughts or would like any assistance.

* @returns {Promise<string>} - Full path of decompressed csv file
*/
async function gunzipToFileWithLimit(inputPath, outputDir, outputFilename, maxBytes) {
await fsp.mkdir(outputDir, { recursive: true });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear if OED creates new directories when it uploads files. If not, it would probably be more secure not to do this. Also, I'm unclear, given this seems to be a hard coded output directory, that it would not allow writing onto inappropriate directories.

Also, I wonder if the size of the zipped file should be checked. I think it may be safe to assume that if it is larger than the desired output size then the unzipped file would be larger than desired. That would stop the saving of the file. This may happen earlier but enforcing here seems a good idea.

return new Promise((resolve, reject) => {
const source = fs.createReadStream(inputPath);
const gunzip = zlib.createGunzip();
const dest = fs.createWriteStream(outPath, { flags: "wx" }); // fail if file exists
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OED prefers comments on the line above code and not the same line.

reject(err);
};

source.on("error", cleanup);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on how the err is begin passed to cleanup by these. I may be missing something and happy for you to explain.


const cleanup = async (err) => {
try { await fsp.unlink(outPath); } catch (_) {}
reject(err);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be in a finally block. If so, would the catch be needed?

I'm not an expert here but if this code issues a reject on the first call then would the other cleanup calls necessarily get executed/completed?

@@ -0,0 +1,54 @@
const zlib = require("zlib");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file lacks the MPL comment in all files.

const fsp = require("fs").promises;
const path = require("path");

/**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is space indented but OED wants tabs.


dest.on("finish", () => resolve(outPath));

source.pipe(gunzip).pipe(dest);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file lacks comments inside the function and I think they should be added. This is definitely an example of code that should be explained.

* @param {string} inputPath - Path to uploaded gzip file
* @param {string} outputDir - Directory to write decompressed csv
* @param {string} outputFilename - Filename to write
* @param {number} maxBytes - Max allowed decompressed output size in bytes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maxBytes should have a default value that enforces a reasonable limit. I also think this function should put some max limit on the size that can be passed.

* @param {number} maxBytes - Max allowed decompressed output size in bytes
* @returns {Promise<string>} - Full path of decompressed csv file
*/
async function gunzipToFileWithLimit(inputPath, outputDir, outputFilename, maxBytes) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function used anywhere so I'm unclear on how it is actually used to enforce limits. For example, how does it relate to the gunzip usage in src/server/routes/csv.js and how it is done there.

@huss
Copy link
Copy Markdown
Member

huss commented Mar 25, 2026

Sorry to add something after I submitted the review. I believe there are several other aspects of potential malicious behavior of gzip files to address. I suspect some are fairly straightforward to implement. You note some in the description. Given this, I waned to know if you planned to address those. If not, I will have to decide if this should be merged or if it should wait until further enhancements are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants