Skip to content

Comments

Improve NULL handling for NOT NULL integer columns #65#66

Merged
monozoide merged 1 commit intodevfrom
bugfix/65-sql-import-asn-null-constraint
Oct 17, 2025
Merged

Improve NULL handling for NOT NULL integer columns #65#66
monozoide merged 1 commit intodevfrom
bugfix/65-sql-import-asn-null-constraint

Conversation

@monozoide
Copy link
Owner

Fix: Prevent NULL values for NOT NULL integer columns in SQL export

🐞 Bug Summary

The SQL import process was failing with a NOT NULL constraint error on the asn_int column in the maillogsentinel_events table. The root cause was the SQL exporter generating NULL values for empty strings in CSV data, which were then passed to NOT NULL integer columns during import, causing constraint violations.

Severity: high

🔁 Reproduction

  1. Prepare a CSV export with an empty string value in an ASN field
  2. Export the data to SQL format using the SQL exporter
  3. Attempt to import the generated SQL file into the database
  4. The import fails with a NOT NULL constraint violation on the asn_int column

Actual result:
The SQL exporter generates NULL for empty string values, which violates the NOT NULL constraint and creates an invalid SQL file that cannot be imported.

Expected result:
The export process should abort with a clear error when encountering invalid data for a NOT NULL column, preventing the generation of an invalid SQL file.

🌍 Runtime Context

  • Affected version(s): 5.15.x
  • Environment(s): Production
  • OS: Debian 13.1 - Trixie
  • Python version: 3.13.x
  • Relevant file: /usr/local/bin/lib/maillogsentinel/sql_importer.py (line 467)
  • Affected module: sql_exporter.py (SQL export logic)

🕵️ Analysis & Root Cause

The issue originates in the format_sql_value function in sql_exporter.py. The function was converting empty strings to NULL values without checking the column's NOT NULL constraint status. This logic was then passed directly to the database import process, causing constraint violations.

Confirmed root cause: The format_sql_value function lacked proper validation to distinguish between nullable and non-nullable columns when handling empty string values.

✅ Applied Fix

The fix modifies the format_sql_value function in sql_exporter.py:

  1. Simplified nullability check: The logic for determining if a column is nullable now only checks for the absence of "NOT NULL" in the column definition, making it more robust and maintainable.

  2. Explicit empty string validation: Within the integer conversion block, an explicit check now raises a ValueError when encountering an empty string before attempting integer conversion. This ValueError is caught by the existing error handling logic.

  3. Conditional error propagation:

    • For nullable columns: The ValueError is caught and NULL is returned (legacy behavior preserved for valid cases)
    • For NOT NULL columns: The ValueError is re-raised, causing the export process to fail and preventing the creation of an invalid SQL file

Affected modules:

  • sql_exporter.py (format_sql_value function)
  • tests/test_sql_exporter.py (new regression test)

Risks/side effects:

  • Exports will now fail earlier when encountering invalid data for NOT NULL columns (this is the intended behavior to prevent silent data corruption)
  • Users will receive clearer error messages instead of import failures
  • No breaking changes for valid data; only affects error handling for edge cases

🧪 Tests (including regression)

  • Unit test covering the case
  • Integration/E2E test (SQL export → import workflow)
  • Non-regression test (bug does not reappear)

Manual validation procedure:

  1. Create a test CSV file with an empty string value in an integer field that maps to a NOT NULL database column (e.g., ASN)
  2. Run the SQL export process on this CSV
  3. Verify that the export process fails with a clear error message instead of generating an invalid SQL file
  4. Confirm that valid data continues to export successfully without regression
  5. Expected result: Export process aborts with error when encountering invalid data for NOT NULL columns; valid data exports without issues

🔗 Links

🖼️ Evidence

Code changes summary:

  • sql_exporter.py: Enhanced format_sql_value function with improved nullability validation and explicit empty string handling
  • tests/test_sql_exporter.py: Added regression test case verifying export fails for empty strings in NOT NULL integer columns

Test output:

  • All existing tests pass
  • New regression test validates the fix

✅ Project checklist

  • All my commits are signed (GPG/SSH)
  • lint passes locally and in CI
  • No sensitive data exposed (credentials, tokens)
  • Manual validation completed
  • The added/covered tests are relevant for the feature
  • I have described a clear manual validation procedure

📝 Summary

This fix resolves a critical bug where empty string values in CSV exports were incorrectly converted to NULL, violating NOT NULL constraints in the database. The solution improves data integrity by failing fast during export rather than generating invalid SQL that fails at import time. The fix is complete, well-tested, and production-ready.

Refines the logic in format_sql_value to treat columns as nullable only if 'NOT NULL' is absent from the SQL type definition. Adds stricter validation to prevent empty strings from being converted to integers for NOT NULL columns, and introduces a new test case to verify data conversion failure for NOT NULL integer fields.
@monozoide monozoide self-assigned this Oct 17, 2025
@monozoide monozoide added the bug Something isn't working label Oct 17, 2025
@monozoide monozoide moved this from Todo to In progress in MailLogSentinel Roadmap Oct 17, 2025
@monozoide monozoide added this to the 5.15.4 milestone Oct 17, 2025
@monozoide monozoide merged commit 71ad31d into dev Oct 17, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in MailLogSentinel Roadmap Oct 17, 2025
@monozoide monozoide deleted the bugfix/65-sql-import-asn-null-constraint branch October 17, 2025 20:35
monozoide added a commit that referenced this pull request Oct 17, 2025
* workflows/update-gitignore-and-create-mls-ci #34 (#38)

* workflows/update-gitignore #34

Updated .gitignore to allow .log files in the repository for test data.

* workflow/add-gh-actions-workflow #34

Introduces a CI workflow that runs linting and tests on code changes to the main and dev branch, while skipping these steps for documentation-only changes. This setup uses flake8 for linting and pytest for testing, and optimizes CI runs by detecting code vs. docs changes.

* docs: Add sample email report and log files to dataset #31 (#39)

Added sample_email_report_output.txt, sample_mail.log, and sample_sasl.log to docs/dataset for documentation and testing purposes. These files provide example outputs and logs for MailLogSentinel.

* chore: add standardized PR templates for all contribution types (#35) (#42)

* Add PR templates

Introduces standardized pull request templates for bugfixes, code changes, documentation, CI/CD, and features in the .github/PR_TEMPLATES directory. These templates help ensure consistent and thorough PR descriptions, validation steps, and project checklists across different types of contributions.

* Delete PULL_REQUEST_TEMPLATE.md #35

Deleted the .github/PULL_REQUEST_TEMPLATE.md file. This change removes the default template for new pull requests.

* Revamp README with clearer setup and feature guide #32 (#43)

* Revamp README with clearer setup and feature guide #32

The README has been rewritten for clarity and conciseness, featuring a new quick start guide, clearer prerequisites, simplified command references, and improved documentation links. The overview, installation, and usage instructions are now more accessible, and advanced features are summarized with direct links to the Wiki. The new format is more user-friendly for first-time users and contributors.

* Update README links and formatting #32

Corrected documentation and sample output links, updated the contributing guide URL, and improved formatting for the closing quote in the README.

* Fix relative link to sample email report in README #32

Updated the link to the daily email report example to use the correct relative path, ensuring the documentation points to the right file location.

* ci: fix path filter for docs-only changes #44 (#45)

Enhanced the GitHub Actions workflow to better distinguish between code and documentation changes using separate path filters for pull requests and pushes. Updated the lint job to use Python 3.11 and ruff instead of flake8, and improved dependency installation for both lint and test jobs. The workflow now supports a fast path for documentation-only changes, skipping unnecessary jobs.

* Revise and expand contributing guidelines #33 (#46)

* Revise and expand contributing guidelines #33

Updated CONTRIBUTING.md with clearer, more structured quick-start instructions and recommendations. Added a new CONTRIBUTING_DETAILED.md file providing comprehensive workflow, commit signing, quality standards, and contribution requirements to help contributors follow best practices.

* Fix relative links in contributing docs #33

Updated relative paths in CONTRIBUTING.md and CONTRIBUTING_DETAILED.md to ensure links to detailed guidelines and discussions work correctly.

Closes #33

* Revise and condense maillogsentinel man page #47 (#49)

The man page for maillogsentinel was rewritten for clarity, brevity, and improved structure. Redundant and verbose sections were condensed, option descriptions were clarified, and auxiliary tool documentation was streamlined. The new version emphasizes practical usage, configuration, diagnostics, and security best practices, while removing excessive detail and outdated formatting.

* Add manpages for ipinfo and log_anonymizer #48 (#50)

Introduces manual pages for the ipinfo and log_anonymizer command-line tools, providing usage instructions, options, examples, and related information for system administrators.

* Add comprehensive FAQ documentation (#51)

Introduces a detailed FAQ (docs/wiki/FAQ.md) covering installation, configuration, usage, maintenance, integrations, troubleshooting, data analysis, security, and development for MailLogSentinel. This resource aims to assist users and contributors with common questions and operational guidance.

* Update documentation links and add manual pages #52 (#53)

Adjusted wiki links to use correct relative paths, added FAQ link, and included references to manual pages for maillogsentinel, ipinfo, and log_anonymizer in the README.

* Create readable markdown versions of manpages #54 (#55)

Introduces manual pages in Markdown format for the ipinfo, log_anonymizer, and maillogsentinel utilities. These documents provide usage instructions, options, configuration details, examples, and security considerations for each tool as part of the MailLogSentinel project.

* Add Debian install guide for MailLogSentinel #21 (#56)

Introduces a comprehensive installation and configuration guide for MailLogSentinel on Debian 12/13. The guide covers prerequisites, system preparation, installation steps, configuration, verification, service and timer setup, advanced options, troubleshooting, security, and additional resources.

* Fix linting errors in CI workflow #58 (#59)

* Fix linting errors in CI workflow #58

This commit fixes a number of linting errors that were causing the CI workflow to fail. The errors were primarily related to unused imports, f-strings without placeholders, and unused variables.

* Refactor and clean up test code #58

Removed unused imports and variables in test_maillogsentinel_setup.py and test_sql_exporter.py. Updated test_run_sql_export_basic_flow to use context manager for patching datetime and simplified the mocking of the logger. These changes improve test clarity and maintainability.

* Remove duplicate unittest.mock import #58

Consolidated the import of patch and MagicMock from unittest.mock to avoid redundancy in the test file.

* Remove unused MagicMock import #58

Cleaned up the import statements by removing MagicMock, which was not used in the test file.

* Fix Python 3.13 compatibility with pathlib #61 (#62)

Refactored the SQL import/export functionality to use `importlib.resources.as_file` instead of the deprecated `pathlib.Path` context manager. This resolves a crash on Python 3.13, where `pathlib.Path` objects no longer support the context manager protocol.

close #61

* Fix: SQL export reports success on failure #63 (#64)

* Fix: SQL export reports success on failure #63

The --sql-export command was displaying a misleading success message even when data conversion errors occurred. This was because the `format_sql_value` function would log a warning and return NULL on conversion failure, but it did not propagate the error.

This commit makes the data conversion stricter by raising an `SQLExportError` when a conversion for a NOT NULL column fails. The `run_sql_export` function now handles this exception, counts the errors, and returns `False` if any errors occurred. It also deletes the incomplete SQL file to avoid leaving invalid artifacts.

This ensures that the SQL export process provides accurate feedback and only reports success when the export is actually successful.

* Update SQL export tests #63

Tests now expect SQLExportError when None is provided for NOT NULL columns. Updated assertions to match new SQL statement formatting with quoted column names. This improves test accuracy and enforces stricter validation in SQL export logic.

Closes #63

* Improve NULL handling for NOT NULL integer columns Fix #65 (#66)

Refines the logic in format_sql_value to treat columns as nullable only if 'NOT NULL' is absent from the SQL type definition. Adds stricter validation to prevent empty strings from being converted to integers for NOT NULL columns, and introduces a new test case to verify data conversion failure for NOT NULL integer fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] SQL import fails with NOT NULL constraint error on asn_int column

1 participant