Skip to content

Fix read_adi_parallel record parsing and chunk_size calculation#4

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/fix-adiftools-read-adi-bugs
Draft

Fix read_adi_parallel record parsing and chunk_size calculation#4
Copilot wants to merge 4 commits intomainfrom
copilot/fix-adiftools-read-adi-bugs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 14, 2026

read_adi_parallel() and _process_chunk() had three critical bugs: (1) chunk_size could be zero when num_processes exceeded record count, (2) multi-line ADIF records were silently dropped, and (3) field extraction regex failed on type indicators like <FREQ:9:N>.

Changes

adiftools.py

read_adi_parallel(): Changed from line-based to record-based parsing

  • Split file content on <EOR> delimiter instead of chunking by line count
  • Calculate chunk_size as max(1, num_records // actual_processes) to prevent zero division
  • Limit processes to min(num_processes, num_records) when records < processes

_process_chunk(): Updated field extraction and record handling

  • Regex updated from r'<(.*?):(\d+)>([^<]*)' to r'<([^:>]+):([^:>]+)(?::[^>]*)?>([^<]*)' to capture optional type indicators
  • Process complete records instead of lines
  • Filter empty records before parsing

module_check.py

Wrapped print statement in if __name__ == '__main__': guard to eliminate import-time side effect.

Example

Before, this would fail with range() arg 3 must not be zero:

parser = ADIFParser()
df = parser.read_adi_parallel('single_record.adi', num_processes=4)  # Error

And multi-line records were silently dropped:

# File content:
# <CALL:6>TEST01 <MODE:3>FT8
# <QSO_DATE:8>20210101 <EOR>
# <CALL:6>TEST02 <MODE:3>CW <QSO_DATE:8>20210102 <EOR>

df = parser.read_adi_parallel('multiline.adi')
len(df)  # Returns 1 instead of 2 - TEST01 lost

Now both cases work correctly.

Original prompt

Summary of changes to implement in repository JS2IIU-MH/adiftools-dev:

Objective:

  • Fix bugs and harden adiftools/adiftools.py read_adi_parallel and _process_chunk to avoid chunk_size==0, handle large files and multi-line records, and make field extraction regex more robust.
  • Remove top-level side-effect print in module_check.py by moving it behind an main guard.
  • Add unit tests covering the fixed behaviors.

Details and implementation requirements:

Files to change:

  • adiftools/adiftools.py
  • module_check.py
  • tests/ (add new tests file(s))

adiftools/adiftools.py changes:

  1. read_adi_parallel(file_path, enable_timestamp=False, num_processes=None):
  • Do not rely on raw file.readlines() chunking by line count. Instead parse the file into ADIF records by splitting on the record terminator (case-insensitive). Example approach: read the file in text mode, call text = file.read(), then records = re.split(r'(?i)', text) and strip/ignore empty records.
  • Ensure num_processes is adjusted: if num_processes is None set to mp.cpu_count(), then set num_processes = min(num_processes, max(1, len(records))). This prevents num_processes > number of records and avoids zero chunk sizes.
  • Compute chunk_size = max(1, len(records) // num_processes) so chunk_size is at least 1.
  • Build chunks as lists of record strings: for i in range(0, len(records), chunk_size): chunks.append(records[i:i+chunk_size]).
  • Use multiprocessing.Pool with pool.map(self._process_chunk, chunks) where each chunk is a list of record strings.
  • After results are returned, combine records and build DataFrame as before. Keep existing behaviors: self.df_adif, self._fields, self._number_of_records, raise AdifParserError if df empty, add timestamp if enable_timestamp.
  1. _process_chunk(self, chunk_records):
  • Update signature and implementation to accept a list of record strings (each record may be multi-line).
  • For each record string, use a safe regex to extract fields: pattern = re.compile(r'<([^:>]+):(\d+)>([^<]*)') where the first capture takes field names but forbids ':' or '>' to avoid greedy capturing.
  • For each field tuple, construct dict with keys field[0].upper().strip() and values field[2].strip() (optionally .upper() if upstream code expects uppercase). Preserve the existing behavior (current code uses .upper().strip() on both name and value) but add comment documenting it.
  • Append parsed dict to records list and return it.
  1. Robustness and performance notes:
  • Avoid reading extremely large files entirely into memory if possible; however, the requested approach can read the file as text and split by . If memory concerns are critical, implement a streaming parser to accumulate until an is seen. For this PR implement the record-splitting approach to fix the chunk_size bug and multi-line record issue while keeping changes minimal.
  • Handle possible empty trailing splits gracefully.
  • Add/adjust docstrings for read_adi_parallel and _process_chunk describing the behavior.

module_check.py changes:

  • Remove top-level print(find_packages(...)) side-effect.
  • Add a main guard; e.g.:
    if name == 'main':
    from setuptools import find_packages
    print(find_packages(include=['adiftools', 'adiftools.*'], exclude=['test/']))
  • If the file already imports find_packages at top-level, remove the import from top-level and move into the main guard or keep import but remove the print at top-level. Ensure importing module does not produce stdout.

Tests to add (in tests/test_parallel_and_module_check.py or similar):

  1. test_read_adi_parallel_handles_num_processes_larger_than_records
  • Create a small ADIF content with two or three records (use tmp_path fixture to write a temporary .adi file). The ADIF content should include multi-line records and separators.
  • Call ADIFParser().read_adi_parallel(temp_file_path, num_processes=10) (a number greater than records).
  • Assert no exception is raised and that returned DataFrame has the expected number of records and expected columns (e.g., contains CALL field).
  1. test_read_adi_parallel_parses_multiline_records
  • Similar temporary file, ensure that records that span multiple lines are parsed as single records (e.g., fields distributed across lines). Assert parsed DataFrame contains expected fields/values.
  1. test_module_check_import_no_output
  • Ensure importing module_check does not print to stdout. Implement by capturing stdout around importlib.reload(module) or using subprocess run that imports the module and assert no stdout. Use importlib to import and capsys fixture might not capture prints during import; using subprocess.run(['python','-c','import module_check'], capture_output=True) with appropriate PYTHONPATH including repository root can work.
  • For simplicity write a test that imports module_check using importlib and uses contextlib.redirect_stdout to capture output, then assert captured output is empty. (Note: redi...

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits February 14, 2026 07:49
…tests

Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>
Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>
Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix bugs and improve file handling in adiftools.py Fix read_adi_parallel record parsing and chunk_size calculation Feb 14, 2026
Copilot AI requested a review from JS2IIU-MH February 14, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants