Fix read_adi_parallel record parsing and chunk_size calculation by Copilot · Pull Request #4 · JS2IIU-MH/adiftools-dev

Copilot · 2026-02-14T07:45:19Z

read_adi_parallel() and _process_chunk() had three critical bugs: (1) chunk_size could be zero when num_processes exceeded record count, (2) multi-line ADIF records were silently dropped, and (3) field extraction regex failed on type indicators like <FREQ:9:N>.

Changes

adiftools.py

read_adi_parallel(): Changed from line-based to record-based parsing

Split file content on <EOR> delimiter instead of chunking by line count
Calculate chunk_size as max(1, num_records // actual_processes) to prevent zero division
Limit processes to min(num_processes, num_records) when records < processes

_process_chunk(): Updated field extraction and record handling

Regex updated from r'<(.*?):(\d+)>([^<]*)' to r'<([^:>]+):([^:>]+)(?::[^>]*)?>([^<]*)' to capture optional type indicators
Process complete records instead of lines
Filter empty records before parsing

module_check.py

Wrapped print statement in if __name__ == '__main__': guard to eliminate import-time side effect.

Example

Before, this would fail with range() arg 3 must not be zero:

parser = ADIFParser()
df = parser.read_adi_parallel('single_record.adi', num_processes=4)  # Error

And multi-line records were silently dropped:

# File content:
# <CALL:6>TEST01 <MODE:3>FT8
# <QSO_DATE:8>20210101 <EOR>
# <CALL:6>TEST02 <MODE:3>CW <QSO_DATE:8>20210102 <EOR>

df = parser.read_adi_parallel('multiline.adi')
len(df)  # Returns 1 instead of 2 - TEST01 lost

Now both cases work correctly.

Original prompt

Summary of changes to implement in repository JS2IIU-MH/adiftools-dev:

Objective:

Fix bugs and harden adiftools/adiftools.py read_adi_parallel and _process_chunk to avoid chunk_size==0, handle large files and multi-line records, and make field extraction regex more robust.
Remove top-level side-effect print in module_check.py by moving it behind an main guard.
Add unit tests covering the fixed behaviors.

Details and implementation requirements:

Files to change:

adiftools/adiftools.py
module_check.py
tests/ (add new tests file(s))

adiftools/adiftools.py changes:

read_adi_parallel(file_path, enable_timestamp=False, num_processes=None):

Do not rely on raw file.readlines() chunking by line count. Instead parse the file into ADIF records by splitting on the record terminator (case-insensitive). Example approach: read the file in text mode, call text = file.read(), then records = re.split(r'(?i)', text) and strip/ignore empty records.
Ensure num_processes is adjusted: if num_processes is None set to mp.cpu_count(), then set num_processes = min(num_processes, max(1, len(records))). This prevents num_processes > number of records and avoids zero chunk sizes.
Compute chunk_size = max(1, len(records) // num_processes) so chunk_size is at least 1.
Build chunks as lists of record strings: for i in range(0, len(records), chunk_size): chunks.append(records[i:i+chunk_size]).
Use multiprocessing.Pool with pool.map(self._process_chunk, chunks) where each chunk is a list of record strings.
After results are returned, combine records and build DataFrame as before. Keep existing behaviors: self.df_adif, self._fields, self._number_of_records, raise AdifParserError if df empty, add timestamp if enable_timestamp.

_process_chunk(self, chunk_records):

Update signature and implementation to accept a list of record strings (each record may be multi-line).
For each record string, use a safe regex to extract fields: pattern = re.compile(r'<([^:>]+):(\d+)>([^<]*)') where the first capture takes field names but forbids ':' or '>' to avoid greedy capturing.
For each field tuple, construct dict with keys field[0].upper().strip() and values field[2].strip() (optionally .upper() if upstream code expects uppercase). Preserve the existing behavior (current code uses .upper().strip() on both name and value) but add comment documenting it.
Append parsed dict to records list and return it.

Robustness and performance notes:

Avoid reading extremely large files entirely into memory if possible; however, the requested approach can read the file as text and split by . If memory concerns are critical, implement a streaming parser to accumulate until an is seen. For this PR implement the record-splitting approach to fix the chunk_size bug and multi-line record issue while keeping changes minimal.
Handle possible empty trailing splits gracefully.
Add/adjust docstrings for read_adi_parallel and _process_chunk describing the behavior.

module_check.py changes:

Remove top-level print(find_packages(...)) side-effect.
Add a main guard; e.g.:
if name == 'main':
from setuptools import find_packages
print(find_packages(include=['adiftools', 'adiftools.*'], exclude=['test/']))
If the file already imports find_packages at top-level, remove the import from top-level and move into the main guard or keep import but remove the print at top-level. Ensure importing module does not produce stdout.

Tests to add (in tests/test_parallel_and_module_check.py or similar):

test_read_adi_parallel_handles_num_processes_larger_than_records

Create a small ADIF content with two or three records (use tmp_path fixture to write a temporary .adi file). The ADIF content should include multi-line records and separators.
Call ADIFParser().read_adi_parallel(temp_file_path, num_processes=10) (a number greater than records).
Assert no exception is raised and that returned DataFrame has the expected number of records and expected columns (e.g., contains CALL field).

test_read_adi_parallel_parses_multiline_records

Similar temporary file, ensure that records that span multiple lines are parsed as single records (e.g., fields distributed across lines). Assert parsed DataFrame contains expected fields/values.

test_module_check_import_no_output

Ensure importing module_check does not print to stdout. Implement by capturing stdout around importlib.reload(module) or using subprocess run that imports the module and assert no stdout. Use importlib to import and capsys fixture might not capture prints during import; using subprocess.run(['python','-c','import module_check'], capture_output=True) with appropriate PYTHONPATH including repository root can work.
For simplicity write a test that imports module_check using importlib and uses contextlib.redirect_stdout to capture output, then assert captured output is empty. (Note: redi...

This pull request was created from Copilot chat.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…tests Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>

Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>

Initial plan

8404ee9

Copilot AI assigned Copilot and JS2IIU-MH Feb 14, 2026

Copilot started work on behalf of JS2IIU-MH February 14, 2026 07:45 View session

Copilot AI and others added 3 commits February 14, 2026 07:49

Fix bugs in read_adi_parallel and module_check.py, add comprehensive …

fb8d22a

…tests Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>

Fix linting issues (remove trailing whitespace)

ca41bfe

Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>

Fix hardcoded paths in test_module_check.py to use dynamic repo root

de4a4a6

Co-authored-by: JS2IIU-MH <146515386+JS2IIU-MH@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix bugs and improve file handling in adiftools.py~~ Fix read_adi_parallel record parsing and chunk_size calculation Feb 14, 2026

Copilot AI requested a review from JS2IIU-MH February 14, 2026 07:53

Copilot finished work on behalf of JS2IIU-MH February 14, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix read_adi_parallel record parsing and chunk_size calculation#4

Fix read_adi_parallel record parsing and chunk_size calculation#4
Copilot wants to merge 4 commits intomainfrom
copilot/fix-adiftools-read-adi-bugs

Copilot AI commented Feb 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

adiftools.py

module_check.py

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 14, 2026 •

edited

Loading