Skip to content

Add Python port of the MassData Parameter Support API#42

Merged
nevstop merged 3 commits into
mainfrom
copilot/massdata-python-support
Apr 25, 2026
Merged

Add Python port of the MassData Parameter Support API#42
nevstop merged 3 commits into
mainfrom
copilot/massdata-python-support

Conversation

Copilot AI commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Following the C port in #40, add a Python implementation of the CSM MassData Parameter Support plugin so Python clients can interoperate with the LabVIEW VIs and C library through the same on-the-wire reference string format.

Layout

New python/ directory mirroring c/:

  • python/csm_massdata.py — implementation
  • python/_test/test_csm_massdata.pyunittest suite
  • python/README.md — Chinese docs paralleling c/README.md
  • .gitignore — exclude __pycache__ / *.pyc under python/

Implementation notes

  • Public function names, parameter order, and CsmMassDataStatus integer values match the LabVIEW VIs and C csm_massdata_status_t one-for-one. Reference strings are byte-identical: <MassData>Start:<N>;Size:<N>[;DataType:<T>].
  • Ring buffer is a bytearray initialized at module import (no lazy-init race), with all public APIs serialized through a single threading.Lock. _State.__init__ catches MemoryError so the module can still be imported on memory-constrained hosts; APIs then return ERR_NO_MEMORY consistently when capacity == 0.
  • write_total is masked to 64 bits to mirror the C uint64_t cursor; the decode-side residency check uses modular reverse distance (write_total - start) & (2**64 - 1), so wraparound past 2**64 is handled correctly on both encode and decode and stays compatible with the C/LabVIEW wire format.
  • Parser parses Start / Size incrementally with per-step overflow detection (no big-int allocation on long digit runs — DoS-resistant), enforces len(DataType) + 1 <= CSM_MASSDATA_MAX_DATATYPE_LEN and returns ERR_BUFFER_TOO_SMALL on overflow to match the C cross-language contract, and rejects ; / < / > in the optional DataType value.
  • Pythonic return shapes: tuples in place of out-parameters (e.g. (status, argument), (status, data)). Input data accepts bytes / bytearray / memoryview and is forwarded as a single-byte memoryview straight into the ring buffer, avoiding an intermediate bytes(...) copy on large payloads.
  • Standard library only; supports Python 3.8+.

Example

from csm_massdata import (
    CSM_ConfigMassDataParameterCacheSize,
    CSM_ConvertMassDataToArgumentWithDataType,
    CSM_ConvertArgumentToMassData,
    CsmMassDataStatus,
)

CSM_ConfigMassDataParameterCacheSize(64 * 1024 * 1024)

samples = b"\x01\x02\x03\x04" * 256
status, arg = CSM_ConvertMassDataToArgumentWithDataType(samples, "1D U8")
# arg == "<MassData>Start:0;Size:1024;DataType:1D U8"

status, restored = CSM_ConvertArgumentToMassData(arg)
assert restored == samples

Tests

unittest cases mirror c/_test/vs/test_main.c (config/status, plain & typed round-trips, parse errors, overwrite detection, cache-too-small, datatype bounds) plus a concurrent encode/decode test exercising the lock under 8 threads. Additional cases cover the parser's fast rejection of overlong digit runs, the on-parse DataType length limit, and bytearray / memoryview inputs.

Per the convention established in #40, the root README.md / README(zh-cn).md are intentionally left untouched.

Copilot AI changed the title [WIP] Add third-party language support for massdata in Python Add Python port of the MassData Parameter Support API Apr 25, 2026
Copilot AI requested a review from nevstop April 25, 2026 12:22
@nevstop nevstop requested a review from Copilot April 25, 2026 15:29

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Python implementation of the CSM MassData Parameter Support API so Python clients can interoperate with the existing LabVIEW VIs and the C port using the same <MassData>Start:<N>;Size:<N>[;DataType:<T>] reference-string format.

Changes:

  • Introduces python/csm_massdata.py implementing the MassData ring-buffer encode/decode APIs with status codes aligned to the C/LabVIEW contract.
  • Adds a unittest suite under python/_test/ mirroring the C test coverage (plus a concurrency test).
  • Adds Python-focused documentation (python/README.md) and updates .gitignore to exclude Python bytecode/caches under python/.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
python/csm_massdata.py Implements the Python MassData API (ring buffer, argument encode/decode, datatype parsing, status reporting).
python/_test/test_csm_massdata.py Adds unittest coverage for API behavior, error cases, overwrite detection, and multi-threaded usage.
python/README.md Documents usage, API mapping to LabVIEW/C, and how to run tests.
.gitignore Ignores Python cache/bytecode artifacts under the new python/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/csm_massdata.py
Comment on lines +167 to +173
while i < n and "0" <= text[i] <= "9":
i += 1
if i == 0:
return None, 0
value = int(text[:i])
if value > _UINT64_MAX:
return None, i

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_uint64 converts the full digit substring with int(text[:i]) before checking against _UINT64_MAX. A crafted argument containing an extremely long digit run can force large big-int allocation/CPU (DoS risk) even though values outside uint64 are rejected. Consider parsing incrementally with an overflow check (or rejecting more than 20 digits) so you can fail fast without constructing an arbitrary-precision integer.

Suggested change
while i < n and "0" <= text[i] <= "9":
i += 1
if i == 0:
return None, 0
value = int(text[:i])
if value > _UINT64_MAX:
return None, i
value = 0
max_before_mul = _UINT64_MAX // 10
max_last_digit = _UINT64_MAX % 10
while i < n and "0" <= text[i] <= "9":
digit = ord(text[i]) - ord("0")
if value > max_before_mul or (
value == max_before_mul and digit > max_last_digit
):
while i < n and "0" <= text[i] <= "9":
i += 1
return None, i
value = value * 10 + digit
i += 1
if i == 0:
return None, 0

Copilot uses AI. Check for mistakes.
Comment thread python/csm_massdata.py
# 验证 DataType 值仅包含合法字符。
for ch in data_type:
if ch in ";<>":
return CsmMassDataStatus.ERR_PARSE, 0, 0, ""

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing DataType:, the C implementation returns ERR_BUFFER_TOO_SMALL if the datatype string won’t fit in CSM_MASSDATA_MAX_DATATYPE_LEN (including NUL). The Python parser currently accepts any length, so an overlong DataType from another language would be treated as OK, diverging from the cross-language contract. Consider enforcing len(data_type)+1 <= CSM_MASSDATA_MAX_DATATYPE_LEN during parsing and mapping overflow to ERR_BUFFER_TOO_SMALL.

Suggested change
return CsmMassDataStatus.ERR_PARSE, 0, 0, ""
return CsmMassDataStatus.ERR_PARSE, 0, 0, ""
# 与 C 端保持一致:DataType 必须能放入固定长度缓冲区,包含末尾 NUL。
if len(data_type) + 1 > CSM_MASSDATA_MAX_DATATYPE_LEN:
return CsmMassDataStatus.ERR_BUFFER_TOO_SMALL, 0, 0, ""

Copilot uses AI. Check for mistakes.
Comment thread python/csm_massdata.py Outdated
Comment on lines +221 to +231
def _coerce_data(data: Optional[Union[bytes, bytearray, memoryview]]) -> Optional[bytes]:
"""将允许的数据输入归一化为 ``bytes``;无效输入返回 ``None``。

与 C 端 ``data_size == 0`` 时允许 ``data == NULL`` 的语义对应:
本函数允许 ``data is None``,视作零长度数据。
"""
if data is None:
return b""
if isinstance(data, (bytes, bytearray, memoryview)):
return bytes(data)
return None

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_coerce_data always converts bytearray/memoryview inputs to a new bytes object, and then the code copies again into the ring buffer. For large payloads this doubles peak memory and adds an extra full copy. Consider keeping the input as a memoryview/bytes-like object and teaching _ring_write to write from it directly to avoid the intermediate allocation.

Copilot uses AI. Check for mistakes.
Comment thread python/csm_massdata.py Outdated
Comment on lines +112 to +113
self.buffer: bytearray = bytearray(CSM_MASSDATA_DEFAULT_CACHE_SIZE)
self.capacity: int = CSM_MASSDATA_DEFAULT_CACHE_SIZE

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module eagerly allocates a 50 MiB bytearray at import time without handling MemoryError. On constrained environments this will raise during import, making it impossible to receive the intended ERR_NO_MEMORY status from APIs. Consider catching MemoryError in _State.__init__ and initializing to an empty/disabled state (capacity 0) so public APIs can return ERR_NO_MEMORY consistently.

Suggested change
self.buffer: bytearray = bytearray(CSM_MASSDATA_DEFAULT_CACHE_SIZE)
self.capacity: int = CSM_MASSDATA_DEFAULT_CACHE_SIZE
try:
self.buffer: bytearray = bytearray(CSM_MASSDATA_DEFAULT_CACHE_SIZE)
self.capacity: int = CSM_MASSDATA_DEFAULT_CACHE_SIZE
except MemoryError:
# 在受限环境中允许模块继续导入;公开 API 可基于 capacity == 0
# 统一返回 ERR_NO_MEMORY,而不是在导入阶段直接失败。
self.buffer = bytearray()
self.capacity = 0

Copilot uses AI. Check for mistakes.
Comment thread python/csm_massdata.py
Comment on lines +287 to +304
with _state.lock:
if len(src) > _state.capacity:
return CsmMassDataStatus.ERR_CACHE_TOO_SMALL, ""
start_cursor = _state.write_total
if src:
_ring_write(src)
_state.write_total += len(src)
_state.last_write = CsmMassDataOperation(start_cursor, len(src))

if data_type is not None:
argument = (f"{_PREFIX}Start:{start_cursor};Size:{len(src)};"
f"DataType:{data_type}")
else:
argument = f"{_PREFIX}Start:{start_cursor};Size:{len(src)}"
# 与 C 端一致:包含末尾 NUL 后不得超过 ``CSM_MASSDATA_MAX_ARGUMENT_LEN``。
if len(argument) + 1 > CSM_MASSDATA_MAX_ARGUMENT_LEN:
return CsmMassDataStatus.ERR_BUFFER_TOO_SMALL, ""
return CsmMassDataStatus.OK, argument

Copilot AI Apr 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_state.write_total is unbounded, but the on-the-wire Start:<N> field is specified/parsed as a 64-bit unsigned integer. After enough writes, start_cursor will exceed 2**64-1, producing reference strings that the parser (and C/LabVIEW) would reject. Consider emulating uint64_t wraparound by keeping write_total masked to 64 bits (and updating the decode-side residency checks accordingly to match the C implementation).

Copilot uses AI. Check for mistakes.
@nevstop

nevstop commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

Copilot AI commented Apr 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

已在 a73f5f6 中处理评审中提出的 5 处问题:

  1. _parse_uint64 DoS 加固 — 改为按字符增量解析并在每一步检测 uint64 溢出,超长数字串直接走拒绝路径,不再调用 int(text[:i]) 构造任意精度大整数。
  2. 解析端 DataType 长度校验_parse_argument 现在在解析时校验 len(data_type)+1 <= CSM_MASSDATA_MAX_DATATYPE_LEN,超长返回 ERR_BUFFER_TOO_SMALL,与 C 端契约一致。
  3. 零额外拷贝_coerce_data 返回单字节 memoryviewbytearray 切片赋值直接从该视图写入环形缓冲区,去掉了大负载场景下的中间 bytes(data) 拷贝。
  4. 导入期 MemoryError 容错_State.__init__ 捕获 MemoryError,将状态降级为 capacity == 0;公开 API 在该状态下统一返回 ERR_NO_MEMORY
  5. write_total 64 位环绕 — 写入累加按 & _UINT64_MAX 环绕;解码端的驻留判断改为模 2**64 的反向距离 (write_total - start) & _UINT64_MAX,未环绕与已环绕两种情形等价处理,与 C 端 uint64_t 语义对齐。

新增 3 个单元测试覆盖以上场景,全部 15 个用例通过。

@nevstop nevstop marked this pull request as ready for review April 25, 2026 16:08
@nevstop nevstop merged commit 2ccc435 into main Apr 25, 2026
@nevstop nevstop deleted the copilot/massdata-python-support branch April 25, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

massdata 的第三方语言支持: Python

3 participants