feat(harvester): add Memovs audiovisual archives API by rerowep · Pull Request #4024 · rero/rero-ils

rerowep · 2026-02-03T11:34:31Z

Add new API harvester for Médiathèque Valais audiovisual archives (archives.memovs.ch). This harvester enables automatic ingestion of Film, Photo, and Audio metadata from the Memovs digital archives.

Implementation includes:

ApiMemovs class with pagination and date filtering support
JSON transformation layer mapping Memovs metadata to RERO-ILS
Document type detection (Film, Audio, Photo, Other)
Holdings creation with electronic locations
Support for creators, contributors, subjects, and descriptions
Integration with existing CLI and task infrastructure

Configuration added as VS-MEMO in apisources.yml with endpoint https://archives.memovs.ch/docs/api/ and code 'memovs'.

coveralls · 2026-02-22T10:34:27Z

coverage: 91.218% (-0.09%) from 91.312% — rerowep:wep-harvesting-audiovisual-archives into rero:staging

coderabbitai · 2026-05-13T06:52:32Z

Walkthrough

This pull request introduces a complete Memovs audiovisual archive API harvester for RERO-ILS. The implementation spans database schema extension, a sophisticated JSON-to-RERO-ILS data transformation pipeline, HTTP-based record harvesting with in-memory caching and item/local-field synchronization, comprehensive test coverage across transformation and harvesting logic, and operational deployment configuration. The feature extends existing API harvester infrastructure through a new settings column, a specialized ApiMemovs class with pagination and cache optimization, and integration into the CLI, scheduler, and data configuration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding a new Memovs audiovisual archives API harvester, which matches the comprehensive implementation across all changed files.
Description check	✅ Passed	The description is directly related to the changeset, detailing the implementation of the Memovs API harvester including its key components (ApiMemovs class, JSON transformation, document type detection, holdings creation, and configuration).
Docstring Coverage	✅ Passed	Docstring coverage is 98.84% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/ui/entities/remote_entities/test_remote_entities_api.py (1)
58-60: ⚡ Quick win

Weak logging assertion: consider removing or making it specific.

The assertion only verifies that any elasticsearch log record exists, but doesn't check:

Log level (is it actually a warning?)

Message content (is it about the 404?)

Relationship to the delete_from_index() call

Since this test is primarily about entity creation and deletion, consider either:
Remove the assertion entirely if logging behavior is not central to this test's purpose.
Make it specific if you do want to verify the expected 404 warning:
es_logs = [r for r in caplog.records if r.name == "elasticsearch"]
assert len(es_logs) == 1
assert es_logs[0].levelname == "WARNING"
assert "404" in es_logs[0].message
The current assertion could pass even if the expected warning doesn't occur, reducing test reliability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ui/entities/remote_entities/test_remote_entities_api.py` around lines
58 - 60, The current weak assertion "assert any(r.name == 'elasticsearch' for r
in caplog.records)" should be tightened or removed: either delete the assertion
if logging isn't central, or replace it with a specific check that filters
caplog.records for r.name == "elasticsearch" and asserts the expected count,
r.levelname == "WARNING", and that the log message contains "404" (to tie it to
the delete_from_index() call) so the test verifies the exact warning produced by
the deletion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rero_ils/modules/api_harvester/cli.py`:
- Around line 120-121: The info command currently prints the full api.settings
payload (click.echo(f"\tsettings  : {api.settings}")), which may leak sensitive
data; change this to print a redacted indicator or only safe keys: implement a
small helper (e.g., redact_settings or get_safe_settings_summary) that takes
api.settings and returns either a list of allowed keys/values or a redacted
placeholder like "<redacted>" or a filtered dict with only safe keys, and
replace the direct usage of api.settings in the click.echo call with the
helper's output so sensitive fields are never printed.

In `@rero_ils/modules/api_harvester/memovs/dojson/json/model.py`:
- Around line 557-559: The code handling bf:summary in the model currently
appends its raw value into summaries as label.value, which can be a dict or
list; update the logic in the block around the bf:summary handling (the if
description := self.data.get("bf:summary") / summaries.append call) to normalize
description to a string before storing: if it's a list, join elements (or map to
strings and join); if it's a dict, serialize to a stable string (e.g., extract a
textual field or use JSON serialization); otherwise coerce to str; then append
{"label": [{"value": normalized_description}]}. Ensure the normalization covers
None and empty cases.
- Around line 179-190: The current block that builds identified_by only handles
bf:identifier when it's a list; update the logic around identifiers (the
variable assigned from self.data.get("bf:identifier", [])) to also accept a
single dict by normalizing it into a list (e.g., if isinstance(identifiers,
dict): identifiers = [identifiers]) before the existing comprehension, then keep
the existing filters and fields ("bf:noteType", "rdf:value", note_type logic and
the exclusion of "COTE") so dict-form identifiers are processed the same as list
entries.

In `@rero_ils/modules/api_harvester/utils.py`:
- Around line 78-80: Avoid appending the raw settings value to the update
messages to prevent leaking secrets: keep assigning settings to source.settings
but replace msgs.append(f"settings:{settings}") with a generic message such as
msgs.append("settings:changed") (or similar non-sensitive text). Locate the code
that sets source.settings and updates msgs (the assignment to source.settings
and the msgs.append call) and change only the message text so no secret content
is echoed to logs/CLI.

---

Nitpick comments:
In `@tests/ui/entities/remote_entities/test_remote_entities_api.py`:
- Around line 58-60: The current weak assertion "assert any(r.name ==
'elasticsearch' for r in caplog.records)" should be tightened or removed: either
delete the assertion if logging isn't central, or replace it with a specific
check that filters caplog.records for r.name == "elasticsearch" and asserts the
expected count, r.levelname == "WARNING", and that the log message contains
"404" (to tie it to the delete_from_index() call) so the test verifies the exact
warning produced by the deletion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bcf40606-2037-45f1-894a-9f4a7b571ecd

📥 Commits

Reviewing files that changed from the base of the PR and between 994996e and 00968e3.

📒 Files selected for processing (20)

data/apisources.yml
data/organisations.json
rero_ils/alembic/4b2a1c3d8e9f_apiharvester_location_itemtype_pids.py
rero_ils/config.py
rero_ils/modules/api_harvester/cli.py
rero_ils/modules/api_harvester/memovs/__init__.py
rero_ils/modules/api_harvester/memovs/api.py
rero_ils/modules/api_harvester/memovs/dojson/__init__.py
rero_ils/modules/api_harvester/memovs/dojson/json/__init__.py
rero_ils/modules/api_harvester/memovs/dojson/json/model.py
rero_ils/modules/api_harvester/models.py
rero_ils/modules/api_harvester/utils.py
rero_ils/modules/ext.py
scripts/setup
tests/api_harvester/memovs/__init__.py
tests/api_harvester/memovs/test_memovs_api.py
tests/api_harvester/memovs/test_memovs_dojson.py
tests/api_harvester/test_cli_api_harvester.py
tests/data/apisources.yml
tests/ui/entities/remote_entities/test_remote_entities_api.py

Add new API harvester for Médiathèque Valais audiovisual archives (archives.memovs.ch) enabling automatic ingestion of Film, Photo, and Audio metadata from the Memovs digital archives into RERO-ILS. Harvester (api_harvester/memovs/): * ApiMemovs class with pagination, date filtering, and orphan deletion * Celery tasks for scheduled harvest and cleanup of removed records * Configuration as VS-MEMO in apisources.yml with location/item_type PIDs * Alembic migration for default location and item_type PID settings BIBFRAME-to-RERO-ILS transformation (dojson/json/model.py): * Document type detection from bf:content/bf:media/bf:carrier (Film, Audio, Photo, Other) * Provision activity from bf:provisionActivity with date normalisation * Electronic locators for landing page and thumbnail * MEF entity resolution for contributions, subjects, genre/form and places * Fallback to identifiedBy when MEF ref cannot be resolved * Physical description from bf:extent split into extent, otherPhysicalDetails and colorContent * Warning logged when bf:contribution agent carries a URL as rdfs:label Document and item creation (api.py): * Holdings created per document with electronic locations * Items created per bf:hasPart call number, stale items deleted on update * Local fields synced from bf:note annotations (vsavmat, vsavgeo, vsavfonds...) Other changes: * feat(documents): add rpt (Reporter) to contribution roles schema * refactor(documents): load _CONTRIBUTION_ROLE dynamically from JSON schema Co-authored-by: Peter Weber <peter.weber@rero.ch>

rerowep self-assigned this Feb 3, 2026

github-actions Bot added f: data labels Feb 3, 2026

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 2 times, most recently from a072a85 to 6377ef8 Compare February 4, 2026 13:09

github-actions Bot added the developers label Feb 4, 2026

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 5 times, most recently from e41e4c2 to 6170f6c Compare February 5, 2026 10:49

PascalRepond force-pushed the staging branch from e19d2a0 to ca206fa Compare February 5, 2026 13:12

rerowep force-pushed the wep-harvesting-audiovisual-archives branch from 6170f6c to f2ff57a Compare February 9, 2026 15:06

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 3 times, most recently from 02670f4 to 07a5ba6 Compare February 19, 2026 15:37

PascalRepond removed dev: fixtures labels Feb 20, 2026

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 2 times, most recently from 9de0c23 to dd9acfe Compare February 21, 2026 22:23

rerowep force-pushed the wep-harvesting-audiovisual-archives branch from dd9acfe to 0861a59 Compare March 18, 2026 13:47

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 6 times, most recently from da8348f to 817eb99 Compare May 13, 2026 06:49

rerowep marked this pull request as ready for review May 13, 2026 06:52

rerowep force-pushed the wep-harvesting-audiovisual-archives branch from 817eb99 to 00968e3 Compare May 13, 2026 11:40

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread rero_ils/modules/api_harvester/cli.py

Comment thread rero_ils/modules/api_harvester/memovs/dojson/json/model.py Outdated

Comment thread rero_ils/modules/api_harvester/memovs/dojson/json/model.py Outdated

Comment thread rero_ils/modules/api_harvester/utils.py

rerowep force-pushed the wep-harvesting-audiovisual-archives branch from 00968e3 to 51cc5c5 Compare May 13, 2026 12:51

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

rerowep force-pushed the wep-harvesting-audiovisual-archives branch 9 times, most recently from bf05461 to 0473432 Compare May 18, 2026 09:52

rerowep force-pushed the wep-harvesting-audiovisual-archives branch from 0473432 to 617a1bc Compare May 18, 2026 10:50

rerowep requested review from Garfield-fr, PascalRepond and jma May 18, 2026 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harvester): add Memovs audiovisual archives API#4024

feat(harvester): add Memovs audiovisual archives API#4024
rerowep wants to merge 1 commit into
rero:stagingfrom
rerowep:wep-harvesting-audiovisual-archives

rerowep commented Feb 3, 2026

Uh oh!

coveralls commented Feb 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rerowep commented Feb 3, 2026

Uh oh!

coveralls commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Feb 22, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading