Operationalize Rome Call for AI Ethics and its six principles in vetting criteria #16
Replies: 2 comments 2 replies
-
Response to Discussion #16: Operationalizing the Rome Call for AI Ethics
1. Preliminary Notes on SourcesTwo clarifications before the mapping. Principle order. The Rome Call source document lists the six principles in this sequence: Transparency, Inclusion, Responsibility, Impartiality, Reliability, Security and Privacy. The CDCF manifesto reorders them: Transparency, Responsibility, Impartiality, Reliability, Security and Privacy, Inclusion. The manifesto's order is the canonical CDCF sequence and is used throughout this response. Principle definitions. The manifesto supplies its own Magisterial grounding for each principle alongside the Rome Call definition. Those citations are included in the mapping table below because they are the authoritative CDCF framing, not the Rome Call's generic language. Where the manifesto's language differs from the Rome Call source, the manifesto governs. 2. Rome Call to Vetting Criteria: Full Traceability TableRelationship types:
Traceability:
Summary:
Note on Fr. John's audit finding. The discussion post states that three principles are reflected in 3. Three Gaps Requiring New CriteriaGap 1: Impartiality
Evaluation types for this criterion:
Gap 2: Reliability
Evaluation types for this criterion:
Gap 3: Security and Privacy
Evaluation types for this criterion:
4. Three Partial Coverages Worth StrengtheningTransparency
Inclusion
Responsibility
5. Recommended Document Changes
6. Sequencing NoteDiscussion #17 (builder-commons reframing) should be resolved before this PR is drafted. The new Impartiality and Reliability criteria will read differently depending on whether they are framed as gatekeeping requirements or formation standards. The language choice belongs to that discussion. Mark Julius Banasihan |
Beta Was this translation helpful? Give feedback.
-
|
This methodology note addresses the Impartiality gap identified in this discussion directly. It proposes a structured assessment process for C5, grounded in cross-provider empirical research and the if-then commitment framework. Dimension 3 (theological accuracy) is explicitly deferred to EAC review before incorporation into v0.3. C5 Methodology Note: Vulnerable Population Impact AssessmentDocument type: Proposed methodology supplement to CDCF Vetting Criteria v0.2, Criterion 5 The Problem This Document SolvesC5 as written in v0.2 asks submitters to produce a vulnerable population impact statement. It gives them no method for producing one. The result is a criterion that passes or fails on presence of a document rather than quality of the assessment. A submitter who writes two vague paragraphs and a submitter who runs structured behavioral testing against vulnerable-user inputs satisfy the same criterion under current language. This note proposes a named methodology so that submitters know what they are being asked to produce, reviewers know what they are evaluating, and the Foundation applies C5 consistently as project complexity grows. Research FoundationsThe methodology below draws on three bodies of work, applied together rather than cited separately. → Body 1: AI Safety Research (Applied from AISST Fellowship Engagement)The evaluation architecture underlying this methodology derives from active engagement with AI safety research through the Harvard AI Safety Student Team (AISST) Policy and Technical Fellowships — critically reading, discussing, and translating primary research into applied governance mechanisms rather than treating the literature as background reading. Two bodies of research are directly applicable here. From the Technical Fellowship (Week 7 — Red Teaming and Evaluations): Perez et al. (2022) established that language models can be reliably probed for harmful outputs through structured adversarial testing — feeding the model inputs designed to surface failure modes rather than waiting for failures to appear in deployment. Zou et al. (2023) demonstrated that aligned models have transferable adversarial vulnerabilities that do not appear under normal prompting. Constitutional AI (Bai et al., 2022) introduced the principle that AI behavior in sensitive domains should be evaluated against a named behavioral specification before deployment. These three findings produce one governance principle: behavioral evaluation of AI in sensitive contexts must be structured and adversarial, not passive. Waiting for harmful outputs to appear in production is not evaluation. The five behavioral dimensions in Step 3 of this methodology are structured adversarial tests, not post-hoc observation requests. From the Policy Fellowship (Week 6 — Liability and Private Governance): Jones (2024) identifies that effective AI governance requires "if-then commitments" — documented conditions that trigger defined responses — rather than principles that exist independently of enforcement mechanisms. The Carnegie Endowment's if-then commitment framework establishes that governance criteria gain force precisely when they specify what happens when a condition is not met, not only when it is. The Regulatory Markets paper (Lev-Aretz and Mattioli, 2023) demonstrates that private governance bodies produce stronger compliance outcomes when they distinguish between aspirational standards and binding conditions. These findings produce one governance principle: C5 must specify what gap acknowledgment is required when tests have not been run, not merely what tests are ideal. A criterion that accepts silence about gaps is not enforceable. → Body 2: Cross-Provider Empirical Research on AI Guidance BehaviorThe empirical foundation for C5 rests on three studies from two providers. Using multiple providers matters. A single company's research about its own model's failure rates is structurally weaker than the same finding corroborated across providers.
The cross-provider finding is this: spirituality guidance is the highest-risk AI guidance domain regardless of which frontier model is deployed. A Catholic app deploying any major AI provider faces the same documented failure rate. This is not a vendor selection problem. It is a domain risk that C5 must address directly. → Body 3: Catholic Social Teaching and CDCF Foundational DocumentsThe theological grounding does not require elaboration here — it is fully documented in the existing vetting criteria. Three specific groundings anchor this note:
The TranslationThe table below makes the translation from research finding to CDCF governance mechanism explicit. This is the applied academic work: taking findings from safety research and policy frameworks and converting them into enforceable criteria for a Catholic governance body.
The Proposed C5 Methodology→ Step 1: Determine Risk TierAssign before any assessment begins. The tier determines what is required.
→ Step 2: Identify the Affected PopulationFor Tier 3 projects, name the population explicitly. Descriptions like "general Catholic users" do not satisfy this step.
Applied to Interior Castle: The app's five interior states (restless, distracted, tempted, numb, peaceful) are the primary navigation axis. The intended user is someone in active spiritual difficulty. People experiencing scrupulosity — obsessive guilt and fear about sin — are a specific subgroup for whom generic AI spiritual guidance carries documented harm risk that differs qualitatively from other distress states. → Step 3: Assess AI Behavior Across Five DimensionsThis is the dimension C5 currently lacks a methodology for. Each dimension derives from a named research source.
On incomplete testing: A submitter who has not run all five tests does not fail C5 automatically. They satisfy C5 by naming which tests have not been run, acknowledging this as a documented gap, and committing to a concrete evaluation plan before Gate 2 graduation. Honesty about gaps satisfies the criterion. Silence does not. This distinction derives directly from the if-then commitment framework: the criterion produces enforcement value when it requires documented acknowledgment, not only documented compliance. → Step 4: Subgroup Performance Statement FormatFor Tier 3 projects, one entry per identified vulnerable subgroup, using this structure: One paragraph per subgroup satisfies incubation. Gate 2 requires more rigorous documentation, including results from structured adversarial testing. → Step 5: Human Accountability LayerC5 and C2 overlap here. A vulnerable population impact statement is incomplete without naming who is accountable when the AI causes harm to a vulnerable user.
Proposed Revised C5 Language for v0.3
Proposed Next Steps
Reference Sources
Invitation for FeedbackThis methodology note is a working document. Responses, challenges, and proposed revisions are welcomed from all Board members, Ecclesial Advisory Council members, and Technical Advisory Council members before any language from this note is incorporated into v0.3 of the vetting criteria. Feedback may be submitted via the CatholicOS GitHub repository, the council communication channels, or directly to the TAC AI Governance Specialist. This note was prepared by the CDCF Technical Advisory Council, AI Governance Specialist function, operating within TAC lane. Dimension 3 (theological accuracy) and the spiritual direction framing are explicitly deferred to EAC review before incorporation into any published version of the criteria. The governance mechanisms proposed here — the five behavioral dimensions, the if-then enforcement structure, the adversarial evaluation framing — translate primary AI safety and policy research into an operational Catholic governance context. Each mechanism traces to a named primary source engaged directly through ongoing research at Harvard University, including active participation in the Harvard AI Safety Student Team (AISST) Policy and Technical Fellowships and the Berkman Klein Center Ethics and Governance of AI initiative. The contribution of this document is the translation itself: taking research produced at the frontier of AI safety and rendering it actionable within CDCF's institutional and theological framework. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
An audit of all repo documents against the CDCF bylaws and manifesto found that the Rome Call for AI Ethics — which the manifesto formally adopts — is either absent or only partially reflected in the vetting criteria.
Problem
The manifesto explicitly adopts the Rome Call for AI Ethics and its six principles:
Current state across documents:
project-governance/project-vetting-criteria.md(general criteria) — only 3 of 6 principles reflected (Transparency, Responsibility, Inclusion); Impartiality, Reliability, and Security/Privacy are missing as explicit criteriaproject-governance/project-vetting-criteria.md(AI domain extension subsections, formerlyai-governance/ai-vetting-criteria.md) — does not reference the Rome Call at all, though some criteria implicitly map to its principlesresearch/governance-as-code-catholic-technology.md(formerlyai-governance/governance-as-code-catholic-ai.md) — cites the Rome Call generally but never maps the six principles to the architectureresearch/fragmented-catholic-digital-governance.md(formerlyai-governance/fragmented-catholic-ai-governance.md) — does not reference the Rome CallThe Rome Call is arguably the most directly relevant Vatican AI governance instrument and the one the CDCF manifesto formally commits to. Its absence from the AI vetting criteria is a notable gap.
Suggested approach
Affected files
project-governance/project-vetting-criteria.md(general criteria + AI domain extension subsections)research/governance-as-code-catholic-technology.mdresearch/fragmented-catholic-digital-governance.mdDependencies
Should be implemented after #8 / PR #9 is merged.✅ PR #9 merged 2026-04-19.Identified during bylaws/manifesto alignment audit.
Beta Was this translation helpful? Give feedback.
All reactions