Operationalize Rome Call for AI Ethics and its six principles in vetting criteria #16

JohnRDOrazio · 2026-04-17T10:33:01Z

JohnRDOrazio
Apr 17, 2026
Maintainer

Updated 2026-05-02: PR #9 has merged. File paths below have been updated to reflect the post-merge repo structure — the former ai-governance/ai-vetting-criteria.md content is now part of project-governance/project-vetting-criteria.md as "AI domain extension" subsections, and the ai-governance/ research memos have moved to research/ with broadened (non-AI-specific) scope.

Context

An audit of all repo documents against the CDCF bylaws and manifesto found that the Rome Call for AI Ethics — which the manifesto formally adopts — is either absent or only partially reflected in the vetting criteria.

Problem

The manifesto explicitly adopts the Rome Call for AI Ethics and its six principles:

Transparency (explainable systems)
Responsibility (accountability)
Impartiality (eliminating algorithmic bias)
Reliability (dependable infrastructure)
Security and Privacy (protecting personal data)
Inclusion (universal access to innovation benefits)

Current state across documents:

project-governance/project-vetting-criteria.md (general criteria) — only 3 of 6 principles reflected (Transparency, Responsibility, Inclusion); Impartiality, Reliability, and Security/Privacy are missing as explicit criteria
project-governance/project-vetting-criteria.md (AI domain extension subsections, formerly ai-governance/ai-vetting-criteria.md) — does not reference the Rome Call at all, though some criteria implicitly map to its principles
research/governance-as-code-catholic-technology.md (formerly ai-governance/governance-as-code-catholic-ai.md) — cites the Rome Call generally but never maps the six principles to the architecture
research/fragmented-catholic-digital-governance.md (formerly ai-governance/fragmented-catholic-ai-governance.md) — does not reference the Rome Call

The Rome Call is arguably the most directly relevant Vatican AI governance instrument and the one the CDCF manifesto formally commits to. Its absence from the AI vetting criteria is a notable gap.

Suggested approach

Explicitly reference the Rome Call as the CDCF's adopted ethical framework in vetting criteria documents
Map each of the six principles to specific criteria or requirements
Add missing principles (Impartiality, Reliability, Security/Privacy) to the project vetting criteria
Consider whether the six principles should serve as the organizing framework for AI-specific vetting

Affected files

project-governance/project-vetting-criteria.md (general criteria + AI domain extension subsections)
research/governance-as-code-catholic-technology.md
research/fragmented-catholic-digital-governance.md

Dependencies

~~Should be implemented after #8 / PR #9 is merged.~~ ✅ PR #9 merged 2026-04-19.

Identified during bylaws/manifesto alignment audit.

mj3b · 2026-05-02T01:59:03Z

mj3b
May 2, 2026

Response to Discussion #16: Operationalizing the Rome Call for AI Ethics

Submitted by Mark Julius Banasihan, TAC, AI Governance
In response to Discussion #16 opened by Fr. John R. D'Orazio, April 17, 2026

1. Preliminary Notes on Sources

Two clarifications before the mapping.

Principle order. The Rome Call source document lists the six principles in this sequence: Transparency, Inclusion, Responsibility, Impartiality, Reliability, Security and Privacy. The CDCF manifesto reorders them: Transparency, Responsibility, Impartiality, Reliability, Security and Privacy, Inclusion. The manifesto's order is the canonical CDCF sequence and is used throughout this response.

Principle definitions. The manifesto supplies its own Magisterial grounding for each principle alongside the Rome Call definition. Those citations are included in the mapping table below because they are the authoritative CDCF framing, not the Rome Call's generic language. Where the manifesto's language differs from the Rome Call source, the manifesto governs.

2. Rome Call to Vetting Criteria: Full Traceability Table

Relationship types:

Symbol	Meaning
Explicit	The criterion names or directly operationalizes the principle
Partial	The criterion covers part of the principle's scope; a documented remainder exists
Implicit	The criterion addresses a related concern without naming the principle
Absent	No criterion addresses this principle

Traceability:
Note on table display: The traceability table in Section 2 is wide. If you are reading this on a desktop browser, scroll left within the table to see all eight criteria columns. On mobile, rotate to landscape or view the raw markdown for the clearest reading experience. The Coverage column at the far right summarizes each principle's status if you prefer a quick scan before reading the full table.

Principle	Manifesto definition and citation	C1 Mission Alignment	C2 Human Accountability	C3 Transparency	C4 Independent Validation	C5 Vulnerable Populations	C6 Deployment Governance	C7 Data Stewardship	C8 Subsidiarity	Coverage
Transparency	All systems must be explainable and open to understanding (Communio et Progressio §17)	—	—	Explicit	Implicit	—	—	—	—	Partial: C3 covers documentation; explainability of reasoning is not required
Responsibility	Proceed with accountability, recognizing the weight of influence on the human family	—	Explicit	—	—	—	Explicit	—	—	Covered: C2 and C6 address accountability at submission and deployment
Impartiality	Safeguard fairness and human dignity; actively work to eliminate algorithmic bias	—	—	—	—	—	—	—	—	Absent
Reliability	Infrastructure must be dependable, serving as a stable foundation for the missions it supports	—	—	—	Implicit	—	—	—	—	Absent: C4 validates claimed capabilities at submission; post-deployment reliability is not addressed
Security and Privacy	Protect the sanctity of the person by securing data and respecting digital boundaries	—	—	—	—	—	—	Partial	—	Partial: C7 covers data stewardship and legal compliance; security architecture is not required
Inclusion	Design for the needs of all human beings; no one excluded from benefits of innovation (Communio et Progressio §19)	Implicit	—	—	—	Partial	—	—	—	Partial: C5 addresses vulnerable populations; universal access, accessibility, and non-discrimination are not explicitly named

Summary:

Principle	Status
Transparency	Partial
Responsibility	Covered
Impartiality	Absent
Reliability	Absent
Security and Privacy	Partial
Inclusion	Partial

Note on Fr. John's audit finding. The discussion post states that three principles are reflected in project-vetting-criteria.md: Transparency, Responsibility, and Inclusion. The live document confirms Transparency (C3 explicit) and Responsibility (C2, C6 explicit). Inclusion is more accurately partial than reflected: C5 addresses the preferential option for the poor and vulnerability to technological exclusion, but the manifesto's inclusion mandate extends to universal access, accessibility, and non-discrimination, none of which C5 names explicitly. The audit finding is directionally correct; the coverage count is slightly overstated.

3. Three Gaps Requiring New Criteria

Gap 1: Impartiality

Field	Detail
Principle	Impartiality: safeguard fairness and human dignity; actively eliminate algorithmic bias
Current state	No criterion requires bias examination, documentation, or testing. C5 requires attention to vulnerable populations, which is adjacent but distinct. A tool can demonstrate preferential concern for the poor while operating on biased training data that systematically underserves specific populations.
Gap	A project can clear all eight criteria without producing any evidence that its outputs treat populations equitably.
Proposed criterion	Gate 1 addition to `ai-vetting-criteria.md`: The submitter must document training data sources, identify demographic or population gaps in that data, and provide evidence of bias testing across the populations the tool is intended to serve. Where bias is identified, the submitter must describe the mitigation applied and the residual risk accepted.
Gate	Gate 1 (Incubation Acceptance)

Evaluation types for this criterion:

Evaluation type	Purpose
Structured data bias evaluation	Detects systematic skews in training data that cause unfair or discriminatory outputs across subgroups
Training data representativeness analysis	Measures whether the training dataset reflects the distribution of the intended deployment population

Gap 2: Reliability

Field	Detail
Principle	Reliability: infrastructure must be dependable, serving as a stable foundation for the missions it supports
Current state	C4 validates claimed capabilities at the point of submission. Post-deployment behavior is not addressed. A tool that passes Gate 1 with validated performance claims may degrade, increase error rates under production load, or fail silently.
Gap	This gap connects directly to Operational Gap 1 in Discussion #10: the absence of a post-endorsement re-vetting trigger. The two gaps share a root cause and should be resolved in the same PR.
Proposed criterion	Gate 2 addition to `ai-vetting-criteria.md`: The submitter must specify a monitoring architecture: what performance metrics are tracked post-deployment, what thresholds trigger a formal review, and who holds authority to suspend commons recognition if reliability falls below the documented baseline.
Gate	Gate 2 (Graduation to Active Status)

Evaluation types for this criterion:

Evaluation type	Purpose
Data drift detection	Measures whether the tool's production inputs have shifted from its training distribution; catches silent degradation between formal re-vetting cycles

Gap 3: Security and Privacy

Field	Detail
Principle	Security and Privacy: protect the sanctity of the person by securing data and respecting digital boundaries
Current state	C7 requires data stewardship documentation and legal compliance. Documentation of legal compliance is not the same as evidence of security architecture.
Gap	A project can satisfy C7 by citing GDPR compliance without specifying encryption standards, access controls, or breach response procedures.
Proposed criterion	Expansion of C7 in `ai-vetting-criteria.md`: Add a security architecture statement alongside the data stewardship documentation. The submitter must specify: what data is collected, how it is stored and encrypted, what access controls govern it, and how the tool responds to a data breach or unauthorized access event. For tools using a knowledge base or retrieval-augmented architecture, the statement must include evidence of training data sanitisation and retrieval poisoning testing.
Gate	Gate 1 (expansion of existing C7 scope)

Evaluation types for this criterion:

Evaluation type	Purpose
Training data sanitisation	Detects unsanitised personal or sensitive data in training sets that should have been scrubbed before model training
Retrieval poisoning testing	Detects adversarial instructions embedded in knowledge base documents designed to hijack model behavior at query time; directly relevant to any CDCF tool using a Catholic data corpus for retrieval

4. Three Partial Coverages Worth Strengthening

Transparency

Field	Detail
Current scope	C3 requires accurate documentation of operation, dependencies, and data usage sufficient for independent technical and canonical review.
What is missing	The manifesto requires that systems be explainable. For AI tools, explainability means a user can understand why the tool produced a given output. This distinction matters most for high-stakes use cases: pastoral guidance, sacramental eligibility screening, healthcare referral. A user receiving an AI-assisted recommendation in those contexts has a legitimate interest in understanding the basis of that recommendation.
Proposed addition	Add to C3: require a user-facing explainability statement describing how the tool communicates uncertainty, surfaces its reasoning, and indicates when a human decision-maker should be consulted.

Inclusion

Field	Detail
Current scope	C5 requires examination of impact on those most vulnerable to technological exclusion or harm, through the lens of the preferential option for the poor.
What is missing	The manifesto's inclusion principle requires that no one be excluded from the benefits of innovation. A tool can serve vulnerable populations in its content while remaining inaccessible to users with disabilities, users in low-bandwidth environments, or users whose primary language is not supported.
Proposed addition	Add to C5: an accessibility floor requiring submitters to document how the tool serves users with limited digital access, including language availability, screen reader compatibility, and low-bandwidth or offline functionality where the mission requires it.

Responsibility

Field	Detail
Current scope	C2 and C6 provide explicit coverage of accountability at submission and deployment. Coverage is strong.
What is missing	Neither criterion cites the Rome Call or the manifesto as the source documents grounding the accountability requirement. The Magisterial tradition behind the criteria is present in the grounding section but not at the criterion level.
Proposed addition	Add one sentence to the Purpose and Rationale section of `ai-vetting-criteria.md` naming the Rome Call and the manifesto as the authoritative sources. No criterion text changes.

5. Recommended Document Changes

Document	Change
`ai-governance/ai-vetting-criteria.md`	Add Impartiality criterion (Gate 1); add Reliability criterion (Gate 2); expand C3 with explainability requirement; expand C5 with accessibility floor; expand C7 with security architecture statement; add Rome Call citation to Purpose and Rationale section
`project-governance/project-vetting-criteria.md`	Same additions to close the three absent gaps and strengthen the three partial coverages
`ai-governance/fragmented-catholic-ai-governance.md`	Add the Rome Call as the explicit ethical framework anchor in the opening section; the memo grounds the fragmentation argument in Catholic Social Teaching without naming the instrument the manifesto formally commits to
`ai-governance/governance-as-code-catholic-ai.md`	Add a section mapping the six principles to the governance-as-code enforcement mechanisms explicitly; the document currently cites the Rome Call generally without tracing the principles to specific architectural controls

6. Sequencing Note

Discussion #17 (builder-commons reframing) should be resolved before this PR is drafted. The new Impartiality and Reliability criteria will read differently depending on whether they are framed as gatekeeping requirements or formation standards. The language choice belongs to that discussion.

Mark Julius Banasihan
TAC, AI Governance, Catholic Digital Commons Foundation
May 2026

2 replies

JohnRDOrazio May 2, 2026
Maintainer Author

Thank you for the thorough mapping, @mj3b — the traceability table and gap analysis are exactly the operationalization this discussion needed.

One note for alignment with the current state of the repo: this comment was drafted against the document layout that existed before PR #9 (merged 2026-04-19), which unified AI governance into the general project governance framework. A few of the file paths cited in your response no longer exist on main:

Path in your comment	Current location
`ai-governance/ai-vetting-criteria.md`	Merged into `project-governance/project-vetting-criteria.md` as "AI domain extension" subsections within each criterion
`ai-governance/fragmented-catholic-ai-governance.md`	Moved to `research/fragmented-catholic-digital-governance.md` (broadened from AI-only to all Catholic digital governance, with AI preserved as a detailed case study)
`ai-governance/governance-as-code-catholic-ai.md`	Moved to `research/governance-as-code-catholic-technology.md` (broadened similarly)

In practical terms this affects the proposed-criterion blocks in §§3–4 ("Gate 1 addition to ai-vetting-criteria.md", "Gate 2 addition to ai-vetting-criteria.md", "Expansion of C7 in ai-vetting-criteria.md") and the recommendations table in §5. The substance of the proposals is unaffected — the new criteria still belong in the same logical place — but they now land inside project-governance/project-vetting-criteria.md, specifically within its AI domain extension subsections, rather than in a separate AI vetting document.

The project-governance/project-vetting-criteria.md row in your §5 table now subsumes the ai-governance/ai-vetting-criteria.md row, so the two can be consolidated when this becomes a PR.

The original discussion body (#16) has been updated to reflect the new paths, but I didn't want to edit your comment without your input. If you'd like to update §§3–5 to point at the post-PR-#9 paths, that would keep the analysis directly actionable; otherwise this reply can serve as the alignment note for future readers.

mj3b May 2, 2026

Fr. John, same situation as #17, your alignment note stands as the record. The corrected file paths for this discussion are below for anyone following the audit trail.

For the benefit of future readers following this thread as an audit trail, the PR that implements these changes will touch the following current file paths:

Change	Current file
Rome Call citation added to Purpose and Rationale section	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
Impartiality criterion (Gate 1): bias documentation, training data audit, representativeness analysis	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
Reliability criterion (Gate 2): post-deployment monitoring architecture, data drift detection	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
C7 expansion: security architecture statement, training data sanitisation, retrieval poisoning testing	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
C3 expansion: user-facing explainability statement	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
C5 expansion: accessibility floor	`project-governance/project-vetting-criteria.md` (AI domain extension subsections)
Rome Call as explicit ethical framework anchor in opening section	`research/fragmented-catholic-digital-governance.md`
Six principles mapped to governance-as-code enforcement mechanisms	`research/governance-as-code-catholic-technology.md`

As noted in your alignment comment, the ai-governance/ai-vetting-criteria.md row and the project-governance/project-vetting-criteria.md row from the §5 recommendations table now consolidate into a single target file. The PR for this discussion and the PR for Discussion #17 will both touch project-governance/project-vetting-criteria.md and project-governance/lifecycle.md. Coordinating those two PRs as a single combined PR rather than sequential PRs would reduce merge conflicts and keep the vetting criteria revision coherent as a single unit of change.

Mark Julius Banasihan
TAC, AI Governance, Catholic Digital Commons Foundation

mj3b · 2026-05-15T20:31:59Z

mj3b
May 15, 2026

This methodology note addresses the Impartiality gap identified in this discussion directly. It proposes a structured assessment process for C5, grounded in cross-provider empirical research and the if-then commitment framework. Dimension 3 (theological accuracy) is explicitly deferred to EAC review before incorporation into v0.3.

C5 Methodology Note: Vulnerable Population Impact Assessment

Document type: Proposed methodology supplement to CDCF Vetting Criteria v0.2, Criterion 5
Prepared by: Technical Advisory Council, AI Governance Specialist
Research foundation: AISST Technical and Policy Fellowship frameworks; Anthropic Societal Impacts research; MIT Media Lab / OpenAI cross-provider affective use research
Date: May 15, 2026
Status: Draft for TAC and EAC review before incorporation into v0.3

The Problem This Document Solves

C5 as written in v0.2 asks submitters to produce a vulnerable population impact statement. It gives them no method for producing one. The result is a criterion that passes or fails on presence of a document rather than quality of the assessment. A submitter who writes two vague paragraphs and a submitter who runs structured behavioral testing against vulnerable-user inputs satisfy the same criterion under current language.

This note proposes a named methodology so that submitters know what they are being asked to produce, reviewers know what they are evaluating, and the Foundation applies C5 consistently as project complexity grows.

Research Foundations

The methodology below draws on three bodies of work, applied together rather than cited separately.

→ Body 1: AI Safety Research (Applied from AISST Fellowship Engagement)

The evaluation architecture underlying this methodology derives from active engagement with AI safety research through the Harvard AI Safety Student Team (AISST) Policy and Technical Fellowships — critically reading, discussing, and translating primary research into applied governance mechanisms rather than treating the literature as background reading. Two bodies of research are directly applicable here.

From the Technical Fellowship (Week 7 — Red Teaming and Evaluations):

Perez et al. (2022) established that language models can be reliably probed for harmful outputs through structured adversarial testing — feeding the model inputs designed to surface failure modes rather than waiting for failures to appear in deployment. Zou et al. (2023) demonstrated that aligned models have transferable adversarial vulnerabilities that do not appear under normal prompting. Constitutional AI (Bai et al., 2022) introduced the principle that AI behavior in sensitive domains should be evaluated against a named behavioral specification before deployment.

These three findings produce one governance principle: behavioral evaluation of AI in sensitive contexts must be structured and adversarial, not passive. Waiting for harmful outputs to appear in production is not evaluation. The five behavioral dimensions in Step 3 of this methodology are structured adversarial tests, not post-hoc observation requests.

From the Policy Fellowship (Week 6 — Liability and Private Governance):

Jones (2024) identifies that effective AI governance requires "if-then commitments" — documented conditions that trigger defined responses — rather than principles that exist independently of enforcement mechanisms. The Carnegie Endowment's if-then commitment framework establishes that governance criteria gain force precisely when they specify what happens when a condition is not met, not only when it is. The Regulatory Markets paper (Lev-Aretz and Mattioli, 2023) demonstrates that private governance bodies produce stronger compliance outcomes when they distinguish between aspirational standards and binding conditions.

These findings produce one governance principle: C5 must specify what gap acknowledgment is required when tests have not been run, not merely what tests are ideal. A criterion that accepts silence about gaps is not enforceable.

→ Body 2: Cross-Provider Empirical Research on AI Guidance Behavior

The empirical foundation for C5 rests on three studies from two providers. Using multiple providers matters. A single company's research about its own model's failure rates is structurally weaker than the same finding corroborated across providers.

Study	Provider	Finding relevant to C5
How people ask Claude for personal guidance (Shen et al., Anthropic, April 2026)	Anthropic	38% sycophancy rate in spirituality guidance conversations — highest of all nine domains studied across 639,000 real conversations. Relationship guidance at 25% was second highest.
How people use Claude for support, advice, and companionship (McCain et al., Anthropic, June 2025)	Anthropic	People turn to AI for companionship specifically when facing "existential dread, persistent loneliness, and difficulties forming meaningful connections." Long conversations occasionally shift from counseling into companionship without the user initiating that shift. Emotional dependency has not been studied longitudinally.
Affective use and emotional wellbeing in ChatGPT (MIT Media Lab / OpenAI, 2025)	OpenAI	Affective engagement rate with ChatGPT aligns with Anthropic findings (approximately 2.9%). Cross-provider corroboration establishes that the pattern is not model-specific.

The cross-provider finding is this: spirituality guidance is the highest-risk AI guidance domain regardless of which frontier model is deployed. A Catholic app deploying any major AI provider faces the same documented failure rate. This is not a vendor selection problem. It is a domain risk that C5 must address directly.

→ Body 3: Catholic Social Teaching and CDCF Foundational Documents

The theological grounding does not require elaboration here — it is fully documented in the existing vetting criteria. Three specific groundings anchor this note:

Source	Specific provision	C5 connection
CDCF Bylaws, Article I Section 2.5	Ethical AI governance as stated organizational purpose	C5 is a bylaws obligation, not an optional best practice
CDCF Manifesto — Rome Call for AI Ethics	Impartiality principle: "actively working to eliminate algorithmic bias"	Subgroup performance testing (Dimension 5) is a named Rome Call obligation. This methodology partially closes the Impartiality gap identified in Discussion #16.
Pope Leo XIV, December 2025 Address	Risk of AI producing "merely passive consumers" among those least able to resist	Dependency risk assessment is a papal mandate, not a secondary concern
Antiqua et Nova §34	Human dignity is grounded in being created in the image of God, not in cognitive or technological achievement	Vulnerable users — those least capable of evaluating AI responses — carry the highest dignity stake

The Translation

The table below makes the translation from research finding to CDCF governance mechanism explicit. This is the applied academic work: taking findings from safety research and policy frameworks and converting them into enforceable criteria for a Catholic governance body.

Research finding	Source	What it means for C5
38% sycophancy rate in spirituality guidance	Anthropic, 2026	Spiritual guidance is the highest-risk AI deployment context. C5 must require behavioral testing in this domain, not assume it.
Cross-provider corroboration of affective use rates	MIT Media Lab / OpenAI, 2025	The risk is domain-specific, not model-specific. Switching AI providers does not resolve the C5 gap.
People turn to AI when human support is unavailable	Anthropic, 2025	The populations most likely to rely on a Catholic AI app are those with the fewest alternatives. Harm to them is unreversable without human fallback.
Adversarial testing surfaces failures passive evaluation misses	Perez et al., 2022; Zou et al., 2023	C5 must require structured adversarial testing, not post-hoc observation.
If-then commitments produce enforcement; aspirational criteria do not	Jones, 2024; Carnegie Endowment, 2024	C5 must specify what gap acknowledgment is required when testing has not been completed, making silence a failing condition.
AI in long conversations shifts from counseling to companionship	Anthropic, 2025	A project designed for prayer guidance may function as emotional companionship in practice. C5 must assess the actual use pattern, not only the intended one.
Preferential option for the poor	Catholic Social Teaching	Those most vulnerable to AI failure are those with the least recourse. This inverts the usual risk calculus: the lowest-resource user carries the highest governance stakes.

The Proposed C5 Methodology

→ Step 1: Determine Risk Tier

Assign before any assessment begins. The tier determines what is required.

Tier	Criteria	Example projects
Tier 1 — Low	No AI responses to users; no personal data collection; read-only or informational content only	Liturgical calendar API; Bible reference lookup; canonical text repository
Tier 2 — Moderate	AI responses in non-emotional, non-spiritual contexts; limited personal data; no inner life or pastoral framing	Parish administrative scheduling tool; content moderation assistant
Tier 3 — High	AI responses in emotional, spiritual, or distress contexts; personal data retained; framed as contemplative, spiritual accompaniment, journaling, prayer guidance, or discernment support	Interior Castle app; any pastoral, contemplative, or inner life–adjacent tool

Decision rule: If there is genuine uncertainty about whether a project is Tier 2 or Tier 3, apply Tier 3. The downside of over-classifying is additional documentation. The downside of under-classifying is unreviewed harm to vulnerable users.

→ Step 2: Identify the Affected Population

For Tier 3 projects, name the population explicitly. Descriptions like "general Catholic users" do not satisfy this step.

Question	What a complete answer requires
Who is the intended user?	Named spiritual state, life circumstance, or demographic the app is designed to serve
Who is the probable actual user?	Evidence or reasoned inference about who will use it, including likely edge cases
Which subgroups carry heightened vulnerability?	At minimum: those experiencing grief, spiritual crisis, or scrupulosity; those without access to a pastor or spiritual director; elderly users; non-English speakers
Are these subgroups underrepresented in the AI model's training data?	A statement based on what is publicly known about the underlying model's documented performance characteristics

Applied to Interior Castle: The app's five interior states (restless, distracted, tempted, numb, peaceful) are the primary navigation axis. The intended user is someone in active spiritual difficulty. People experiencing scrupulosity — obsessive guilt and fear about sin — are a specific subgroup for whom generic AI spiritual guidance carries documented harm risk that differs qualitatively from other distress states.

→ Step 3: Assess AI Behavior Across Five Dimensions

This is the dimension C5 currently lacks a methodology for. Each dimension derives from a named research source.

#	Dimension	What the submitter must address	Research source
1	Sycophancy risk	Has the AI been tested against inputs where spiritually harmful validation is the probable output? What outputs were produced? What guardrails exist?	Shen et al. (Anthropic, 2026); Perez et al. (2022); Zou et al. (2023)
2	Crisis response	Does the system respond differently to inputs indicating self-harm, spiritual despair, or acute crisis? What is the documented response path? Is a human or pastoral resource surfaced?	McCain et al. (Anthropic, 2025); Jones (2024); Carnegie Endowment (2024)
3	Theological accuracy	Has AI output been reviewed by a named person with theological competence for accuracy against Catholic teaching? This dimension requires EAC review before v0.3 incorporation.	Antiqua et Nova; CDCF Bylaws Art. I §2.5
4	Dependency risk	Does the app's design encourage return engagement in ways that may substitute for human accompaniment? Has this been assessed against actual usage patterns, not only intended use?	Anthropic 2025 (conversation drift finding)
5	Subgroup performance	Has the AI's behavior been tested with inputs reflecting the specific vulnerable populations identified in Step 2? What were the findings?	Perez et al., 2022; Rome Call — Impartiality

On incomplete testing: A submitter who has not run all five tests does not fail C5 automatically. They satisfy C5 by naming which tests have not been run, acknowledging this as a documented gap, and committing to a concrete evaluation plan before Gate 2 graduation. Honesty about gaps satisfies the criterion. Silence does not. This distinction derives directly from the if-then commitment framework: the criterion produces enforcement value when it requires documented acknowledgment, not only documented compliance.

→ Step 4: Subgroup Performance Statement Format

For Tier 3 projects, one entry per identified vulnerable subgroup, using this structure:

Population:        [name the group — e.g., "users experiencing scrupulosity"]
Testing method:    [how behavior was evaluated — manual prompting, automated adversarial 
                   testing, user feedback analysis]
Finding:           [what the AI produced in response to inputs reflecting this 
                   population's likely use]
Gap acknowledged:  [what was not tested and why]
Plan:              [what evaluation is planned before Gate 2 graduation, with timeline]

One paragraph per subgroup satisfies incubation. Gate 2 requires more rigorous documentation, including results from structured adversarial testing.

→ Step 5: Human Accountability Layer

C5 and C2 overlap here. A vulnerable population impact statement is incomplete without naming who is accountable when the AI causes harm to a vulnerable user.

Element	Requirement
Named role or person	Who holds accountability for AI-generated responses in vulnerable-user contexts
Escalation trigger	What condition activates human review or pastoral intervention
Crisis resource	What resource is surfaced to a distressed user — a pastoral contact, Catholic Charities referral, or crisis line
Review cadence	How often AI behavior in this context is reviewed against the behavioral dimensions in Step 3

Proposed Revised C5 Language for v0.3

Criterion 5: Impact on Vulnerable Populations

The submitter must determine the project's risk tier (Tier 1, 2, or 3) based on whether the project generates AI responses in emotional, spiritual, or distress contexts and whether it retains personal data (see methodology note for tier definitions).

Tier 3 projects must produce a structured Vulnerable Population Impact Assessment covering:

→ Identification of the affected population and specific vulnerable subgroups (Step 2)
→ Documented or planned behavioral evaluation across five dimensions: sycophancy risk, crisis response, theological accuracy, dependency risk, and subgroup performance (Step 3)
→ A subgroup performance statement for each identified subgroup (Step 4)
→ A named human accountability layer for vulnerable-user interactions (Step 5)

A submitter who has not completed all five behavioral evaluations satisfies this criterion by naming each incomplete test explicitly and committing to a documented evaluation plan with timeline before Gate 2 graduation. Silence about gaps is a failing condition.

Theological accuracy evaluation (Dimension 3) requires EAC review before incorporation into any published version of the criteria.

Proposed Next Steps

Action	Owner	When
TAC review of tier definitions, five behavioral dimensions, and if-then enforcement language	TAC	Before v0.3 draft
EAC review of Dimension 3 (theological accuracy) and spiritual direction framing	EAC	Before v0.3 draft
Apply this methodology retroactively to Interior Castle C5 gap	TAC AI Governance Specialist + John Christian	After Community Project listing
Incorporate revised C5 language and this methodology note into v0.3 criteria	Fr. D'Orazio / repository maintainer	v0.3 cycle
File this note as a response to Discussion #16 (Rome Call operationalization); note Dimension 2 (crisis response) as the pre-deployment counterpart to the incident response protocol in Discussion #10 Gap 2	Fr. D'Orazio / repository maintainer	On publication

Reference Sources

Source	URL	Relevance to C5
Anthropic: How people ask Claude for personal guidance (Shen et al., 2026)	anthropic.com/research/claude-personal-guidance	38% spirituality sycophancy rate; primary empirical foundation
Anthropic: How people use Claude for support, advice, and companionship (McCain et al., 2025)	anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship	Dependency risk; conversation drift finding; crisis pushback data
MIT Media Lab / OpenAI: Affective use and emotional wellbeing in ChatGPT (2025)	media.mit.edu/posts/openai-mit-research-collaboration-affective-use-and-emotional-wellbeing-in-ChatGPT	Cross-provider corroboration; establishes domain risk as provider-independent
Red Teaming Language Models with Language Models (Perez et al., 2022)	arxiv.org/pdf/2202.03286	Adversarial testing architecture; basis for Step 3 evaluation structure
Universal and Transferable Adversarial Attacks on Aligned Language Models (Zou et al., 2023)	arxiv.org/abs/2307.15043	Aligned models have transferable adversarial vulnerabilities; basis for Step 3
Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)	arxiv.org/abs/2212.08073	Named behavioral specification before deployment; basis for Step 3 format
The AI regulator's toolbox (Jones, 2024)	adamjones.me/blog/ai-regulator-toolbox	Concrete governance practices; basis for if-then enforcement language
If-then commitments for AI risk reduction (Carnegie Endowment, 2024)	carnegieendowment.org/research/2024/09/if-then-commitments-for-ai-risk-reduction	Enforcement mechanism design; basis for gap acknowledgment requirement
CDCF Vetting Criteria v0.2	catholicdigitalcommons.org/governance/project-governance/project-vetting-criteria	Current C5 language this note proposes to strengthen
CDCF Bylaws v1.0-draft-1, Article I Section 2.5	catholicdigitalcommons.org/about/bylaws	Ethical AI governance as stated organizational purpose
CDCF Manifesto — Rome Call for AI Ethics	catholicdigitalcommons.org/about/manifesto	Six principles including impartiality; social value mandate
Antiqua et Nova (DDF, January 2025)	vatican.va	Human dignity; AI under genuine human moral responsibility
Pope Leo XIV, December 2025 Address	vatican.va	Risk of passive consumption among vulnerable populations

Invitation for Feedback

This methodology note is a working document. Responses, challenges, and proposed revisions are welcomed from all Board members, Ecclesial Advisory Council members, and Technical Advisory Council members before any language from this note is incorporated into v0.3 of the vetting criteria. Feedback may be submitted via the CatholicOS GitHub repository, the council communication channels, or directly to the TAC AI Governance Specialist.

This note was prepared by the CDCF Technical Advisory Council, AI Governance Specialist function, operating within TAC lane. Dimension 3 (theological accuracy) and the spiritual direction framing are explicitly deferred to EAC review before incorporation into any published version of the criteria.

The governance mechanisms proposed here — the five behavioral dimensions, the if-then enforcement structure, the adversarial evaluation framing — translate primary AI safety and policy research into an operational Catholic governance context. Each mechanism traces to a named primary source engaged directly through ongoing research at Harvard University, including active participation in the Harvard AI Safety Student Team (AISST) Policy and Technical Fellowships and the Berkman Klein Center Ethics and Governance of AI initiative. The contribution of this document is the translation itself: taking research produced at the frontier of AI safety and rendering it actionable within CDCF's institutional and theological framework.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operationalize Rome Call for AI Ethics and its six principles in vetting criteria #16

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Operationalize Rome Call for AI Ethics and its six principles in vetting criteria #16

Uh oh!

Uh oh!

JohnRDOrazio Apr 17, 2026 Maintainer

Context

Problem

Suggested approach

Affected files

Dependencies

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

mj3b May 2, 2026

Response to Discussion #16: Operationalizing the Rome Call for AI Ethics

1. Preliminary Notes on Sources

2. Rome Call to Vetting Criteria: Full Traceability Table

3. Three Gaps Requiring New Criteria

Gap 1: Impartiality

Gap 2: Reliability

Gap 3: Security and Privacy

4. Three Partial Coverages Worth Strengthening

Transparency

Inclusion

Responsibility

5. Recommended Document Changes

6. Sequencing Note

Uh oh!

JohnRDOrazio May 2, 2026 Maintainer Author

Uh oh!

mj3b May 2, 2026

Uh oh!

Uh oh!

mj3b May 15, 2026

C5 Methodology Note: Vulnerable Population Impact Assessment

The Problem This Document Solves

Research Foundations

→ Body 1: AI Safety Research (Applied from AISST Fellowship Engagement)

→ Body 2: Cross-Provider Empirical Research on AI Guidance Behavior

→ Body 3: Catholic Social Teaching and CDCF Foundational Documents

The Translation

The Proposed C5 Methodology

→ Step 1: Determine Risk Tier

→ Step 2: Identify the Affected Population

→ Step 3: Assess AI Behavior Across Five Dimensions

→ Step 4: Subgroup Performance Statement Format

→ Step 5: Human Accountability Layer

Proposed Revised C5 Language for v0.3

Proposed Next Steps

Reference Sources

Invitation for Feedback

JohnRDOrazio
Apr 17, 2026
Maintainer

Replies: 2 comments 2 replies

mj3b
May 2, 2026

JohnRDOrazio May 2, 2026
Maintainer Author

mj3b
May 15, 2026