Skip to content

Better typing of cysteines#33

Open
arohou wants to merge 4 commits into
tristanic:masterfrom
arohou:alr-cysteines
Open

Better typing of cysteines#33
arohou wants to merge 4 commits into
tristanic:masterfrom
arohou:alr-cysteines

Conversation

@arohou
Copy link
Copy Markdown

@arohou arohou commented May 22, 2026

I am aiming to enable more covalent modifications of Cys residues. In the process of working it out, my agent made the following analysis.

I have NOT yet tested this, but I'd appreciate getting your review when you get a chance.

Summary

cys_type() in openmm_interface.py decides which Amber template variant
(CYS / CYX / CYScyc / CYM / ...) applies to every CYS residue, by
inspecting the bonds on SG. Its thioether branch only matched a partner atom
whose name is literally "CH3" -- the head-cap carbon of the synthetic
ACEcyc residue used for cyclic-peptide thioethers. Any other thioether
partner on SG (a covalent-inhibitor warhead carbon, a post-translationally
modified Cys, a designed bioconjugate, ...) fell through to the metal-binding
branch and was assigned CYM (deprotonated cysteine), which gives SG the
wrong partial charge and breaks the cross-residue bond during simulation.

Observed symptom

A protein with a small molecule covalently attached to a cysteine SG (e.g.
the warhead carbon of a bromoacetamide / acrylamide / etc. inhibitor, named
something like C1, C2, Cb) is loaded into ChimeraX with the
protein--ligand bond present in the model. After loading the ligand's
ffXML via "Load residue MD definitions" and clicking "Start
simulation"
:

  • The cysteine is parameterised as CYM (visible by inspecting
    find_residue_templates()'s assignment for that residue).
  • During simulation, the SG--C bond between the residue and the ligand
    pulls apart -- the deprotonated CYM template assigns the wrong partial
    charge to SG, and with ignoreExternalBonds=True in
    _create_openmm_system the cross-residue bond is held only by the
    ChimeraX-side connectivity, so the mismatched template charges win.

The current practical workaround is a manual rename in the ChimeraX log
before every simulation start, e.g.

setattr #1/A:154 residues name CYScyc

which is fragile and easy to forget.

Root cause

In cys_type():

for a in bonded_atoms:
    if a.residue != residue:
        if a.name == "SG":
            ...
            return 'CYX'
        elif a.name == "CH3":
            if 'OXT' in names:
                return 'CCYScyc'
            else:
                return 'CYScyc'
        # Assume metal binding - will eventually need to do something better here
        return 'CYM'

The elif a.name == "CH3": branch was intended to detect a cyclic-peptide
thioether (ACEcyc head-cap to Cys side-chain), but it only ever fires for
partners whose atom name happens to be CH3. Any other external carbon on
SG -- regardless of element -- falls into the "Assume metal binding" path
and returns CYM.

The thioether templates CYScyc / CCYScyc in amberff/termods.xml do not
depend on the partner atom's name; they just declare that SG carries one
external bond, with thioether-appropriate charges on CB, SG, and the
backbone. They are the correct template for any Cys-SG--carbon
thioether, not just the ACEcyc one.

Fix

Match on the partner atom's element instead of its name:

elif a.element.name == "C":
    # SG bonded to any external carbon -- a thioether.
    # Previously this branch only matched a partner atom
    # literally named "CH3" (the ACEcyc head cap), which
    # missed thioether bonds to any other external carbon
    # (covalent-inhibitor warheads, post-translational
    # modifications, designed bioconjugates, ...).  The
    # CYScyc / CCYScyc templates only depend on SG having
    # one external bond; the partner atom's name does not
    # affect the internal charges, so the broadened match
    # is safe.
    if 'OXT' in names:
        return 'CCYScyc'
    if 'H1' in names:
        # No NCYScyc template ships in termods.xml yet, so
        # return CYM rather than silently mis-parameterise
        # an N-terminal Cys with an S--C external bond.
        return 'CYM'
    return 'CYScyc'

Why this is safe

  • The original name == "CH3" case is a strict subset of element == "C",
    so every previously-supported scenario keeps returning the same template.
  • The thioether templates only define charges and bonds within the
    cysteine itself; they declare SG as an <ExternalBond> site. The
    partner atom is supplied by ChimeraX connectivity and is not modelled
    inside the cysteine template, so the partner's name and chemical identity
    never appear in the parameters consumed by OpenMM.
  • The disulfide path (a.name == "SG"), the metal-binding fallback, the
    iron-sulfur cluster branch, and the free-Cys path are all unchanged.

Files changed

  • isolde/src/openmm/openmm_interface.py -- cys_type()

Test scenario

  1. Open in ChimeraX a structure with a thioether covalent bond between a
    non-terminal Cys SG and a small-molecule carbon. (Any covalent-inhibitor
    structure with the protein--ligand bond modelled as a single CONECT/LINK
    between SG and the warhead carbon will do.)
  2. Load the ligand's ffXML via ISOLDE's "Load residue MD definitions"
    button.
  3. Start a simulation (Start simulation).
  4. Before fix: The cysteine is parameterised as CYM. The S--C bond
    between cysteine and ligand pulls apart over the first few frames of
    simulation.
  5. After fix: The cysteine is parameterised as CYScyc (or CCYScyc if
    it is the C-terminal residue). The simulation runs cleanly; the S--C
    bond is preserved.

Regression checks

  • Cyclic-peptide thioether (ACEcyc.CH3 -- CYS.SG): still picks
    CYScyc / CCYScyc. The new branch is a strict superset of the old
    name == "CH3" branch.
  • Disulfide (Cys-Cys via SG--SG): still picks CYX / CCYX /
    NCYX.
  • Free Cys: still picks CYS / CCYS / NCYS.
  • Metal-coordinating Cys (e.g. Zn-coordinating): still picks CYM
    because the metal atom is not a carbon.
  • Iron-sulfur cluster Cys (residue neighbours include SF4 / FES):
    still picks MC_CYF via the unchanged early return.

Known limitations (out of scope for this fix)

  • N-terminal Cys with an external C--S bond (residue carries H1):
    termods.xml does not yet contain an NCYScyc template, so the new code
    explicitly returns CYM for this case rather than silently picking the
    wrong template. Adding the N-terminal variant is a separate change to
    termods.xml.
  • Non-carbon thioether-like external partners on SG (S--N, S--P):
    still fall through to CYM. No biologically motivated case has come up
    yet; a future PR could broaden this further if needed.

Alexis Rohou added 4 commits May 14, 2026 16:29
…r agentic control, because otherwise popup windows requiring the human to click were getting in the way.
Adds `isolde validate {peptidebonds,rama,rotamers,clashes}` for
agent / scripted access to the same scoring as the GUI Validate
tab, plus a no-op `isolde validate` parent that lists the
subcommands when called bare (same for `isolde preflight`). Each
subcommand returns a structured dict (summary + items) and accepts
shared `log` / `saveFile` / `limit` keywords; summary lines include
a hint pointing the caller at how to see the full list.

Refactors the GUI peptide-bond and clashes panels and the
`rama` / `rota` `report=True` text dumps to share the new compute
helpers (`_compute_rama_report`, `_compute_rotamer_report`,
`classify_peptide_bonds`, `clash_atom_label`) so the new commands
and the legacy GUI/CLI surfaces stay in lock-step. `RamaMgr.cis()`
and `twisted()` now also read their cutoffs from
`defaults.CIS_PEPTIDE_BOND_CUTOFF` / `defaults.TWISTED_PEPTIDE_BOND_DELTA`
instead of hardcoded `radians(30)` / `radians(150)`.
`cys_type()` previously only special-cased Cys SG bonded to an atom
literally named `CH3` (the ACEcyc head cap used for cyclic-peptide
thioethers).  All other external carbon partners -- covalent ligand
warhead carbons, post-translationally modified Cys, designed
bioconjugates, etc. -- fell through to the metal-binding branch and
were mis-parameterised as CYM, with the wrong SG charge and an
unstable S--C bond during simulation.

Match on `a.element.name == "C"` instead of the literal atom name so
any external C--S bond picks the CYScyc / CCYScyc thioether template.
These templates only depend on SG having one external bond; the
partner's atom name does not affect the internal charges, so the
broadened match is safe.

N-terminal Cys with an external C--S bond (residue carries H1) now
explicitly returns CYM, since no NCYScyc template ships in
termods.xml.  Disulfide, metal, and iron-sulfur paths are unchanged.
@arohou arohou changed the title Alr cysteines Better typing of cysteines May 22, 2026
@tristanic
Copy link
Copy Markdown
Owner

tristanic commented May 22, 2026 via email

@tristanic
Copy link
Copy Markdown
Owner

tristanic commented May 22, 2026 via email

@arohou
Copy link
Copy Markdown
Author

arohou commented May 22, 2026

OK thanks. My agent had recommended doing setattr #1/A:154 residues name CYScyc or similar initially. This is good to know.

Also - you'll want to review my isolde validate pull request first, because this current branch (cysteine typing) is based off of that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants