Skip to content

sarisabban/Pose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

320 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

██████╗  ██████╗ ███████╗███████╗
██╔══██╗██╔═══██╗██╔════╝██╔════╝
██████╔╝██║   ██║███████╗█████╗  
██╔═══╝ ██║   ██║╚════██║██╔══╝  
██║     ╚██████╔╝███████║███████╗
╚═╝      ╚═════╝ ╚══════╝╚══════╝

Pose

A bare-metal Python library for building and manipulating protein and nucleic acid molecular structures

Python >= 3 NumPy License: GPL v2


Video Tutorial

Watch the full walkthrough: Video Tutorial on YouTube


What is Pose?

Pose constructs a data structure for protein or nucleic acid molecules that contains all relevant information defining a polymer. Primary information includes the XYZ cartesian coordinates of each atom, the identity and charge of each atom, and the bond graph of the entire molecule. Secondary information includes the FASTA sequence, radius of gyration, potential energy, and the secondary structure assignment for each protein residue.

Using this data structure, Pose can build and manipulate polypeptides and nucleic acids: construct any polypeptide or nucleic acid from sequence, move dihedral and rotamer angles, mutate residues and base pairs, and measure bond lengths and angles. It is designed as a substrate for higher-level protocols such as simulated annealing, molecular dynamics, and machine learning-based molecular design.

Key features:

  • Designed to be extremely stable bare-metal python: NumPy is the only dependency for the core Pose and Molecule classes
  • 26 amino acids supported by default (20 canonical + 6 non-canonical: LYX, MSE, PYL, SEC, TRF, TSO), can be extended to 100+
  • Support for both L-amino acids and D-amino acids (mixed sequences fully supported)
  • 5 DNA and RNA canonical nucleotides
  • Full bond graph with atom partial charges
  • Measure and rotate protein dihedral and rotamer angles (φ/ψ/ω/χ)
  • Measure and rotate nucleic acids dihedral angles (α/β/γ/δ/ε/ζ/χ)
  • Measure and adjust the distance and angle between any atoms
  • PDB and mmCIF file import and export
  • Pythonic zero-based indexing throughout (unlike PDB's one-based convention)

Installation

Dependencies: Python >= 3, NumPy

For virtualenv:

pip install git+https://github.com/sarisabban/Pose

For anaconda:

conda create -n ENVIRONMENT python=3
conda activate ENVIRONMENT
pip3 install git+https://github.com/sarisabban/Pose

Quick Start

from pose import *

# Build a peptide
p = Pose()
p.Build('MSLESNRGI', chain='A', fmt='protein') # Uppercase = L-amino acids, lowercase = D-amino acids
p.Build('MSLESNRGI', chain='B', fmt='protein') # Add a second chain
p.GetInfo()                                    # Print structured summary

# Inspect properties
print('Sequence:', p.data['FASTA'])
print('Mass:', p.data['Mass'], 'Da')
print('Rg:', p.data['Rg'], 'Å')

# Rotate backbone angles (indices are zero-based)
p.RotateDihedral(1, -60, 'PHI')
p.RotateDihedral(1, -45, 'PSI')

# Mutate and export
p.Mutate(2, 'V')        # Change residue at index 2 (Leu) → Val
p.Export('peptide.pdb')

# Import a protein
p = Pose()
p.Import('1YN3.pdb')
p.GetInfo()

# Import a nucleic acid
p = Pose()
p.Import('1BNA.pdb')
p.GetInfo()

# Build a nucleic acid
p = Pose()
p.Build('ATGCGTACGTTCCGGCAGACGT', chain='A', fmt='DNA')
p.GetInfo()

D-amino acids — use lowercase letters: Uppercase sequence letters build L-amino acids (natural form). Lowercase builds D-amino acids (mirror images). Mixed sequences are fully supported.

p.Build('ACEG')   # All L-amino acids
p.Build('aceg')   # All D-amino acids
p.Build('GAg')    # G=L-Gly, A=L-Ala, g=D-Gly
p.Build('AcEg')   # Mixed L/D sequence

Importing a PDB file:

p = Pose()
p.Import('1TQG.pdb', chain='A')
p.ReBuild()     # Adds missing hydrogens

You can run p.ReBuild() after Import() to add hydrogens to the structure. But understand that a new synthetic structure will be built, therefore you will lose the original occupancy and temperature-factor for each atom (replaces with 1.0 and 0.0).

Importing a molecule:

m = Molecule()
m.Import('caffiene.sdf')
m.GetInfo()

RDKit plugin:

from rdkit import Chem

CFF = 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
m = Chem.MolFromSmiles(CFF)
# Manipulate a molecule using RDKit here
molstr = Chem.MolToMolBlock(m)

m = Molecule()
m.Import(molstr)
m.GetInfo()

This RDKit plugin gives you the power and flexibility to manipulate molecules using RDKit and then import them to the Molecule() class when they are ready.


API Reference

Call Class

Class Description
p = Pose() Calls the Pose() class for proteins, DNA, and RNA
m = Molecule() Calls the Molecule() class for small organic molecules

Each class have similar methods and data structure, but with slight differences in the way they are used.

Building & I/O

Method Description
p.Import(filename='1YN3.pdb', chain=['A', 'B'], model=1) Imports a structure from a PDB or mmCIF file and constructs the p.data object. Can import a protein, DNA, or RNA structure. chain accepts a single chain ID ('A'), a list of chains (['A', 'B']), or None to import all chains. model selects which model to import from multi-model files (e.g. NMR ensembles); defaults to 1. For atoms with multiple conformers, the highest-occupancy conformer is kept. Cannot import a structure that is a mixture of proteins and nucleic acids in separate chains, import each macromolecule type as a separate pose
m.Import(filename='caffiene.sdf') Imports a structure from a PDB, SDF, mmCIF, MOL, or MOL2 files, or an RDKit block string and constructs the m.data object
p.Export('out.pdb') Write the full structure, and all chains, to a PDB or mmCIF file
m.Export('out.sdf') Write the full structure to a PDB, SDF, mmCIF, MOL, or MOL2 file
p.Build('MSLESNRGI', chain='A', fmt='protein') Build a macromolecule from a one-letter sequence. For a polypeptide add the sequence and choose the format fmt='Protein', uppercase = L-amino acids, lowercase = D-amino acids. For a nucleic acid add the sequence and choose the format fmt='DNA' or fmt='RNA'. You can add more chains by repeating the command with different chain chain='A' values. A structure can either be a protein, or a nucleic acid (DNA/RNA), it cannot be a mixture of the two
p.ReBuild(sequence=None, mirror=False, _mutate=None) Rebuild the polypeptide or nucleic acid. Use sequence='AGLMTSWVLVA' to rebuild the structure with multiple bulk mutations on chain A. Use sequence={'A':'MSLKLSTVVA', 'B':'ASLKSWFWVA'} to perform mutations at multiple chains at the same time. Use mirror=True to rebuild a protein and convert L-amino acids → D-amino acids and D-amino acids → L-amino acids. Will add missing hydrogens. For DNA and RNA, the sequence='' length must match exactly the original sequence length, otherwise an error will be raised
p.Mutate(1, 'V') Mutate a single monomer. For proteins: p.Mutate(1, 'V') = residue 1 → L-Valine, p.Mutate(1, 'v') = residue 1 → D-Valine. For DNA: p.Mutate(0, 'T') = nucleotide 0 → Thymine. For RNA: p.Mutate(0, 'U') = nucleotide 0 → Uracil. For double-stranded nucleic acids, the complementary base is also updated automatically

Measurements

Method Description
p.GetDistance(0, 'N', 5, 'CA') Get the distance in Å between any two atoms. Example: residue 0 nitrogen atom to residue 5 CA atom
m.GetDistance(0, 5) Get the distance in Å between any two atoms. Example: atom 0 to atom 5
p.GetDihedral(2, 'PHI') Calculate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral angles. In this example we are measuring the PHI angle of the 3rd protein residue (index 2). For protein χ dihedral use p.GetDihedral(4, 'chi', 1) 5th residue (index 4), CHI 1 angle
m.GetDihedral(0, 1, 2, 3) Calculate a dihedral angle between 4 atoms. In this example the dihedral angle is made up of the atoms at indeces 0, 1, 2, and 3
p.GetAngle(0, 'N', 5, 'CA', 17, 'C') Get the angle between any three atoms in the whole structure. Example: N of residue 1, CA of residue 5, and C angle of residue 17, with the CA atom in the middle being the pivot
m.GetAngle(0, 5, 17) Get the angle between any three atoms in the whole structure. Example: atom at index 1, atom at index 5, and atom at index 17, with atom at index 5 being the pivot
p.GetAtomBonds(0, 1) Confirm and get the PDB name and element name [atom 1 element name, atom 1 PDB name, atom 2 PDB name, atom 2 element name] for two atoms (if they are bonded together). Use the atom indeces. If the two atoms are not bonded an error will be raised
m.GetAtomBonds(1) Get all atom names bonded to this atom index ['atom name 1', 'atom name 2', 'atom name 3']
p.GetAtomCoord(3, 'N') Get the XYZ coordinates of an atom of a residue or a nucleotide (monomers). Example: N nitrogen of monomer index 3
m.GetAtomCoord(3) Get the XYZ coordinates of an atom given its index. Example: atom at index 3
p.GetAtomList(PDB=False) Get a list of all atom element names for the entire structure. Use PDB=True for PDB-formatted names
m.GetAtomList() Get a list of all atom element names for the entire structure
p.GetAtomIdx(3, 'N') Get the atom index in p.data['Coordinates'] from its name within a monomer. This is the opposite of p.GetAtomCoord(3, 'N')
p.GetIdentity(0, 'Atom') Identify the PDB name of an atom, or an amino acid, or a nucleotide by its index. Example p.GetIdentity(5, 'Atom') or p.GetIdentity(5, 'amino acid') or p.GetIdentity(5, 'nucleotide'). Also, specifically just for atoms, you are return its partial charge using p.GetIdentity(3, 'Atom', charge=True)
p.GetInfo() Print a formatted summary of the structure's information
m.GetInfo() Print a formatted summary of the structure's information and a graphical representation of the molecule
m.CalcSMILES() Calculate the SMILES representation of a molecule and add it to m.data['SMILES']
m.CalcSMARTS() Calculate the SMARTS representation of a molecule and add it to m.data['SMARTS']
p.CalcMass() Calculates the entire molecular mass of a molecule (all chains) in Da (Daltons), updates the value of p.data['Mass']
m.CalcMass() Calculates the entire molecular mass of a molecule
p.CalcSize() Calculates the length of each chain in a structure, updates the value of p.data['Size']. You can get the length of each chain using p.data['Size'][CHAIN]
p.CalcFASTA() Compiles the FASTA sequence of each chain, updates the value of p.data['FASTA']. You can get the FASTA sequence of each chain using p.data['FASTA'][CHAIN]
p.CalcRg() Calculates the entire Radius of Gyration of a molecule (all chains) in Å (angstrom), updates the value of p.data['Rg']
m.CalcRg() Calculates the entire Radius of Gyration of a molecule
p.CalcCharge(iterations=6) Calculate the Gasteiger-Marsili partial charges to all atoms using iterative equalization (default 6 iterations), updates the value of p.data['Atoms'][index][2]
m.CalcCharge(iterations=6) Calculate the Gasteiger-Marsili partial charges to all atoms using iterative equalization (default 6 iterations), updates the value of m.data['Atoms'][index][2]
p.CalcDSSP() Calculates each amino acid's secondary structure assignments, only for proteins, and stores them in p.data['Amino Acids'][i][4] and updates p.data['SS'][CHAIN], therefore this is where you can get the SS sequence of each chain. Codes: H=α-helix, G=3₁₀-helix, I=π-helix, E=β-sheet, B=β-bridge, T=turn, S=bend, L=loop, P=PPII-helix
p.CalcSASA(n_points=100, probe_radius=1.4) Calculates the Solvent Accessible Surface Area (SASA) for each amino acid, only for proteins, using golden sphere sampling. n_points controls sampling density, probe_radius is the solvent probe radius in Å (default 1.4 for water). Adds the value to p.data['Amino Acids'][i][6]

Manipulation

Method Description
p.AdjustDistance(0, 'N', 4, 'C', 17) Set the distance between any two atoms in (Å). Example: set the distance between N in residue 0 and C in residue 4 to 17 Å. Order matters: the second atom (and all atoms downstream of it on the same chain) moves, while the first atom stays fixed. (0, 'N', 0, 'CA', d)(0, 'CA', 0, 'N', d)
m.AdjustDistance(0, 4, 17) Set the distance between any two atoms in (Å). Example: set the distance between atom at index 0 and atom at index 4 to 17 Å. Order matters: the second atom (and all atoms downstream of it) moves, while the first atom stays fixed. (0, 1, d)(1, 0, d)
p.AdjustAngle(1, 'N', 1, 'CA', 1, 'C', -2) Add/subtract degrees from a three-atom angle, with the middle atom being the pivot point. Example: subtract 2° from N–CA–C angle of residue 1, with the CA atom being the pivot
m.AdjustAngle(0, 1, 2, -2) Add/subtract degrees from a three-atom angle, with the middle atom being the pivot point. Example: subtract 2° from the angle represented by atom 0, atom 1, and atom 2, with atom 1 being the pivot
p.RotateDihedral(1, -60, 'PHI') Rotate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral angles. Example: residue 1 PHI dihedral to -60°
m.RotateDihedral(0, 1, 2, 3, -60) Rotate any dihedral angle represented by four atoms. Example: rotate a dihedral angle represented by atom index 0, atom index 1, atom index 2, and atoms index 3 to become -60°
p.MovePose(theta=5, u=[18, 10, 5], l=6, ori=[0, 0, 0]) Rotate and/or translate the whole structure. theta = rotation angle in degrees, u = rotation axis vector (will be normalised), l = translation distance in Å, ori = target point to translate towards. All parameters are optional (default None); you can rotate only, translate only, or both
m.MovePose(theta=5, u=[18, 10, 5], l=6, ori=[0, 0, 0]) Rotate and/or translate the whole structure. theta = rotation angle in degrees, u = rotation axis vector (will be normalised), l = translation distance in Å, ori = target point to translate towards. All parameters are optional (default None); you can rotate only, translate only, or both

Tools

These are standalone tools (not Pose() class methods) and thus are called on their own:

Function Description
Parameterise('MSE.cif', 'J', 'MSE') To add a new amino acid to the database.json library. Takes filename, single letter unicode, three letter tricode
RMSD(pose1, pose2, alg='align', export='aligned.pdb') Computes the Root Mean Squared Deviation between two protein or nucleic acids Pose structures using Cα (alpha-carbon) atoms for proteins, or C1 atoms for nulceic acids. Returns the RMSD in (Å). Supported algorithms: 'align' (sequence alignment + iterative Kabsch), 'kabsch' (SVD-based optimal rotation), 'quaternion' (eigenvalue-based optimal rotation), or 'simple' (translation only, no rotation). Can export the aligned structures to aligned_1.pdb, aligned_2.pdb
BLAST(sequence1, sequence2) Perform pairwise protein or nucleic acid sequence alignment using the Smith-Waterman local alignment algorithm with BLOSUM62 substitution scores, matching the statistical model used by NCBI BLASTP. Returns: (alignment_string, percent_identity, e_value)
MSA([sequence1, sequence2, sequence3....]) Aligns three or more protein or nucleic acid sequences using a ClustalW-like progressive alignment strategy, pairwise distances are computed with BLAST(). Returns: (alignment_string, aligned_list, conservation_list, entropy_list, pssm_array, dca_array) where conservation_list is a per-column score in [0, 1] (1 = fully conserved), entropy_list is per-column Shannon entropy in bits, pssm_array is a (L, 20) log-odds matrix in BLOSUM62 column order (ARNDCQEGHILKMFPSTWYV), and dca_array is an (L, L) APC-corrected mean-field DCA direct-information matrix
Isoelectric(sequence) Calculates the protein's isoelectric point (pI) using the EMBOSS pKa scale and bisection on [0, 14]. Takes a protein sequence and returns a float, the pH at which the protein has zero net charge
Hydrophobicity(sequence, window=9, scale='eisenberg') Calculates the hydrophobicity profile from a protein sequence using a sliding window. Supported scales: 'eisenberg' (default, normalized consensus), 'kyte-doolittle', 'hopp-woods', 'engelman'. Returns a tuple of two lists (positions, scores) where positions are zero-based indices of the window centers — these lists are used to plot the graph
Aliphatic(sequence) Calculates the Aliphatic index of a protein from its sequence (Ikai 1980: AI = X(A) + 2.9·X(V) + 3.9·(X(I) + X(L))), returns a float value
ExtinctCoeff(sequence, reduced=True) Calculates the molar extinction coefficient at 280 nm in water (Pace 1995: ε = nW·5500 + nY·1490 + (nC/2)·125). With reduced=True (default) cysteines are treated as reduced and contribute 0; with reduced=False cysteines are treated as cystines and contribute (nC // 2) · 125. Returns an int value in M⁻¹ cm⁻¹
Instability(sequence) Calculates the Instability index of a protein (Guruprasad et al. 1990) using the DIWV dipeptide weight table. Returns a float; values below 40 generally indicate a stable protein
GRAVY(sequence) Calculates the Grand Average of Hydropathy using the Kyte-Doolittle hydropathy scale, returns a float value
Split(pose, chain=None, start=None, end=None) Slice a Pose into a new Pose object. Takes the original pose, the chain if you want to split out an entire chain, or start, end if you want to split out a range of monomer residues (zero-based, inclusive). Works for proteins, DNA, and RNA. Atom and residue indices, the bond graph, and coordinates are all renumbered densely from zero in the returned pose
Concatenate(pose1, pose2, fuse=False) Combine two poses of the same Type. With fuse=False (default) pose2 is appended to pose1 as additional chains, preserving the original coordinates of both poses; chain IDs in pose2 that collide with pose1 are renamed to the next free letter. With fuse=True the concatenated FASTA is rebuilt as a single continuous polymer with idealised geometry, the original input coordinates are discarded
PCR(sequence) Generates forward and reverse PCR primers for a DNA template (DNA only, accepts only A/C/G/T, template must be ≥ 36 bp). Uses a 5-tier relaxation strategy so that any chemically valid template always returns a primer pair. Ideal tier requires length 18–25, GC 40–60%, nearest-neighbor SantaLucia 1998 Tm in [55, 65] °C, a 3' GC clamp, no run of 4 identical bases, no internal palindrome (hairpin), no 3' self-dimer, and |ΔTm| ≤ 2 °C. If no pair satisfies it the search falls through progressively relaxed Good / Fair / Poor / Last resort tiers, each widening the length / GC / Tm / ΔTm bounds and dropping the GC clamp / hairpin / dimer gates. When the result comes from any tier below Ideal, a warning is printed to stdout naming the tier and which gates were relaxed (e.g. Warning: PCR primers are suboptimal (Poor tier) — GC% outside 40-60; Tm outside 55-65 °C; GC clamp missing). Returns a tuple (forward_string, reverse_string, warning_message_for_suboptimal_primers)
Translate(sequence, fmt='protein', organism='ecoli') Translates between protein, DNA, and RNA. The input alphabet is auto-detected. Takes a sequence and translates it to the requested fmt format. Nucleotide → protein translation uses the standard genetic code and returns * for stop codons. Protein → DNA/RNA back-translation is codon-optimised by selecting the highest-frequency codon (deterministic) for the chosen organism, which takes 'ecoli' (default) or 'human'. Returns the translated sequence as an uppercase string
PROSITE(sequence, pattern) Search a protein sequence for a PROSITE-style pattern. Pattern grammar: [ABC] = any of A/B/C, {ABC} = any except A/B/C, x = any residue, x(n) / x(n,m) = quantifiers, A(n) / A(n,m) = repeat literal residues, < / > = anchor at sequence start/end, - = token separator (stripped). Returns a list of tuples [(start, end, match), ...] with 1-based, inclusive positions
HydrogenBondMap(pose) Generates a backbone hydrogen-bond donor/acceptor map for a protein pose (proteins only). Uses the same DSSP electrostatic criterion as p.CalcDSSP() (Kabsch & Sander 1983: E < -0.5 kcal/mol). Returns an array of shape (N_atoms, N_atoms) where 0 = no bond, 1 = this atom is a donor (backbone N), 2 = this atom is an acceptor (backbone O)
ContactMap(pose) Generates a monomer-monomer distance map in angstroms. The molecule type is auto-detected from pose.data['Type']: distances between protein residues are calculated from the Cα atoms, while distances between DNA and RNA bases are calculated from their C1' atoms. Returns an array of shape (N_residues, N_residues) with zero on the diagonal
Rotamers(10, pose) Update χ dihedrals (rotamers) with the most-probable χ dihedrals for a residue given backbone phi, psi. Derived from the Dunbrack rotamer library

BLAST handles sequences beyond the 20 canonical L-amino acids automatically: D-amino acids: stored as lowercase letters in pose.data['FASTA']. BLAST uppercases both sequences before alignment, treating each D-amino acid as its L-counterpart for scoring purposes. This correctly reflects the chemical reality that D- and L-forms of the same residue have identical side-chain chemistry. Non-canonical amino acids: any letter not in the 20-letter BLOSUM62 alphabet falls back to: +4 for a self-match (equal to the minimum BLOSUM62 diagonal), −1 for a mismatch. This keeps non-canonical residues visible to the aligner without inflating scores.

MSA handles sequences beyond the 20 canonical L-amino acids, identical to BLAST()

For Parameterise() this is the workflow:

  1. Download the CIF file for the amino acid from RCSB Chemical Sketch
  2. Call Parameterise() with the CIF file path, a single-letter key, and the three-letter residue code.

Key Concepts

Zero-based indexing

All residue and atom indices start at 0, not 1. Residue 0 is the N-terminal amino acid. This is the opposite of PDB convention.

p.Build('MSLESNRGI', chain='A', fmt='protein') # Construct a polypeptide
p.GetDihedral(0, 'PHI')                        # PHI of first residue (index 0)
p.GetDihedral(2, 'chi', 1)                     # CHI 1 of third residue (index 2)
p.GetDistance(0, 'N', 1, 'CA')                 # N of residue 0 to CA of residue 1
p.Build('MSLESNRGI', chain='B', fmt='protein') # Add a second chain

Accessing the data structure directly

p.data['FASTA']              # Sequence string
p.data['Size']               # Number of residues (int)
p.data['Amino Acids'][0]     # [letter, chain, bb_indices, sc_indices, secondary structure, tricode, SASA]
p.data['Atoms'][0]           # [pdb_name, element, charge, occupancy, temp_factor]
p.data['Coordinates']        # Numpy array, shape (N, 3)
p.data['Bonds']              # Adjacency list: {atom_index: [bonded_atom_indices]}

Iterating over residues and atoms:

for idx, aa in p.data['Amino Acids'].items():
    symbol, chain, bb, sc, ss, tricode, sasa = aa
    print(f'Residue {idx}: {tricode} ({symbol}), SS={ss}')

for idx, atom in p.data['Atoms'].items():
    name, element, charge, occupancy, temp = atom
    xyz = p.data['Coordinates'][idx]
    print(f'Atom {idx}: {name} ({element}) at {xyz}')

Supported Amino Acids

Uppercase = L-form, lowercase = D-form. All 26 are supported in mixed L/D sequences. Additional amino acids can be added to the database.json file.

The N-terminus is protonated, as expected at physiological pH (~7.4), and therefore exists as a positively charged ammonium group (–NH3+)

A - ALA B - LYX C - CYS D - ASP E - GLU
F - PHE G - GLY H - HIS I - ILE J - MSE
K - LYS L - LEU M - MET N - ASN O - PYL
P - PRO Q - GLN R - ARG S - SER T - THR
U - SEC V - VAL W - TRP X - TRF Y - TYR
Z - TSO

Supported Nucleotides

DNA

A - DA T - DT C - DC G - DG

RNA

A - A U - U C - C G - G

Data Structure Reference

Get the content of the structure's JSON object using print(p.data[KEY])

This p.data structure from the Pose() class represents proteins, DNA and RNA:

Key Value Type Description
Type String Identifies the structure as a protein, DNA, or RNA
Energy Float Potential energy of the molecule
Rg Float Radius of gyration
Mass Float Mass in Daltons
Size Dict Length of each chain, ie: the number of monomers for each chain
FASTA Dict One-letter sequence for each chain
SS Dict One-letter amino acid secondary structure asignments for each chain
Nucleotides Dict {index: [symbol, chain, bb_atom_indices, sc_atom_indices, tricode]}, zero-based indexing
Amino Acids Dict {index: [symbol, chain, bb_atom_indices, sc_atom_indices, secondary_struct, tricode, SASA]}, zero-based indexing
Atoms Dict {atom_index: [pdb_name, element, partial charge, occupancy, temp_factor]}, zero-based indexing
Bonds Dict Bond graph as adjacency list: {atom_index: [bonded_atom_indices]}
Coordinates NumPy array Shape (N, 3), Cartesian XYZ for each atom

This m.data structure from the Molecule() class represents small organic molecules:

Key Value Type Description
Type String Identifies the structure as a molecule
Energy Float Potential energy of the molecule
Rg Float Radius of gyration
Mass Float Molecule's molecular mass
SMILES Str The SMILES representation of the molecule as a string
SMARTS Str The SMARTS representation of the molecile as a string
'Formula' Str The molecular formula of the molecule
Atoms Dict {atom_index: [pdb_name, element, partial charge]}, zero-based indexing
Bonds Dict Bond graph as adjacency list: {atom_index: [bonded_atom_indices]}
Coordinates NumPy array Shape (N, 3), Cartesian XYZ for each atom

Description of amino acids in database.json:

This information resides in database['Amino Acids'][AMINO_ACID_UNICODE or BACKBONE]

Dictionary Key Value Type Description of Values
Vectors List of lists The position of each atom relative to the N of the backbone. If the N coorinate is X, Y, Z = 0, 0, 0 you will get these vectors. To find the correct vectors position the N at coordinate X, Y, Z = 0, 0, 0, and use the corresponding coordinates of each atom
Tricode String The three letter code for each amino acid
Fused Boolian True = the sidechain is fused to the backbone
Backbone Atoms or Sidechain Atoms List of lists The atom identity of each coordinate point, for example: first coordinate point is the nitrogen with symbol N and PDB entry N, next atom is the hydrogen that is bonded to the nitrogen with symbol H and PDB entry 1H etc... Unlike the PDB where all hydrogens are collected after the amino acid, here each atom's hydrogens come right after it. This makes for easier matrix operations. Order is index [0] = PDB atom's name, index [1] = element, index [2] = partial charge, index [3] = occupancy, index [4] = temperature factor
Chi Angle Atoms List of lists The atoms in the sidechain that are contributing to a chi angle
Bonds Dictionary The bond graph as an adjacency list
BondOrderss Dictionary The bond order graph as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds
BBDEP Dictionary The sin/cos×10000 grids calculated from the Dunbrack backbone-dependent rotamer library, at each 10° (φ, ψ) bin the highest-probability rotamer's χ dihedrals were encoded as (sin χ, cos χ) pairs on a 36×36 grid. At runtime, tools.Rotamers() uses residue name, its φ dihedral, and its ψ dihedral and bilinearly interpolates the four neighbouring grid cells and recovers each χ via atan2(sin_interp, cos_interp). The non-canonical BBDEP (LYX, MSE, PYL, SEC, TRF, TSO) that have no Dunbrack entries were borrowed verbatim from the closest canonical analog whose χ definitions match (MSE↔MET, SEC↔CYS, TRF↔TRP, first χ of LYX/PYL from LYS, first chis of TSO from TYR). Any extra chi angles beyond what the analog provides are filled with a "trans pad" (chi = 180° everywhere, encoded as sin=0, cos=−10000), a deliberate and explicit placeholder that downstream MD minimization will relax into the correct local minimum

Description of nucleotides in database.json:

This information resides in database['Nucleotides'][NUCEOTIDE_TRICODE]

Dictionary Key Value Type Description of Values
Vectors List of lists The position of each atom relative to the N of the backbone. If the N coorinate is X, Y, Z = 0, 0, 0 you will get these vectors. To find the correct vectors position the N at coordinate X, Y, Z = 0, 0, 0, and use the corresponding coordinates of each atom
Tricode String The three letter code for each nucleotide
Type String Identify as DNA or RNA
Backbone Atoms List of lists The atom identity of each backbone coordinate point, first coordinate point is the phosphorus with symbol P and PDB entry P, next atom is the oxygen atom that is bonded to the phosphorus with symbol O and PDB entry OP1 etc...
Base Atoms List of lists The atom identity of each nistrogen base coordinate point
Chi Angle Atoms List of lists The atoms in the sidechain that are contributing to a chi angle
Bonds Dictionary The bond graph as an adjacency list
BondOrderss Dictionary The bond order graph as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds

Community & Contributions

Contributions are welcome! Open an issue or pull request on GitHub, or just email me.

Chat with users and contributors in real time: IRC: #pose channel on the irc.libera.chat network, Or use the Libera web chat, no install needed.

Come ask questions, share what you've built with Pose, or discuss contributions.