DL-RP-MDS is a tool for annotating missense variants and predicts the possible impact on the protein structure. The current version predicts novel variants in TP53. MLH1, and MSH2. Some of the highlights of this tool are:
- Deep learning algorithm trained from variant classification in ClinVar database
- Probabilistic classifier for predicting variants impacts
- High-throughput analysis for missense variants mutation
Introduction
In recent years, the deposit of genome variants in public databases has greatly surpassed the capability of experimental assay functional analysis. DNA damage repair (DDR) genes were among the highest number of unclassified variants, and Single Nucleotide Variants (SNV) made up most of the human variation. Non-synonymous SNVs, or missense mutations, result in single amino acid substitutions, and their effects on the protein(s) are often enigmatic.
We have developed a deep learning (DL)-based system for decoding and classifying genetic variants. The system combines an unsupervised learning model, auto-encoder, and a neural network classifier, together with the Ramachandran plot-molecular dynamics simulation (RP-MDS) method to form DL-RP-MDS. Our novel approach focuses on gene-specific protein structures to predict the deleteriousness of the variants.
A brief description of the methods
We train DL-RP-MDS based on benign and pathogenic variants in the ClinVar database, together with the crystal protein structure. A protein is made of amino acids conjoined by peptide bonds. The collective molecular steric restriction and the intramolecular interactions of the amino acid branch gives a distinct structure of the protein. Using the protein backbone as an indicator of residue spatial population, the angles of the rotation (phi and psi) show the possible structural configuration of the protein. Single point mutation in the protein changes the local steric restrictions and the intramolecular interactions of the neighbour residues. This physical principle is then used as a determining factor for observing the changes in the structure to provide a deleterious probability of the variant.
Using the wildtype crystal structure file as the template, we mutate single amino acid in the structure and use molecular dynamics simulation to equilibrate the new system. The final 10 ns of the trajectories are extracted, and the Ramachandran plot is analysed for each frame. The combined Ramachandran plot is transferred to DL-RP-MDS for training and testing the model. Here, an unsupervised learning mode, autoencoder, and a neural network classifier, multi-layer perceptron are used to encode and decode the combined Ramachandran plots. Depending on the protein structure and complexity, different numbers of ‘virtual dimensions’ are used, providing a gene-specific analysis of the variants. The model is trained to recognise benign and pathogenic variants from data collected in the ClinVar database, thus a predictive score of P(U, Unknown) and P(D, Deleterious) is given to the tested variant(s). The final classification is based on the highest probability of the prediction, i.e. P(D) = 0.7 and P (U) = 0.3, therefore the variant is labelled as deleterious, and vice versa.
Please cite:
- Tam B, Qin Z, Zhao B, Wang SM, Lei CL. Integration of deep learning with Ramachandran plot molecular dynamics simulation for genetic variant classification. iScience. 2023 Feb 2;26(3):106122. doi: 10.1016/j.isci.2023.106122. PMID: 36879825; PMCID: PMC9984559.
- Tam B., Sinha S., Qin Z., Wang S. M., Comprehensive identification of deleterious TP53 missense VUS variants based on their impact on TP53 structural stability, International Journal of Molecular Sciences, 2021, 22, 11345
- Tam B., Sinha S., Wang S. M., Combining Ramachandran Plot and Molecular Dynamics Simulation for structural-based variant classification: using TP53 variants as model, Computational and Structural Biotechnology Journal, 2020, 18, 4033-4039
Deep learning Ramachandran plot molecular dynamics simulation (DL-RP-MDS) predicts the possible impact of the missense genetic variants in the protein structure.
DL-RP-MDS deciphers the simulation output of the GROMACS and predicts pathogenicity of the selected variant.
GROMACS is a molecular dynamic software and below is the recommended simulation setup of system. In particular we have used specific genomic position and number of residues to train and predict deleteriousness of the variant. We recommend to use the XXX_WT.pdb file for single point residue substitution.
User Input
- There are two options:
- MDS + DL-RP-MDS
- DL-RP-MDS
- There are three options:
- TP53, residue 94 – 312
- MLH1, residue 1 – 347
- MSH2 , residue 1 – 932
- Reference Genome for the variants, the input in the style of hg19 or hg38.
- Input for the wildtype residues, the position of the variants, and the variants residues.
- The results of the simulation will be sent to the given email address.
The “MD + DL-RP-MDS” option allows the user to send in request for simulating molecular dynamics simulation (MDS) for the protein (see steps 2-4). DL-RP-MDS is performed after the simulations. MDS will take at least 2 days to complete.
The “DL-RP-MDS” option allows the user to upload an “XXX_rama.xvg” file and perform DL-RP-MDS. To create “XXX_rama.xvg”, we recommend the user to extract 334 frames by using the “gmx rama” command in the GROMACS software.
Results
- The variants classification, the probability of “Unknown, U” and the probability of “Deleterious, D”.
- The latent dimension of the DL-RP-MDS results, by hovering the mouse over the image, the variant location is represented by “x“. The green and red colour of “x“ represents Unknown and Deleterious, respectively.
For more information about the general query of DL-RP-MDS, please contact:
Benjamin Tam
University of Macau
Email: benjamintam@um.edu.mo
San Ming Wang
University of Macau
Email: sanmingwang@um.edu.mo
Chon Lok Lei
University of Macau
Email: chonloklei@um.edu.mo