首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation.  相似文献   

2.
3.
Next-generation sequencing(NGS) technologies generate thousands to millions of genetic variants per sample.Identification of potential disease-causal variants is labor intensive as it relies on filtering using various annotation metrics and consideration of multiple pathogenicity prediction scores.We have developed VPOT(variant prioritization ordering tool),a python-based command line tool that allows researchers to create a single fully customizable pathogenicity ranking score from any number of annotation values,each with a user-defined weighting.The use of VPOT can be informative when analyzing entire cohorts,as variants in a cohort can be prioritized.VPOT also provides additional functions to allow variant filtering based on a candidate gene list or by affected status in a family pedigree.VPOT outperforms similar tools in terms of efficacy,flexibility,scalability,and computational performance.VPOT is freely available for public use at Git Hub(https://github.com/VCCRI/VPOT/).Documentation for installation along with a user tutorial,a default parameter file,and test data are provided.  相似文献   

4.
Although complex diseases and traits are thought to have multifactorial genetic basis, the common methods in genome-wide association analyses test each variant for association independent of the others. This computational simplification may lead to reduced power to identify variants with small effect sizes and requires correcting for multiple hypothesis tests with complex relationships. However, advances in computational methods and increase in computational resources are enabling the computation of models that adhere more closely to the theory of multifactorial inheritance. Here, a Bayesian variable selection and model averaging approach is formulated for searching for additive and dominant genetic effects. The approach considers simultaneously all available variants for inclusion as predictors in a linear genotype-phenotype mapping and averages over the uncertainty in the variable selection. This leads to naturally interpretable summary quantities on the significances of the variants and their contribution to the genetic basis of the studied trait. We first characterize the behavior of the approach in simulations. The results indicate a gain in the causal variant identification performance when additive and dominant variation are simulated, with a negligible loss of power in purely additive case. An application to the analysis of high- and low-density lipoprotein cholesterol levels in a dataset of 3895 Finns is then presented, demonstrating the feasibility of the approach at the current scale of single-nucleotide polymorphism data. We describe a Markov chain Monte Carlo algorithm for the computation and give suggestions on the specification of prior parameters using commonly available prior information. An open-source software implementing the method is available at http://www.lce.hut.fi/research/mm/bmagwa/ and https://github.com/to-mi/.  相似文献   

5.

Background

With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and deletions. Insertion and deletion (indel) variants comprise a major proportion of human genetic variation. However, little is known about their effects on humans. The absence of understanding is largely due to the lack of both biological data and computational resources.

Results

This paper presents a new indel functional prediction method HMMvar based on HMM profiles, which capture the conservation information in sequences. The results demonstrate that a scoring strategy based on HMM profiles can achieve good performance in identifying deleterious or neutral variants for different data sets, and can predict the protein functional effects of both single and multiple mutations.

Conclusions

This paper proposed a quantitative prediction method, HMMvar, to predict the effect of genetic variation using hidden Markov models. The HMM based pipeline program implementing the method HMMvar is freely available at https://bioinformatics.cs.vt.edu/zhanglab/hmm.  相似文献   

6.
7.
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein–protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction .  相似文献   

8.
VarSite is a web server mapping known disease‐associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image‐based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease‐associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite .  相似文献   

9.
10.
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
This is a PLOS Computational Biology Software Article.
  相似文献   

11.
12.
A huge number of high-quality predicted protein structures are now publicly available. However, many of these structures contain non-globular regions, which diminish the performance of downstream structural bioinformatic applications. In this study, we develop AlphaCutter for the removal of non-globular regions from predicted protein structures. A large-scale cleaning of 542,380 predicted SwissProt structures highlights that AlphaCutter is able to (1) remove non-globular regions that are undetectable using pLDDT scores and (2) preserve high integrity of the cleaned domain regions. As useful applications, AlphaCutter improved the folding energy scores and sequence recovery rates in the re-design of domain regions. On average, AlphaCutter takes less than 3 s to clean a protein structure, enabling efficient cleaning of the exploding number of predicted protein structures. AlphaCutter is available at https://github.com/johnnytam100/AlphaCutter . AlphaCutter-cleaned SwissProt structures are available for download at https://doi.org/10.5281/zenodo.7944483 .  相似文献   

13.
《Journal of molecular biology》2019,431(11):2197-2212
Knowledge of protein structure can be used to predict the phenotypic consequence of a missense variant. Since structural coverage of the human proteome can be roughly tripled to over 50% of the residues if homology-predicted structures are included in addition to experimentally determined coordinates, it is important to assess the reliability of using predicted models when analyzing missense variants. Accordingly, we assess whether a missense variant is structurally damaging by using experimental and predicted structures. We considered 606 experimental structures and show that 40% of the 1965 disease-associated missense variants analyzed have a structurally damaging change in the mutant structure. Only 11% of the 2134 neutral variants are structurally damaging. Importantly, similar results are obtained when 1052 structures predicted using Phyre2 algorithm were used, even when the model shares low (< 40%) sequence identity to the template. Thus, structure-based analysis of the effects of missense variants can be effectively applied to homology models. Our in-house pipeline, Missense3D, for structurally assessing missense variants was made available at http://www.sbg.bio.ic.ac.uk/~missense3d  相似文献   

14.

Background

Germline pathogenic variants in the breast cancer type 1 susceptibility gene BRCA1 are associated with a 60% lifetime risk for breast and ovarian cancer. This overall risk estimate is for all BRCA1 variants; obviously, not all variants confer the same risk of developing a disease. In cancer patients, loss of BRCA1 function in tumor tissue has been associated with an increased sensitivity to platinum agents and to poly-(ADP-ribose) polymerase (PARP) inhibitors. For clinical management of both at-risk individuals and cancer patients, it would be important that each identified genetic variant be associated with clinical significance. Unfortunately for the vast majority of variants, the clinical impact is unknown. The availability of results from studies assessing the impact of variants on protein function may provide insight of crucial importance.

Results and conclusion

We have collected, curated, and structured the molecular and cellular phenotypic impact of 3654 distinct BRCA1 variants. The data was modeled in triple format, using the variant as a subject, the studied function as the object, and a predicate describing the relation between the two. Each annotation is supported by a fully traceable evidence. The data was captured using standard ontologies to ensure consistency, and enhance searchability and interoperability. We have assessed the extent to which functional defects at the molecular and cellular levels correlate with the clinical interpretation of variants by ClinVar submitters. Approximately 30% of the ClinVar BRCA1 missense variants have some molecular or cellular assay available in the literature. Pathogenic variants (as assigned by ClinVar) have at least some significant functional defect in 94% of testable cases. For benign variants, 77% of ClinVar benign variants, for which neXtProt Cancer variant portal has data, shows either no or mild experimental functional defects. While this does not provide evidence for clinical interpretation of variants, it may provide some guidance for variants of unknown significance, in the absence of more reliable data.The neXtProt Cancer variant portal (https://www.nextprot.org/portals/breast-cancer) contains over 6300 observations at the molecular and/or cellular level for BRCA1 variants.
  相似文献   

15.
16.
17.
Deep mutational scanning provides unprecedented wealth of quantitative data regarding the functional outcome of mutations in proteins. A single experiment may measure properties (eg, structural stability) of numerous protein variants. Leveraging the experimental data to gain insights about unexplored regions of the mutational landscape is a major computational challenge. Such insights may facilitate further experimental work and accelerate the development of novel protein variants with beneficial therapeutic or industrially relevant properties. Here we present a novel, machine learning approach for the prediction of functional mutation outcome in the context of deep mutational screens. Using sequence (one-hot) features of variants with known properties, as well as structural features derived from models thereof, we train predictive statistical models to estimate the unknown properties of other variants. The utility of the new computational scheme is demonstrated using five sets of mutational scanning data, denoted “targets”: (a) protease specificity of APPI (amyloid precursor protein inhibitor) variants; (b-d) three stability related properties of IGBPG (immunoglobulin G-binding β1 domain of streptococcal protein G) variants; and (e) fluorescence of GFP (green fluorescent protein) variants. Performance is measured by the overall correlation of the predicted and observed properties, and enrichment—the ability to predict the most potent variants and presumably guide further experiments. Despite the diversity of the targets the statistical models can generalize variant examples thereof and predict the properties of test variants with both single and multiple mutations.  相似文献   

18.
Venus is a yellow fluorescent protein that has been developed for its fast chromophore maturation rate and bright yellow fluorescence that is relatively insensitive to changes in pH and ion concentrations. Here, we present a detailed study of the stability and folding of Venus in the pH range from 6.0 to 8.0 using chemical denaturants and a variety of spectroscopic probes. By following hydrogen-deuterium exchange of 15N-labeled Venus using NMR spectroscopy over 13 months, residue-specific free energies of unfolding of some highly protected amide groups have been determined. Exchange rates of less than one per year are observed for some amide groups. A super-stable core is identified for Venus and compared with that previously reported for green fluorescent protein. These results are discussed in terms of the stability and folding of fluorescent proteins. Under mildly acidic conditions, we show that Venus undergoes a drastic decrease in yellow fluorescence at relatively low concentrations of guanidinium chloride. A detailed study of this effect establishes that it is due to pH-dependent, nonspecific interactions of ions with the protein. In contrast to previous studies on enhanced green fluorescence protein variant S65T/T203Y, which showed a specific halide ion-binding site, NMR chemical shift mapping shows no evidence for specific ion binding. Instead, chemical shift perturbations are observed for many residues primarily located in both lids of the β-barrel structure, which suggests that small scale structural rearrangements occur on increasing ionic strength under mildly acidic conditions and that these are propagated to the chromophore resulting in fluorescence quenching.  相似文献   

19.
Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNPs genotyped by currently available arrays used for genomic selection. A number of variants detected from sequencing influential sires are likely to be causal, but noticeable improvements in prediction accuracy using imputed sequence variant genotypes have not been reported. Improvement in accuracy of predicted breeding values may be limited by the accuracy of imputed sequence variants. Using genotypes of SNPs on a high‐density array and non‐synonymous SNPs detected in sequence from influential sires of a multibreed population, results of this examination suggest that linkage disequilibrium between non‐synonymous and array SNPs may be insufficient for accurate imputation from the array to sequence. In contrast to 75% of array SNPs being strongly correlated to another SNP on the array, less than 25% of the non‐synonymous SNPs were strongly correlated to an array SNP. When correlations between non‐synonymous and array SNPs were strong, distances between the SNPs were greater than separation that might be expected based on linkage disequilibrium decay. Consistently near‐perfect whole‐genome linkage disequilibrium between the full array and each non‐synonymous SNP within the sequenced bulls suggests that whole‐genome approaches to infer sequence variants might be more accurate than imputation based on local haplotypes. Opportunity for strong linkage disequilibrium between sequence and array SNPs may be limited by discrepancies in allele frequency distributions, so investigating alternate genotyping approaches and panels providing greater chances of frequency‐matched SNPs strongly correlated to sequence variants is also warranted. Genotypes used for this study are available from https://www.animalgenome.org/repository/pub/ ;USDA2017.0519/.  相似文献   

20.
Predicting the effect of missense variations on protein stability and dynamics is important for understanding their role in diseases, and the link between protein structure and function. Approaches to estimate these changes have been proposed, but most only consider single‐point missense variants and a static state of the protein, with those that incorporate dynamics are computationally expensive. Here we present DynaMut2, a web server that combines Normal Mode Analysis (NMA) methods to capture protein motion and our graph‐based signatures to represent the wildtype environment to investigate the effects of single and multiple point mutations on protein stability and dynamics. DynaMut2 was able to accurately predict the effects of missense mutations on protein stability, achieving Pearson's correlation of up to 0.72 (RMSE: 1.02 kcal/mol) on a single point and 0.64 (RMSE: 1.80 kcal/mol) on multiple‐point missense mutations across 10‐fold cross‐validation and independent blind tests. For single‐point mutations, DynaMut2 achieved comparable performance with other methods when predicting variations in Gibbs Free Energy (ΔΔG) and in melting temperature (ΔTm). We anticipate our tool to be a valuable suite for the study of protein flexibility analysis and the study of the role of variants in disease. DynaMut2 is freely available as a web server and API at http://biosig.unimelb.edu.au/dynamut2 .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号