首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

2.
Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are indels that have a length that is not divisible by 3 and subsequently cause frameshifts. Indels that have a length divisible by 3 cause amino acid insertions/deletions or block substitutions; we call these 3n indels. The new amino acid changes resulting from 3n indels could potentially affect protein function. Therefore, we construct a SIFT Indel prediction algorithm for 3n indels which achieves 82% accuracy, 81% sensitivity, 82% specificity, 82% precision, 0.63 MCC, and 0.87 AUC by 10-fold cross-validation. We have previously published a prediction algorithm for frameshifting indels. The rules for the prediction of 3n indels are different from the rules for the prediction of frameshifting indels and reflect the biological differences of these two different types of variations. SIFT Indel was applied to human 3n indels from the 1000 Genomes Project and the Exome Sequencing Project. We found that common variants are less likely to be deleterious than rare variants. The SIFT indel prediction algorithm for 3n indels is available at http://sift-dna.org/  相似文献   

3.
Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.  相似文献   

4.

Background

Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6.

Methods

We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR.

Results

A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10-6 for CoDP, p < 3.3 × 10-5 for MAPP, p < 3.1 × 10-4 for SIFT and p < 1.2 × 10-3 for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods.

Conclusion

In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at http://cib.cf.ocha.ac.jp/CoDP/.  相似文献   

5.
Xi T  Jones IM  Mohrenweiser HW 《Genomics》2004,83(6):970-979
Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant from Tolerant (SIFT) classified 226 of 508 variants (44%) as "Intolerant." Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as "Probably or possibly damaging." Another 9-15% of the variants were classed as "Potentially intolerant or damaging." The results from the two algorithms are highly associated, with concordance in predicted impact observed for approximately 62% of the variants. Twenty-one to thirty-one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as "Tolerant" or "Benign." Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.  相似文献   

6.
Patterns of nucleotide substitution in pseudogenes and functional genes   总被引:26,自引:0,他引:26  
Summary The pattern of point mutations is inferred from nucleotide substitutions in pseudogenes. The pattern obtained suggests that transition mutations occur somewhat more frequently than transversion mutations and that mutations result more often in A or T than in G or C. Our results are discussed with respect to the predictions from Topal and Fresco's model for the molecular basis of point (substitution) mutations (Nature 263:285–289, 1976). The pattern of nucleotide substitution at the first and second positions of codons in functional genes is quite similar to that in pseudogenes, but the relative frequency of the transition CT in the sense strand is drastically reduced and those of the transversions CG and GC are doubled. The differences between the two patterns can be explained by the observation that in the protein evolution amino acid substitutions occur mainly between amino acids with similar biochemical properties (Grantham, Science 185:862–864, 1974). Our results for the patterns of nucleotide substitutions in pseudogenes and in functional genes lead to the prediction that both the coding and non-coding regions of protein coding genes should have high frequencies of A and T. Available data show that the non-coding regions are indeed high in A and T but the coding regions are low in T, though high in A.  相似文献   

7.
Vicatos S  Reddy BV  Kaznessis Y 《Proteins》2005,58(4):935-949
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.  相似文献   

8.
Many non-synonymous single nucleotide polymorphisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions.  相似文献   

9.
10.
The 44-amino-acid E5 protein of bovine papillomavirus type 1 is the shortest known protein with transforming activity. To identify the specific amino acids required for in vitro focus formation in mouse C127 cells, we used oligonucleotide-directed saturation mutagenesis to construct an extensive collection of mutants with missense mutations in the E5 gene. Characterization of mutants with amino acid substitutions in the hydrophobic middle third of the E5 protein indicated that efficient transformation requires a stretch of hydrophobic amino acids but not a specific amino acid sequence in this portion of the protein. Many amino acids in the carboxyl-terminal third of the protein can also undergo substitution without impairment of focus-forming activity, but the amino acids at seven positions, including two cysteine residues that mediate dimer formation, appear essential for efficient transforming activity. These essential amino acids are the most well conserved among related fibropapillomaviruses. The small size of the E5 protein, its lack of similarity to other transforming proteins, and its ability to tolerate many amino acid substitutions implies that it transforms cells via a novel mechanism.  相似文献   

11.
Bacteria with reduced DNA polymerase I activity have increased sensitivity to killing by chain-terminating nucleotides (S. A. Rashbaum and N. R. Cozzarelli, Nature 264:679-680, 1976). We have used this observation as the basis of a genetic strategy to identify mutations in the dnaE (polC) gene of Escherichia coli that alter sensitivity to 2',3'-dideoxyadenosine (ddA). Two dnaE (polC) mutant strains with increased sensitivity to ddA and one strain with increased resistance were isolated and characterized. The mutant phenotypes are due to single amino acid substitutions in the alpha subunit, the protein product of the dnaE (polC) gene. Increased sensitivity to ddA is produced by the L329F and H417Y substitutions, and increased resistance is produced by the G365S substitution. The L329F and H417Y substitutions also reduce the accuracy of DNA replication (the mutator phenotype), while the G365S substitution increases accuracy (the antimutator phenotype). All of the amino acid substitutions are in conserved regions near essential aspartate residues. These results prove the effectiveness of the genetic strategy in identifying informative dnaE (polC) mutations that can be used to elucidate the molecular basis of nucleotide interactions in the alpha subunit of the DNA polymerase III holoenzyme.  相似文献   

12.
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.  相似文献   

13.
Prediction of protein stability upon amino acid substitutions is an important problem in molecular biology and it will be helpful for designing stable mutants. In this work, we have analyzed the stability of protein mutants using three different data sets of 1791, 1396, and 2204 mutants, respectively, for thermal stability (DeltaTm), free energy change due to thermal (DeltaDeltaG), and denaturant denaturations (DeltaDeltaGH2O), obtained from the ProTherm database. We have classified the mutants into 380 possible substitutions and assigned the stability of each mutant using the information obtained with similar type of mutations. We observed that this assignment could distinguish the stabilizing and destabilizing mutants to an accuracy of 70-80% at different measures of stability. Further, we have classified the mutants based on secondary structure and solvent accessibility (ASA) and observed that the classification significantly improved the accuracy of prediction. The classification of mutants based on helix, strand, and coil distinguished the stabilizing/destabilizing mutants at an average accuracy of 82% and the correlation is 0.56; information about the location of residues at the interior, partially buried, and surface regions of a protein correctly identified the stabilizing/destabilizing residues at an average accuracy of 81% and the correlation is 0.59. The nine subclassifications based on three secondary structures and solvent accessibilities improved the accuracy of assigning stabilizing/destabilizing mutants to an accuracy of 84-89% for the three data sets. Further, the present method is able to predict the free energy change (DeltaDeltaG) upon mutations within a deviation of 0.64 kcal/mol. We suggest that this method could be used for predicting the stability of protein mutants.  相似文献   

14.
Amino acid composition and the evolutionary rates of protein-coding genes   总被引:14,自引:0,他引:14  
Summary Based on the rates of amino acid substitution for 60 mammalian genes of 50 codons or more, it is shown that the rate of amino acid substitution of a protein is correlated with its amino acid composition. In particular, the content of glycine residues is negatively correlated with the rate of amino acid substitution, and this content alone explains about 38% of the total variation in amino acid substitution rates among different protein families. The propensity of a polypeptide to evolve fast or slowly may be predicted from an index or indices of protein mutability directly derivable from the amino acid composition. The propensity of an amino acid to remain conserved during evolutionary times depends not so much on its being features prominently in active sites, but on its stability index, defined as the mean chemical distance [R. Grantham (1974) Science 185862–864] between the amino acid and its mutational derivatives produced by single-nucleotide substitutions. Functional constraints related to active and binding sites of proteins play only a minor role in determining the overall rate of amino acid substitution. The importance of amino acid composition in determining rates of substitution is illustrated with examples involving cytochrome c, cytochrome b5,ras-related genes, the calmodulin protein family, and fibrinopeptides.  相似文献   

15.
Ataxia telangiectasia (AT) is an autosomal recessive disorder characterized by cerebellar ataxia, telangiectasia, immunodeficiency, elevated α-fetoprotein levels, chromosomal instability, predisposition to cancer, and radiation sensitivity. We report the identification of a new, double missense mutation in the ataxia telangiectasia gene (ATM) of a Dutch family. This homozygous mutation consists of two consecutive base substitutions in exon 55: a T→G transversion at position 7875 of the ATM cDNA and a G→C transversion at position 7876. These transversions were confirmed by polymerase chain reaction/primer-induced restriction analysis with CelII. The double base substitution results in an amino acid change of an aspartic acid to a glutamic acid at codon 2625 and of an alanine to a proline at codon 2626 of the ATM protein. Both amino acids are conserved between the ATM protein and its functional homolog, the Atm gene product in the mouse. Furthermore, the Chou-Fasman and Robson predictions both demonstrate a change in the secondary structure of the ATM protein carrying the D2625E/A2626P mutation. These findings suggest that the double base substitution in the ATM gene is a disease-causing mutation. Received: 6 October 1997 / Accepted: 5 November 1997  相似文献   

16.
We analyzed the transactivation function of the acidic segment of the Ah receptor (amino acids 515-583) by reconstituting AhR-defective mouse hepatoma cells with mutants. Our data reveal that both hydrophobic and acidic residues are important for transactivation and that these residues are clustered in two regions of the acidic segment of AhR. Both regions are crucial for function, because disruption of either one substantially impairs transactivation of the chromosomal CYP1A1 target gene. Neither region contains an amino acid motif that resembles those reported for other acidic activation domains. Furthermore, proline substitutions in both regions do not impair transactivation in vivo, a finding that implies that alpha-helix formation is not required for function.  相似文献   

17.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

18.
Mutations in the fibrinogen gamma chain (FGG) gene have been associated with various disorders, such as dysfibrinogenemia, thrombophilia, and hypofibrinogenemia. A literature survey showed that a residue exchange in fibrinogen Milano I from γ Asp to Val at position 330 impairs fibrin polymerization. The D356V (D330V) mutation located in the C-terminus was predicted to be highly deleterious and to affect the function of the protein. The pathogenicity of the altered gene and changes in protein functions were predicted using in silico methods, such as SIFT, PolyPhen 2, I-Mutant 3.0, Align GV–GD, PhD–SNP, and SNPs&GO. The secondary structure of the mutant protein was unwound by the end of the 50-ns simulation period, and a structural change in the helix-turn transition of the alpha-helical (352–356) region residues was observed. Moreover, a change in the length of the helical region was visualized in the mutant trajectory file, indicating the local transient unfolding of the protein. The obtained computational results suggest that the substitution of the neutral amino acid valine for the acidic amino acid aspartic acid at position 356 results in an unwound conformation within 50 ns, which might contribute to defective polymerization. Our analysis also provides insights into the effect of the conformational change in the D356V (D330V) mutant on protein structure and function.  相似文献   

19.
Lee TC  Lee AS  Li KB 《Amino acids》2008,35(3):615-626
Determining if missense mutations are deleterious is critical for the analysis of genes implicated in disease. However, the mutational effects of many missense mutations in databases like the Breast Cancer Information Core are unclassified. Several approaches have emerged recently to determine such mutational effects but none have utilized amino acid property indices. We modified a previously described phylogenetic approach by first classifying benign substitutions based on the assumption that missense mutations that are maintained in orthologs are unlikely to affect function. A consensus conservation score based on 16 amino acid properties was used to characterize the remaining substitutions. This approach was evaluated with experimentally verified T4 lysozyme missnese mutations and is shown to be able to sieve out putative biochemical and structurally important residues. The use of amino acid properties can enhance the prediction of biochemical and structurally important residues and thus also predict the significance of missense mutations.  相似文献   

20.
Germline TP53 mutations result in cancer proneness syndromes known as Li-Fraumeni, Li-Fraumeni-like, and nonsyndromic predisposition with or without family history. To explore genotype/phenotype associations, we previously adopted a functional classification of all germline TP53 mutant alleles based on transactivation. Severe deficiency (SD) alleles were associated with more severe cancer proneness syndromes, and a larger number of tumors, compared with partial deficiency (PD) alleles. Because mutant p53 can exert dominant-negative (DN) effects, we addressed the relationship between DN and clinical manifestations. We reasoned that DN effects might be stronger in familial cancer cases associated with germline TP53 mutations, where mutant alleles coexist with the wild-type allele since conception. We examined 104 p53 mutant alleles with single amino acid substitutions described in the IARC germline database for (i) transactivation capability and (ii) capacity to reduce the activity of the wild-type allele (i.e., DN effect) using a quantitative yeast-based assay. The functional classifications of p53 alleles were then related to clinical variables. We confirmed that a classification based on transactivation alone can identify familial cancer cases with more severe clinical features. Classification based on DN effects allowed us to highlight similar associations but did not reveal distinct clinical subclasses of SD alleles, except for a correlation with tumor tissue prevalence. We conclude that in carriers of germline TP53 mutations transactivation-based classification of TP53 alleles appears more important for genotype/phenotype correlations than DN effects and that haplo-insufficiency of the TP53 gene is an important factor in cancer proneness in humans.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号