首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Hierarchical classifications of the 20 amino acids according to residue relationships within scoring matrices have not hitherto been tested for reliability. In fact, testing here of the residue groupings obtained thus from 18 published matrices shows that they vary considerably in reliability. This behaviour gives a new insight then into the matrices with respect to the relationships between the amino acid scores contained therein. For example, other than the trivial grouping of the 20 amino acids, no reliable residue groupings are present in all 18 matrix amino acid hierarchical classifications. Hierarchical classification of the 18 scoring matrices themselves is investigated in terms of matrix representation and choice of similarity and dissimilarity measures for matrix comparison. There is no absolute standard against which to compare a matrix clustering, of course, but it is possible to assess the usefulness of a measure for the purpose in terms of the reliability of the calculated tree. Matrix representation is shown to be important. Finally, a novel two-step approach for hierarchical classification of the 18 amino acid scoring matrices is described.  相似文献   

2.
3.
4.
5.
Changing effective population size and the McDonald-Kreitman test   总被引:2,自引:0,他引:2  
Eyre-Walker A 《Genetics》2002,162(4):2017-2024
Artifactual evidence of adaptive amino acid substitution can be generated within a McDonald-Kreitman test if some amino acid mutations are slightly deleterious and there has been an increase in effective population size. Here I investigate the conditions under which this occurs. I show that fairly small increases in effective population size can generate artifactual evidence of positive selection if there is no selection upon synonymous codon use. This problem is exacerbated by the removal of low-frequency polymorphisms. However, selection on synonymous codon use restricts the conditions under which artifactual evidence of adaptive evolution is produced.  相似文献   

6.
Local hydrophobic collapse of the polypeptide chain and transient long-range interactions in unfolded states of apomyoglobin appear to occur in regions of the amino acid sequence which, upon folding, bury an above-average area of hydrophobic surface. To explore the role of these interactions in protein folding, we prepared and characterized apomyoglobins with compensating point mutations designed to change the average buried surface area in local regions of the sequence, while conserving as much as possible the constitution of the hydrophobic core. The behavior of the mutants in quench-flow experiments to determine the folding pathway was exactly as predicted by the changes in the buried surface area parameter calculated from the amino acid sequence. In addition, spin label experiments with acid-unfolded mutant apomyoglobin showed that the transient long-range contacts that occur in the wild-type protein are abolished in the mutant, while new contacts are observed between areas that now have above-average buried surface area. We conclude that specific groupings of amino acid side-chains, which can be predicted from the sequence, are responsible for early hydrophobic interactions in the first phase of folding in apomyoglobin, and that these early interactions determine the subsequent course of the folding process.  相似文献   

7.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

8.
We have developed a web server, iPTREE-STAB for discriminating the stability of proteins (stabilizing or destabilizing) and predicting their stability changes (delta deltaG) upon single amino acid substitutions from amino acid sequence. The discrimination and prediction are mainly based on decision tree coupled with adaptive boosting algorithm, and classification and regression tree, respectively, using three neighboring residues of the mutant site along N- and C-terminals. Our method showed an accuracy of 82% for discriminating the stabilizing and destabilizing mutants, and a correlation of 0.70 for predicting protein stability changes upon mutations. AVAILABILITY: http://bioinformatics.myweb.hinet.net/iptree.htm. SUPPLEMENTARY INFORMATION: Dataset and other details are given.  相似文献   

9.
10.
MOTIVATION: Obtaining soluble proteins in sufficient concentrations is a recurring limiting factor in various experimental studies. Solubility is an individual trait of proteins which, under a given set of experimental conditions, is determined by their amino acid sequence. Accurate theoretical prediction of solubility from sequence is instrumental for setting priorities on targets in large-scale proteomics projects. RESULTS: We present a machine-learning approach called PROSO to assess the chance of a protein to be soluble upon heterologous expression in Escherichia coli based on its amino acid composition. The classification algorithm is organized as a two-layered structure in which the output of primary support vector machine (SVM) classifiers serves as input for a secondary Naive Bayes classifier. Experimental progress information from the TargetDB database as well as previously published datasets were used as the source of training data. In comparison with previously published methods our classification algorithm possesses improved discriminatory capacity characterized by the Matthews Correlation Coefficient (MCC) of 0.434 between predicted and known solubility states and the overall prediction accuracy of 72% (75 and 68% for positive and negative class, respectively). We also provide experimental verification of our predictions using solubility measurements for 31 mutational variants of two different proteins.  相似文献   

11.
12.
《Plains anthropologist》2013,58(76):123-132
Abstract

This paper presents the result of an analysis of 24 burial mounds and cairns in southwest Missouri. All of the sites belong to the Fristoe Burial Complex with an estimated age of A. D. 500 to A.D. 1000. Data selected for analysis consisted of 41 traits distributed in varying numbers among the sites. In order to observe relationships between sites, a Q-type factor analysis was used. An orthogonal rotation yielded five factors. Three factors are discussed. Factors 2 and 5 are not discussed because of the small number of sites explained by them. Factor 1 loads heavily on eight sites and Factors 3 and 4 each on seven sites. Seventeen of the 24 sites are explained by the three factors which are hypothesized to represent three distinct temporal groupings. Factor 1 is thought to depict a late grouping based on trade relationships between the Gulf Coast and the Ozark Highlands. Factor 4 appears to be a grouping of Late Woodland elements, and Factor 3 possibly represents a set of Mississippian and Late Woodland elements. Results of the analysis allow us to hypothesize three temporal groupings within the Fristoe Burial Complex. The results further indicate that factor analysis can be used as a technique to order archeological materials and generate hypotheses.  相似文献   

13.
如何减少注意资源的消耗、提升人类在动态视觉持续性注意任务中的表现,是持续性注意研究关注的重点问题,具有理论和实践的重要意义。多目标追踪任务是研究个体持续性注意的常用实验室方法。多目标追踪任务中,观察者可以利用基于物体特征的分组效应将多个运动目标知觉为一个更大的运动单元,从而减少注意资源的消耗、提高追踪任务表现。为了进一步节省注意资源、提升注意追踪的表现,研究者提出了注意追踪中分组效应的可加性问题。分组效应的可加性表现为基于两个及以上特征的分组对追踪表现的提高优于基于一个特征的分组。可加性的研究对理解不同分组效应的认知机制,个体动态视觉追踪中的注意机制和注意资源分配等具有重要意义。本文对以往的行为以及神经影像学研究进行了汇总,讨论了不同类型分组效应的知觉加工机制及其可加性,系统阐述了基于不同表面特征的不可加性,和基于表面特征与特定时空特征可加性的认知及其神经基础。未来可以从行为学实验角度探究更多基于不同特征分组效应的可加性,或者从注意追踪中基于不同分组效应的神经机制入手,探讨分组效应的可加性问题,为分组效应的分类及可加性研究提供更多认知和神经层面的依据。  相似文献   

14.
Phylogenetic reconstructions are a major component of many studies in evolutionary biology, but their accuracy can be reduced under certain conditions. Recent studies showed that the convergent evolution of some phenotypes resulted from recurrent amino acid substitutions in genes belonging to distant lineages. It has been suggested that these convergent substitutions could bias phylogenetic reconstruction toward grouping convergent phenotypes together, but such an effect has never been appropriately tested. We used computer simulations to determine the effect of convergent substitutions on the accuracy of phylogenetic inference. We show that, in some realistic conditions, even a relatively small proportion of convergent codons can strongly bias phylogenetic reconstruction, especially when amino acid sequences are used as characters. The strength of this bias does not depend on the reconstruction method but varies as a function of how much divergence had occurred among the lineages prior to any episodes of convergent substitutions. While the occurrence of this bias is difficult to predict, the risk of spurious groupings is strongly decreased by considering only 3rd codon positions, which are less subject to selection, as long as saturation problems are not present. Therefore, we recommend that, whenever possible, topologies obtained with amino acid sequences and 3rd codon positions be compared to identify potential phylogenetic biases and avoid evolutionarily misleading conclusions.  相似文献   

15.
MOTIVATION: Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. RESULTS: We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. AVAILABILITY: Dataset and stand-alone program are available upon request.  相似文献   

16.
Recent methodological advances permit the estimation of species richness and occurrences for rare species by linking species‐level occurrence models at the community level. The value of such methods is underscored by the ability to examine the influence of landscape heterogeneity on species assemblages at large spatial scales. A salient advantage of community‐level approaches is that parameter estimates for data‐poor species are more precise as the estimation process “borrows” from data‐rich species. However, this analytical benefit raises a question about the degree to which inferences are dependent on the implicit assumption of relatedness among species. Here, we assess the sensitivity of community/group‐level metrics, and individual‐level species inferences given various classification schemes for grouping species assemblages using multispecies occurrence models. We explore the implications of these groupings on parameter estimates for avian communities in two ecosystems: tropical forests in Puerto Rico and temperate forests in northeastern United States. We report on the classification performance and extent of variability in occurrence probabilities and species richness estimates that can be observed depending on the classification scheme used. We found estimates of species richness to be most precise and to have the best predictive performance when all of the data were grouped at a single community level. Community/group‐level parameters appear to be heavily influenced by the grouping criteria, but were not driven strictly by total number of detections for species. We found different grouping schemes can provide an opportunity to identify unique assemblage responses that would not have been found if all of the species were analyzed together. We suggest three guidelines: (1) classification schemes should be determined based on study objectives; (2) model selection should be used to quantitatively compare different classification approaches; and (3) sensitivity of results to different classification approaches should be assessed. These guidelines should help researchers apply hierarchical community models in the most effective manner.  相似文献   

17.
18.
19.
To examine the questions of whether the additive and dominance effects present for morphological characters in racial crosses are of sufficient consistency and magnitude to allow such genetic effects to be used for racial classification, we used a diallel experiment among the 25 well-defined Mexican races of maize, which include the ancestral stocks of most commercial and genetic maize types. With such an experiment, genetic effects and genotype by environmental interactions for one or more characters can be used to measure genetic and adaptational or environmental similarity. We used average parental effects (general combining abilities), specific effects, and genotype by environmental effects of 21 characters from the diallel (grown at three locations) to group the Mexican races of maize. The groupings based upon average genetic effects and upon genotype by environmental interactions are more satisfactory than groupings based upon specific effects. The standard errors for genetic distances based upon specific (largely dominance) effects seem to be too high for practical use. Principal components analyses of the same data suggest a similar conclusion.-The groupings based upon average genetic effects are in general agreement with previous studies, with the exception of Maíz Dulce, which is grouped with the Cónicos, rather than being isolated from the other Mexican races of maize.  相似文献   

20.
MOTIVATION: A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS: We introduce a new algorithm, called Sparse Probabilistic Suffix Trees (SPST), that identifies equivalence between the contexts of a VLMC. We show that, in many cases, the identification of these equivalence can improve the classification rate of the classical Probabilistic Suffix Trees (PST) algorithm. We also show that better classification can be achieved by identifying representative fingerprints in the amino acid chains, and this variation in the SPST algorithm is called F-SPST.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号