首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

While many authors have discussed models and tools for studying protein evolution at the sequence level, molecular function is usually mediated by complex, higher order features such as independently folding domains and linear motifs that are based on or embedded in a particular arrangment of features such as secondary structure elements, transmembrane domains and regions with intrinsic disorder. This ‘protein architecture’ can, in its most simplistic representation, be visualized as domain organization cartoons that can be used to compare proteins in terms of the order of their mostly globular domains.

Methodology

Here, we describe a visual approach and a webserver for protein comparison that extend the domain organization cartoon concept. By developing an information-rich, compact visualization of different protein features above the sequence level, potentially related proteins can be compared at the level of propensities for secondary structure, transmembrane domains and intrinsic disorder, in addition to PFAM domains. A public Web server is available at www.proteinarchitect.net, while the code is provided at protarchitect.sourceforge.net.

Conclusions/Significance

Due to recent advances in sequencing technologies we are now flooded with millions of predicted proteins that await comparative analysis. In many cases, mature tools focused on revealing hits with considerable global or local similarity to well-characterized proteins will not be able to lead us to testable hypotheses about a protein''s function, or the function of a particular region. The visual comparison of different types of protein features with ProteinArchitect will be useful when assessing the relevance of similarity search hits, to discover subgroups in protein families and superfamilies, and to understand protein regions with conserved features outside globular regions. Therefore, this approach is likely to help researchers to develop testable hypotheses about a protein''s function even if is somewhat distant from the more characterized proteins, by facilitating the discovery of features that are conserved above the sequence level for comparison and further experimental investigation.  相似文献   

2.
Abstract

This is a sequel to the paper (1), where a model which describes ranged series of codon frequencies was proposed. The model was tested against the empirical distributions obtained for the best studied species and was on the whole found to be in fairly good agreement with the available data. The few deviations from the model's predictions were found to have a monotypic regularity. In the present paper we proceed on the assumption that the deviations are due to inhomogeneous conditions of molecular evolution within a genome. This approach makes it possible to elaborate the theory presented earlier. An improved model is derived for the ranged distribution of codon frequencies, which is then tested against the experimental data.  相似文献   

3.
J. F. Y. Brookfield 《Genetics》1986,112(2):393-407
A quantitative model is proposed for the expected degree of relationship between copies of a family of transposable elements in a finite population of hosts. Special cases of the model (in which the process of homogenization of element copies either is or is not limited by transposition rate) are presented and illustrated, using data on mobile sequences from different species. It is shown that transposition will be expected, in large populations, to result in only a rather distant relationship between transposable elements at different genomic sites. Possible inadequacies of the model are suggested and quantified.  相似文献   

4.
5.
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein’s function in the cell. Understanding a protein’s secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.  相似文献   

6.
7.
8.
We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.  相似文献   

9.
Biological systems often display modularity, in the sense that they can be decomposed into nearly independent subsystems. Recent studies have suggested that modular structure can spontaneously emerge if goals (environments) change over time, such that each new goal shares the same set of sub-problems with previous goals. Such modularly varying goals can also dramatically speed up evolution, relative to evolution under a constant goal. These studies were based on simulations of model systems, such as logic circuits and RNA structure, which are generally not easy to treat analytically. We present, here, a simple model for evolution under modularly varying goals that can be solved analytically. This model helps to understand some of the fundamental mechanisms that lead to rapid emergence of modular structure under modularly varying goals. In particular, the model suggests a mechanism for the dramatic speedup in evolution observed under such temporally varying goals.  相似文献   

10.
Abstract

Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25–30 amino acid residues.  相似文献   

11.
Annotation of the rapidly accumulating body of sequence data relies heavily on the detection of remote homologues and functional motifs in protein families. The most popular methods rely on sequence alignment. These include programs that use a scoring matrix to compare the probability of a potential alignment with random chance and programs that use curated multiple alignments to train profile hidden Markov models (HMMs). Related approaches depend on bootstrapping multiple alignments from a single sequence. However, alignment-based programs have limitations. They make the assumption that contiguity is conserved between homologous segments, which may not be true in genetic recombination or horizontal transfer. Alignments also become ambiguous when sequence similarity drops below 40%. This has kindled interest in classification methods that do not rely on alignment. An approach to classification without alignment based on the distribution of contiguous sequences of four amino acids (4-grams) was developed. Interest in 4-grams stemmed from the observation that almost all theoretically possible 4-grams (20(4)) occur in natural sequences and the majority of 4-grams are uniformly distributed. This implies that the probability of finding identical 4-grams by random chance in unrelated sequences is low. A Bayesian probabilistic model was developed to test this hypothesis. For each protein family in Pfam-A and PIR-PSD, a feature vector called a probe was constructed from the set of 4-grams that best characterised the family. In rigorous jackknife tests, unknown sequences from Pfam-A and PIR-PSD were compared with the probes for each family. A classification result was deemed a true positive if the probe match with the highest probability was in first place in a rank-ordered list. This was achieved in 70% of cases. Analysis of false positives suggested that the precision might approach 85% if selected families were clustered into subsets. Case studies indicated that the 4-grams in common between an unknown and the best matching probe correlated with functional motifs from PRINTS. The results showed that remote homologues and functional motifs could be identified from an analysis of 4-gram patterns.  相似文献   

12.
The heat shock protein 70 kDa sequences (HSP70) are of great importance as molecular chaperones in protein folding and transport. They are abundant under conditions of cellular stress. They are highly conserved in all domains of life: Archaea, eubacteria, eukaryotes, and organelles (mitochondria, chloroplasts). A multiple alignment of a large collection of these sequences was obtained employing our symmetric-iterative ITERALIGN program (Brocchieri and Karlin 1998). Assessments of conservation are interpreted in evolutionary terms and with respect to functional implications. Many archaeal sequences (methanogens and halophiles) tend to align best with the Gram-positive sequences. These two groups also miss a signature segment [about 25 amino acids (aa) long] present in all other HSP70 species (Gupta and Golding 1993). We observed a second signature sequence of about 4 aa absent from all eukaryotic homologues, significantly aligned in all prokaryotic sequences. Consensus sequences were developed for eight groups [Archaea, Gram-positive, proteobacterial Gram-negative, singular bacteria, mitochondria, plastids, eukaryotic endoplasmic reticulum (ER) isoforms, eukaryotic cytoplasmic isoforms]. All group consensus comparisons tend to summarize better the alignments than do the individual sequence comparisons. The global individual consensus ``matches' 87% with the consensus of consensuses sequence. A functional analysis of the global consensus identifies a (new) highly significant mixed charge cluster proximal to the carboxyl terminus of the sequence highlighting the hypercharge run EEDKKRRER (one-letter aa code used). The individual Archaea and Gram-positive sequences contain a corresponding significant mixed charge cluster in the location of the charge cluster of the consensus sequence. In contrast, the four Gram-negative proteobacterial sequences of the alignment do not have a charge cluster (even at the 5% significance level). All eukaryotic HSP70 sequences have the analogous charge cluster. Strikingly, several of the eukaryotic isoforms show multiple mixed charged clusters. These clusters were interpreted with supporting data related to HSP70 activity in facilitating chaperone, transport, and secretion function. We observed that the consensus contains only a single tryptophan residue and a single conserved cysteine. This is interpreted with respect to the target rule for disaggregating misfolded proteins. The mitochondrial HSP70 connections to bacterial HSP70 are analyzed, suggesting a polyphyletic split of Trypanosoma and Leishmania protist mitochondrial (Mt) homologues separated from Mt-animal/fungal/plant homologues. Moreover, the HSP70 sequences from the amitochondrial Entamoeba histolytica and Trichomonas vaginalis species were analyzed. The E. histolytica HSP70 is most similar to the higher eukaryotic cytoplasmic sequences, with significantly weaker alignments to ER sequences and much diminished matching to all eubacterial, mitochondrial, and chloroplast sequences. This appears to be at variance with the hypothesis that E. histolytica rather recently lost its mitochondrial organelle. T. vaginalis contains two HSP70 sequences, one Mt-like and the second similar to eukaryotic cytoplasmic sequences suggesting two diverse origins. Received: 29 January 1998 / Accepted: 14 May 1998  相似文献   

13.
Cover illustration: Soon designing proteins on demand? This is the vision expressed by the designer pencils pointing at the logo of the meeting held in September 2006 in Greifswald, Germany. This special issue was edited by Prof. Uwe Bornscheuer from Greifswald, who selected papers from keynote speakers at this meeting. With special thanks to Prof. Romas Kazlauskas, University of Minnesota, for the design of the conference logo. Pencils © FOTOLIA.  相似文献   

14.
Cover illustration: Protein Design and Evolution for Biocatalysis. This special issue of Biotechnology Journal contains selected contributions from scientists participating to the ESF-EMBO Symposium which took place in October 2008 in San Feliu, Spain. Guest Editor is the chair and organizer of the meeting, Jiri Damborsky from Brno (Czech Republic). He highlights a variety of topics brought up in the meeting, ranging from new methods of rational design, directed evolution, metagenomics and single-molecule techniques, to construction of useful enzymes for industrial applications. Uwe Bornscheuer (Greifswald, Germany) authored a meeting report. Image colored pencils, © PhotoDisc, Inc.; Protein logo © ESF.  相似文献   

15.
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.  相似文献   

16.
Laboratory-Directed Protein Evolution   总被引:19,自引:0,他引:19       下载免费PDF全文
Systematic approaches to directed evolution of proteins have been documented since the 1970s. The ability to recruit new protein functions arises from the considerable substrate ambiguity of many proteins. The substrate ambiguity of a protein can be interpreted as the evolutionary potential that allows a protein to acquire new specificities through mutation or to regain function via mutations that differ from the original protein sequence. All organisms have evolutionarily exploited this substrate ambiguity. When exploited in a laboratory under controlled mutagenesis and selection, it enables a protein to “evolve” in desired directions. One of the most effective strategies in directed protein evolution is to gradually accumulate mutations, either sequentially or by recombination, while applying selective pressure. This is typically achieved by the generation of libraries of mutants followed by efficient screening of these libraries for targeted functions and subsequent repetition of the process using improved mutants from the previous screening. Here we review some of the successful strategies in creating protein diversity and the more recent progress in directed protein evolution in a wide range of scientific disciplines and its impacts in chemical, pharmaceutical, and agricultural sciences.  相似文献   

17.
Incorrect protein translation, caused by codon mismatch, is an important problem of living cells. In this work, a computational model was introduced to quantify the effects of codon mismatch and the model was used to study the protein translation of Saccharomyces cerevisiae. According to simulation results, the probability of codon mismatch will increase when the supply of amino acids is unbalanced, and the longer is the codon sequence, the larger is the probability for incorrect translation to occur, making the synthesis of long peptide chain difficult. By comparing to simulation results without codon mismatch effects taken into account, the fraction of mRNAs with bound ribosome decrease faster along the mRNAs, making the 5’ ramp phenomenon more obvious. It was also found in our work that the premature mechanism resulted from codon mismatch can reduce the proportion of incorrect translation when the amino acid supply is extremely unbalanced, which is one possible source of high fidelity protein synthesis after peptidyl transfer.  相似文献   

18.
19.
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.  相似文献   

20.
An Analysis of Ball's Empirical Model of Stomatal Conductance   总被引:12,自引:0,他引:12  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号