首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sim KL  Creamer TP 《Proteins》2004,54(4):629-638
Protein simple sequences, a subset of low-complexity sequences, are regions of sequence highly enriched in one or a few residue types. Simple sequences are exceedingly common, the average being more than one per protein sequence. Despite being so common, such sequences are not well-studied. The simple sequences that have been subjected to detailed study are often found to possess important functions. Here we present a survey of protein simple sequences, generally enriched in a single residue type, with the aim of studying their conservation. We find that the majority of such simple sequences are not conserved. However, conserved protein simple sequences are relatively common, with approximately 11% of the surveyed protein families possessing a conserved simple sequence. The data obtained in this study support the idea that simple sequences are conserved for functional reasons. Such functions can range from substrate binding, to mediating protein-protein interactions, to structural integrity. A perhaps surprising finding is that the residue enriching a conserved simple sequence is itself not necessarily conserved. Neither is the length of many of the highly conserved simple sequences. In the few cases where structural and functional data is available it is found that the conserved simple sequences are consistent with both local structure and function. The data presented support the idea that protein simple sequences can be conserved and have important roles in protein structure and function.  相似文献   

2.
Huntley MA  Golding GB 《Proteins》2002,48(1):134-140
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.  相似文献   

3.
4.
It is well established that hydrophobic signal sequences direct proteins into or across the endoplasmic reticulum membrane in eukaryotes and cell membranes in prokaryotes. Although it is recognized that eukaryote proteins are efficiently secreted by bacterial systems, the export of bacterial proteins by eukaryotes has received little attention. To investigate membrane translocation of bacterial proteins by mammalian cells, the secretion of a bacterial endoglucanase (endoglucanase E) from stably transfected Chinese hamster ovary cells has been examined. We report that a functional endoglucanase is secreted when fused to prokaryote or eukaryote signal peptides. Furthermore, the endoglucanase was post-translationally modified before secretion. Data presented in this paper suggest that secretion of bacterial proteins by eukaryote cells may be a general phenomenon and infer that there are no specific requirements with respect to the origin of the signal sequences.  相似文献   

5.
6.
7.
Structures of homologous proteins are usually conserved during evolution, as are critical active site residues. This is the case for actin and tubulin, the two most important cytoskeleton proteins in eukaryotes. Actins and their related proteins (Arps) constitute a large superfamily whereas the tubulin family has fewer members. Unaligned sequences of these two protein families were analysed by searching for short groups of family-specific amino acid residues, that we call motifs, and by counting the number of residues from one motif to the next. For each sequence, the set of motif-to-motif residue counts forms a subfamily-specific pattern (landmark pattern) allowing actin and tubulin superfamily members to be identified and sorted into subfamilies. The differences between patterns of individual subfamilies are due to inserts and deletions (indels). Inserts appear to have arisen at an early stage in eukaryote evolution as suggested by the small but consistent kingdom-dependent differences found within many Arp subfamilies and in γ-tubulins. Inserts tend to be in surface loops where they can influence subfamily-specific function without disturbing the core structure of the protein. The relatively few indels found for tubulins have similar positions to established results, whereas we find many previously unreported indel positions and lengths for the metazoan Arps.  相似文献   

8.
Summary A statistical analysis of the data tabulated in the Atlas of Protein Sequence and Structure 1972 indicates that the observed frequency of occurrence of the tripeptides Asn-X-Ser and Asn-X-Thr is approximately one third of the expected in eukaryotic proteins, but in prokaryotic proteins the observation agrees closely with expectation. Thus the lowered frequency of these tripeptides found by Hunt and Dayhoff is restricted to eukaryotic proteins. Of all the Asn-X-Ser/Thr sequences examined, those which contain covalently attached carbohydrates are found only in the extracellular proteins of eukaryote. These observations are discussed in relation to the evolution of glycoproteins which seems to have occurred in the ancestor of eukaryotes after the divergence from prokaryotes.  相似文献   

9.

Background

In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient.

Results

The methods developed in this paper take sets of paralogs for different proteins found in the same species and make a pairing based on their evolutionary distance relative to a set of other proteins that are unique and so have a known relationship (singletons). The former constitute a set of unlabelled nodes in a graph while the latter are labelled. Two variants were tested, one based on a phylogenetic tree of the sequences (the topology-based method) and a simpler, faster variant based only on the inter-sequence distances (the distance-based method). Over a set of test proteins, both gave good results, with the topology method performing slightly better.

Conclusions

The methods develop here still need refinement and augmentation from constraints other than the sequence data alone, such as known interactions from annotation and databases, or non-trivial relationships in genome location. With the ever growing numbers of eukaryotic genomes, it is hoped that the methods described here will open a route to the use of these data equal to the current success attained with bacterial sequences.
  相似文献   

10.
11.
Oliveira L  Paiva PB  Paiva AC  Vriend G 《Proteins》2003,52(4):544-552
We introduce sequence entropy-variability plots as a method of analyzing families of protein sequences, and demonstrate this for three well-known sequence families: globins, ras-like proteins, and serine-proteases. The location of an aligned residue position in the entropy-variability plot correlates with structural characteristics, and with known facts about the roles of individual amino acids in the function of these proteins. The large numbers of known sequences in these families allowed us to introduce new filtering methods for variability patterns. The results are discussed in terms of a simple evolutionary model for functional proteins.  相似文献   

12.
The 82-90 kD family of molecular chaperone proteins has homologs in eukaryotes (Hsp90) and many eubacteria (HtpG) but not in Archaebacteria. We used representatives of all four different eukaryotic paralogs (cytosolic, endoplasmic reticulum (ER), chloroplast, mitochondrial) together with numerous eubacterial HtpG proteins for phylogenetic analyses to investigate their evolutionary origins. Our trees confirm that none of the organellar Hsp90s derives from the endosymbionts of early eukaryotes. Contrary to previous suggestions of distant origins through lateral gene transfer (LGT) all eukaryote Hsp90s are related to Gram-positive eubacterial HtpG proteins. The nucleocytosolic, ER and chloroplast Hsp90 paralogs are clearly mutually related. The origin of mitochondrial Hsp90 is more obscure, as these sequences are deeply nested within eubacteria. Our trees also reveal a deep split within eubacteria into a group of mainly long-branching sequences (including the eukaryote mitochondrial Hsp90s) and another group comprising exclusively short-branching HtpG proteins, from which the cytosolic/ER versions probably arose. Both versions are present in several eubacterial phyla, suggesting gene duplication very early in eubacterial evolution and multiple independent losses thereafter. We identified one probable case of LGT within eubacteria. However, multiple losses can simply explain the evolutionary pattern of the eubacterial HtpG paralogs and predominate over LGT. We suggest that the actinobacterial ancestor of eukaryotes harbored genes for both eubacterial HtpG paralogs, as the actinobacterium Streptomyces coelicolor still does; one could have given rise to the mitochondrial Hsp90 and the other, following another duplication event in the ancestral eukaryote, to the cytosolic and ER Hsp90 homologs.  相似文献   

13.
14.
Hydrogenases, oxygen-sensitive enzymes that can make hydrogen gas, are key to the function of hydrogen-producing organelles (hydrogenosomes), which occur in anaerobic protozoa scattered throughout the eukaryotic tree. Hydrogenases also play a central role in the hydrogen and syntrophic hypotheses for eukaryogenesis. Here, we show that sequences related to iron-only hydrogenases ([Fe] hydrogenases) are more widely distributed among eukaryotes than reports of hydrogen production have suggested. Genes encoding small proteins which contain conserved structural features unique to [Fe] hydrogenases were identified on all well-surveyed aerobic eukaryote genomes. Longer sequences encoding [Fe] hydrogenases also occur in the anaerobic eukaryotes Entamoeba histolytica and Spironucleus barkhanus, both of which lack hydrogenosomes. We also identified a new [Fe] hydrogenase sequence from Trichomonas vaginalis, bringing the total of [Fe] hydrogenases reported for this organism to three, all of which may function within its hydrogenosomes. Phylogenetic analysis and hypothesis testing using likelihood ratio tests and parametric bootstrapping suggest that the [Fe] hydrogenases in anaerobic eukaryotes are not monophyletic. Iron-only hydrogenases from Entamoeba, Spironucleus, and Trichomonas are plausibly monophyletic, consistent with the hypothesis that a gene for [Fe] hydrogenase was already present on the genome of the common, perhaps also anaerobic, ancestor of these phylogenetically distinct eukaryotes. Trees where the [Fe] hydrogenase from the hydrogenosomal ciliate Nyctotherus was constrained to be monophyletic with the other eukaryote sequences were rejected using a likelihood ratio test of monophyly. In most analyses, the Nyctotherus sequence formed a sister group with a [Fe] hydrogenase on the genome of the eubacterium Desulfovibrio vulgaris. Thus, it is possible that Nyctotherus obtained its hydrogenosomal [Fe] hydrogenase from a different source from Trichomonas for its hydrogenosomes. We find no support for the hypothesis that components of the Nyctotherus [Fe] hydrogenase fusion protein derive from the mitochondrial respiratory chain.  相似文献   

15.
Issac B  Raghava GP 《BioTechniques》2002,33(3):548-50, 552, 554-6
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.  相似文献   

16.
17.
The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues.  相似文献   

18.
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.  相似文献   

19.
Protein-DNA interactions are crucial for many biological processes. Attempts to model these interactions have generally taken the form of amino acid-base recognition codes or purely sequence-based profile methods, which depend on the availability of extensive sequence and structural information for specific structural families, neglect side-chain conformational variability, and lack generality beyond the structural family used to train the model. Here, we take advantage of recent advances in rotamer-based protein design and the large number of structurally characterized protein-DNA complexes to develop and parameterize a simple physical model for protein-DNA interactions. The model shows considerable promise for redesigning amino acids at protein-DNA interfaces, as design calculations recover the amino acid residue identities and conformations at these interfaces with accuracies comparable to sequence recovery in globular proteins. The model shows promise also for predicting DNA-binding specificity for fixed protein sequences: native DNA sequences are selected correctly from pools of competing DNA substrates; however, incorporation of backbone movement will likely be required to improve performance in homology modeling applications. Interestingly, optimization of zinc finger protein amino acid sequences for high-affinity binding to specific DNA sequences results in proteins with little or no predicted specificity, suggesting that naturally occurring DNA-binding proteins are optimized for specificity rather than affinity. When combined with algorithms that optimize specificity directly, the simple computational model developed here should be useful for the engineering of proteins with novel DNA-binding specificities.  相似文献   

20.
Summary Analysis of the sequence data available today, comprising more than 500,000 bases, confirms the previously observed phenomenon that there are distinct dinucleotide preferences in DNA sequences. Consistent behaviour is observed in the major sequence groups analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet preferences are common to all groups and are found in most sequences of the Los Alamos Library. The patterns seen in such large data sets are very significant statistically and biologically. Since they are present in numerous and diverse nucleotide sequences, one may conclude that they confer evolutionary advantages on the organism.In eukaryotes RR and YY dinucleotides are preferred over YR and RY (where R is a purine and Y a pyrimidine). Since opposite-chain nearest-neighbour purine clashes are major determinants of DNA structure, it appears that the tight packaging of DNA in nucleosomes disfavors, in general, such (YR and RY) steric repulsion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号