共查询到20条相似文献,搜索用时 15 毫秒
1.
An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed form the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences. © 1993 Wiley-Liss, Inc. 相似文献
2.
The increasing availability of prokaryotic genome sequenceshas shown that simple sequence repeats (SSRs) are widespreadin prokaryotes and that there is extensive variation in theirlength, number and distribution. Considering their potentialimportance in generating genomic diversity, we determined thedistribution of a specific group of SSRs, mononucleotide repeatsof size between 5 and 13 nt, in 157 sequenced prokaryotic genomes.The data obtained in the present study show that (i) a largenumber of mononucleotide SSRs is present in all prokaryoticgenomes investigated, (ii) shorter repeats are much more abundantthan longer repeats, and (iii) in the majority of the genomes,longer mononucleotide SSRs are excluded from coding regionsalthough we identified several organisms where mononucleotideSSRs are not excluded from the coding regions. We also observedthat some genomes contain more mononucleotide SSRs than expected,while others contain significantly less. Bacterial genomes thatcontain much less mononucleotide SSRs than expected are generallylarger and more GC-rich, while bacterial genomes that containmuch more mononucleotide SSRs than expected are in general smallerand more AT-rich. Finally, we also noted that genomes that containa high fraction of horizontally transferred genes have a lowermononucleotide SSR density and that A and T are generally overrepresentedin mononucleotide SSRs. 相似文献
3.
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies. 相似文献
4.
Intron/exon structure of the human gene for the muscle isozyme of glycogen phosphorylase 总被引:10,自引:0,他引:10
The intron/exon organization of the human gene for glycogen phosphorylase has been determined. The segments of the polypeptide chain that corresponds to the 19 exons of the gene are examined for relationships between the three-dimensional structure to the protein and gene structure. Only weak correlations are observed between domains of phosphorylase and exons. The nucleotide binding domains that are found in phosphorylase and other glycolytic enzymes are examined for relationships between exons of the genes and structures of the domains. When mapped to the three-dimensional structures, the intron/exon boundaries are shown to be widely distributed in this family of protein domains. 相似文献
5.
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence. 相似文献
6.
7.
Highly repetitive sequence within proteins is an abundant feature yet is considered by some to be the protein equivalent of \"junk DNA.\" Homopolymer sequences, the most highly repetitive of this group, are typically encoded by trinucleotide repeats at the DNA level. It is thought that many of these sequences are produced by a replicative slippage mechanism. Recent studies suggest that these highly mutable regions within proteins may allow for rapid morphological evolution emerging from the increased variability afforded by such coding structures. However, in a homopolymer, it is difficult to determine if the repeated amino acid is due to slippage at the DNA level or due to selection at the protein level. Here we develop and test a model to detect cases for which the homopolymer tract has clearly been selected for, with no evidence of slippage at the DNA level. The polyserine tract within the phosphatidylserine receptor protein is used as an excellent example of one such case. 相似文献
8.
9.
G. Novelli M. C. Carlà Campa L. Sineo A. Pizzuti V. Silani E. Pontieri F. Sangiuolo M. Gennarelli G. Bernardi B. Dallapiccola 《Human Evolution》1994,9(4):315-321
Myotonic dystrophy is due to instability of a [CTG] repeat in the myotonin-protein kinase gene. We have sequenced the complete
3′ untranslated region of this gene which contains the repeat, in seven nonhuman primates. We found that the genomic organisation
was conserved, suggesting that this region has important regulatory functions. These data also argue that the human state
is derived from a primate ancestor in which the mutational event did not involve the loss of cryptic sequences interrupting
or surrounding the repeat, but likely affected only the original length of the repeat. 相似文献
10.
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure. 相似文献
11.
Investigating the relative importance of protein stability, function, and folding kinetics in driving protein evolution has long been hindered by the fact that we can only compare modern natural proteins, the products of the very process we seek to understand, to each other, with no external references or baselines. Through a large-scale all-atom simulation of protein evolution, we have created a large diverse alignment of SH3 domain sequences which have been selected only for native state stability, with no other influencing factors. Although the average pairwise identity between computationally evolved and natural sequences is only 17%, the residue frequency distributions of the computationally evolved sequences are similar to natural SH3 sequences at 86% of the positions in the domain, suggesting that optimization for the native state structure has dominated the evolution of natural SH3 domains. Additionally, the positions which play a consistent role in the transition state of three well-characterized SH3 domains (by phi-value analysis) are structurally optimized for the native state, and vice versa. Indeed, we see a specific and significant correlation between sequence optimization for native state stability and conservation of transition state structure. 相似文献
12.
Senthilkumar R Sabarinathan R Hameed BS Banerjee N Chidambarathanu N Karthik R Sekar K 《Bioinformation》2010,4(7):271-275
An Internet computing server has been developed to identify all the occurrences of the internal sequence repeats in a protein and DNA sequences. Further, an option is provided for the users to check the occurrence(s) of the resultant sequence repeats in the other sequence and structure (Protein Data Bank) databases. The databases deployed in the proposed computing engine are up-to-date and thus the users will get the latest information available in the respective databases. The server is freely accessible over the World Wide Web (WWW). AVAILABILITY: http://bioserver1.physics.iisc.ernet.in/fair/ 相似文献
13.
Some probabilistic results on simple sequence repeats (SSRs) in DNA sequences are derived and used to quantify the nonrandomness of SSRs as an index of nonrandomness. The applicability of the index of nonrandomness is illustrated using several examples from the literature on selected human diseased genes. 相似文献
14.
Rajathei David Mary Mani K Saravanan 《Journal of biomolecular structure & dynamics》2013,31(3):534-551
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains. 相似文献
15.
The beta-propeller fold is a phylogenetically widespread, common protein architecture able to support a range of different functions such as catalysis, ligand binding and transport, regulation and protein binding. Interestingly, it appears that the beta-propeller topology is also compatible with strikingly diverse sequences. Amongst this diversity, there are three large groups of proteins with related sequences and very important cellular and intercellular regulatory functions: WD, kelch, and YWTD proteins. A common characteristic between these protein families is that their sequences, while distinct, all contain internal repeats 40-45 residues long. Through a pangenomic analysis using internal repeat profiles derived from the structurally known propeller modules of the eukaryotic protein RCC1 and the related prokaryotic protein BLIP-II, we have defined a new superfamily of propeller repeats, the RCC1-like repeats (RLRs). These sequences turn out to be more phylogenetically widespread than other large groups of propeller proteins, occurring in both prokaryotic and eukaryotic genomes. Interestingly, our research showed that RLR domains with different numbers of repeats exist, ranging from 3 to 7, and possibly more. A novel, intriguing finding is the discovery of sequences with 3 repeats, as well as proteins with 10 modular units, though in the latter case it is not clear whether these are made of two 5-bladed domains or a single, novel 10-bladed propeller. In addition, the results indicate that circular permutation events may have taken place in the evolution of these proteins. It is now established that the group of RLR proteins is extremely numerous and is characterized by unique, remarkable features which place it in a position of special interest as an important superfamily of proteins in nature. 相似文献
16.
Nick V. Grishin 《Proteins》2015,83(7):1238-1251
ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up‐to‐date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) each week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi‐domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert‐driven analysis. Proteins 2015; 83:1238–1251. © 2015 Wiley Periodicals, Inc. 相似文献
17.
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature''s blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective. 相似文献
18.
The structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS-PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonreduntant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate:sugar phosphotransferase system (PEP:PTS) and for bacterial 2-component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of ProDom. 相似文献
19.
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function. 相似文献
20.
MARIE‐HÉLÈNE AVELANGE‐MACHEREL NICOLE PAYET DAVID LALANNE MARTINE NEVEU DIMITRI TOLLETER JUDITH BURSTIN DAVID MACHEREL 《Plant, cell & environment》2015,38(7):1299-1311
LEAM, a late embryogenesis abundant protein, and HSP22, a small heat shock protein, were shown to accumulate in the mitochondria during pea (Pisum sativum L.) seed development, where they are expected to contribute to desiccation tolerance. Here, their expression was examined in seeds of 89 pea genotypes by Western blot analysis. All genotypes expressed LEAM and HSP22 in similar amounts. In contrast with HSP22, LEAM displayed different isoforms according to apparent molecular mass. Each of the 89 genotypes harboured a single LEAM isoform. Genomic and RT‐PCR analysis revealed four LEAM genes differing by a small variable indel in the coding region. These variations were consistent with the apparent molecular mass of each isoform. Indels, which occurred in repeated domains, did not alter the main properties of LEAM. Structural modelling indicated that the class A α‐helix structure, which allows interactions with the mitochondrial inner membrane in the dry state, was preserved in all isoforms, suggesting functionality is maintained. The overall results point out the essential character of LEAM and HSP22 in pea seeds. LEAM variability is discussed in terms of pea breeding history as well as LEA gene evolution mechanisms. 相似文献