共查询到20条相似文献,搜索用时 0 毫秒
1.
Zakharia M. Frenkel Zeev M. Frenkel Edward N. Trifonov Sagi Snir 《Journal of theoretical biology》2009,260(3):438-444
A novel approach for evaluation of sequence relatedness via a network over the sequence space is presented. This relatedness is quantified by graph theoretical techniques. The graph is perceived as a flow network, and flow algorithms are applied. The number of independent pathways between nodes in the network is shown to reflect structural similarity of corresponding protein fragments. These results provide an appropriate parameter for quantitative estimation of such relatedness, as well as reliability of the prediction. They also demonstrate a new potential for sequence analysis and comparison by means of the flow network in the sequence space. 相似文献
2.
Twenty-seven protein sequence elements, six to nine amino acids long, were extracted from 15 phylogenetically diverse complete
prokaryotic proteomes. The elements are present in all of these proteomes, with at least one copy each (omnipresent elements),
and have presumably been conserved since the last universal common ancestor (LUCA). All these omnipresent elements are identified
in crystallized protein structures as parts of highly conserved closed loops, 25–30 residues long, thus representing the closed-loop
modules discovered in 2000 by Berezovsky et al. The omnipresent peptides make up seven distinct groups, of which the largest groups, Aleph and Beth,
contain 18 and four elements, respectively, which are related but different, while five other groups are represented by only
one element each. The LUCA modules appear with one or several copies per protein molecule in a variety of combinations depending
on the functional identity of the corresponding protein. The functional involvement of individual LUCA modules is outlined
on the basis of known protein annotations. Analyses of all the related sequences in a large, formatted protein sequence space
suggest that many, if not all, of the 27 omnipresent elements have a common sequence origin. This sequence space network analysis
may lead to elucidation of the earliest stages of protein evolution. 相似文献
3.
Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word “BLAST” becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. 相似文献
4.
5.
Structural similarity to link sequence space: new potential superfamilies and implications for structural genomics
下载免费PDF全文

Aloy P Oliva B Querol E Aviles FX Russell RB 《Protein science : a publication of the Protein Society》2002,11(5):1101-1116
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods. 相似文献
6.
Our efforts to classify the functional units of many proteins, the modules, are reviewed. The data from the sequencing projects for various model organisms are extremely helpful in deducing the evolution of proteins and modules. For example, a dramatic increase of modular proteins can be observed from yeast to C. elegans in accordance with new protein functions that had to be introduced in multicellular organisms. Our sequence characterization of modules relies on sensitive similarity search algorithms and the collection of multiple sequence alignments for each module. To trace the evolution of modules and to further automate the classification, we have developed a sequence and a module alerting system that checks newly arriving sequence data for the presence of already classified modules. Using these systems, we were able to identify an unexpected similarity between extracellular C1Q modules with bacterial proteins. 相似文献
7.
Martijn A. Huynen 《Journal of molecular evolution》1996,43(3):165-169
RNA secondary-structure folding algorithms predict the existence of connected networks of RNA sequences with identical secondary
structures. Fitness landscapes that are based on the mapping between RNA sequence and RNA secondary structure hence have many
neutral paths. A neutral walk on these fitness landscapes gives access to a virtually unlimited number of secondary structures
that are a single point mutation from the neutral path. This shows that neutral evolution explores phenotype space and can
play a role in adaptation.
Received: 23 December 1995 / Accepted: 17 March 1996 相似文献
8.
9.
Zakhar M. Frenkel Edward N. Trifonov 《Journal of biomolecular structure & dynamics》2013,31(6):643-655
Abstract The closed loops within the proteins of the TIM-barrel fold family are analyzed and compared sequence- and structure-wise. The size distribution of the closed loops of the TIM-barrels confirms universal preference to the standard size of 25–30 residues. 3D structural RMSD comparisons of the closed loops and presentation of their sequences in binary form suggest that the TIM-barrel proteins are built from descendants of several types of basic closed loop prototypes. Comparison of these prototypes points to a likely common ancestor—the alpha helix containing closed loops of 28 amino acids. The presumed ancestor is characterized by specific binary consensus sequence. 相似文献
10.
Deterministic models of mutation and selection in the space of (binary) nucleotide-type sequences have been investigated
for haploid populations during the past 25 years, and, recently, for diploid populations as well. These models, in particular
their ‘error thresholds’, have mainly been analyzed by numerical methods and perturbation techniques. We consider them here
by means of bifurcation theory, which improves our understanding of both equilibrium and dynamical properties.
In a caricature obtained from the original model by neglecting back mutation to the favourable allele, the familiar error
threshold of the haploid two-class model turns out to be a simple transcritical bifurcation, whereas its diploid counterpart
exhibits an additional saddle node. This corresponds to a second error threshold. Three-class models with neutral spaces of
unequal size introduce further features. Such are a global bifurcation in haploid populations, and simple examples of Hopf
bifurcations (as predicted by Akin’s theorem) in the diploid case.
Received 13 June 1995; received in revised form 26 July 1996 相似文献
11.
The amino acid sequence of two small ribosomal proteins from Bacillus stearothermophilus 总被引:3,自引:0,他引:3
The low-Mr proteins (tentatively called protein I and II) were purified from 2 M NaCl extracts of the Bacillus stearothermophilus ribosome. Their amino acid sequences have been determined from the peptides obtained by digestion with trypsin, chymotrypsin, and pepsin, and by cleavage with CNBr, using the micro-DABITC/PITC double-coupling method [FEBS Lett. (1978) 93, 205-214]. Protein I contains 56 residues and has an Mr of 6514. Protein II had 37 residues with an Mr of 4361. The amino acid sequence of protein I shows significant similarity to L32 from E. coli, whereas that of protein II is slightly, if at all, related to ribosomal protein L34 from E. coli. 相似文献
12.
R. P. Ambler 《Journal of molecular evolution》1996,42(6):617-630
Despite the revolution caused by information from macromolecular sequences, the basis of bacterial classification remains the genus and the species. How do these terms relate to the variety of bacteria that exist on earth? In this paper, the inter- and intraspecies differences in amino acid sequence of several bacterial electron transport proteins, cytochromesc, and blue copper proteins are compared. For the soil and water organisms studied, bacterial species can be classed as “tight” when there is little intraspecies variation, or “loose” when this variation is large. For this set of proteins and organisms, interspecies variation is much larger than that within a species. Examples of “tight” species arePseudomonas aeruginosa andRhodobacter sphaeroides, whilePseudomonas stutzeri andRhodopseudomonas palustris are loose species. The results are discussed in the context of the origin and age of bacterial species, and the distribution of genomes in “sequence space.” The situation is probably different for commensal or pathogenic bacteria, whose population structure and evolution are linked to the properties of another organism. 相似文献
13.
Functional divergence in protein (family) sequence evolution 总被引:6,自引:0,他引:6
Gu X 《Genetica》2003,118(2-3):133-141
As widely used today to infer function, the homology search is based on the neutral theory that sites of greatest functional significance are under the strongest selective constraints as well as lowest evolutionary rates, and vice versa. Therefore, site-specific rate changes (or altered selective constraints) are related to functional divergence during protein (family) evolution. In this paper, we review our recent work about this issue. We show a great deal of functional information can be obtained from the evolutionary perspective, which can in turn be used to facilitate high throughput functional assays. The emergence of evolutionary functional genomics is also indicated. The related software DIVERGE can be obtained form http://xgu1.zool.iastate.edu. 相似文献
14.
The lipoyl-binding domain is often present, in one or several copies, in the E2 subunit and, less often, in the E1 and E3 subunits of 2-oxo acid dehydrogenase complexes. Phylogenetic analysis shows evidence of multiple, independent intragenomic recombination events between different versions of the lipoyl-binding domain in various bacteria and eukaryotic mitochondria, leading to homogenization of the sequences of the lipoyl-binding domain within the same enzymatic complex in several bacterial lineages. This appears to be the first case of sequence homogenization at the level of an individual domain in prokaryotes. 相似文献
15.
The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html). 相似文献
16.
A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.Correspondence to: H.A. Lim 相似文献
17.
Kadlík V Strohalm M Kodícek M 《Biochemical and biophysical research communications》2003,305(4):1091-1093
Lysine epsilon -amino group reacts with citraconic anhydride forming a derivative, which is stable on terms for trypsin cleavage. This modification changes the spectrum of peptides formed by the trypsin action; as the number of trypsin-sensitive sites is reduced, the peptides with higher molecular mass can survive in the digest. The various studies of proteins by MALDI-TOF mass spectrometry are often complicated by the low sequence coverage of the peptide chain. This paper demonstrates that the modification of proteins by citraconylation before trypsin cleavage represents a simple experimental technique, which allows a significant increase of sequence coverage in MALDI-TOF mass spectrometry. This improvement is caused both by change of trypsin fragmentation pattern and by disturbance of the protein's native tertiary structure. 相似文献
18.
Complete nucleotide sequence of immunogenic protein MPB70 from Mycobacterium bovis BCG 总被引:15,自引:0,他引:15
Kunihiro Terasaka Ryuji Yamaguchi Kazuhiro Matsuo Akihiro Yamazaki Sadamu Nagai Takeshi Yamada 《FEMS microbiology letters》1989,58(2-3):273-276
The extracellular protein MPB70 is a heat-stable immunogenic protein which was found in the culture filtrate of Mycobacterium bovis BCG Japanese. We determined the complete nt and aa sequences of MPB70 and correlated with the previously reported data. The N-terminal sequence revealed that the signal peptide (SP) consisted of 30 aa and that the mature protein had 163 aa with a molecular weight of 16,305. The SP displayed a characteristic feature of an Ala-rich property which would be efficient in a SP function. 相似文献
19.
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure. 相似文献
20.
Rapid increase in protein sequence information from genome sequencing projects demand the intervention of bioinformatics tools to recognize interesting gene-products and associated function. Often, multiple algorithms need to be employed to improve accuracy in predictions and several structure prediction algorithms are on the public domain. Here, we report the availability of an Integrated Web-server as a bioinformatics online package dedicated for in-silico analysis of protein sequence and structure data (IWS). IWS provides web interface to both in-house and widely accepted programs from major bioinformatics groups, organized as 10 different modules. IWS also provides interactive images for Analysis Work Flow, which will provide transparency to the user to carry out analysis by moving across modules seamlessly and to perform their predictions in a rapid manner. AVAILABILITY: IWS IS AVAILABLE FROM THE URL: http://caps.ncbs.res.in/iws. 相似文献