首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kawabata T  Arisaka F  Nishikawa K 《Gene》2000,259(1-2):223-233
Among the total of 274 orfs within bacteriophage T4, only half have been reasonably well characterized, and the functions of the rest have remained obscure. In order to predict the molecular functions of the orfs, a position-specific iterated (PSI)-BLAST search of bacteriophage T4 against the sequence database of known 3D structures was carried out. PSI-BLAST is one of the most powerful iterative sequence search methods using multiple sequence alignment, with the ability to detect many more proteins with distant homology than standard pairwise methods. The 3D structures of proteins are considered to be better preserved than the sequences, and the detected distantly homologous proteins are likely to possess highly similar 3D structures. Thirteen orfs of phage T4, whose homologues were not detected by standard pairwise methods, were found to have significantly homologous counterparts by this method. The plausibility of the results was confirmed by checking whether important residues at substrate/ligand-binding sites were conserved. Among them, two orfs, vs.1 and e.1, which are similar to Escherichia coli lytic enzyme and MutT protein, respectively, had not been studied previously. Also, gp rIIA, a rapid lysis protein, whose gene structure had been intensively studied during the development of molecular biology in the 1950s and yet whose molecular function remains unknown, has an N-terminal domain that is significantly similar to the N-terminal region of the heat shock protein Hsp90.  相似文献   

2.
In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.  相似文献   

3.
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence–structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.  相似文献   

4.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

5.
Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure.  相似文献   

6.
RecA family proteins are responsible for homology search and strand exchange. In bacteria, homology search begins after RecA binds an initiating single-stranded DNA (ssDNA) in the primary DNA-binding site, forming the presynaptic filament. Once the filament is formed, it interrogates double-stranded DNA (dsDNA). During the interrogation, bases in the dsDNA attempt to form Watson–Crick bonds with the corresponding bases in the initiating strand. Mismatch dependent instability in the base pairing in the heteroduplex strand exchange product could provide stringent recognition; however, we present experimental and theoretical results suggesting that the heteroduplex stability is insensitive to mismatches. We also present data suggesting that an initial homology test of 8 contiguous bases rejects most interactions containing more than 1/8 mismatches without forming a detectable 20 bp product. We propose that, in vivo, the sparsity of accidental sequence matches allows an initial 8 bp test to rapidly reject almost all non-homologous sequences. We speculate that once the initial test is passed, the mismatch insensitive binding in the heteroduplex allows short mismatched regions to be incorporated in otherwise homologous strand exchange products even though sequences with less homology are eventually rejected.  相似文献   

7.
Histone Sequence Database: new histone fold family members.   总被引:2,自引:0,他引:2       下载免费PDF全文
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

8.
Identifying common local segments, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity > 25%). However, for distant proteins, it is much more difficult to align motifs that are not similar in sequences but still share common structures or functions. This paper is a first attempt to align multiple protein sequences using both primary and secondary structure information. A new sequence model is proposed so that the model assigns high probabilities not only to motifs that contain conserved amino acids but also to motifs that present common secondary structures. The proposed method is tested in a structural alignment database BAliBASE. We show that information brought by the predicted secondary structures greatly improves motif identification. A website of this program is available at www.stat.purdue.edu/~junxie/2ndmodel/sov.html.  相似文献   

9.
Prediction of protein subcellular localization   总被引:6,自引:0,他引:6  
Yu CS  Chen YC  Lu CH  Hwang JK 《Proteins》2006,64(3):643-651
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.  相似文献   

10.
The quality of three-dimensional homology models derived from protein sequences provides an independent measure of the suitability of a protein sequence for a certain fold. We have used automated homology modeling and model assessment tools to identify putative nuclear hormone receptor ligand-binding domains in the genome of Caenorhabditis elegans. Our results indicate that the availability of multiple crystal structures is crucial to obtaining useful models in this receptor family. The majority of annotated mammalian nuclear hormone receptors could be assigned to a ligand-binding domain fold by using the best model derived from any of four template structures. This strategy also assigned the ligand-binding domain fold to a number of C.elegans. sequences without prior annotation. Interestingly, the retinoic acid receptor crystal structure contributed most to the number of sequences that could be assigned to a ligand-binding domain fold. Several causes for this can be suggested, including the high quality of this protein structure in terms of our assessment tools, similarity between the biological function or ligand of this receptor and the modeled genes and gene duplication in C.elegans.  相似文献   

11.
Thompson JD  Koehl P  Ripp R  Poch O 《Proteins》2005,61(1):127-136
Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.u-strasbg.fr/balibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.  相似文献   

12.
Sequence alignments are fundamental to a wide range of applications, including database searching, functional residue identification and structure prediction techniques. These applications predict or propagate structural/functional/evolutionary information based on a presumed homology between the aligned sequences. If the initial hypothesis of homology is wrong, no subsequent application, however sophisticated, can be expected to yield accurate results. Here we present a novel method, LEON, to predict homology between proteins based on a multiple alignment of complete sequences (MACS). In MACS, weak signals from distantly related proteins can be considered in the overall context of the family. Intermediate sequences and the combination of individual weak matches are used to increase the significance of low-scoring regions. Residue composition is also taken into account by incorporation of several existing methods for the detection of compositionally biased sequence segments. The accuracy and reliability of the predictions is demonstrated in large-scale comparisons with structural and sequence family databases, where the specificity was shown to be >99% and the sensitivity was estimated to be ~76%. LEON can thus be used to reliably identify the complex relationships between large multidomain proteins and should be useful for automatic high-throughput genome annotations, 2D/3D structure predictions, protein–protein interaction predictions etc.  相似文献   

13.
Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo.  相似文献   

14.
The Protein Mutant Database.   总被引:3,自引:0,他引:3       下载免费PDF全文
Currently the protein mutant database (PMD) contains over 81 000 mutants, including artificial as well as natural mutants of various proteins extracted from about 10 000 articles. We recently developed a powerful viewing and retrieving system (http://pmd.ddbj.nig.ac.jp), which is integrated with the sequence and tertiary structure databases. The system has the following features: (i) mutated sequences are displayed after being automatically generated from the information described in the entry together with the sequence data of wild-type proteins integrated. This is a convenient feature because it allows one to see the position of altered amino acids (shown in a different color) in the entire sequence of a wild-type protein; (ii) for those proteins whose 3D structures have been experimentally determined, a 3D structure is displayed to show mutation sites in a different color; (iii) a sequence homology search against PMD can be carried out with any query sequence; (iv) a summary of mutations of homologous sequences can be displayed, which shows all the mutations at a certain site of a protein, recorded throughout the PMD.  相似文献   

15.
The 5′-cap structure of most spliceosomal small nuclear RNAs (snRNAs) and certain small nucleolar RNAs (snoRNAs) undergoes hypermethylation from a 7-methylguanosine to a 2,2,7-trimethylguanosine structure. 5′-Cap hypermethylation of snRNAs is dependent upon a conserved sequence element known as the Sm site common to most snRNAs. Here we have performed a mutational analysis of U3 and U14 to determine the cis-acting sequences required for 5′-cap hypermethylation of Box C/D snoRNAs. We have found that both the conserved sequence elements Box C (termed C′ in U3) and Box D are necessary for cap hypermethylation. Furthermore, the terminal stem structure that is formed by sequences that flank Box C (C′ in U3) and Box D is also required. However, mutation of other conserved sequences has no effect on hypermethylation of the cap. Finally, the analysis of fragments of U3 and U14 RNAs indicates that the Box C/D motif, including Box C (C′ in U3), Box D and the terminal stem, is capable of directing cap hypermethylation. Thus, the Box C/D motif, which is important for snoRNA processing, stability, nuclear retention, protein binding, nucleolar localization and function, is also necessary and sufficient for cap hypermethylation of these RNAs.  相似文献   

16.
In this paper, we improve the homology search performance by the combination of the predicted protein secondary structures and protein sequences. Previous research suggested that the straightforward combination of predicted secondary structures did not improve the homology search performance, mostly because of the errors in the structure prediction. We solved this problem by taking into account the confidence scores output by the prediction programs.  相似文献   

17.
Dolan MA  Keil M  Baker DS 《Proteins》2008,72(4):1243-1258
Although the number of known protein structures is increasing, the number of protein sequences without determined structures is still much larger. Three-dimensional (3D) protein structure information helps in the understanding of functional mechanisms, but solving structures by X-ray crystallography or NMR is often a lengthy and difficult process. A relatively fast way of determining a protein's 3D structure is to construct a computer model using homologous sequence and structure information. Much work has gone into algorithms that comprise the ORCHESTRAR homology modeling program in the SYBYL software package. This novel homology modeling tool combines algorithms for modeling conserved cores, variable regions, and side chains. The paradigm of using existing knowledge from multiple templates and the underlying protein environment knowledgebase is used in all of these algorithms, and will become even more powerful as the number of experimentally derived protein structures increases. To determine how ORCHESTRAR compares to Composer (a broadly used, but an older tool), homology models of 18 proteins were constructed using each program so that a detailed comparison of each step in the modeling process could be carried out. Proteins modeled include kinases, dihydrofolate reductase, HIV protease, and factor Xa. In almost all cases ORCHESTRAR produces models with lower root-mean-squared deviation (RMSD) values when compared with structures determined by X-ray crystallography or NMR. Moreover, ORCHESTRAR produced a homology model for three target sequences where Composer failed to produce any. Data for RMSD comparisons between structurally conserved cores, structurally variable regions, side-chain conformations are presented, as well as analyses of active site and protein-protein interface configurations.  相似文献   

18.
19.
GENIUS II is an automated database system in which open reading frames (ORFs) in complete genomes are assigned to known protein three-dimensional (3D) structures. The system uses the multiple intermediate sequence search method in which query and target sequences are linked by intermediate sequences gathered by PSI-BLAST search. By applying the system to 129 complete genomes, 43.8% on average of the ORFs in the genomes were assigned to known 3D structures and the results are available for free at GENIUS II web site.  相似文献   

20.
Methods for protein structure (3D)-sequence (1D) compatibility evaluation (threading) have been developed during the past decade. The protocol in which a sequence can recognize its compatible structure in the structural library (i.e., the fold recognition or the forward-folding search) is available for the structure prediction of new proteins. However, the reverse protocol, in which a structure recognizes its homologous sequences among a sequence database, named the inverse-folding search, is a more difficult application. In this study, we have investigated the feasibility of the latter approach. A structural library, composed of about 400 well-resolved structures with mutually dissimilar sequences, was prepared, and 163 of them had remote homologs in the library. We examined whether they could correctly seek their homologs by both forward- and inverse-folding searches. The results showed that the inverse-folding protocol is more effective than the forward-folding protocol, once the reference states of the compatibility functions are appropriately adjusted. This adjustment only slightly affects the ability of the forward-folding search. We noticed that the scoring, in which a given sequence is re-mounted onto a structure according to the 3D-1D alignment determined by the dynamic programming method, is only effective in the forward-folding protocol and not in the inverse-folding protocol. Namely, the inverse-folding search works significantly better with the score given by the 3D-1D alignment per se, rather than that obtained by the re-mounting. The implications of these results are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号