首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the era of structural genomics, it is necessary to generate accurate structural alignments in order to build good templates for homology modeling. Although a great number of structural alignment algorithms have been developed, most of them ignore intermolecular interactions during the alignment procedure. Therefore, structures in different oligomeric states are barely distinguishable, and it is very challenging to find correct alignment in coil regions. Here we present a novel approach to structural alignment using a clique finding algorithm and environmental information (SAUCE). In this approach, we build the alignment based on not only structural coordinate information but also realistic environmental information extracted from biological unit files provided by the Protein Data Bank (PDB). At first, we eliminate all environmentally unfavorable pairings of residues. Then we identify alignments in core regions via a maximal clique finding algorithm. Two extreme value distribution (EVD) form statistics have been developed to evaluate core region alignments. With an optional extension step, global alignment can be derived based on environment-based dynamic programming linking. We show that our method is able to differentiate three-dimensional structures in different oligomeric states, and is able to find flexible alignments between multidomain structures without predetermined hinge regions. The overall performance is also evaluated on a large scale by comparisons to current structural classification databases as well as to other alignment methods.  相似文献   

2.
3.
Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at http://physbio.mssm.edu/~ortizg/.  相似文献   

4.
Nowadays we are experiencing a remarkable growth in the number of databases that have become accessible over the Web. However, in a certain number of cases, for example, in the case of BioImage, this information is not of a textual nature, thus posing new challenges in the design of tools to handle these data. In this work, we concentrate on the development of new mechanisms aimed at "querying" these databases of complex data sets by their intrinsic content, rather than by their textual annotations only. We concentrate our efforts on a subset of BioImage containing 3D images (volumes) of biological macromolecules, implementing a first prototype of a "query-by-content" system. In the context of databases of complex data types the term query-by-content makes reference to those data modeling techniques in which user-defined functions aim at "understanding" (to some extent) the informational content of the data sets. In these systems the matching criteria introduced by the user are related to intrinsic features concerning the 3D images themselves, hence, complementing traditional queries by textual key words only. Efficient computational algorithms are required in order to "extract" structural information of the 3D images prior to storing them in the database. Also, easy-to-use interfaces should be implemented in order to obtain feedback from the expert. Our query-by-content prototype is used to construct a concrete query, making use of basic structural features, which are then evaluated over a set of three-dimensional images of biological macromolecules. This experimental implementation can be accessed via the Web at the BioImage server in Madrid, at http://www.bioimage.org/qbc/index.html.  相似文献   

5.
LTR_STRUC: a novel search and identification program for LTR retrotransposons   总被引:10,自引:0,他引:10  
MOTIVATION: Long terminal repeat (LTR) retrotransposons constitute a substantial fraction of most eukaryotic genomes and are believed to have a significant impact on genome structure and function. Conventional methods used to search for LTR retrotransposons in genome databases are labor intensive. We present an efficient, reliable and automated method to identify and analyze members of this important class of transposable elements. RESULTS: We have developed a new data-mining program, LTR_STRUC (LTR retrotransposon structure program) which identifies and automatically analyzes LTR retrotransposons in genome databases by searching for structural features characteristic of such elements. LTR_STRUC has significant advantages over conventional search methods in the case of LTR retrotransposon families having low sequence homology to known queries or families with atypical structure (e.g. non-autonomous elements lacking canonical retroviral ORFs) and is thus a discovery tool that complements established methods. LTR_STRUC finds LTR retrotransposons using an algorithm that encompasses a number of tasks that would otherwise have to be initiated individually by the user. For each LTR retrotransposon found, LTR_STRUC automatically generates an analysis of a variety of structural features of biological interest. AVAILABILITY: The LTR_STRUC program is currently available as a console application free of charge to academic users from the authors.  相似文献   

6.
sMOL Explorer is a 2D ligand-based computational tool that provides three major functionalities: data management, information retrieval and extraction and statistical analysis and data mining through Web interface. With sMOL Explorer, users can create personal databases by adding each small molecule via a drawing interface or uploading the data files from internal and external projects into the sMOL database. Then, the database can be browsed and queried with textual and structural similarity search. The molecule can also be submitted to search against external public databases including PubChem, KEGG, DrugBank and eMolecules. Moreover, users can easily access a variety of data mining tools from Weka and R packages to perform analysis including (1) finding the frequent substructure, (2) clustering the molecular fingerprints, (3) identifying and removing irrelevant attributes from the data and (4) building the classification model of biological activity. AVAILABILITY: sMOL Explorer is an Open Source project and is freely available to all interested users at http://www.biotec.or.th/ISL/SMOL/.  相似文献   

7.
In various international policy processes such as the UN Sustainable Development Goals, an urgent demand for robust consumption‐based indicators of material flows, or material footprints (MFs), has emerged over the past years. Yet, MFs for national economies diverge when calculated with different Global Multiregional Input–Output (GMRIO) databases, constituting a significant barrier to a broad policy uptake of these indicators. The objective of this paper is to quantify the impact of data deviations between GMRIO databases on the resulting MF. We use two methods, structural decomposition analysis and structural production layer decomposition, and apply them for a pairwise assessment of three GMRIO databases, EXIOBASE, Eora, and the OECD Inter‐Country Input–Output (ICIO) database, using an identical set of material extensions. Although all three GMRIO databases accord for the directionality of footprint results, that is, whether a countries’ final demand depends on net imports of raw materials from abroad or is a net exporter, they sometimes show significant differences in level and composition of material flows. Decomposing the effects from the Leontief matrices (economic structures), we observe that a few sectors at the very first stages of the supply chain, that is, raw material extraction and basic processing, explain 60% of the total deviations stemming from the technology matrices. We conclude that further development of methods to align results from GMRIOs, in particular for material‐intensive sectors and supply chains, should be an important research priority. This will be vital to strengthen the uptake of demand‐based material flow indicators in the resource policy context.  相似文献   

8.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

9.
SUMMARY: AiO (All in One) is a program for Windows, that combines typical DNA/protein features such as plasmid map drawing, finding of ORFs, translate, backtranslate and high quality printing with a number of databases. These databases allow the management of oligonucleotides, oligonucleotide-manufacturers, restriction enzymes, structural DNA and program users in a multi-user/multi-group environment. AVAILABILITY: An AiO specific website, with the possibility to download is at: http://134.99.88.55/aio/ SUPPLEMENTARY INFORMATION: Examples with screen shots- http://134.99.88.55/aio/ : Manual (in PDF format)-http://134.99.88.55/aio/manual.pdf  相似文献   

10.
Structure comparison is a crucial aspect of structural biology today. The field of structure comparison is developing rapidly, with the development of new algorithms, similarity scores, and statistical scores. The predicted large increase of experimental structures and structural models made possible by high-throughput efforts means that structural comparison and searching of structural databases using automated methods will become increasingly common. This Ways & Means article is meant to guide the structural biologist in the basics of structural alignment, and to provide an overview of the available software tools. The main purpose is to encourage users to gain some understanding of the strengths and limitations of structural alignment, and to take these factors into account when interpreting the results of different programs.  相似文献   

11.
12.
Structural classification of membrane proteins is still in its infancy due to the relative paucity of available three‐dimensional structures compared with soluble proteins. However, recent technological advances in protein structure determination have led to a significant increase in experimentally known membrane protein folds, warranting exploration of the structural universe of membrane proteins. Here, a new and completely membrane protein specific structural classification system is introduced that classifies α‐helical membrane proteins according to common helix architectures. Each membrane protein is represented by a helix interaction graph depicting transmembrane helices with their pairwise interactions resulting from individual residue contacts. Subsequently, proteins are clustered according to similarities among these helix interaction graphs using a newly developed structural similarity score called HISS. As HISS scores explicitly disregard structural properties of loop regions, they are more suitable to capture conserved transmembrane helix bundle architectures than other structural similarity scores. Importantly, we are able to show that a classification approach based on helix interaction similarity closely resembles conventional structural classification databases such as SCOP and CATH implying that helix interactions are one of the major determinants of α‐helical membrane protein folds. Furthermore, the classification of all currently available membrane protein structures into 20 recurrent helix architectures and 15 singleton proteins demonstrates not only an impressive variability of membrane helix bundles but also the conservation of common helix interaction patterns among proteins with distinctly different sequences. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

13.
Rapid automatic detection and alignment of repeats in protein sequences   总被引:11,自引:0,他引:11  
Heger A  Holm L 《Proteins》2000,41(2):224-237
Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information.  相似文献   

14.
We report a detailed classification of disulfide patterns to further understand the role of disulfides in protein structure and function. The classification is applied to a unique searchable database of disulfide patterns derived from the SwissProt and Pfam databases. The disulfide database contains seven times the number of publicly available disulfide annotations. Each disulfide pattern in the database captures the topology and cysteine spacing of a protein domain. We have clustered the domains by their disulfide patterns and visualized the results using a novel representation termed the "classification wheel." The classification is applied to 40,620 protein domains with 2-10 disulfides. The effectiveness of the classification is evaluated by determining the extent to which proteins of similar structure and function are grouped together through comparison with the SCOP and Pfam databases, respectively. In general, proteins with similar disulfide patterns have similar structure and function, even in cases of low sequence similarity, and we illustrate this with specific examples. Using a measure of disulfide topology complexity, we find that there is a predominance of less complex topologies. We also explored the importance of loss or addition of disulfides to protein structure and function by linking classification wheels through disulfide subpattern comparisons. This classification, when coupled with our disulfide database, will serve as a useful resource for searching and comparing disulfide patterns, and understanding their role in protein structure, folding, and stability. Proteins in the disulfide clusters that do not contain structural information are prime candidates for structural genomics initiatives, because they may correspond to novel structures.  相似文献   

15.
Low in vivo solubility of recombinant proteins expressed in Escherichia coli can seriously hinder the purification of structural samples for large-scale proteomic NMR and X-ray crystallography studies. Previous results from our laboratory have shown that up to one half of all bacterial and archaeal proteins are insoluble when overexpressed in E. coli. Although a number of strategies may be used to increase in vivo protein solubility, there are no generally applicable methods, and the expression of each insoluble recombinant protein must be individually optimized. For this reason, we have tested a generic denaturation/refolding protein purification procedure to assess the number of structural samples that could be generated by using this methodology. Our results show that a denaturation/refolding protocol is appropriate for many small proteins (相似文献   

16.
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.  相似文献   

17.
林木基因组学研究进展   总被引:7,自引:0,他引:7  
林木基因组学研究进展迅速。结构基因组学方面,已构建了近40个主要造林树种的遗传连锁图谱,在不同树种中定位了30余个重要的数量性状位点,在部分树种中开展了基因组比较和综合图谱构建研究,杨树的全基因组测序已经完成,桉树的全基因组测序正在进行。功能基因组学方面,已分析了主要造林树种多种组织的转录组EST序列,对林木次生生长与木材形成、开花和抗寒性的形成等过程开展了功能基因组学研究。另外,探讨了林木基因组学研究的发展趋势,以期为我国林木基因组学研究提供有益的参考。  相似文献   

18.
From protein structure to function.   总被引:6,自引:0,他引:6  
Several databases of protein structural families now exist-organised according to both evolutionary relationships and common folding arrangements. Although these lag behind sequence databases in size, the prospect of structural genomics initiatives means that they may soon include representatives of many of the sequence families. To some extent, functional information can be derived from structural similarity. For some structural families, their function is highly conserved, whereas, for others, it can only be inherited or derived on the basis of additional information (e.g. sequence patterns, common residue clusters and characteristic surface properties).  相似文献   

19.
Structural biology sheds light on the puzzle of genomic ORFans   总被引:5,自引:0,他引:5  
Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, it has been suggested that most ORFans are not likely to correspond to functional, expressed proteins, but rather to spurious ORFs, pseudo-genes or to rapidly evolving proteins with non-essential roles. As a snapshot view of current ORFan structural studies, we searched for ORFans among proteins whose three-dimensional structures have been recently determined. We find that functional and structural studies of ORFans are not as underemphasized as previously suggested. These recently determined structures correspond to ORFans from all Kingdoms of life, and include proteins that have previously been functionally characterized, as well as structural genomics targets of unknown function labeled as "hypothetical proteins". This suggests that many of the ORFans in the databases are likely to correspond to expressed, functional (and even essential) proteins. Furthermore, the recently determined structures include examples of the various types of ORFans, suggesting that the functions and evolutionary origins of ORFans are diverse. Although this survey sheds some light on the ORFan mystery, further experimental studies are required to gain a better understanding of the role and origins of the tens of thousands of ORFans awaiting characterization.  相似文献   

20.
MOTIVATION: Protein structure classification has been recognized as one of the most important research issues in protein structure analysis. A substantial number of methods for the classification have been proposed, and several databases have been constructed using these methods. Since some proteins with very similar sequences may exhibit structural diversities, we have proposed PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB), which strategy of selection is based not only on sequence similarity but also on structural similarity. Forty-eight representative sets whose similarity criteria were predetermined were made available over the World Wide Web (WWW). However, the sets were insufficient in number to satisfy users researching protein structures by various methods. RESULT: We have improved the system for PDB-REPRDB so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. This paper describes the method we use to classify chains and select the representatives in the system. We also describe the interface used to set the parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号