首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
慈竹是我国四川当地的优势丛生竹种之一,其纤维长度和质量较优异,是造纸、纺织等工业的良好原料。本文利用Illumina Hi SeqTM 2000平台,对10、50、100和150 cm高的慈竹笋进行转录组分析,共得到69.28 M条读长(Reads),经从头拼接、组装和聚类后得到111 137条非重复序列基因Unigene,其中共有63 094条注释到COG、GO、KEGG、Swiss-Prot和Nr数据库中。这些Unigene不仅具有一般的功能,如转录和信号转导等,还涉及到蔗糖转运与代谢、次级代谢产物及细胞壁的生物合成等方面。不同高度慈竹笋的纤维素合成酶基因存在差异表达,发现了可能调控慈竹生长发育以及纤维素和木质素生物合成的相关基因,为慈竹品种改良提供一定的理论基础。  相似文献   

2.
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.  相似文献   

3.
4.
Lee BW  Kim TH  Kim SK  Kim SS  Ryu GC  Bhak J 《Molecules and cells》2006,21(2):269-275
A recent report of the Korean Intellectual Property Office (KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net.  相似文献   

5.
6.
A novel method has been developed for acquiring the correct alignment of a query sequence against remotely homologous proteins by extracting structural information from profiles of multiple structure alignment. A systematic search algorithm combined with a group of score functions based on sequence information and structural information has been introduced in this procedure. A limited number of top solutions (15,000) with high scores were selected as candidates for further examination. On a test-set comprising 301 proteins from 75 protein families with sequence identity less than 30%, the proportion of proteins with completely correct alignment as first candidate was improved to 39.8% by our method, whereas the typical performance of existing sequence-based alignment methods was only between 16.1% and 22.7%. Furthermore, multiple candidates for possible alignment were provided in our approach, which dramatically increased the possibility of finding correct alignment, such that completely correct alignments were found amongst the top-ranked 1000 candidates in 88.3% of the proteins. With the assistance of a sequence database, completely correct alignment solutions were achieved amongst the top 1000 candidates in 94.3% of the proteins. From such a limited number of candidates, it would become possible to identify more correct alignment using a more time-consuming but more powerful method with more detailed structural information, such as side-chain packing and energy minimization, etc. The results indicate that the novel alignment strategy could be helpful for extending the application of highly reliable methods for fold identification and homology modeling to a huge number of homologous proteins of low sequence similarity. Details of the methods, together with the results and implications for future development are presented.  相似文献   

7.
Mushegian A 《Proteins》2002,47(1):69-74
Comparative sequence analysis of presenilins reveals the conserved transmembrane domain shared with leukocyte antigen CD47, possibly involved in signal transduction. Sensitive techniques of multiple sequence alignment extend the earlier observation of the aminopeptidase homology domain in nicastrin to suggest that this protein may be a catalytically active component of secretasome involved in proteolysis or co-proteolysis of presenilin or beta-amyloid.  相似文献   

8.
9.
Human protein kinase C1 (PKC1) and protein kinase D1 (PKD1) are two closely related enzymes, which have emerged as key regulators of many important cellular processes. In this study, 3D models of human PKC1 and PKD1 were constructed based on homology modelling and molecular dynamics simulations. A novel 2,6-naphthyridine is a potent and selective inhibitor for human PKD1 and not for PKC1, which was docked into them and positioned in their active sites with different orientations. By comparison of active site architectures between human PKC1 and PKD1, the possible reasons affecting their inhibitor binding were proposed. In addition, some residues are identified as critical residues for inhibitor binding.  相似文献   

10.
11.
在过去的十几年,微生物组相关研究和应用持续升温。微生物组逐渐成为生命科学、环境科学和医学等领域的研究焦点。与此同时,全球多个国家和组织也都积极发起各自的微生物组计划,进行多方面的布局,力争在这一具有广阔前景的领域获得战略地位。此外,无论是科研还是产业应用已经迎来了研究高潮和投融资热潮,微生物组相关产品和服务也不断出现。然而,行业在快速发展的同时,也存在一些不足。由于微生物组测序和分析相关技术和方法发展迅速,各国研究和应用尚未在技术、方案和数据等标准上达成统一,国内行业参与者对微生物组也存在认识不足,对微生物组相关新方法、新技术、新理论等还未能充分掌握和使用。除此之外,已有的一些标准和指南,内容过于简单,实操性也不足,这不仅给科研数据的整合造成了困难和资源浪费,还给相关企业进行不良竞争、以次充好提供了机会。更重要的是,我国尚缺乏微生物组相关的国家标准,国家微生物组计划仍处于筹备过程。在此背景下,中国生物工程学会、中国科学院微生物研究所于2019年6月至2020年3月,共同设立了"微生物组测序与分析专家共识"专项研究课题。中国生物工程学会组织了微生物组相关领域的27位专家以及来自行业内的3...  相似文献   

12.
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.  相似文献   

13.
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.  相似文献   

14.
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation.  相似文献   

15.

Background  

In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters.  相似文献   

16.
17.
18.
In this paper, we intend to predict protein structural classes (α, β, α+β, or α/β) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.  相似文献   

19.
Searches using position specific scoring matrices (PSSMs) have been commonly used in remote homology detection procedures such as PSI-BLAST and RPS-BLAST. A PSSM is generated typically using one of the sequences of a family as the reference sequence. In the case of PSI-BLAST searches the reference sequence is same as the query. Recently we have shown that searches against the database of multiple family-profiles, with each one of the members of the family used as a reference sequence, are more effective than searches against the classical database of single family-profiles. Despite relatively a better overall performance when compared with common sequence-profile matching procedures, searches against the multiple family-profiles database result in a few false positives and false negatives. Here we show that profile length and divergence of sequences used in the construction of a PSSM have major influence on the performance of multiple profile based search approach. We also identify that a simple parameter defined by the number of PSSMs corresponding to a family that is hit, for a query, divided by the total number of PSSMs in the family can distinguish effectively the true positives from the false positives in the multiple profiles search approach.  相似文献   

20.
Canaves JM 《Proteins》2004,56(1):19-27
Recently, the structures of two proteins belonging to the archease family, TM1083 from Thermotoga maritima and MTH1598 from Methanobacterium thermoautotrophicum, have been solved independently by two Protein Structure Initiative structural genomics pilot centers using X-ray crystallography and NMR, respectively. The archease protein family is a good example of one of the paradoxes of structural genomics: Approximately one third of protein structures produced by structural genomics centers have no known function and are still annotated as "hypothetical proteins" in the Protein Data Bank. In the case of archeases, despite the existence of two protein structures and abundant sequence information, there is still no function assigned to this protein family. Here, our group predicts, based on structural similarity, sequence conservation, and gene context analyses, that members of this protein family might function as chaperones or modulators of proteins involved in DNA/RNA processing. The conservation of genomic context for this protein family is constant from Archaea and Bacteria to humans, and suggests that unannotated open reading frames contiguous to them could be novel RNA/DNA binding proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号