共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Wright FA Lemon WJ Zhao WD Sears R Zhuo D Wang JP Yang HY Baer T Stredney D Spitzner J Stutz A Krahe R Yuan B 《Genome biology》2001,2(7):research0025.1-research002518
3.
4.
《Genomics》2020,112(1):603-614
Russula griseocarnosa is a species of edible ectomycorrhizal fungi with medicinal properties that grows in southern China. Total DNA was isolated from a fresh fruiting body of R. griseocarnosa and subjected to sequencing using Illumina Hiseq with the PacBio RS sequencing platform. Here, we present the 64.81 Mb draft genome map of R. griseocarnosa based on 471 scaffolds and 16,128 coding protein genes. The gene annotation of protein coding genes was used to obtain corresponding annotations by blastp. Phylogenetic analysis revealed a close evolutionary relationship of R. griseocarnosa to Heterobasidion irregulare and Stereum hirsutum in the core Russulales clade. The R. griseocarnosa genome encodes a repertoire of enzymes engaged in carbohydrate and polysaccharide metabolism, along with cytochrome P450s and secondary metabolite biosynthesis. The genome content of R. griseocarnosa provides insights into the genetic basis of its reported medicinal properties and serves as a reference for comparative genomics of fungi. 相似文献
5.
6.
Systematic discovery of functional modules and context-specific functional annotation of human genome 总被引:1,自引:0,他引:1
Huang Y Li H Hu H Yan X Waterman MS Huang H Zhou XJ 《Bioinformatics (Oxford, England)》2007,23(13):i222-i229
MOTIVATION: The rapid accumulation of microarray datasets provides unique opportunities to perform systematic functional characterization of the human genome. We designed a graph-based approach to integrate cross-platform microarray data, and extract recurrent expression patterns. A series of microarray datasets can be modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. The integrative approach provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression and (3) provide a way to predict gene functions in a context-specific way. RESULTS: We integrate 65 human microarray datasets, comprising 1105 experiments and over 11 million expression measurements. We develop a data mining procedure based on frequent itemset mining and biclustering to systematically discover network patterns that recur in at least five datasets. This resulted in 143,401 potential functional modules. Subsequently, we design a network topology statistic based on graph random walk that effectively captures characteristics of a gene's local functional environment. Function annotations based on this statistic are then subject to the assessment using the random forest method, combining six other attributes of the network modules. We assign 1126 functions to 895 genes, 779 known and 116 unknown, with a validation accuracy of 70%. Among our assignments, 20% genes are assigned with multiple functions based on different network environments. AVAILABILITY: http://zhoulab.usc.edu/ContextAnnotation. 相似文献
7.
More than 300 bacterial genome sequences are publicly available, and many more are scheduled to be completed and released in the near future. Converting this raw sequence information into a better understanding of the biology of bacteria involves the identification and annotation of genes, proteins and pathways. This processing is typically done using sequence annotation pipelines comprised of a variety of software modules and, in some cases, human experts. The reference databases, computational methods and knowledge that form the basis of these pipelines are constantly evolving, and thus there is a need to reprocess genome annotations on a regular basis. The combined challenge of revising existing annotations and extracting useful information from the flood of new genome sequences will necessitate more reliance on completely automated systems. 相似文献
8.
9.
10.
11.
Kelly M. McGarvey Tamara Goldfarb Eric Cox Catherine M. Farrell Tripti Gupta Vinita S. Joardar Vamsi K. Kodali Michael R. Murphy Nuala A. O’Leary Shashikant Pujar Bhanu Rajput Sanjida H. Rangwala Lillian D. Riddick David Webb Mathew W. Wright Terence D. Murphy Kim D. Pruitt 《Mammalian genome》2015,26(9-10):379-390
12.
13.
Braun BR van Het Hoog M d'Enfert C Martchenko M Dungan J Kuo A Inglis DO Uhl MA Hogues H Berriman M Lorenz M Levitin A Oberholzer U Bachewich C Harcus D Marcil A Dignard D Iouk T Zito R Frangeul L Tekaia F Rutherford K Wang E Munro CA Bates S Gow NA Hoyer LL Köhler G Morschhäuser J Newport G Znaidi S Raymond M Turcotte B Sherlock G Costanzo M Ihmels J Berman J Sanglard D Agabian N Mitchell AP Johnson AD Whiteway M Nantel A 《PLoS genetics》2005,1(1):36-57
Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications. 相似文献
14.
Automated genome sequence analysis and annotation. 总被引:5,自引:0,他引:5
M A Andrade N P Brown C Leroy S Hoersch A de Daruvar C Reich A Franchini J Tamames A Valencia C Ouzounis C Sander 《Bioinformatics (Oxford, England)》1999,15(5):391-412
MOTIVATION: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. RESULTS: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. AVAILABILITY: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit 相似文献
15.
Rickettsia sibirica sibirica is the causative agent of Siberian or North Asian tick typhus, a tick-borne rickettsiosis known to exist in Siberia and eastern China. Here we present the draft genome of Rickettsia sibirica sibirica strain BJ-90 isolated from Dermacentor sinicus ticks collected in Beijing, China. 相似文献
16.
Draft genome assembly and annotation of Glycyrrhiza uralensis,a medicinal legume 总被引:1,自引:0,他引:1 下载免费PDF全文
Keiichi Mochida Tetsuya Sakurai Hikaru Seki Takuhiro Yoshida Kotaro Takahagi Satoru Sawai Hiroshi Uchiyama Toshiya Muranaka Kazuki Saito 《The Plant journal : for cell and molecular biology》2017,89(2):181-194
Chinese liquorice/licorice (Glycyrrhiza uralensis) is a leguminous plant species whose roots and rhizomes have been widely used as a herbal medicine and natural sweetener. Whole‐genome sequencing is essential for gene discovery studies and molecular breeding in liquorice. Here, we report a draft assembly of the approximately 379‐Mb whole‐genome sequence of strain 308‐19 of G. uralensis; this assembly contains 34 445 predicted protein‐coding genes. Comparative analyses suggested well‐conserved genomic components and collinearity of gene loci (synteny) between the genome of liquorice and those of other legumes such as Medicago and chickpea. We observed that three genes involved in isoflavonoid biosynthesis, namely, 2‐hydroxyisoflavanone synthase (CYP93C), 2,7,4′‐trihydroxyisoflavanone 4′‐O‐methyltransferase/isoflavone 4′‐O‐methyltransferase (HI4OMT) and isoflavone‐7‐O‐methyltransferase (7‐IOMT) formed a cluster on the scaffold of the liquorice genome and showed conserved microsynteny with Medicago and chickpea. Based on the liquorice genome annotation, we predicted genes in the P450 and UDP‐dependent glycosyltransferase (UGT) superfamilies, some of which are involved in triterpenoid saponin biosynthesis, and characterised their gene expression with the reference genome sequence. The genome sequencing and its annotations provide an essential resource for liquorice improvement through molecular breeding and the discovery of useful genes for engineering bioactive components through synthetic biology approaches. 相似文献
17.
18.
19.
In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC bendhmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/. 相似文献
20.
Multidimensional annotation of the Escherichia coli K-12 genome 总被引:2,自引:0,他引:2
Karp PD Keseler IM Shearer A Latendresse M Krummenacker M Paley SM Paulsen I Collado-Vides J Gama-Castro S Peralta-Gil M Santos-Zavaleta A Peñaloza-Spínola MI Bonavides-Martinez C Ingraham J 《Nucleic acids research》2007,35(22):7577-7590