首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
张杰  尚宗民  曹建华  樊斌  赵书红 《遗传》2012,(10):121-129
2009年11月,美、英等国科学家宣布首次绘制出家猪的基因组草图。近两年,随着全基因组序列陆续释放,越来越多的测序片段得到正确拼接组装,从全基因组水平上对猪功能基因进行注释分析显得尤为迫切。文章以丝切蛋白1(Cofilin 1,CFL1)基因的注释过程为例,介绍了运用Sanger研究所开发的Otterlace软件对猪全基因组的免疫基因序列进行人工分析与注释。通过详细说明Zmap、Blixem和Dotter 3个注释工具的使用方法,并给出了注释过程的主要步骤,以期对Otterlace的应用起一个抛砖引玉的作用。运用Otterlace软件对243个免疫相关基因进行分析,其中180个基因得到完整或部分注释,这为后续深入开展这些基因的功能研究奠定了基础。  相似文献   

4.
Stano M  Klucar L 《Genomics》2011,98(5):376-380
phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics.  相似文献   

5.
6.
DNA sequence segments conserved since divergence of Escherichia coli and Bacillus subtilis were identified, using the GenBank sequence database. Chromosomal locations of the conserved segments were compared between the two bacteria, and the following three features were observed. (1) Although the two genomes are nearly identical in size, chromosomal arrangements of the conserved segments are considerably different from each other. (2) In many cases, chromosomal locations of a conserved segment in the two species have deviated from each other by a multiple of 60°. (3) There are many instances in which a contiguous segment in one genome is split into two or more segments located at distinct positions in the other genome, and these split segments were found to tend to lie on the E. coli or B. subtilis genome separated by distances of multiples of 60°. On the basis of these observations, genome organizations of the two bacteria were discussed in terms of genome doublings as well as random chromosomal rearrangements.  相似文献   

7.
基因组注释是识别出基因组序列中功能组件的过程,其可以直接对序列赋予生物学意义,由此方便研究者探究和分析基因组功能.基因组注释可以帮助研究从三个层次上理解基因组,一种是在核苷酸水平的注释,主要确定DNA序列中基因、RNA、重复序列等组件的物理位置,包括转录起始,翻译起始,外显子边界等具体位置信息.同时可以注释得到变异在不...  相似文献   

8.
微生物基因组注释系统MGAP   总被引:6,自引:0,他引:6  
利用生物信息学方法和工具开发了微生物基因组注释系统(Microbial genome annotation package, MGAP),并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。  相似文献   

9.
MOTIVATION: The recent efforts of various sequence projects to sequence deeply into various phylogenies provide great resources for comparative sequence analysis. A generic and portable tool is essential for scientists to visualize and analyze sequence comparisons. RESULTS: We have developed SynBrowse, a synteny browser for visualizing and analyzing genome alignments both within and between species. It is intended to help scientists study macrosynteny, microsynteny and homologous genes between sequences. It can also aid with the identification of uncharacterized genes, putative regulatory elements and novel structural features of a species. SynBrowse is a GBrowse (the Generic Genome Browser) family software tool that runs on top of the open source BioPerl modules. It consists of two components: a web-based front end and a set of relational database back ends. Each database stores pre-computed alignments from a focus sequence to reference sequences in addition to the genome annotations of the focus sequence. The user interface lets end users select a key comparative alignment type and search for syntenic blocks between two sequences and zoom in to view the relationships among the corresponding genome annotations in detail. SynBrowse is portable with simple installation, flexible configuration, convenient data input and easy integration with other components of a model organism system. AVAILABILITY: The software is available at http://www.gmod.org CONTACT: vbrendel@iastate.edu  相似文献   

10.
Helicobacter hepaticus is an important pathogen in laboratory mice and induces the development of liver tumors and gastrointestinal disease in susceptible strains of mice. In this study, a miniset of 36 cosmid clones from a genomic library of H. hepaticus was ordered and grouped into four large contigs representing approximately 1 Mb of the H. hepaticus genome using PCR, DNA sequencing, Southern and dot-blot hybridization and pulsed-field gel electrophoresis. From the 200-300 terminal nucleotide sequences of 38 cosmid clones, 56 coding regions were predicted, of which 51 were found to have orthologs in the public databases and five appeared to be unique to H. hepaticus. Of these 51 genes, 36 have orthologs in Helicobacter pylori and 25 display the highest sequence similarity to H. pylori. However, chromosomal positions of these genes are not conserved between these two helicobacters. In addition, 10 H. hepaticus genes had the highest sequence similarity to orthologs in Campylobacter jejuni. The GC content in a randomly selected 21-kb H. hepaticus genomic sequence was 35.8%, which approximates the average between H. pylori (39%) and C. jejuni (30.6%). These results demonstrate that: (1) H. hepaticus is more closely related to H. pylori than C. jejuni; (2) significant genomic alterations exist between H. hepaticus and H. pylori, including gene organization, protein sequences and GC content, probably in part due to specific adaptation to distinct ecological niches.  相似文献   

11.
Trichoderma reesei is an important industrial fungus known for its ability to efficiently secrete large quantities of protein as well as its wide variety of biomass degrading enzymes. Past research on this fungus has primarily focused on extending its protein production capabilities, leaving the structure of its 33 Mb genome essentially a mystery. To begin to address these deficiencies and further our knowledge of T. reesei's secretion and cellulolytic potential, we have created a genomic framework for this fungus. We constructed a BAC library containing 9216 clones with an average insert size of 125 kb which provides a coverage of 28 genome equivalents. BAC ends were sequenced and annotated using publicly available software which identified a number of genes not seen in previously sequenced EST datasets. Little evidence was found for repetitive sequence in T. reesei with the exception of several copies of an element with similarity to the Podospora anserina transposon, PAT. Hybridization of 34 genes involved in biomass degradation revealed five groups of co-located genes in the genome. BAC clones were fingerprinted and analyzed using fingerprinted contigs (FPC) software resulting in 334 contigs covering 28 megabases of the genome. The assembly of these FPC contigs was verified by congruence with hybridization results.  相似文献   

12.
We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.  相似文献   

13.
RNAmmer: consistent and rapid annotation of ribosomal RNA genes   总被引:7,自引:0,他引:7  
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.  相似文献   

14.
Despite the growing number of genomes published or currently being sequenced, there is a relative paucity of software for functional classification of newly discovered genes and their assignment to metabolic pathways. Available software for such analyses has a very steep learning curve and requires the installation, configuration, and maintenance of large amounts of complex infrastructure, including complementary software and databases. Many such tools are restricted to one or a few data sources and classification schemes. In this work, we report an automated system for gene annotation and metabolic pathway reconstruction (ASGARD), which was designed to be powerful and generalizable, yet simple for the biologist to install and run on centralized, commonly available computers. It avoids the requirement for complex resources such as relational databases and web servers, as well as the need for administrator access to the operating system. Our methodology contributes to a more rapid investigation of the potential biochemical capabilities of genes and genomes by the biological researcher, and is useful in biochemical as well as comparative and evolutionary studies of pathways and networks.  相似文献   

15.
Given the availability of complete genome sequences from related organisms, sequence conservation can provide important clues for predicting gene structure. In particular, one should be able to leverage information about known genes in one species to help determine the structures of related genes in another. Such an approach is appealing in that high-quality gene prediction can be achieved for newly sequenced species, such as mouse and puffer fish, using the extensive knowledge that has been accumulated about human genes. This article reports a novel approach to predicting the exon-intron structures of mouse genes by incorporating constraints from orthologous human genes using techniques that have previously been exploited in speech and natural language processing applications. The approach uses a context-free grammar to parse a training corpus of annotated human genes. A statistical training procedure produces a weighted recursive transition network (RTN) intended to capture the general features of a mammalian gene. This RTN is expanded into a finite state transducer (FST) and composed with an FST capturing the specific features of the human orthologue. This model includes a trigram language model on the amino acid sequence as well as exon length constraints. A final stage uses the free software package ClustalW to align the top n candidates in the search space. For a set of 98 orthologous human-mouse pairs, we achieved 96% sensitivity and 97% specificity at the exon level on the mouse genes, given only knowledge gleaned from the annotated human genome.  相似文献   

16.
In the context of the international project aimed at sequencing the whole genome of Bacillus subtilis we have developed a non-redundant, fully annotated database of sequences from this organism. Starting from the B.subtilis sequences available in the EMBL, GenBank and DDBJ collections we have removed all encountered duplications and then added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage, etc.) We have also added cross-references to the EMBL, MEDLINE, SWISS-PROT and ENZYME data banks. The present system results from merging of the NRSub and SubtiList databases and the sequence contigs used in the two systems are identical. NRSub is distributed as a flatfile in EMBL format (which is supported by most sequence analysis software packages) and as an ACNUC database, while SubtiList is distributed as a relational database under 4th Dimension. It is possible to access the data through two dedicated World Wide Web servers located in France and Japan.  相似文献   

17.
Rawlings ND  Morton FR 《Biochimie》2008,90(2):243-259
Many of the 181 families of peptidases contain homologues that are known to have functions other than peptide bond hydrolysis. Distinguishing an active peptidase from a homologue that is not a peptidase requires specialist knowledge of the important active site residues, because replacement or lack of one of these catalytic residues is an important clue that the homologue in question is unlikely to hydrolyse peptide bonds. Now that the rate at which proteins are characterized is outstripped by the rate that genome sequences are determined, many genes are being incorrectly annotated because only sequence similarity is taken into consideration. We present a tool called the MEROPS batch BLAST which not only performs a comparison against the MEROPS sequence collection, but also does a pair-wise alignment with the closest homologue detected and calculates the position of the active site residues. A non-peptidase homologue can be distinguished by the absence or unacceptable replacement of any of these residues. An analysis of peptidase homologues in the genome of the bacterium Erythrobacter litoralis is presented as an example.  相似文献   

18.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

19.
MOTIVATION: The availability of increasing amounts of sequence data about completely sequenced genomes spurs the development of new methods in the fields of automated annotation, and of comparative genomics. Tools allowing the visualization of results produced by analysis methods, superimposed on possibly annotated sequence data, and enabling synchronized navigation in multiple genomes, provide new means for interactive genome exploration. This kind of visual inspection can be used as a basis to assess the quality of new analysis algorithms, or to discover genome portions to be subjected to in-depth studies. RESULTS: We propose a software package, MuGeN, built for navigating through multiple annotated genomes. It is capable of retrieving annotated sequences in several formats, stored in local files, or available in databases over the network. From these, it then generates an interactive display, or an image file, in most common formats suitable for printing, further editing or integrating in Web pages. Genome maps may be mixed with computer analysis results loaded from XML files, whose format is generic enough to be adapted to a majority of sequence oriented analysis methods. AVAILABILITY: MuGeN is available at http://www-mig.jouy.inra.fr/bdsi/MuGeN.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号