首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The University of California Santa Cruz (UCSC) Genome Bioinformatics website consists of a suite of free, open-source, on-line tools that can be used to browse, analyze, and query genomic data. These tools are available to anyone who has an Internet browser and an interest in genomics. The website provides a quick and easy-to-use visual display of genomic data. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. Many of the annotation tracks are submitted by scientists worldwide; the others are computed by the UCSC Genome Bioinformatics group from publicly available sequence data. It also allows users to upload and display their own experimental results or annotation sets by creating a custom track. The suite of tools, downloadable data files, and links to documentation and other information can be found at http://genome.ucsc.edu/.  相似文献   

2.
The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information.  相似文献   

3.
Next-generation sequencing has yielded a vast amount of cattle genomic data for global characterization of population genetic diversity and identification of genomic regions under natural and artificial selection. However, efficient storage, querying, and visualization of such large datasets remain challenging. Here, we developed a comprehensive database, the Bovine Genome Variation Database (BGVD). It provides six main functionalities: gene search, variation search, genomic signature search, Genome Browser, alignment search tools, and the genome coordinate conversion tool. BGVD contains information on genomic variations comprising ~60.44 M SNPs, ~6.86 M indels, 76,634 CNV regions, and signatures of selective sweeps in 432 samples from modern cattle worldwide. Users can quickly retrieve distribution patterns of these variations for 54 cattle breeds through an interactive source of breed origin map, using a given gene symbol or genomic region for any of the three versions of the bovine reference genomes (ARS-UCD1.2, UMD3.1.1, and Btau 5.0.1). Signals of selection sweep are displayed as Manhattan plots and Genome Browser tracks. To further investigate and visualize the relationships between variants and signatures of selection, the Genome Browser integrates all variations, selection data, and resources, from NCBI, the UCSC Genome Browser, and Animal QTLdb. Collectively, all these features make the BGVD a useful archive for in-depth data mining and analyses of cattle biology and cattle breeding on a global scale. BGVD is publicly available at http://animal.nwsuaf.edu.cn/BosVar.  相似文献   

4.
The UCSC Known Genes   总被引:17,自引:0,他引:17  
The University of California Santa Cruz (UCSC) Known Genes dataset is constructed by a fully automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank. The detailed steps of this process are described. Extensive cross-references from this dataset to other genomic and proteomic data were constructed. For each known gene, a details page is provided containing rich information about the gene, together with extensive links to other relevant genomic, proteomic and pathway data. As of July 2005, the UCSC Known Genes are available for human, mouse and rat genomes. The Known Genes serves as a foundation to support several key programs: the Genome Browser, Proteome Browser, Gene Sorter and Table Browser offered at the UCSC website. All the associated data files and program source code are also available. They can be accessed at http://genome.ucsc.edu. The genomic coverage of UCSC Known Genes, RefSeq, Ensembl Genes, H-Invitational and CCDS is analyzed. Although UCSC Known Genes offers the highest genomic and CDS coverage among major human and mouse gene sets, more detailed analysis suggests all of them could be further improved.  相似文献   

5.
6.
Understanding the evolutionary history and adaptive process depends on the knowledge that we can acquire from both ancient and modern genomic data. With the availability of a deluge of whole-genome sequencing data from ancient and modern goat samples, a user-friendly database making efficient reuse of these important resources is needed. Here, we use the genomes of 208 modern domestic goats, 24 bezoars, 46 wild ibexes, and 82 ancient goats to present a comprehensive goat genome variation database(GGVD). GGVD hosts a total of ~41.44 million SNPs, ~5.14 million indels, 6,193 selected loci, and 112 introgression regions. Users can freely visualize the frequency of genomic variations in geographical maps,selective sweeps in interactive tables, Manhattan plots, or line charts, as well as the heatmap patterns of the SNP genotype. Ancient data can be shown in haplotypes to track the state of genetic variants of selection and introgression events in the early, middle, and late stages. For facilitating access to sequence features, the UCSC Genome Browser, BLAT, BLAST, Lift Over, and pcadapt are also integrated into GGVD.GGVD will be a convenient tool for population genetic studies and molecular marker designing in goat breeding programs, and it is publicly available at http://animal.nwsuaf.edu.cn/Goat Var.  相似文献   

7.
The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.  相似文献   

8.
There are four sequenced and publicly available plant genomes to date. With many more slated for completion, one challenge will be to use comparative genomic methods to detect novel evolutionary patterns in plant genomes. This research requires sequence alignment algorithms to detect regions of similarity within and among genomes. However, different alignment algorithms are optimized for identifying different types of homologous sequences. This review focuses on plant genome evolution and provides a tutorial for using several sequence alignment algorithms and visualization tools to detect useful patterns of conservation: conserved non-coding sequences, false positive noise, subfunctionalization, synteny, annotation errors, inversions and local duplications. Our tutorial encourages the reader to experiment online with the reviewed tools as a companion to the text.  相似文献   

9.
Recent advances, such as the availability of extensive genome survey sequence (GSS) data and draft physical maps, are radically transforming the means by which we can dissect Brassica genome structure and systematically relate it to the Arabidopsis model. Hitherto, our view of the co-linearities between these closely related genomes had been largely inferred from comparative RFLP data, necessitating substantial interpolation and expert interpretation. Sequencing of the Brassica rapa genome by the Multinational Brassica Genome Project will, however, enable an entirely computational approach to this problem. Meanwhile we have been developing databases and bioinformatics tools to support our work in Brassica comparative genomics, including a recently completed draft physical map of B. rapa integrated with anchor probes derived from the Arabidopsis genome sequence. We are also exploring new ways to display the emerging Brassica-Arabidopsis sequence homology data. We have mapped all publicly available Brassica sequences in silico to the Arabidopsis TIGR v5 genome sequence and published this in the ATIDB database that uses Generic Genome Browser (GBrowse). This in silico approach potentially identifies all paralogous sequences and so we colour-code the significance of the mappings and offer an integrated, real-time multiple alignment tool to partition them into paralogous groups. The MySQL database driving GBrowse can also be directly interrogated, using the powerful API offered by the Perl BioColon, two colonsDBColon, two colonsGFF methods, facilitating a wide range of data-mining possibilities.  相似文献   

10.

Background  

Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.  相似文献   

11.

Background  

The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups.  相似文献   

12.
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.  相似文献   

13.
EasyExonPrimer     
EasyExonPrimer is a web-based software that automates the design of PCR primers to amplify exon sequences from genomic DNA. EasyExonPrimer is written in Perl and uses Primer3 to design PCR primers based on the genome builds and annotation databases available at the University of California, Santa Cruz (UCSC) Genome Browser database (http://genome.ucsc.edu/). It masks repeats and known single nucleotide polymorphism (SNP) sites in the genome and designs standardised primers using optimised conditions. Users can input genes by RefSeq mRNA ID, gene name or keyword. The primer design is optimised for large-scale resequencing of exons. For exons larger than 1 kb, the user has the option of breaking the exon sequence down into overlapping smaller fragments. All primer pairs are then verified using the In-Silico PCR software to test for uniqueness in the genome. We have designed >1000 pairs of primers for 90 genes; 95% of the primer pairs successfully amplified exon sequences under standard PCR conditions without requiring further optimisation. AVAILABILITY: EasyExonPrimer is available from http://129.43.22.27/~primer/. The source code is also available upon request. CONTACT: Xiaolin Wu (forestwu@mail.nih.gov).  相似文献   

14.
Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes.  相似文献   

15.
16.
The rapid development of high-throughput sequencing technologies has led to a dramatic decrease in the money and time required for de novo genome sequencing or genome resequencing projects, with new genome sequences constantly released every week. Among such projects, the plethora of updated genome assemblies induces the requirement of version-dependent annotation files and other compatible public dataset for downstream analysis. To handle these tasks in an efficient manner, we developed the reference-based genome assembly and annotation tool (RGAAT), a flexible toolkit for resequencing-based consensus building and annotation update. RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes and SAMtools on four DNA-seq datasets tested in this study. RGAAT can also identify sequence variants based on cross-cultivar or cross-version genomic alignments. Unlike GATK and SAMtools/BCFtools, RGAAT builds the consensus sequence by taking into account the true allele frequency. Finally, RGAAT generates a coordinate conversion file between the reference and query genomes using sequence variants and supports annotation file transfer. Compared to the rapid annotation transfer tool (RATT), RGAAT displays better performance characteristics for annotation transfer between different genome assemblies, strains, and species. In addition, RGAAT can be used for genome modification, genome comparison, and coordinate conversion. RGAAT is available at https://sourceforge.net/projects/rgaat/ and https://github.com/wushyer/RGAAT_v2 at no cost.  相似文献   

17.
The past decade has seen the completion of numerous whole-genome sequencing projects, began with bacterial genomes and continued with eukaryotic species from different phyla: fungi, plants and animals. Besides, more biological information are produced and are shared thanks to information exchange systems, and more biological concepts, as well as more bioinformatics tools, are available. In this article, we will describe how the evolutionary biology concepts, as well as computer science, are useful for a better understanding of biology in general and genome annotation in particular. The genome annotation process consists of taking the raw DNA produced, for example, by the genome sequencing projects, adding the layers of analysis and interpretation necessary to extract its biological significance and placing it in the context of our understanding of biological processes. Genome annotation is a multistep process falling into two broad categories: structural and functional annotation.  相似文献   

18.
19.
Linkage disequilibrium (LD) is an essential metric for selecting single-nucleotide polymorphisms (SNPs) to use in genetic studies and identifying causal variants from significant tag SNPs. The explosion in the number of polymorphisms that can now be genotyped by commercial arrays makes the interpretation of triangular correlation plots, commonly used for visualizing LD, extremely difficult in particular when large genomics regions need to be considered or when SNPs in perfect LD are not adjacent but scattered across a genomic region. We developed ArchiLD, a user-friendly graphical application for the hierarchical visualization of LD in human populations. The software provides a powerful framework for analyzing LD patterns with a particular focus on blocks of SNPs in perfect linkage as defined by r2. Thanks to its integration with the UCSC Genome Browser, LD plots can be easily overlapped with additional data on regulation, conservation and expression. ArchiLD is an intuitive solution for the visualization of LD across large or highly polymorphic genomic regions. Its ease of use and its integration with the UCSC Genome Browser annotation potential facilitates the interpretation of association results and enables a more informed selection of tag SNPs for genetic studies.  相似文献   

20.
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号