首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
WindowMasker: window-based masker for sequenced genomes   总被引:3,自引:0,他引:3  
MOTIVATION: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes. RESULTS: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis. AVAILABILITY: WM is included in the NCBI C++ toolkit. The source code for the entire toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. Once the toolkit source is unpacked, the instructions for building WindowMasker application in the UNIX environment can be found in file src/app/winmasker/README.build. SUPPLEMENTARY INFORMATION: Supplementary data are available at ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf  相似文献   

2.
DNA/GUI (DNA Graphical User Interface) is an interactive software system for rapid and efficient analysis of images of the types used in genome mapping, such as autoradiograms and electrophoretic gels. Images are digitized using a commercially available charge-coupled-device (CCD) camera system and analyzed on a graphics workstation using a menu-driven user interface. DNA/GUI features automatic lane and band detection, simultaneous display of multiple images and a unique spatial-normalization algorithm. Images and their associated data are archived and easily available for later recall. Preliminary results indicate that DNA/GUI is a useful tool in the analysis and comparison of images used in a variety of applications such as genetic-linkage analysis and DNA restriction mapping. The interactive display software is based on the X Window System and is therefore readily portable to a variety of graphics workstations.  相似文献   

3.
MOTIVATION: Recently novel classes of functional RNAs, most prominently the miRNAs have been discovered, strongly suggesting that further types of functional RNAs are still hidden in the recently completed genomic DNA sequences. Only few techniques are known, however, to survey genomes for such RNA genes. When sufficiently similar sequences are not available for comparative approaches the only known remedy is to search directly for structural features. RESULTS: We present here efficient algorithms for computing locally stable RNA structures at genome-wide scales. Both the minimum energy structure and the complete matrix of base pairing probabilities can be computed in theta(N x L2) time and theta(N + L2) memory in terms of the length N of the genome and the size L of the largest secondary structure motifs of interest. In practice, the 100 Mb of the complete genome of Caenorhabditis elegans can be folded within about half a day on a modern PC with a search depth of L = 100. This is sufficient example for a survey for miRNAs. AVAILABILITY: The software described in this contribution will be available for download at http://www.tbi.univie.ac.at/~ivo/RNA/ as part of the Vienna RNA Package.  相似文献   

4.
5.
EasyExonPrimer     
EasyExonPrimer is a web-based software that automates the design of PCR primers to amplify exon sequences from genomic DNA. EasyExonPrimer is written in Perl and uses Primer3 to design PCR primers based on the genome builds and annotation databases available at the University of California, Santa Cruz (UCSC) Genome Browser database (http://genome.ucsc.edu/). It masks repeats and known single nucleotide polymorphism (SNP) sites in the genome and designs standardised primers using optimised conditions. Users can input genes by RefSeq mRNA ID, gene name or keyword. The primer design is optimised for large-scale resequencing of exons. For exons larger than 1 kb, the user has the option of breaking the exon sequence down into overlapping smaller fragments. All primer pairs are then verified using the In-Silico PCR software to test for uniqueness in the genome. We have designed >1000 pairs of primers for 90 genes; 95% of the primer pairs successfully amplified exon sequences under standard PCR conditions without requiring further optimisation. AVAILABILITY: EasyExonPrimer is available from http://129.43.22.27/~primer/. The source code is also available upon request. CONTACT: Xiaolin Wu (forestwu@mail.nih.gov).  相似文献   

6.
MOTIVATION: Advances in DNA microarray technology and computational methods have unlocked new opportunities to identify 'DNA fingerprints', i.e. oligonucleotide sequences that uniquely identify a specific genome. We present an integrated approach for the computational identification of DNA fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable definition of a DNA fingerprint stated both from a computational as well as an experimental point of view, and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our approach. RESULTS: The presented computational approach is implemented in an integrated high-performance computing (HPC) software tool for oligonucleotide fingerprint identification termed TOFI. We employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for Francisella tularensis are presented. AVAILABILITY: The implemented algorithm is available upon request.  相似文献   

7.
A Genomic Islands (GI) is a chunk of DNA sequence in a genome whose origin can be traced back to other organisms or viruses. The detection of GIs plays an indispensable role in biomedical research, due to the fact that GIs are highly related to special functionalities such as disease-causing GIs - pathogenicity islands. It is also very important to visualize genomic islands, as well as the supporting features corresponding to the genomic islands in the genome. We have developed a program, Genomic Island Visualization (GIV), which displays the locations of genomic islands in a genome, as well as the corresponding supportive feature information for GIs. GIV was implemented in C++, and was compiled and executed on Linux/Unix operating systems.

Availability

GIV is freely available for non-commercial use at http://www5.esu.edu/cpsc/bioinfo/software/GIV  相似文献   

8.
DNA microarray assays represent the first widely used application that attempts to build upon the information provided by genome projects in the study of biological questions. One of the greatest challenges with working with microarrays is collecting, managing, and analyzing data. Although several commercial and noncommercial solutions exist, there is a growing body of freely available, open source software that allows users to analyze data using a host of existing techniques and to develop their own and integrate them within the system. Here we review three of the most widely used and comprehensive systems, the statistical analysis tools written in R through the Bioconductor project (http://www.bioconductor.org), the Java-based TM4 software system available from The Institute for Genomic Research (http://www.tigr.org/software), and BASE, the Web-based system developed at Lund University (http://base.thep.lu.se).  相似文献   

9.
Identifying bacterial genes and endosymbiont DNA with Glimmer   总被引:11,自引:0,他引:11  
MOTIVATION: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. RESULTS: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. AVAILABILITY: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.  相似文献   

10.

Visualizing regions of conserved synteny between two genomes is supported by numerous software applications. However, none of the current applications allow researchers to select genome features to display or highlight in blocks of synteny based on the annotated biological properties of the features (e.g., type, function, and/or phenotype association). To address this usability gap, we developed an interactive web-based conserved synteny browser, The Jackson Laboratory (JAX) Synteny Browser. The browser allows researchers to highlight or selectively display genome features in the reference and/or the comparison genome according to the biological attributes of the features. Although the current implementation for the browser is limited to the reference genomes for the laboratory mouse and human, the software platform is intentionally genome agnostic. The JAX Synteny Browser software can be deployed for any two genomes where genome coordinates for syntenic blocks are defined and for which biological attributes of the features in one or both genomes are available in widely used standard bioinformatics file formats. The JAX Synteny Browser is available at: http://syntenybrowser.jax.org/. The code base is available from GitHub: https://github.com/TheJacksonLaboratory/syntenybrowser and is distributed under the Creative Commons Attribution license (CC BY).

  相似文献   

11.
We have developed a website, www.in-silico.com, which runs a software program that performs three basic tasks in completely sequenced bacterial genomes by in silico analysis: PCR amplification, amplified fragment length polymorphism (AFLP-PCR) and endonuclease restriction. For PCR, after selection of the genome and introduction of primers, fragment size, DNA sequence and corresponding open reading frame (ORF) identity of the resulting PCR product is computed. Plasmids of sequenced species may be included in the analysis. Theoretical AFLP-PCR analyzes similar parameters, and includes a suggestion tool providing a list of commercial restriction enzyme pairs yielding up to 50 amplicons in the selected genome. Endonuclease restriction analysis of complete genomes and plasmids calculates the number of restriction sites for endonucleases in a given genome. If the number of fragments is 50 or fewer, pulsed field gel electrophoresis image and restriction maps are illustrated. Other tools that have been included in this site are ORF search by name and DNA to protein translation as well as restriction digestion of user-defined DNA sequences. AVAILABILITY: This is a new molecular biology resource freely available over the Internet at http://www.in-silico.com  相似文献   

12.
SUMMARY: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. AVAILABILITY: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/.  相似文献   

13.
DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/.  相似文献   

14.

Background

Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results

We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker.

Conclusions

We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0377-z) contains supplementary material, which is available to authorized users.  相似文献   

15.
16.
MPP is a Java application, encompassing both new and established algorithms, for the analysis of gene and marker content datasets arising from high-throughput microarray techniques. MPP analyses flat file output from microarray experiments to determine the probability of the presence or absence of genes or markers within a genome. MPP can construct gene or marker content datasets for a number of genomes and can use the data to estimate an evolutionary tree or network. Results from gene content analyses may be validated by comparing them to known gene contents. MPP was initially developed to analyse data derived from comparative genome hybridization (CGH) microarray experiments in fungi and bacteria. It has recently been adapted to analyse retrotransposon-based insertion polymorphism (RBIP) marker scores derived from tagged microarray marker (TAM) experiments in pea. New analytical procedures may be added easily to MPP as plugins in order to increase the scope of the software. AVAILABILITY: MPP source code, executables and online help are available at http://cbr.jic.ac.uk/dicks/software/  相似文献   

17.
REGANOR     
With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY: The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation  相似文献   

18.
Unicellular eukaryotes were among the first ones to be selected for complete genome sequencing because of the small size of their genomes and their interactions with humans and a broad range of animals and plants. Currently, ten completely sequenced unicellular genome sequences have been publicly released and as the number of available unicellular genomes increases, comparative genomics analysis within this group of organisms becomes more and more instructive. However, such an analysis is difficult to carry out without a suitable platform gathering not only the original annotations but also relevant information available in public databases or obtained by applying common bioinformatics methods. With the aim of solving these difficulties, we have developed a web-accessible database named u-Genome, the unicellular genome design database. The database is unique in featuring three datasets namely (1) orthologous proteins (2) paralogous proteins and (3) statistical distributions on exons, introns, intergenic DNA and correlations between them. A tool, Uniview, designed to visualize the gene structures for individual genes in the genome is also integrated. This database is of importance in understanding unicellular genome design and architecture and evolution related studies. The database is available through a web interface at http://sege.ntu.edu.sg/wester/ugenome.  相似文献   

19.
The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号