首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available.  相似文献   

2.
High‐throughput sequencing makes it possible to evaluate thousands of genetic markers across genomes and populations. Reduced‐representation sequencing approaches, like double‐digest restriction site‐associated DNA sequencing (ddRADseq), are frequently applied to screen for genetic variation. In particular in nonmodel organisms where whole‐genome sequencing is not yet feasible, ddRADseq has become popular as it allows genomewide assessment of variation patterns even in the absence of other genomic resources. However, while many tools are available for the analysis of ddRADseq data, few options exist to simulate ddRADseq data in order to evaluate the accuracy of downstream tools. The available tools either focus on the optimization of ddRAD experiment design or do not provide the information necessary for a detailed evaluation of different ddRAD analysis tools. For this task, a ground truth, that is, the underlying information of all effects in the data set, is required. Therefore, we here present ddrage , the ddRA D Data Set Ge nerator, that allows both developers and users to evaluate their ddRAD analysis software. ddrage allows the user to adjust many parameters such as coverage and rates of mutations, sequencing errors or allelic dropouts, in order to generate a realistic simulated ddRADseq data set for given experimental scenarios and organisms. The simulated reads can be easily processed with available analysis software such as stacks or pyrad and evaluated against the underlying parameters used to generate the data to gauge the impact of different parameter values used during downstream data processing.  相似文献   

3.
NEBcutter, version 1.0, is a program available via a web server (http://tools.neb.com/NEBcutter) that will accept an input DNA sequence and produce a comprehensive report of the restriction enzymes that will cleave the sequence. It produces a variety of outputs including restriction enzyme maps, theoretical digests and links into the restriction enzyme database, REBASE (http://www.neb.com/rebase). Importantly, its table of recognition sites is updated daily from REBASE and it marks all sites that are potentially affected by DNA methylation (Dam, Dcm, etc.). Many options exist to choose the enzymes used for digestion, including all known specificities, subsets of those that are commercially available or sets of enzymes that produce compatible termini.  相似文献   

4.
A computer program package for the storage, change, and comparison of restriction maps is described. The programs are intended to detect overlaps between relatively short (about 10-40 kb; abbreviations ref.2) maps and to merge the overlapping fragments into large restriction maps. They run on a 16-bit-microcomputer with limited memory and addressing capability. Due to the restricted reliability of restriction maps compared with DNA sequence data a particular storage method was used. The source code of the programs is freely available (+).  相似文献   

5.
We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects.  相似文献   

6.
Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.  相似文献   

7.
MAXAMIZE. A DNA sequencing strategy advisor.   总被引:2,自引:1,他引:1       下载免费PDF全文
The MAXAMIZE advisory system determines from user-provided restriction maps an optimal strategy to do nucleotide sequencing by methods involving end-labeled fragments. The maps may be either simple linear restriction maps of fragments or complex circular maps including restriction sites of a vector. The whole system is interactive and is written in the Genetic English language provided by the GENESIS System, a molecular genetics knowledge representation and manipulation package. In addition, MAXAMIZE provides bookkeeping facilities for sequencing and offers advise on how to verify the newly obtained sequence data.  相似文献   

8.
Yeast mitochondrial DNA-pBR322 recombinant DNA molecules known to contain tRNA genes from a tRNA rich region of the yeast genome were used as a source of DNA for restriction mapping and tRNA gene sequence analysis. We report here restriction maps of two segments of yeast mitochondrial DNA and the sequence of mitochondrial genes coding for tRNAglyGGR and tRNAvalGUR. Both genes are flanked by A + T rich DNA and neither has an intervening sequence nor codes for a 3' CCA end. The tRNA structures deduced from the genes have the usual cloverleaf structures and invariant nucleotides. This combination of DNA sequencing and restriction mapping has enabled us to determine that the tRNAvalGUR and a previously sequenced tRNA, the tRNApheUUY are transcribed from the same strand of DNA.  相似文献   

9.
M J Kelly 《Génome》1989,31(2):1027-1033
Mapping and sequencing the human genome will generate large amounts of data, which must be sorted, analyzed, and stored for rapid retrieval to complete this enormous task. Computers and their software programs provide the most important tool to the molecular biologist today. A discussion of current capabilities and future needs in computer hardware and software for the human genome project is the topic of this paper. The use of computer programs to generate restriction maps, manage clone libraries, manage sequence projects, and generate consensus sequences is presented. The use of computers to communicate useful information rapidly to scientific colleagues is also mentioned. The role of both GenBank and BIONET is central to the dissemination and analysis of sequence information. The capabilities of electronic communication worldwide for assisting this project is available on the BIONET National Computer Resource, using existing networks.  相似文献   

10.
An algorithm has been developed for the determination of nucleotide sequence from data produced in fluorescence-based automated DNA sequencing instruments employing the four-color strategy. This algorithm takes advantage of object oriented programming techniques for modularity and extensibility. The algorithm is adaptive in that data sets from a wide variety of instruments and sequencing conditions can be used with good results. Confidence values are provided on the base calls as an estimate of accuracy. The algorithm iteratively employs confidence determinations from several different modules, each of which examines a different feature of the data for accurate peak identification. Modules within this system can be added or removed for increased performance or for application to a different task. In comparisons with commercial software, the algorithm performed well.  相似文献   

11.
The EMBL Nucleotide Sequence Database.   总被引:6,自引:1,他引:5       下载免费PDF全文
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.  相似文献   

12.
The paper describes a package of APL-programs suited for the management and the analysis of DNA sequence data. Most of the application programs are related to experimental work in a DNA sequencing laboratory: Search for overlapping DNA fragments to construct complete DNA sequences; search for restriction sites; computing fragment patterns when cutting a DNA with restriction enzymes; plotting physical and genetic maps of a DNA, search for homologies as well as various counting procedures. More sophisticated programs are concerned with the prediction of RNA secondary structure and its graphical representation.  相似文献   

13.
Abstract The diversity of resolvase ( tnpR ) genes carried by a number of mercury resistant soil bacteria has been investigated by DNA sequencing. The resulting DNA sequence information was compared to previously published tnp R. DNA sequences and to previously published restriction fragment length polymorphism (RFLP) data, permitting the relationships between DNA sequencing and RFLP approaches to be studied by the use of phylogenetic trees. DNA maximum likelihood and DNA parsimony were used to construct a variety of phylogenetic trees. DNA sequencing confirmed the validity of RFLP analysis and highlighted the importance of restriction endonuclease choice upon the resulting RFLP patterns and dendrogram topology. The tnp R genes of two previously uncharacterised mercury resistant bacteria, T2–7 and T2–12 were also studied. DNA sequence data placed T2–7 in a previously described gene class, tnp R-D and T2–12 in a new gene class, tnp R-F. The significance of this data with respect to the recombination and evolution events occurring within bacterial populations are discussed.  相似文献   

14.
The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.  相似文献   

15.
We offer a guide to de novo genome assembly1 using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software.  相似文献   

16.
A strategy of DNA sequencing employing computer programs.   总被引:65,自引:31,他引:34       下载免费PDF全文
With modern fast sequencing techniques and suitable computer programs it is now possible to sequence whole genomes without the need of restriction maps. This paper describes computer programs that can be used to order both sequence gel readings and clones. A method of coding for uncertainties in gel readings is described. These programs are available on request.  相似文献   

17.
We have developed a website, www.in-silico.com, which runs a software program that performs three basic tasks in completely sequenced bacterial genomes by in silico analysis: PCR amplification, amplified fragment length polymorphism (AFLP-PCR) and endonuclease restriction. For PCR, after selection of the genome and introduction of primers, fragment size, DNA sequence and corresponding open reading frame (ORF) identity of the resulting PCR product is computed. Plasmids of sequenced species may be included in the analysis. Theoretical AFLP-PCR analyzes similar parameters, and includes a suggestion tool providing a list of commercial restriction enzyme pairs yielding up to 50 amplicons in the selected genome. Endonuclease restriction analysis of complete genomes and plasmids calculates the number of restriction sites for endonucleases in a given genome. If the number of fragments is 50 or fewer, pulsed field gel electrophoresis image and restriction maps are illustrated. Other tools that have been included in this site are ORF search by name and DNA to protein translation as well as restriction digestion of user-defined DNA sequences. AVAILABILITY: This is a new molecular biology resource freely available over the Internet at http://www.in-silico.com  相似文献   

18.
19.
Yersinia pestis is the causative agent of the bubonic, septicemic, and pneumonic plagues (also known as black death) and has been responsible for recurrent devastating pandemics throughout history. To further understand this virulent bacterium and to accelerate an ongoing sequencing project, two whole-genome restriction maps (XhoI and PvuII) of Y. pestis strain KIM were constructed using shotgun optical mapping. This approach constructs ordered restriction maps from randomly sheared individual DNA molecules directly extracted from cells. The two maps served different purposes; the XhoI map facilitated sequence assembly by providing a scaffold for high-resolution alignment, while the PvuII map verified genome sequence assembly. Our results show that such maps facilitated the closure of sequence gaps and, most importantly, provided a purely independent means for sequence validation. Given the recent advancements to the optical mapping system, increased resolution and throughput are enabling such maps to guide sequence assembly at a very early stage of a microbial sequencing project.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号