首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In pairwise end sequencing, sequences are determined from both ends of random subclones derived from a DNA target. Sufficiently similar overlapping end sequences are identified and grouped into contigs. When a clone's paired end sequences fall in different contigs, the contigs are connected together to form scaffolds. Increasingly, the goals of pairwise strategies are large and highly repetitive genomic targets. Here, we consider large-scale pairwise strategies that employ mixtures of subclone sizes. We explore the properties of scaffold formation within a hybrid theory/simulation mathematical model of a genomic target that contains many repeat families. Using this model, we evaluate problems that may arise, such as falsely linked end sequences (due either to random matches or to homologous repeats) and scaffolds that terminate without extending the full length of the target. We illustrate our model with an exploration of a strategy for sequencing the human genome. Our results show that, for a strategy that generates 10-fold sequence coverage derived from the ends of clones ranging in length from 2 to 150 kb, using an appropriate rule for detecting overlaps, we expect few false links while obtaining a single scaffold extending the length of each chromosome.  相似文献   

2.
A system for shotgun DNA sequencing.   总被引:651,自引:197,他引:651       下载免费PDF全文
A multipurpose cloning site has been introduced into the gene for beta-galactosidase (beta-D-galactosidegalactohydrolase, EC 3.21.23) on the single-stranded DNA phage M13mp2 (Gronenborn, B. and Messing, J., (1978) Nature 272, 375-377) with the use of synthetic DNA. The site contributes 14 additional codons and does not affect the ability of the lac gene product to undergo intracistronic complementation. Two restriction endonuclease cleavage sites in the viral gene II were removed by single base-pair mutations. Using the new phage M13mp7, DNA fragments generated by cleavage with a variety of different restriction endonucleases can be cloned directly. The nucleotide sequences of the cloned DNAs can be determined rapidly by DNA synthesis using chain terminators and a synthetic oligonucleotide primer complementary to 15 bases preceeding the new array of restriction sites.  相似文献   

3.
Shi C  Hu N  Huang H  Gao J  Zhao YJ  Gao LZ 《PloS one》2012,7(2):e31468

Background

Chloroplast genomes supply valuable genetic information for evolutionary and functional studies in plants. The past five years have witnessed a dramatic increase in the number of completely sequenced chloroplast genomes with the application of second-generation sequencing technology in plastid genome sequencing projects. However, cost-effective high-throughput chloroplast DNA (cpDNA) extraction becomes a major bottleneck restricting the application, as conventional methods are difficult to make a balance between the quality and yield of cpDNAs.

Methodology/Principal Findings

We first tested two traditional methods to isolate cpDNA from the three species, Oryza brachyantha, Leersia japonica and Prinsepia utihis. Both of them failed to obtain properly defined cpDNA bands. However, we developed a simple but efficient method based on sucrose gradients and found that the modified protocol worked efficiently to isolate the cpDNA from the same three plant species. We sequenced the isolated DNA samples with Illumina (Solexa) sequencing technology to test cpDNA purity according to aligning sequence reads to the reference chloroplast genomes, showing that the reference genome was properly covered. We show that 40–50% cpDNA purity is achieved with our method.

Conclusion

Here we provide an improved method used to isolate cpDNA from angiosperms. The Illumina sequencing results suggest that the isolated cpDNA has reached enough yield and sufficient purity to perform subsequent genome assembly. The cpDNA isolation protocol thus will be widely applicable to the plant chloroplast genome sequencing projects.  相似文献   

4.
The classical theory of shotgun DNA sequencing accounts for neither the placement dependencies that are a fundamental consequence of the forward-reverse sequencing strategy, nor the edge effect that arises for small to moderate-sized genomic targets. These phenomena are relevant to a number of sequencing scenarios, including large-insert BAC and fosmid clones, filtered genomic libraries, and macro-nuclear chromosomes. Here, we report a model that considers these two effects and provides both the expected value of coverage and its variance. Comparison to methyl-filtered maize data shows significant improvement over classical theory. The model is used to analyze coverage performance over a range of small to moderately-sized genomic targets. We find that the read pairing effect and the edge effect interact in a non-trivial fashion. Shorter reads give superior coverage per unit sequence depth relative to longer ones. In principle, end-sequences can be optimized with respect to template insert length; however, optimal performance is unlikely to be realized in most cases because of inherent size variation in any set of targets. Conversely, single-stranded reads exhibit roughly the same coverage attributes as optimized end-reads. Although linking information is lost, single-stranded data should not pose a significant assembly liability if the target represents predominantly low-copy sequence. We also find that random sequencing should be halted at substantially lower redundancies than those now associated with larger projects. Given the enormous amount of data generated per cycle on pyro-sequencing instruments, this observation suggests devising schemes to split each run cycle between twoor more projects. This would prevent over-sequencing and would further leverage the pyrosequencing method.  相似文献   

5.
6.
We developed a semi-automated genome analysis system called GAMBLER in order to support the current whole-genome sequencing project focusing on alkaliphilic Bacillus halodurans C-125. GAMBLER was designed to reduce the human intervention required and to reduce the complications in annotating thousands of ORFs in the microbial genome. GAMBLER automates three major routines: analyzing assembly results provided by genome assembler software, assigning ORFs, and homology searching. GAMBLER is equipped with an interface for convenience of annotation. All processes and options are manipulatable through a WWW browser that enables scientists to share their genome analysis results without choosing computer platforms.  相似文献   

7.
Belknap WR  Wang Y  Huo N  Wu J  Rockhold DR  Gu YQ  Stover E 《Génome》2011,54(12):1005-1015
The citrus cultivar Carrizo is the single most important rootstock to the US citrus industry and has resistance or tolerance to a number of major citrus diseases, including citrus tristeza virus, foot rot, and Huanglongbing (HLB, citrus greening). A Carrizo genomic sequence database providing approximately 3.5×genome coverage (haploid genome size approximately 367 Mb) was populated through 454 GS FLX shotgun sequencing. Analysis of the repetitive DNA fraction indicated a total interspersed repeat fraction of 36.5%. Assembly and characterization of abundant citrus Ty3/gypsy elements revealed a novel type of element containing open reading frames encoding a viral RNA-silencing suppressor protein (RNA binding protein, rbp) and a plant cytokinin riboside 5′-monophosphate phosphoribohydrolase-related protein (LONELY GUY, log). Similar gypsy elements were identified in the Populus trichocarpa genome. Gene-coding region analysis indicated that 24.4% of the nonrepetitive reads contained genic regions. The depth of genome coverage was sufficient to allow accurate assembly of constituent genes, including a putative phloem-expressed gene. The development of the Carrizo database (http://citrus.pw.usda.gov/) will contribute to characterization of agronomically significant loci and provide a publicly available genomic resource to the citrus research community.  相似文献   

8.
9.
Apple II software for M13 shotgun DNA sequencing.   总被引:18,自引:17,他引:18       下载免费PDF全文
A set of programs is presented for the reconstruction of a DNA sequence from data generated by the M13 shotgun sequencing technique. Once the sequence has been established and stored other programs are used for its analysis. The programs have been written for the Apple II microcomputer. A minimum investment is required for the hardware and the software is easily interchangeable between the growing number of interested researchers. Copies are available in ready to use form.  相似文献   

10.
Genomic V exons from whole genome shotgun data in reptiles   总被引:1,自引:0,他引:1  
Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).  相似文献   

11.
We have designed and implemented a system to manage whole genome shotgun sequences and whole genome sequence assembly data flow. The Sequence Assembly Manager (SAM) consists primarily of a MySQL relational database and Perl applications designed to easily manipulate and coordinate the analysis of sequence information and to view and report genome assembly progress through its Common Gateway Interface (CGI) web interface. The application includes a tool to compare sequence assemblies to fingerprint maps that has been used successfully to improve and validate both maps and sequence assemblies of the Rhodococcus sp.RHAI and Cryptococcus neoformans WM276 genomes.  相似文献   

12.
MOTIVATION: Contigs-Assembly and Annotation Tool-Box (CAAT-Box) is a software package developed for the computational part of a genome project where the sequence is obtained by a shotgun strategy. CAAT-Box contains new tools to predict links between contigs by using similarity searches with other whole genome sequences. Most importantly, it allows annotation of a genome to commence during the finishing phase using a gene-oriented strategy. For this purpose, CAAT-Box creates an Individual Protein file (IPF) for each ORF of an assembly. The nucleotide sequence reported in an IPF corresponds to the sequence of the ORF with 500 additional bases before the ORF and 200 bases after. For annotation, additional information like Blast results can be added or linked to the IPFs as well as automatic and/or manual annotations. When a new assembly is performed, CAAT-Box creates new IPFs according to the old IPF panel. CAAT-Box recognizes the modified IPFs which are the only ones used for a new automatic analysis after each assembly. Using this strategy, the user works with a group of IPFs independently of the closure phase progression. The IPFs are accessible by a web server and can therefore be modified and commented by different groups. RESULT: CAAT-Box was used to obtain and to annotate several complete genomes like Listeria monocytogenes or Streptococcus agalactiae. AVAILABILITY: The program may be obtained from the authors and is freely available to non-profit organisations.  相似文献   

13.
Microsatellites (simple sequence repeats, SSRs) are important genetic markers in tree breeding and conservation. Here we utilized high-throughput 454 sequencing technology to mine microsatellites from masson pine (MP) genomic DNA. First, we analyzed the characteristics of SSRs in all nonredundant MP reads (genome survey sequences, GSSs) and compared them with loblolly pine (LP) GSSs and BACs (bacterial artificial chromosome clone sequences), and three other nonconiferous species GSSs. Second, a set of MP GSS–SSR primer pairs were designed. There were extremely low overall GSS–SSR densities (28 SSR/Mb) in MP when compared with LP (48 SSR/Mb) and the other species. AT, AAT, AAAT, and AAAAAT were the richest motifs in di-, tri-, tetra-, and hexanucleotides, respectively. Two hundred forty GSS–SSR primer pairs were designed in total, and 20 novel polymorphic markers were identified using three populations (two natural and one clonal seed orchard) as evaluating samples. These markers should be useful for future MP population genetics studies.  相似文献   

14.
GEL--a computer tool for DNA sequencing projects.   总被引:1,自引:0,他引:1       下载免费PDF全文
The GEL program for entry and analysis of DNA sequencing information is discussed, and examples of interaction with the program are presented. The current version of the program represents the last of several revisions to the first GEL program, reported previously in this journal (1). Improvements and additions have been made, making the current GEL a particularly useful laboratory tool for molecular biologists engaged in DNA sequencing projects.  相似文献   

15.
DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of ‘C’-‘C’ alignments covering reference cytosines. Di-base color reads produced by lifetech’s SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton–Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/.  相似文献   

16.
While the sequencing of bacterial genomes has become a routine procedure at major sequencing centers, there are still a number of genome projects at small- or medium-size facilities. For these facilities a maximum of control over sequencing, assembling and finishing is essential. At the same time, facilities have to be able to co-operate at minimum costs for the overall project. We have established a pipeline for the distributed sequencing of Alcanivorax borkumensis SK2, Azoarcus sp. BH72, Clavibacter michiganensis subsp. michiganensis NCPPB382, Sorangium cellulosum So ce56 and Xanthomonas campestris pv. vesicatoria 85-10. Our pipeline relies on standard tools (e.g. PHRED/PHRAP, CAP3 and Consed/Autofinish) wherever possible, supplementing them with new tools (BioMake and BACCardI) to achieve the aims described above.  相似文献   

17.
We studied the possible impact of genomic projects by comparing the number of published articles before and after the completion of the project. We found that for most species, there is no significant change in the number of citations. Also our study remarks the growing importance of taxonomy as main motivation for the sequencing of genomes.  相似文献   

18.
The rate limiting step in a large-scale sequencing project is the generation of single-stranded DNA templates. We describe a fast, semiautomated procedure, using 96-well microtitre plates, in which 192 templates can be readily prepared in 1 day. The technique can be carried out manually or can be semiautomated using a robot pipetting device. We also provide evidence for the reliability and applicability of this method to a large-scale sequencing project.  相似文献   

19.
A quality control algorithm for DNA sequencing projects.   总被引:2,自引:0,他引:2       下载免费PDF全文
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号