首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A physical mapping strategy has been developed to verify and accelerate the assembly and gap closure phase of a microbial genome shotgun-sequencing project. The protocol was worked out during the ongoing Pseudomonas putida KT2440 genome project. A macro-restriction map was constructed by linking probe hybridisation of SwaI- or I-CeuI-restricted chromosomes to serve as a backbone for the quick quality control of sequence and contig assemblies. The library of PCR-generated SwaI linking probes was derived from the sequence assembly after 3- and 6-fold genome coverage. In order to support gap closure in regions with ambiguous assemblies such as the repetitive sequence of the seven ribosomal operons, high-resolution Smith/Birnstiel maps were generated by Southern hybridisation of pulsed-field gel electrophoresis-separated rare-cutter complete/frequent-cutter partial digestions with rare-cutter fragment end probes. Overall 1.5 Mb of the 6.1 Mb P.putida KT2440 genome has been subjected to high-resolution physical mapping in order to align assemblies generated from shotgun sequencing.  相似文献   

2.
While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200–900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.  相似文献   

3.
Retrotransposons are major components of eukaryotic genomes and are present in high copy numbers. We developed retrotransposon-based insertion polymorphism (RBIP) markers based on long terminal repeat (LTR) sequences and flanking genome regions by using shotgun genome sequence data of mango (Mangifera indica L.). Three novel LTR sequences were identified based on two LTR retrotransposon structural features; a 5′ LTR located upstream of the primer binding site and a 3′ LTR showing high sequence similarity to the 5′ LTR. Starting with 377 unique sequences containing both 3′ LTR and downstream genome region sequences, we developed 82 RBIP markers that were applied to DNA fingerprinting of 16 mango accession. Five RBIP markers were enough to distinguish all 16 accessions. Our result showed that LTR identification from shotgun genome sequences was effective for development of retrotransposon-based DNA markers without whole-genome sequence information. We discuss application of the developed RBIP markers for identification of genetic diversity and construction of a genetic linkage map.  相似文献   

4.
Whole genome shotgun sequence analysis has become the standard method for beginning to determine a genome sequence. The preparation of the shotgun sequence clones is, in fact, a biological experiment. It determines which segments of the genome can be cloned into Escherichia coli and which cannot. By analyzing the complete set of sequences from such an experiment, it is possible to identify genes lethal to E. coli. Among this set are genes encoding restriction enzymes which, when active in E. coli, lead to cell death by cleaving the E. coli genome at the restriction enzyme recognition sites. By analyzing shotgun sequence data sets we show that this is a reliable method to detect active restriction enzyme genes in newly sequenced genomes, thereby facilitating functional annotation. Active restriction enzyme genes have been identified, and their activity demonstrated biochemically, in the sequenced genomes of Methanocaldococcus jannaschii, Bacillus cereus ATCC 10987 and Methylococcus capsulatus.  相似文献   

5.
The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.  相似文献   

6.
微生物基因组空缺区域(Gap)中可能存在重要的生物学信息,如果无法补齐所有Gap,不仅不能获得完整的基因组图谱,还会给后续的基因组信息解读造成很大困难。而基因组空缺区域填充(Gap closure)是获得微生物基因组完成图的关键,本文结合作者以及借鉴上海人类基因组研究中心在微生物基因组Gap closure中的经验,针对微生物基因组Gap closure常用的6种策略:参考序列比对、多引物PCR、基因组步移、基因组文库克隆末端测序、末端配对(Paired-End)以及基因组光学图谱技术进行综述。  相似文献   

7.
Ab initio gene identification in metagenomic sequences   总被引:1,自引:0,他引:1  
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.  相似文献   

8.
In search of RNase P RNA from microbial genomes   总被引:2,自引:0,他引:2       下载免费PDF全文
Li Y  Altman S 《RNA (New York, N.Y.)》2004,10(10):1533-1540
A simple procedure has been developed to quickly retrieve and validate the DNA sequence encoding the RNA subunit of ribonuclease P (RNase P RNA) from microbial genomes. RNase P RNA sequences were identified from 94% of bacterial and archaeal complete genomes where previously no RNase P RNA was annotated. A sequence was found in camelpox virus, highly conserved in all orthopoxviruses (including smallpox virus), which could fold into a putative RNase P RNA in terms of conserved primary features and secondary structure. New structure features of RNase P RNA that enable one to distinguish bacteria from archaea and eukarya were found. This RNA is yet another RNA that can be a molecular criterion to divide the living world into three domains (bacteria, archaea, and eukarya). The catalytic center of this RNA, and its detection from some environmental whole genome shotgun sequences, is also discussed.  相似文献   

9.
High-Cot sequence analysis of the maize genome   总被引:10,自引:0,他引:10  
Higher eukaryotic genomes, including those from plants, contain large amounts of repetitive DNA that complicate genome analysis. We have developed a technique based on DNA renaturation which normalizes repetitive DNA, and thereby allows a more efficient outcome for full genome shotgun sequencing. The data indicate that sequencing the unrenatured outcome of a Cot experiment, otherwise known as High-Cot DNA, enriches genic sequences by more than fourfold in maize, from 5% for a random library to more than 20% for a High-Cot library. Using this approach, we predict that gene discovery would be greater than 95% and that the number of sequencing runs required to sequence the full gene space in maize would be at least fourfold lower than that required for full-genome shotgun sequencing.  相似文献   

10.
Venkatesh B  Dandona N  Brenner S 《Genomics》2006,87(2):307-310
Contrary to previous observations that fish genomes are devoid of nuclear mitochondrial pseudogenes, a genome-wide survey identified a large number of "recent" and "ancient" nuclear mitochondrial DNA fragments (Numts) in the whole-genome sequences of the fugu (Takifugu rubripes), Tetraodon nigroviridis, and zebrafish (Danio rerio). We have analyzed the latest assembly (v4.0) of the fugu genome and show that, like the Anopheles genome, the fugu nuclear genome does not contain mitochondrial pseudogenes. Fugu assembly v4.0 contains a single scaffold representing the near complete sequence of the fugu mitochondria. The "recent" Numts identified by the previous study in fugu assembly v2.0 are in fact shotgun sequences of mitochondrial DNA that were misassembled with the nuclear sequences, whereas the "ancient" Numts appear to be the result of spurious matches. It is likely that the Numts identified in the genomes of Tetraodon and zebrafish are also similar artifacts. Shotgun sequences of whole genomes often include some mitochondrial sequences. Therefore, any Numts identified in shotgun-sequence assemblies should be verified by Southern hybridization or PCR amplification.  相似文献   

11.
This is the first de novo assembly and annotation of a complete mitochondrial genome in the Ericales order from the American cranberry (Vaccinium macrocarpon Ait.). Moreover, only four complete Asterid mitochondrial genomes have been made publicly available. The cranberry mitochondrial genome was assembled and reconstructed from whole genome 454 Roche GS-FLX and Illumina shotgun sequences. Compared with other Asterids, the reconstruction of the genome revealed an average size mitochondrion (459,678 nt) with relatively little repetitive sequences and DNA of plastid origin. The complete mitochondrial genome of cranberry was annotated obtaining a total of 34 genes classified based on their putative function, plus three ribosomal RNAs, and 17 transfer RNAs. Maternal organellar cranberry inheritance was inferred by analyzing gene variation in the cranberry mitochondria and plastid genomes. The annotation of cranberry mitochondrial genome revealed the presence of two copies of tRNA-Sec and a selenocysteine insertion sequence (SECIS) element which were lost in plants during evolution. This is the first report of a land plant possessing selenocysteine insertion machinery at the sequence level.  相似文献   

12.
The human body consists of innumerable multifaceted environments that predispose colonization by a number of distinct microbial communities, which play fundamental roles in human health and disease. In addition to community surveys and shotgun metagenomes that seek to explore the composition and diversity of these microbiomes, there are significant efforts to sequence reference microbial genomes from many body sites of healthy adults. To illustrate the utility of reference genomes when studying more complex metagenomes, we present a reference-based analysis of sequence reads generated from 55 shotgun metagenomes, selected from 5 major body sites, including 16 sub-sites. Interestingly, between 13% and 92% (62.3% average) of these shotgun reads were aligned to a then-complete list of 2780 reference genomes, including 1583 references for the human microbiome. However, no reference genome was universally found in all body sites. For any given metagenome, the body site-specific reference genomes, derived from the same body site as the sample, accounted for an average of 58.8% of the mapped reads. While different body sites did differ in abundant genera, proximal or symmetrical body sites were found to be most similar to one another. The extent of variation observed, both between individuals sampled within the same microenvironment, or at the same site within the same individual over time, calls into question comparative studies across individuals even if sampled at the same body site. This study illustrates the high utility of reference genomes and the need for further site-specific reference microbial genome sequencing, even within the already well-sampled human microbiome.  相似文献   

13.
Rohwer F  Seguritan V  Choi DH  Segall AM  Azam F 《BioTechniques》2001,31(1):108-12, 114-6, 118
In the following report, thermal cycling coupled with random 10-mers as primers was used to construct randomly amplified shotgun libraries (RASLs). This approach allowed shotgun libraries to be constructed from nanogram quantities of input DNA. RASLs contained inserts from throughout a target genome in an unbiased fashion and did not appear to contain chimeric sequences. This protocol should be useful for shotgun sequencing the genomes of unculturable organisms and rapidly producing shotgun libraries from cosmids, fosmids, yeast artificial chromosomes (YACs), and bacterial artificial chromosomes (BACs).  相似文献   

14.
The complete plastid genome sequence of the American cranberry (Vaccinium macrocarpon Ait.) was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of “HyRed” via homology comparisons with complete sequences from several species available at the National Center for Biotechnology Information database. Eleven cranberry plastid contigs were selected for the construction of the plastid genome-based homologies and on raw reads flowing through contigs and connection information. We assembled and annotated a cranberry plastid genome (82,284 reads; 185x coverage) with a length of 176 kb and the typical structure found in plants, but with several structural rearrangements in the large single-copy region when compared to other plastid asterid genomes. To evaluate the reliability of the sequence data, phylogenetic analysis of 30 species outside the order Ericales (with 54 genes) showed Vaccinium inside the clade Asteridae, as reported in other studies using single genes. The cranberry plastid genome sequence will allow the accumulation of critical data useful for breeding and a suite of other genetic studies.  相似文献   

15.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

16.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

17.
Expected-value models have long provided a rudimentary theoretical foundation for random DNA sequencing. Here, we are interested in improving characterization of genome coverage in terms of its underlying probability distributions. We find that the mathematical notion of occupancy serves as a good model for evolution of the coverage distribution function and reveals new insights related to sequence redundancy. Established concepts, such as “full shotgun depth,” have been assumed invariant, but actually depend on project size and decrease over time. For most microbial projects, the full shotgun milestone should be revised downward by about 30%. Accordingly, many already-completed genomes appear to have been over-sequenced. Results also suggest that read lengths for emerging high-throughput sequencing methods must be increased substantially before they can be considered as possible successors to the standard Sanger method. In particular, gains in throughput and sequence depth cannot be made to compensate for diminished read length. Limits are well approximated by a simple logarithmic equation, which should be useful in estimating maximum coverage-based redundancy for future projects.  相似文献   

18.
MOTIVATION: Contigs-Assembly and Annotation Tool-Box (CAAT-Box) is a software package developed for the computational part of a genome project where the sequence is obtained by a shotgun strategy. CAAT-Box contains new tools to predict links between contigs by using similarity searches with other whole genome sequences. Most importantly, it allows annotation of a genome to commence during the finishing phase using a gene-oriented strategy. For this purpose, CAAT-Box creates an Individual Protein file (IPF) for each ORF of an assembly. The nucleotide sequence reported in an IPF corresponds to the sequence of the ORF with 500 additional bases before the ORF and 200 bases after. For annotation, additional information like Blast results can be added or linked to the IPFs as well as automatic and/or manual annotations. When a new assembly is performed, CAAT-Box creates new IPFs according to the old IPF panel. CAAT-Box recognizes the modified IPFs which are the only ones used for a new automatic analysis after each assembly. Using this strategy, the user works with a group of IPFs independently of the closure phase progression. The IPFs are accessible by a web server and can therefore be modified and commented by different groups. RESULT: CAAT-Box was used to obtain and to annotate several complete genomes like Listeria monocytogenes or Streptococcus agalactiae. AVAILABILITY: The program may be obtained from the authors and is freely available to non-profit organisations.  相似文献   

19.
Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.  相似文献   

20.
We have designed and implemented a system to manage whole genome shotgun sequences and whole genome sequence assembly data flow. The Sequence Assembly Manager (SAM) consists primarily of a MySQL relational database and Perl applications designed to easily manipulate and coordinate the analysis of sequence information and to view and report genome assembly progress through its Common Gateway Interface (CGI) web interface. The application includes a tool to compare sequence assemblies to fingerprint maps that has been used successfully to improve and validate both maps and sequence assemblies of the Rhodococcus sp.RHAI and Cryptococcus neoformans WM276 genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号