首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net.  相似文献   

3.
Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.  相似文献   

4.
For three-dimensional (3D) structure determination of large macromolecular complexes, single-particle electron cryomicroscopy is considered the method of choice. Within this field, structure determination de novo, as opposed to refinement of known structures, still presents a major challenge, especially for macromolecules without point-group symmetry. This is primarily because of technical issues: one of these is poor image contrast, and another is the often low particle concentration and sample heterogeneity imposed by the practical limits of biochemical purification. In this work, we tested a state-of-the art 4 k x 4 k charge-coupled device (CCD) detector (TVIPS TemCam-F415) to see whether or not it can contribute to improving the image features that are especially important for structure determination de novo. The present study is therefore focused on a comparison of film and CCD detector in the acquisition of images in the low-to-medium ( approximately 10-25 A) resolution range using a 200 kV electron microscope equipped with field emission gun. For comparison, biological specimens and radiation-insensitive carbon layers were imaged under various conditions to test the image phase transmission, spatial signal-to-noise ratio, visual image quality and power-spectral signal decay for the complete image-processing chain. At all settings of the camera, the phase transmission and spectral signal-to-noise ratio were significantly better on CCD than on film in the low-to-medium resolution range. Thus, the number of particle images needed for initial structure determination is reduced and the overall quality of the initial computed 3D models is improved. However, at high resolution, film is still significantly better than the CCD camera: without binning of the CCD camera and at a magnification of 70 kx, film is better beyond 21 A resolution. With 4-fold binning of the CCD camera and at very high magnification (> 300 kx) film is still superior beyond 7 A resolution.  相似文献   

5.
6.
We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ~280 bp or ~3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.  相似文献   

7.
The Drosophila egg contains all the components required to properly execute the early mitotic divisions but is unable to assemble a functional centrosome without a sperm-provided basal body. We show that 65% of unfertilized eggs obtained from a laboratory strain of Drosophila mercatorum can spontaneously assemble a number of cytoplasmic asters after activation, most of them duplicating in a cell cycle-dependent manner. Such asters are formed by a polarized array of microtubules that have their Asp-associated minus-ends converging at a main focus, where centrioles and typical centrosomal antigens are found. Aster assembly is spatially restricted to the anterior region of the oocyte. When fertilized, the parthenogenetic egg forms the poles of the gonomeric spindle by using the sperm-provided basal body, despite the presence within the same cytoplasm of maternal centrosomes. Thirty-five percent of parthenogenetic eggs and all unfertilized and fertilized eggs from the sibling bisexually reproducing D. mercatorum strain do not contain cytoplasmic asters. Thus, the Drosophila eggs have the potential for de novo formation of functional centrosomes independent of preexisting centrioles, but some control mechanisms preventing their spontaneous assembly must exist. We speculate that the release of the block preventing centrosome self-assembly could be a landmark for ensuring parthenogenetic reproduction.  相似文献   

8.
9.
10.
Narzisi G  Mishra B 《PloS one》2011,6(4):e19175
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.  相似文献   

11.
Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10–9. We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.  相似文献   

12.
Jiang  Shuangying  Tang  Yuanwei  Xiang  Liang  Zhu  Xinlu  Cai  Zelin  Li  Ling  Chen  Yingxi  Chen  Peishuang  Feng  Yuge  Lin  Xin  Li  Guoqiang  Sharif  Jafar  Dai  Junbiao 《中国科学:生命科学英文版》2022,65(7):1445-1455
Science China Life Sciences - Synthetic genomics has provided new bottom-up platforms for the functional study of viral and microbial genomes. The construction of the large, gigabase (Gb)-sized...  相似文献   

13.
We describe a new assembly algorithm, where a genome assembly with low sequence coverage, either throughout the genome or locally, due to cloning bias, is considerably improved through an assisting process via a related genome. We show that the information provided by aligning the whole-genome shotgun reads of the target against a reference genome can be used to substantially improve the quality of the resulting assembly.  相似文献   

14.
Current challenges in de novo plant genome sequencing and assembly   总被引:1,自引:0,他引:1  
Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community.  相似文献   

15.
16.
17.
It has been reported that nontransformed mammalian cells become arrested during G1 in the absence of centrioles (Hinchcliffe, E., F. Miller, M. Cham, A. Khodjakov, and G. Sluder. 2001. Science. 291:1547-1550). Here, we show that removal of resident centrioles (by laser ablation or needle microsurgery) does not impede cell cycle progression in HeLa cells. HeLa cells born without centrosomes, later, assemble a variable number of centrioles de novo. Centriole assembly begins with the formation of small centrin aggregates that appear during the S phase. These, initially amorphous "precentrioles" become morphologically recognizable centrioles before mitosis. De novo-assembled centrioles mature (i.e., gain abilities to organize microtubules and replicate) in the next cell cycle. This maturation is not simply a time-dependent phenomenon, because de novo-formed centrioles do not mature if they are assembled in S phase-arrested cells. By selectively ablating only one centriole at a time, we find that the presence of a single centriole inhibits the assembly of additional centrioles, indicating that centrioles have an activity that suppresses the de novo pathway.  相似文献   

18.
An increase in studies using restriction site‐associated DNA sequencing (RADseq) methods has led to a need for both the development and assessment of novel bioinformatic tools that aid in the generation and analysis of these data. Here, we report the availability of AftrRAD, a bioinformatic pipeline that efficiently assembles and genotypes RADseq data, and outputs these data in various formats for downstream analyses. We use simulated and experimental data sets to evaluate AftrRAD's ability to perform accurate de novo assembly of loci, and we compare its performance with two other commonly used programs, stacks and pyrad. We demonstrate that AftrRAD is able to accurately assemble loci, while accounting for indel variation among alleles, in a more computationally efficient manner than currently available programs. AftrRAD run times are not strongly affected by the number of samples in the data set, making this program a useful tool when multicore systems are not available for parallel processing, or when data sets include large numbers of samples.  相似文献   

19.
20.
Annotated genomes can provide new perspectives on the biology of species. We present the first de novo whole genome sequencing for the pink-footed goose. In order to obtain a high-quality de novo assembly the strategy used was to combine one short insert paired-end library with two mate-pair libraries. The pink-footed goose genome was assembled de novo using three different assemblers and an assembly evaluation was subsequently performed in order to choose the best assembler. For our data, ALLPATHS-LG performed the best, since the assembly produced covers most of the genome, while introducing the fewest errors. A total of 26,134 genes were annotated, with bird species accounting for virtually all BLAST hits. We also estimated the substitution rate in the pink-footed goose, which can be of use in future demographic studies, by using a comparative approach with the genome of the chicken, the mallard and the swan goose. A substitution rate of 1.38 × 10? 7 per nucleotide per generation was obtained when comparing the genomes of the two closely-related goose species (the pink-footed and the swan goose). Altogether, we provide a valuable tool for future genomic studies aiming at particular genes and regions of the pink-footed goose genome as well as other bird species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号