首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/.  相似文献   

2.
Plant Molecular Biology Reporter - Pumpkin (Cucurbita spp.) is one of the major vegetable crops grown worldwide. The number of simple sequence repeat (SSR) markers in pumpkins lags far behind the...  相似文献   

3.
4.
利用公共数据库中果蝇F1代和栽培水稻基于高通量Illumina测序平台的RNA-Seq短序列数据,比较了8个(ABySS,Velvet,SOAPdenovo,Oases,Trinity,Multiple-k,T-IDBA and Trans-ABySS)转录组从头组装软件.结果显示,在基于单一k-mer和多重k-mer方法的两类软件中,Trinity和Trans-ABySS分别表现出最好的组装性能,而其它软件性能比较接近.我们还发现基于多重k-mer比单一k-mer可以组装获得更多的总碱基数目,但是即使利用最好的多重k-mer组装软件,所获得的数据质量也比研究人员所期望的要低.鉴于此,我们提出了“ETM”优化方法,将多重k-mer方法组合到Trinity中,使其在具有最好的组装性能的基础上兼具了多重k-mer的优势,测试结果显示了该方法具有一定的优越性.我们的研究结果为用户选择合适的软件提供了依据,对推动基于高通量Illumina测序的转录组研究具有重要意义.  相似文献   

5.
利用公共数据库中果蝇F1代和栽培水稻基于高通量Illumina测序平台的RNA Seq短序列数据,比较了8个 (ABySS, Velvet, SOAPdenovo, Oases, Trinity, Multiple k, T IDBA and Trans ABySS) 转录组从头组装软件。结果显示,在基于单一k mer和多重k mer方法的两类软件中,Trinity和Trans ABySS分别表现出最好的组装性能,而其它软件性能比较接近。我们还发现基于多重k mer比单一k mer可以组装获得更多的总碱基数目,但是即使利用最好的多重k mer组装软件,所获得的数据质量也比研究人员所期望的要低。鉴于此,我们提出了“ETM”优化方法,将多重k mer方法组合到Trinity中,使其在具有最好的组装性能的基础上兼具了多重k mer的优势,测试结果显示了该方法具有一定的优越性。我们的研究结果为用户选择合适的软件提供了依据,对推动基于高通量Illumina测序的转录组研究具有重要意义。  相似文献   

6.
7.
8.
为了促进对四倍体拟南芥(A.suecica)的研究,阐明多倍体植物在染色体加倍过程中遗传物质的变化,从而在分子层面上解释多倍体植物的环境适应和进化机制,描述了一套基于第二代测序技术的转录组短序列组装和生物信息学分析方法.通过对23 000 000条来至于Illumina测序平台的序列数据进行SOAPdenovo组装,以...  相似文献   

9.
A common request of proteomics core facilities is protein identification. However, in some instances primary sequence information for the protein in question is not present in public databases. In other cases, the amino acid sequence of a protein may differ in some way from the sequence predicted from the gene sequence in a database as a result of gene mutation, gene splicing, and/or multiple posttranslational modifications. Thus, it may be necessary to determine the sequence of one or more peptides de novo in order to identify and/or adequately characterize the protein of interest. The primary goal of this study was to give participating laboratories an opportunity to evaluate their proficiency in sequencing unknown peptides that are not included in any published database. Samples containing 3–6 pmol each of five synthetic peptides with amino acid sequences that were not present in public databases were sent to 106 laboratories. One nonstandard amino acid was present in one of the peptides. From a comparison of the results obtained by different strategies, participating laboratories will be able to gauge their own capabilities and establish realistic expectations for the approaches that can be used for this determination.  相似文献   

10.
11.
12.
Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.  相似文献   

13.
State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.  相似文献   

14.
  1. Download : Download high-res image (101KB)
  2. Download : Download full-size image
Highlights
  • •An automated de novo sequencing program is evaluated with respect to different types of data.
  • •The number of unique high scoring de novo sequences that can be assigned to a data set provides a metric of overall data quality.
  • •A database suitability metric is presented for situations when the database choice is not obvious, or when the database quality is uncertain.
  相似文献   

15.
16.
A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.  相似文献   

17.
A novel transgene silencing phenomenon was found in the ornamental plant, gentian (Gentiana triflora × G. scabra), in which the introduced Cauliflower mosaic virus (CaMV) 35S promoter region was strictly methylated, irrespective of the transgene copy number and integrated loci. Transgenic tobacco having the same vector did not show the silencing behavior. Not only unmodified, but also modified 35S promoters containing a 35S enhancer sequence were found to be highly methylated in the single copy transgenic gentian lines. The 35S core promoter (−90)-introduced transgenic lines showed a small degree of methylation, implying that the 35S enhancer sequence was involved in the methylation machinery. The rigorous silencing phenomenon enabled us to analyze methylation in a number of the transgenic lines in parallel, which led to the discovery of a consensus target region for de novo methylation, which comprised an asymmetric cytosine (CpHpH; H is A, C or T) sequence. Consequently, distinct footprints of de novo methylation were detected in each (modified) 35S promoter sequence, and the enhancer region (−148 to −85) was identified as a crucial target for de novo methylation. Electrophoretic mobility shift assay (EMSA) showed that complexes formed in gentian nuclear extract with the −149 to −124 and −107 to −83 region probes were distinct from those of tobacco nuclear extracts, suggesting that the complexes might contribute to de novo methylation. Our results provide insights into the phenomenon of sequence- and species- specific gene silencing in higher plants.  相似文献   

18.
19.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.  相似文献   

20.
目的:针对下一代测序数据,尤其是单端测序数据,研究快速、准确查找Indel的方法。方法:先与全基因组参考序列进行快速比对,筛选出包含Indel的序列;再对这些序列进行双向的二次比对,确定Indel长度;最后借助长度信息在锁定范围内查找Indel的确切位置和相关信息。结果:本文成功构建FIND(Fast INDel detection system)系统,用于从单端测序数据中查找Indel信息。以模拟测序数据作为测试数据,在12X测试数据情况下,FIND的灵敏度和特异性分别为87.71%和99.66%,而且该性能还随着测序倍数的增加而提升。结论:充分利用比对过程获取的信息,在确定Indle长度的同时也确定出其大致位置,最终在局部范围内实现对单端测序数据中Indle的快速而准确的查找。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号