共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Xiaojia Tang Saurabh Baheti Khader Shameer Kevin J. Thompson Quin Wills Nifang Niu Ilona N. Holcomb Stephane C. Boutet Ramesh Ramakrishnan Jennifer M. Kachergus Jean-Pierre A. Kocher Richard M. Weinshilboum Liewei Wang E.?Aubrey Thompson Krishna R. Kalari 《Nucleic acids research》2014,42(22):e172
Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/. 相似文献
4.
Background
DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.Results
In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.Conclusion
Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-264) contains supplementary material, which is available to authorized users. 相似文献5.
Takahashi M Matsuda F Margetic N Lathrop M 《Journal of bioinformatics and computational biology》2003,1(2):253-265
The single nucleotide polymorphism (SNP) is the difference of the DNA sequence between individuals and provides abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data sometimes need to be verified. If only a particular gene locus is concerned, locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. The rule is applied to multiply aligned sequences with a trace and the peak height of the traces are compared between samples. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is also designed for a person to verify and edit sequences easily on the screen. It is very useful for identifying de novo SNPs in a DNA fragment of interest. 相似文献
6.
7.
Biases during DNA extraction of activated sludge samples revealed by high throughput sequencing 总被引:3,自引:0,他引:3
Standardization of DNA extraction is a fundamental issue of fidelity and comparability in investigations of environmental microbial communities. Commercial kits for soil or feces are often adopted for studies of activated sludge because of a lack of specific kits, but they have never been evaluated regarding their effectiveness and potential biases based on high throughput sequencing. In this study, seven common DNA extraction kits were evaluated, based on not only yield/purity but also sequencing results, using two activated sludge samples (two sub-samples each, i.e. ethanol-fixed and fresh, as-is). The results indicate that the bead-beating step is necessary for DNA extraction from activated sludge. The two kits without the bead-beating step yielded very low amounts of DNA, and the least abundant operational taxonomic units (OTUs), and significantly underestimated the Gram-positive Actinobacteria, Nitrospirae, Chloroflexi, and Alphaproteobacteria and overestimated Gammaproteobacteria, Deltaproteobacteria, Bacteroidetes, and the rare phyla whose cell walls might have been readily broken. Among the other five kits, FastDNA@ SPIN Kit for Soil extracted the most and the purest DNA. Although the number of total OTUs obtained using this kit was not the highest, the abundant OTUs and abundance of Actinobacteria demonstrated its efficiency. The three MoBio kits and one ZR kit produced fair results, but had a relatively low DNA yield and/or less Actinobacteria-related sequences. Moreover, the 50 % ethanol fixation increased the DNA yield, but did not change the sequenced microbial community in a significant way. Based on the present study, the FastDNA SPIN kit for Soil is recommended for DNA extraction of activated sludge samples. More importantly, the selection of the DNA extraction kit must be done carefully if the samples contain dominant lysing-resistant groups, such as Actinobacteria and Nitrospirae. 相似文献
8.
Suspension arrays for high throughput, multiplexed single nucleotide polymorphism genotyping 总被引:8,自引:0,他引:8
BACKGROUND: Genetic diversity can help explain disease susceptibility and differential drug response. The most common type of variant is the single nucleotide polymorphism (SNP). We present a low-cost, high throughput assay for SNP genotyping. METHODS: The assay uses oligonucleotide probes covalently attached to fluorescently encoded microspheres. These probes are hybridized directly to fluorescently labeled polymerase chain reaction (PCR) products and the results are analyzed in a standard flow cytometer. RESULTS: The genotypes determined with our assay are in good agreement with those determined by TaqMan. The range of G/C content for oligonucleotide probes was 23.5-65% in the 17 bases surrounding the SNP. Further optimization of probe length and target concentration is shown to dramatically enhance the assay performance for certain SNPs. Using microspheres which have unique fluorescent signatures, we performed a 32-plex assay where we simultaneously determined the genotypes of eight different polymorphic genes. CONCLUSIONS: We demonstrate, for the first time, the feasibility of multiplexed genotyping with suspension arrays using direct hybridization analyses. Our approach enables probes to be removed from or added to an array, enhancing flexibility over conventional chips. The ability to multiplex both the PCR preparation and the hybridization should enhance the throughput, cost, and speed of the assay. 相似文献
9.
《Genomics》2020,112(1):346-355
We proposed a data cleaning pipeline for single cell (SC) RNA-seq data, where we first screen genes (gene-wise screening) followed by screening cell libraries (library-wise screening). Gene-wise screening is based on the expectation that for a gene with a low technical noise, a gene's count in a library will tend to increase with the increase of library size, which was tested using negative binomial regression of gene count (as dependent variable) against library size (as independent variable). Library-wise screening is based on the expectation that across-library correlations for housekeeping (HK) genes is expected to be higher than the correlations for non-housekeeping (NHK) genes in those libraries with low technical noise. We removed those libraries, whose mean pairwise correlation for HK genes is NOT significantly higher than that for NHK genes. We successfully applied the pipeline to two large SC RNA-seq datasets. The pipeline was also developed into an R package. 相似文献
10.
11.
12.
13.
14.
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。 相似文献
15.
Christopher D Katanski Christopher P Watkins Wen Zhang Matthew Reyer Samuel Miller Tao Pan 《Nucleic acids research》2022,50(17):e99
Queuosine (Q) is a conserved tRNA modification at the wobble anticodon position of tRNAs that read the codons of amino acids Tyr, His, Asn, and Asp. Q-modification in tRNA plays important roles in the regulation of translation efficiency and fidelity. Queuosine tRNA modification is synthesized de novo in bacteria, whereas in mammals the substrate for Q-modification in tRNA is queuine, the catabolic product of the Q-base of gut bacteria. This gut microbiome dependent tRNA modification may play pivotal roles in translational regulation in different cellular contexts, but extensive studies of Q-modification biology are hindered by the lack of high throughput sequencing methods for its detection and quantitation. Here, we describe a periodate-treatment method that enables single base resolution profiling of Q-modification in tRNAs by Nextgen sequencing from biological RNA samples. Periodate oxidizes the Q-base, which results in specific deletion signatures in the RNA-seq data. Unexpectedly, we found that periodate-treatment also enables the detection of several 2-thio-modifications including τm5s2U, mcm5s2U, cmnm5s2U, and s2C by sequencing in human and E. coli tRNA. We term this method periodate-dependent analysis of queuosine and sulfur modification sequencing (PAQS-seq). We assess Q- and 2-thio-modifications at the tRNA isodecoder level, and 2-thio modification changes in stress response. PAQS-seq should be widely applicable in the biological studies of Q- and 2-thio-modifications in mammalian and microbial tRNAs. 相似文献
16.
17.
目的 探索广西油茶地区人群肠道菌群的特征。 方法 采用1∶1病例对照研究方法,在广西油茶地区和非油茶地区按性别、年龄匹配收集20对健康男性人群粪便和血样,同时收集个体一般信息和食物摄入信息;测定血生化指标,采用16S rDNA的 V4-V5区序列进行高通量测序分析肠道菌群的差异。 结果 油茶组人群肠道菌群丰度(Ace指数、Chao1指数)较非油茶组显著增加(t=2.202、3.210,P=0.034、0.003);厚壁菌门、柔壁菌门在油茶组中丰度显著高于非油茶组,拟杆菌门、梭杆菌门在非油茶组中显著高于油茶组;油茶组Dialister、Faecalibacterium、毛螺旋菌属、普雷沃菌属、棒状杆菌、微球菌、双歧杆菌的丰度显著高于非油茶组;油茶组人群体质量、BMI、血清总胆固醇、低密度脂蛋白、超敏C 反应蛋白水平显著降低(t或z=2.682、3.843、2.238、2.702、1.581,P=0.007、结论 广西油茶地区人群肠道菌群多样性具有显著特征,为通过肠道菌群研究油茶的健康效应提供了新的理论依据。 相似文献
18.
Glass bead purification of plasmid template DNA for high throughput sequencing of mammalian genomes 总被引:1,自引:0,他引:1 下载免费PDF全文
Dederich DA Okwuonu G Garner T Denn A Sutton A Escotto M Martindale A Delgado O Muzny DM Gibbs RA Metzker ML 《Nucleic acids research》2002,30(7):e32
To meet the new challenge of generating the draft sequences of mammalian genomes, we describe the development of a novel high throughput 96-well method for the purification of plasmid DNA template using size-fractionated, acid-washed glass beads. Unlike most previously described approaches, the current method has been designed and optimized to facilitate the direct binding of alcohol-precipitated plasmid DNA to glass beads from alkaline lysed bacterial cells containing the insoluble cellular aggregate material. Eliminating the tedious step of separating the cleared lysate significantly simplifies the method and improves throughput and reliability. During a 4 month period of 96-capillary DNA sequencing of the Rattus norvegicus genome at the Baylor College of Medicine Human Genome Sequencing Center, the average success rate and read length derived from >1 800 000 plasmid DNA templates prepared by the direct lysis/glass bead method were 82.2% and 516 bases, respectively. The cost of this direct lysis/glass bead method in September 2001 was ~10 cents per clone, which is a significant cost saving in high throughput genomic sequencing efforts. 相似文献
19.