期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting the impact of sequencing errors on SAGE data

Colinge J Feger G 《Bioinformatics (Oxford, England)》2001,17(9):840-842

SAGE data are obtained by sequencing short DNA tags. Due to the mistakes in DNA sequencing, SAGE data contain errors. We propose a new approach to identify tags whose abundance is biased by sequencing errors. This approach is based on a concept of neighbourhood: abundant tags can contaminate tags whose sequence is very close. The application of our approach reveals that moderately abundant tags can be generated by sequencing errors uniquely. It also allows for detecting correct rare tags. AVAILABILITY: Software is available only to non-profit entities and for non-commercial purposes upon request. 相似文献

2.

Identitag, a relational database for SAGE tag identification and interspecies comparison of SAGE libraries

Céline Keime Francesca Damiola Dominique Mouchiroud Laurent Duret Olivier Gandrillon 《BMC bioinformatics》2004,5(1):143

相似文献

3.

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

Russell L Zaretzki Michael A Gilchrist William M Briggs Artin Armagan 《BMC bioinformatics》2010,11(1):72

相似文献

4.

Identification and prevention of a GC content bias in SAGE libraries 总被引：6，自引：1，他引：5

下载免费PDF全文

Elliott H. Margulies Sharon L. R. Kardia Jeffrey W. Innis 《Nucleic acids research》2001,29(12):e60

Serial Analysis of Gene Expression (SAGE) is becoming a widely used gene expression profiling method for the study of development, cancer and other human diseases. Investigators using SAGE rely heavily on the quantitative aspect of this method for cataloging gene expression and comparing multiple SAGE libraries. We have developed additional computational and statistical tools to assess the quality and reproducibility of a SAGE library. Using these methods, a critical variable in the SAGE protocol was identified that has the potential to bias the Tag distribution relative to the GC content of the 10 bp SAGE Tag DNA sequence. We also detected this bias in a number of publicly available SAGE libraries. It is important to note that the GC content bias went undetected by quality control procedures in the current SAGE protocol and was only identified with the use of these statistical analyses on as few as 750 SAGE Tags. In addition to keeping any solution of free DiTags on ice, an analysis of the GC content should be performed before sequencing large numbers of SAGE Tags to be confident that SAGE libraries are free from experimental bias. 相似文献

5.

Statistical errors.

《BMJ (Clinical research ed.)》1977,1(6053):66

相似文献

6.

A comparative analysis of the information content in long and short SAGE libraries

Yi-Ju Li Puting Xu Xuejun Qin Donald E Schmechel Christine M Hulette Jonathan L Haines Margaret A Pericak-Vance John R Gilbert 《BMC bioinformatics》2006,7(1):504

Background

Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag. 相似文献

7.

Optimal enzymes for amplifying sequencing libraries

Quail MA Otto TD Gu Y Harris SR Skelly TF McQuillan JA Swerdlow HP Oyola SO 《Nature methods》2012,9(1):10-11

相似文献

8.

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Jun?Lu John?K?Tomfohr Thomas?B?Kepler Email author 《BMC bioinformatics》2005,6(1):165

Background

In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, t _wtest, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made. 相似文献

9.

aRNA-longSAGE: a new approach to generate SAGE libraries from microdissected cells 总被引：3，自引：0，他引：3

下载免费PDF全文

Heidenblut AM Lüttges J Buchholz M Heinitz C Emmersen J Nielsen KL Schreiter P Souquet M Nowacki S Herbrand U Klöppel G Schmiegel W Gress T Hahn SA 《Nucleic acids research》2004,32(16):e131

Large-scale gene expression analyses of microdissected primary tissue are still difficult because generally only a limited amount of mRNA can be obtained from microdissected cells. The introduction of the T7-based RNA amplification technique was an important step to reduce the amount of RNA needed for such analyses. This amplification technique produces amplified antisense RNA (aRNA), which so far has precluded its direct use for serial analysis of gene expression (SAGE) library production. We describe a method, termed ‘aRNA-longSAGE’, which is the first to allow the direct use of aRNA for standard longSAGE library production. The aRNA-longSAGE protocol was validated by comparing two aRNA-longSAGE libraries with two Micro-longSAGE libraries that were generated from the same RNA preparations of two different cell lines. Using a conservative validation approach, we were able to verify 68% of the differentially expressed genes identified by aRNA-longSAGE. Furthermore, the identification rate of differentially expressed genes was roughly twice as high in our aRNA-longSAGE libraries as in the standard Micro-longSAGE libraries. Using our validated aRNA-longSAGE protocol, we were able to successfully generate longSAGE libraries from as little as 40 ng of total RNA isolated from 2000–3000 microdissected pancreatic ductal epithelial cells or cells from pancreatic intraepithelial neoplasias. 相似文献

10.

Analysis of context-dependent errors for illumina sequencing

Abnizova I Leonard S Skelly T Brown A Jackson D Gourtovaia M Qi G Te Boekhorst R Faruque N Lewis K Cox T 《Journal of bioinformatics and computational biology》2012,10(2):1241005

The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch. 相似文献

11.

Rapid quantification of DNA libraries for next-generation sequencing

Bernd Buehler Holly H. Hogrefe Graham Scott Harini Ravi Carlos Pabón-Peña Scott O’Brien Rachel Formosa Scott Happe 《Methods (San Diego, Calif.)》2010,50(4):S15-S18

The next-generation DNA sequencing workflows require an accurate quantification of the DNA molecules to be sequenced which assures optimal performance of the instrument. Here, we demonstrate the use of qPCR for quantification of DNA libraries used in next-generation sequencing. In addition, we find that qPCR quantification may allow improvements to current NGS workflows, including reducing the amount of library DNA required, increasing the accuracy in quantifying amplifiable DNA, and avoiding amplification bias by reducing or eliminating the need to amplify DNA before sequencing. 相似文献

12.

Blocking of targeted microRNAs from next-generation sequencing libraries

Brian S. Roberts Andrew A. Hardigan Marie K. Kirby Meredith B. Fitz-Gerald C.?Mel Wilcox Robert P. Kimberly Richard M. Myers 《Nucleic acids research》2015,43(21):e145

Highly abundant microRNAs (miRNAs) in small RNA sequencing libraries make it difficult to obtain efficient measurements of more lowly expressed species. We present a new method that allows for the selective blocking of specific, abundant miRNAs during preparation of sequencing libraries. This technique is specific with little off-target effects and has no impact on the reproducibility of the measurement of non-targeted species. In human plasma samples, we demonstrate that blocking of highly abundant hsa-miR-16–5p leads to improved detection of lowly expressed miRNAs and more precise measurement of differential expression overall. Furthermore, we establish the ability to target a second abundant miRNA and to multiplex the blocking of two miRNAs simultaneously. For small RNA sequencing, this technique could fill a similar role as do ribosomal or globin removal technologies in messenger RNA sequencing. 相似文献

13.

Testing for neutrality in samples with sequencing errors

下载免费PDF全文

Achaz G 《Genetics》2008,179(3):1409-1424

Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets. 相似文献

14.

Statistical analysis of systematic errors in high-throughput screening

Kevorkov D Makarenkov V 《Journal of biomolecular screening》2005,10(6):557-567

High-throughput screening (HTS) is an efficient technology for drug discovery. It allows for screening of more than 100,000 compounds a day per screen and requires effective procedures for quality control. The authors have developed a method for evaluating a background surface of an HTS assay; it can be used to correct raw HTS data. This correction is necessary to take into account systematic errors that may affect the procedure of hit selection. The described method allows one to analyze experimental HTS data and determine trends and local fluctuations of the corresponding background surfaces. For an assay with a large number of plates, the deviations of the background surface from a plane are caused by systematic errors. Their influence can be minimized by the subtraction of the systematic background from the raw data. Two experimental HTS assays from the ChemBank database are examined in this article. The systematic error present in these data was estimated and removed from them. It enabled the authors to correct the hit selection procedure for both assays. 相似文献

15.

High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites 总被引：1，自引：0，他引：1

Roulet E Busso S Camargo AA Simpson AJ Mermod N Bucher P 《Nature biotechnology》2002,20(8):831-835

相似文献

16.

Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions

D Doerr I Gronau S Moran I Yavneh 《Algorithms for molecular biology : AMB》2012,7(1):22

ABSTRACT: BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods. 相似文献

17.

A heuristic managing errors for DNA sequencing

Błazewicz J Formanowicz P Guinand F Kasprzak M 《Bioinformatics (Oxford, England)》2002,18(5):652-660

MOTIVATION: A new heuristic algorithm for solving DNA sequencing by hybridization problem with positive and negative errors. RESULTS: A heuristic algorithm providing better solutions than algorithms known from the literature based on tabu search method. 相似文献

18.

Extension of Lander-Waterman theory for sequencing filtered DNA libraries

Michael?C?Wendl Email author W?Brad?Barbazuk 《BMC bioinformatics》2005,6(1):245

Background

The degree to which conventional DNA sequencing techniques will be successful for highly repetitive genomes is unclear. Investigators are therefore considering various filtering methods to select against high-copy sequence in DNA clone libraries. The standard model for random sequencing, Lander-Waterman theory, does not account for two important issues in such libraries, discontinuities and position-based sampling biases (the so-called "edge effect"). We report an extension of the theory for analyzing such configurations. 相似文献

19.

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries

Aird D Ross MG Chen WS Danielsson M Fennell T Russ C Jaffe DB Nusbaum C Gnirke A 《Genome biology》2011,12(2):R18

Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate. 相似文献

20.

Construction of small RNA cDNA libraries for deep sequencing 总被引：6，自引：0，他引：6

Lu C Meyers BC Green PJ 《Methods (San Diego, Calif.)》2007,43(2):110-117

Small RNAs (21-24 nucleotides) including microRNAs (miRNAs) and small interfering RNAs (siRNAs) are potent regulators of gene expression in both plants and animals. Several hundred genes encoding miRNAs and thousands of siRNAs have been experimentally identified by cloning approaches. New sequencing technologies facilitate the identification of these molecules and provide global quantitative expression data in a given biological sample. Here, we describe the methods used in our laboratory to construct small RNA cDNA libraries for high-throughput sequencing using technologies such as MPSS, 454 or SBS. 相似文献