首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
MOTIVATION: This article presents a method to identify the isotopic distributions within a mass spectrum using a probabilistic classifier supplemented with dynamic programming. Such a system is needed for a variety of purposes, including generating robust and meaningful features from mass spectra to be used in classification. RESULTS: The primary result of this article is that the dynamic programming approach significantly improves sensitivity, without harming specificity, of a probabilistic classifier for identifying the isotopic distributions. When annotating isotopic distributions where an expert has performed the initial 'peak-picking' (removal of noise peaks), the dynamic programming approach gives a true positive rate of 96% and a false positive rate of 0.0%, whereas the classifier alone has a true positive rate of only 47% when the false positive rate is 0.0%. When annotating isotopic distributions in machine peak-picked spectra, which may contain many noise peaks, the dynamic programming approach gives a true positive rate of only 22.0%, but it still keeps a low false positive rate of 1.0% and still outperforms the classifier alone. It is important to note that all these rates are when we require exact matches with the distributions in annotated spectra; in our evaluation a distribution is considered 'entirely incorrect' if it is missing even one peak or contains even one extraneous peak. We compared to the THRASH and AID-MS systems using a looser requirement: correctly identifying the distribution that contains the mono-isotopic mass. Under this measure, our dynamic programming approach achieves a true positive rate of 82% and a false positive rate of 1%, which again outperforms the classifier alone. The dynamic programming approach ends up being more conservative than THRASH and AID-MS, yielding both fewer true and false peaks, but the F-score of the dynamic programming approach is significantly better than those of THRASH and AID-MS. All results were obtained with 10-fold cross-validation of 99 sections of mass spectra with a total of 214 hand-annotated isotopic distributions. AVAILABILITY: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM.  相似文献   

5.
Zhang SD 《PloS one》2011,6(4):e18874
BACKGROUND: Biomedical researchers are now often faced with situations where it is necessary to test a large number of hypotheses simultaneously, eg, in comparative gene expression studies using high-throughput microarray technology. To properly control false positive errors the FDR (false discovery rate) approach has become widely used in multiple testing. The accurate estimation of FDR requires the proportion of true null hypotheses being accurately estimated. To date many methods for estimating this quantity have been proposed. Typically when a new method is introduced, some simulations are carried out to show the improved accuracy of the new method. However, the simulations are often very limited to covering only a few points in the parameter space. RESULTS: Here I have carried out extensive in silico experiments to compare some commonly used methods for estimating the proportion of true null hypotheses. The coverage of these simulations is unprecedented thorough over the parameter space compared to typical simulation studies in the literature. Thus this work enables us to draw conclusions globally as to the performance of these different methods. It was found that a very simple method gives the most accurate estimation in a dominantly large area of the parameter space. Given its simplicity and its overall superior accuracy I recommend its use as the first choice for estimating the proportion of true null hypotheses in multiple testing.  相似文献   

6.
7.
8.
9.
10.
11.
Global comparisons of gene expression profiles between species provide significant insight into gene regulation, evolutionary processes and disease mechanisms. In this work, we describe a flexible and intuitive approach for global expression profiling of closely related species, using high-density exon arrays designed for a single reference genome. The high-density probe coverage of exon arrays allows us to select identical sets of perfect-match probes to measure expression levels of orthologous genes. This eliminates a serious confounding factor in probe affinity effects of species-specific microarray probes, and enables direct comparisons of estimated expression indexes across species. Using a newly designed Affymetrix exon array, with eight probes per exon for approximately 315 000 exons in the human genome, we conducted expression profiling in corresponding tissues from humans, chimpanzees and rhesus macaques. Quantitative real-time PCR analysis of differentially expressed candidate genes is highly concordant with microarray data, yielding a validation rate of 21/22 for human versus chimpanzee differences, and 11/11 for human versus rhesus differences. This method has the potential to greatly facilitate biomedical and evolutionary studies of gene expression in nonhuman primates and can be easily extended to expression array design and comparative analysis of other animals and plants.  相似文献   

12.
13.
14.
15.
16.
17.
18.
We developed a broad-ranging method for identifying key hydrogen-producing and consuming microorganisms through analysis of hydrogenase gene content and expression in complex anaerobic microbial communities. The method is based on a tiling hydrogenase gene oligonucleotide DNA microarray (Hydrogenase Chip), which implements a high number of probes per gene by tiling probe sequences across genes of interest at 1.67 × –2 × coverage. This design favors the avoidance of false positive gene identification in samples of DNA or RNA extracted from complex microbial communities. We applied this technique to interrogate interspecies hydrogen transfer in complex communities in (i) lab-scale reductive dehalogenating microcosms enabling us to delineate key H2-consuming microorganisms, and (ii) hydrogen-generating microbial mats where we found evidence for significant H2 production by cyanobacteria. Independent quantitative PCR analysis on selected hydrogenase genes showed that this Hydrogenase Chip technique is semiquantitative. We also determined that as microbial community complexity increases, specificity must be traded for sensitivity in analyzing data from tiling DNA microarrays.  相似文献   

19.
Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号