共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Identifying differentially expressed genes using false discovery rate controlling procedures 总被引:18,自引:0,他引:18
MOTIVATION: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene co-regulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resampling-based FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the na?ve application of the linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. RESULTS: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the family-wise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the na?ve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resampling-based procedures that resample the joint distribution of the test statistics and estimate the level of FDR control. AVAILABILITY: An R program that adjusts p-values using FDR controlling procedures is freely available over the Internet at www.math.tau.ac.il/~ybenja. 相似文献
3.
4.
5.
Skibbe DS Wang X Zhao X Borsuk LA Nettleton D Schnable PS 《Bioinformatics (Oxford, England)》2006,22(15):1863-1870
MOTIVATION: Scanning parameters are often overlooked when optimizing microarray experiments. A scanning approach that extends the dynamic data range by acquiring multiple scans of different intensities has been developed. RESULTS: Data from each of three scan intensities (low, medium, high) were analyzed separately using multiple scan and linear regression approaches to identify and compare the sets of genes that exhibit statistically significant differential expression. In the multiple scan approach only one-third of the differentially expressed genes were shared among the three intensities, and each scan intensity identified unique sets of differentially expressed genes. The set of differentially expressed genes from any one scan amounted to < 70% of the total number of genes identified in at least one scan. The average signal intensity of genes that exhibited statistically significant changes in expression was highest for the low-intensity scan and lowest for the high-intensity scan, suggesting that low-intensity scans may be best for detecting expression differences in high-signal genes, while high-intensity scans may be best for detecting expression differences in low-signal genes. Comparison of the differentially expressed genes identified in the multiple scan and linear regression approaches revealed that the multiple scan approach effectively identifies a subset of statistically significant genes that linear regression approach is unable to identify. Quantitative RT-PCR (qRT-PCR) tests demonstrated that statistically significant differences identified at all three scan intensities can be verified. AVAILABILITY: The data presented can be viewed at http://www.ncbi.nlm.nih.gov/geo/ under GEO accession no. GSE3017. 相似文献
6.
High-throughput genotyping of swine populations is a potentially efficient method for establishing animal lineage and identification of loci important to animal health and efficient pork production. Markers were developed based upon single nucleotide polymorphisms (SNPs), which are abundant and amenable to automated genotyping platforms. The focus of this research was SNP discovery in expressed porcine genes providing markers to develop the porcine/human comparative map. Locus specific amplification (LSA) and comparative sequencing were used to generate PCR products and allelic information from parents of a swine reference family. Discovery of 1650 SNPs in 403 amplicons and strategies for optimizing LSA-based SNP discovery using alternative methods of PCR primer design, data analysis, and germplasm selection that are applicable to other populations and species are described. These data were the first large-scale assessment of frequency and distribution of porcine SNPs. 相似文献
7.
PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure. 相似文献
8.
9.
The ordinary-, penalized-, and bootstrap t-test, least squares and best linear unbiased prediction were compared for their false discovery rates (FDR), i.e. the fraction of falsely discovered genes, which was empirically estimated in a duplicate of the data set. The bootstrap-t-test yielded up to 80% lower FDRs than the alternative statistics, and its FDR was always as good as or better than any of the alternatives. Generally, the predicted FDR from the bootstrapped P-values agreed well with their empirical estimates, except when the number of mRNA samples is smaller than 16. In a cancer data set, the bootstrap-t-test discovered 200 differentially regulated genes at a FDR of 2.6%, and in a knock-out gene expression experiment 10 genes were discovered at a FDR of 3.2%. It is argued that, in the case of microarray data, control of the FDR takes sufficient account of the multiple testing, whilst being less stringent than Bonferoni-type multiple testing corrections. Extensions of the bootstrap simulations to more complicated test-statistics are discussed. 相似文献
10.
To isolate useful and interesting plant genes in large quantities, random sequencing of cDNA clones from potato leaf library
treated with ethylene was performed. Partial sequences of randomly selected 210 clones with the insert of longer than 500
base pair (bp) as well as poly (A) tail have been compared with sequences in GeneBank, EMBL and DDBJ nucleic acid databases
and fostered 193 expressed sequence tags (ESTs). The 210 cDNA clones identified are related to various aspect of metabolic
pathways such as glycolysis, amino acid synthesis, translation mechanism, ribosome synthesis, hormone response, stress response,
regulation of gene expression, and signal transduction. Among the 193 ESTs, 12 ESTs (29 cDNA clones) appeared more than once
and 181 ESTs appeared once regarded as a solitary group. Out of 210 clones, 29 clones (13.8%) have no similarity to the known
nucleotide sequences and could serve as a potentially useful resource for plant molecular biology referring to particular
genes. Nucleotide sequencing to generate more ESTs from ethylene-induced as well as non-induced potato leaf is in progress
as well. 相似文献
11.
Eun Sik Tak 《Bioscience, biotechnology, and biochemistry》2013,77(3):367-373
The coelomic cells of the earthworm consist of leukocytes, chlorogocytes, and coelomocytes, which play an important role in innate immunity reactions. To gain insight into the expression profiles of coelomic cells of the earthworm, Eisenia andrei, we analyzed 1151 expressed sequence tags (ESTs) derived from the cDNA library of the coelomic cells. Among the 1151 ESTs analyzed, 493 ESTs (42.8%) showed a significant similarity to known genes and represented 164 unique genes, of which 93 ESTs were singletons and 71 ESTs manifested as two or more ESTs. From the 164 unique genes sequenced, we found 24 immune-related and cell defense genes. Furthermore, real-time PCR analysis showed that levels of lysenin-related proteins mRNA in coelomic cells of E. andrei were upregulated after the injection of Bacillus subtilis bacteria. This EST data-set would provide a valuable resource for future researches of earthworm immune system. 相似文献
12.
Robust estimation of the false discovery rate 总被引:2,自引:0,他引:2
MOTIVATION: Presently available methods that use p-values to estimate or control the false discovery rate (FDR) implicitly assume that p-values are continuously distributed and based on two-sided tests. Therefore, it is difficult to reliably estimate the FDR when p-values are discrete or based on one-sided tests. RESULTS: A simple and robust method to estimate the FDR is proposed. The proposed method does not rely on implicit assumptions that tests are two-sided or yield continuously distributed p-values. The proposed method is proven to be conservative and have desirable large-sample properties. In addition, the proposed method was among the best performers across a series of 'real data simulations' comparing the performance of five currently available methods. AVAILABILITY: Libraries of S-plus and R routines to implement the method are freely available from www.stjuderesearch.org/depts/biostats. 相似文献
13.
Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance. 相似文献
14.
15.
Background
Thousands of genes in a genomewide data set are tested against some null hypothesis, for detecting differentially expressed genes in microarray experiments. The expected proportion of false positive genes in a set of genes, called the False Discovery Rate (FDR), has been proposed to measure the statistical significance of this set. Various procedures exist for controlling the FDR. However the threshold (generally 5%) is arbitrary and a specific measure associated with each gene would be worthwhile. 相似文献16.
Previous studies have been conducted in gene expression profiling to identify groups of genes that characterize the colorectal carcinoma disease. Despite the success of previous attempts to identify groups of genes in the progression of the colorectal carcinoma disease, their methods either require subjective interpretation of the number of clusters, or lack stability during different runs of the algorithms. All of which limits the usefulness of these methods. In this study, we propose an enhanced algorithm that provides stability and robustness in identifying differentially expressed genes in an expression profile analysis. Our proposed algorithm uses multiple clustering algorithms under the consensus clustering framework. The results of the experiment show that the robustness of our method provides a consistent structure of clusters, similar to the structure found in the previous study. Furthermore, our algorithm outperforms any single clustering algorithms in terms of the cluster quality score. 相似文献
17.
The discovery of new intron-containing human tRNA genes using the polymerase chain reaction 总被引:3,自引:0,他引:3
Introns in transfer RNA genes are rare in vertebrates. Until now, the only intron-containing human tRNA genes were believed to be those coding for tRNA(Tyr). All of these introns are inserted 3' to the anticodon position in these genes. We have designed polymerase chain reaction primers that can amplify all of the tRNA(Tyr) genes for cloning and sequencing by using the conserved portions of the gene coding for the structural part of the tRNA. Our preliminary results have revealed five tRNA(Tyr) genes, each of which contains a different intron. We used the same technique to amplify, clone, and sequence the human genes for tRNA(Leu)CAA. This has resulted in the discovery that this human tRNA gene family also has introns inserted 3' to the anticodon. This polymerase chain reaction technique is useful in detecting new families of intron-containing tRNA genes as well as identifying sequence variations in the introns of individual genes. 相似文献
18.
《Palaeogeography, Palaeoclimatology, Palaeoecology》2007,243(3-4):373-377
A simple method for the spectral analysis of multispecies microfossil data through time or stratigraphic level is presented. The method is based on the Mantel correlogram, allowing any ecological similarity measure to be used. The method can therefore be applied to binary (presence-absence) data as well as raw or normalized species counts. In contrast with spectral analysis of univariate ordination scores, this approach does not explicitly discard information. The method, referred to as the Mantel periodogram, is exemplified with a data set from the literature, demonstrating several astronomically forced periodicities in microfaunal data from the Plio-Pleistocene. 相似文献
19.
Shaw-Smith CJ Coffey AJ Huckle E Durham J Campbell EA Freeman TC Walters JR Bentley DR 《BioTechniques》2000,28(5):958-964
In cDNA indexing, differentially expressed genes are identified by the display of specific, corresponding subsets of cDNA. Subdivision of the cDNA population is achieved by the sequence-specific ligation of adapters to the overhangs created by class IIS restriction enzymes. However, inadequate specificity of ligation leads to redundancy between different adapter subsets. We evaluate the incidence of mismatches between adapters and class IIS restriction fragments during ligation and describe a modified set of conditions that improves ligation specificity. The improved protocol reduces redundancy between amplified cDNA subsets, which leads to a lower number of bands per lane of the differential display gel, and therefore simplifies analysis. We confirm the validity of this revised protocol by identifying five differentially expressed genes in mouse duodenum and ileum. 相似文献
20.
A semi-automated, non-rigid breast surface registration method is presented that involves solving the Laplace or diffusion equations over undeformed and deformed breast surfaces. The resulting potential energy fields and isocontours are used to establish surface correspondence. This novel surface-based method, which does not require intensity images, anatomical landmarks, or fiducials, is compared to a gold standard of thin-plate spline (TPS) interpolation. Realistic finite element simulations of breast compression and further testing against a tissue-mimicking phantom demonstrate that this method is capable of registering surfaces experiencing 6 - 36 mm compression to within a mean error of 0.5 - 5.7 mm. 相似文献