首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Recently several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. However, it may not be clear how these methods compare with each other. Our main goal here is to compare three methods, the t-test, a regression modeling approach (Thomas et al., Genome Res., 11, 1227-1236, 2001) and a mixture model approach (Pan et al., http://www.biostat.umn.edu/cgi-bin/rrs?print+2001,2001a,b) with particular attention to their different modeling assumptions. RESULTS: It is pointed out that all the three methods are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected. In particular, we give an explicit formula for the test statistic used in the regression approach. Using the leukemia data of Golub et al. (Science, 285, 531-537, 1999), we illustrate these points. We also briefly compare the results with those of several other methods, including the empirical Bayesian method of Efron et al. (J. Am. Stat. Assoc., to appear, 2001) and the Significance Analysis of Microarray (SAM) method of Tusher et al. (PROC: Natl Acad. Sci. USA, 98, 5116-5121, 2001).  相似文献   

2.

Background  

DNA microarrays are used to investigate differences in gene expression between two or more classes of samples. Most currently used approaches compare mean expression levels between classes and are not geared to find genes whose expression is significantly different in only a subset of samples in a class. However, biological variability can lead to situations where key genes are differentially expressed in only a subset of samples. To facilitate the identification of such genes, a new method is reported.  相似文献   

3.
We introduce a non-parametric approach using bootstrap-assisted correspondence analysis to identify and validate genes that are differentially expressed in factorial microarray experiments. Model comparison showed that although both parametric and non-parametric methods capture the different profiles in the data, our method is less inclined to false positive results due to dimension reduction in data analysis.  相似文献   

4.
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definition of PaGE stays intact and we just estimate the needed prior probabilities. Our estimation method is similar to Storey's estimator without being its direct extension. Then, we modify the problem formulation to find significantly differentially expressed genes and present an efficient method for finding them. This formulation increases the power by directly incorporating Storey's estimator. We report the preliminary results on the BRCA data set to demonstrate the applicability and effectiveness of our approach.  相似文献   

5.
MOTIVATION: A primary objective of microarray studies is to determine genes which are differentially expressed under various conditions. Parametric tests, such as two-sample t-tests, may be used to identify differentially expressed genes, but they require some assumptions that are not realistic for many practical problems. Non-parametric tests, such as empirical Bayes methods and mixture normal approaches, have been proposed, but the inferences are complicated and the tests may not have as much power as parametric models. RESULTS: We propose a weakly parametric method to model the distributions of summary statistics that are used to detect differentially expressed genes. Standard maximum likelihood methods can be employed to make inferences. For illustration purposes the proposed method is applied to the leukemia data (training part) discussed elsewhere. A simulation study is conducted to evaluate the performance of the proposed method.  相似文献   

6.
An important problem addressed using cDNA microarray data is the detection of genes differentially expressed in two tissues of interest. Currently used approaches ignore the multidimensional structure of the data. However it is well known that correlation among covariates can enhance the ability to detect less pronounced differences. We use the Mahalanobis distance between vectors of gene expressions as a criterion for simultaneously comparing a set of genes and develop an algorithm for maximizing it. To overcome the problem of instability of covariance matrices we propose a new method of combining data from small-scale random search experiments. We show that by utilizing the correlation structure the multivariate method, in addition to the genes found by the one-dimensional criteria, finds genes whose differential expression is not detectable marginally.  相似文献   

7.
BACKGROUND: Using differential display (DD), we discovered a new member of the serine protease family of protein-cleaving enzymes, named protease M. The gene is most closely related by sequence to the kallikreins, to prostate-specific antigen (PSA), and to trypsin. The diagnostic use of PSA in prostate cancer suggested that a related molecule might be a predictor for breast or ovarian cancer. This, in turn, led to studies designed to characterize the protein and to screen for its expression in cancer. MATERIALS AND METHODS: The isolation of protease M by DD, the cloning and sequencing of the cDNA, and the comparison of the predicted protein structure with related proteins are described, as are methods to produce recombinant proteins and polyclonal antibody preparations. Protease M expression was examined in mammary, prostate, and ovarian cancer, as well as normal, cells and tissues. Stable transfectants expressing the protease M gene were produced in mammary carcinoma cells. RESULTS: Protease M was localized by fluorescent in situ hybridization analysis to chromosome 19q13.3, in a region to which other kallikreins and PSA also map. The gene is expressed in the primary mammary carcinoma lines tested but not in the corresponding cell lines of metastatic origin. It is strongly expressed in ovarian cancer tissues and cell lines. The enzyme activity could not be established, because of difficulties in producing sufficient recombinant protein, a common problem with proteases. Transfectants were selected that overexpress the mRNA, but the protein levels remained very low. CONCLUSIONS: Protease M expression (mRNA) may be a useful marker in the detection of primary mammary carcinomas, as well as primary ovarian cancers. Other medical applications are also likely, based on sequence relatedness to trypsin and PSA.  相似文献   

8.
An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.  相似文献   

9.
10.

Introduction

We previously identified common differentially expressed (DE) genes in bladder cancer (BC). In the present study we analyzed in depth, the expression of several groups of these DE genes.

Materials and Methods

Samples from 30 human BCs and their adjacent normal tissues were analyzed by whole genome cDNA microarrays, qRT-PCR and Western blotting. Our attention was focused on cell-cycle control and DNA damage repair genes, genes related to apoptosis, signal transduction, angiogenesis, as well as cellular proliferation, invasion and metastasis. Four publicly available GEO Datasets were further analyzed, and the expression data of the genes of interest (GOIs) were compared to those of the present study. The relationship among the GOI was also investigated. GO and KEGG molecular pathway analysis was performed to identify possible enrichment of genes with specific biological themes.

Results

Unsupervised cluster analysis of DNA microarray data revealed a clear distinction in BC vs. control samples and low vs. high grade tumors. Genes with at least 2-fold differential expression in BC vs. controls, as well as in non-muscle invasive vs. muscle invasive tumors and in low vs. high grade tumors, were identified and ranked. Specific attention was paid to the changes in osteopontin (OPN, SPP1) expression, due to its multiple biological functions. Similarly, genes exhibiting equal or low expression in BC vs. the controls were scored. Significant pair-wise correlations in gene expression were scored. GO analysis revealed the multi-facet character of the GOIs, since they participate in a variety of mechanisms, including cell proliferation, cell death, metabolism, cell shape, and cytoskeletal re-organization. KEGG analysis revealed that the most significant pathway was that of Bladder Cancer (p = 1.5×10−31).

Conclusions

The present work adds to the current knowledge on molecular signature identification of BC. Such works should progress in order to gain more insight into disease molecular mechanisms.  相似文献   

11.

Background  

The biomedical community is rapidly developing new methods of data analysis for microarray experiments, with the goal of establishing new standards to objectively process the massive datasets produced from functional genomic experiments. Each microarray experiment measures thousands of genes simultaneously producing an unprecedented amount of biological information across increasingly numerous experiments; however, in general, only a very small percentage of the genes present on any given array are identified as differentially regulated. The challenge then is to process this information objectively and efficiently in order to obtain knowledge of the biological system under study and by which to compare information gained across multiple experiments. In this context, systematic and objective mathematical approaches, which are simple to apply across a large number of experimental designs, become fundamental to correctly handle the mass of data and to understand the true complexity of the biological systems under study.  相似文献   

12.
Genome analysis of actinomycetes has revealed the presence of numerous cryptic gene clusters encoding putative natural products. These loci remain dormant until appropriate chemical or physical signals induce their expression. Here we demonstrate the use of a high-throughput genome scanning method to detect and analyze gene clusters involved in natural-product biosynthesis. This method was applied to uncover biosynthetic pathways encoding enediyne antitumor antibiotics in a variety of actinomycetes. Comparative analysis of five biosynthetic loci representative of the major structural classes of enediynes reveals the presence of a conserved cassette of five genes that includes a novel family of polyketide synthase (PKS). The enediyne PKS (PKSE) is proposed to be involved in the formation of the highly reactive chromophore ring structure (or "warhead") found in all enediynes. Genome scanning analysis indicates that the enediyne warhead cassette is widely dispersed among actinomycetes. We show that selective growth conditions can induce the expression of these loci, suggesting that the range of enediyne natural products may be much greater than previously thought. This technology can be used to increase the scope and diversity of natural-product discovery.  相似文献   

13.
MOTIVATION: Currently most of the methods for identifying differentially expressed genes fall into the category of so called single-gene-analysis, performing hypothesis testing on a gene-by-gene basis. In a single-gene-analysis approach, estimating the variability of each gene is required to determine whether a gene is differentially expressed or not. Poor accuracy of variability estimation makes it difficult to identify genes with small fold-changes unless a very large number of replicate experiments are performed. RESULTS: We propose a method that can avoid the difficult task of estimating variability for each gene, while reliably identifying a group of differentially expressed genes with low false discovery rates, even when the fold-changes are very small. In this article, a new characterization of differentially expressed genes is established based on a theorem about the distribution of ranks of genes sorted by (log) ratios within each array. This characterization of differentially expressed genes based on rank is an example of all-gene-analysis instead of single gene analysis. We apply the method to a cDNA microarray dataset and many low fold-changed genes (as low as 1.3 fold-changes) are reliably identified without carrying out hypothesis testing on a gene-by-gene basis. The false discovery rate is estimated in two different ways reflecting the variability from all the genes without the complications related to multiple hypothesis testing. We also provide some comparisons between our approach and single-gene-analysis based methods. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
Overlaying differential changes in gene expression on protein interaction networks has proven to be a useful approach to interpreting the cell's dynamic response to a changing environment. Despite successes in finding active subnetworks in the context of a single species, the idea of overlaying lists of differentially expressed genes on networks has not yet been extended to support the analysis of multiple species' interaction networks. To address this problem, we designed a scalable, cross-species network search algorithm, neXus (Network-cross(X)-species-Search), that discovers conserved, active subnetworks based on parallel differential expression studies in multiple species. Our approach leverages functional linkage networks, which provide more comprehensive coverage of functional relationships than physical interaction networks by combining heterogeneous types of genomic data. We applied our cross-species approach to identify conserved modules that are differentially active in stem cells relative to differentiated cells based on parallel gene expression studies and functional linkage networks from mouse and human. We find hundreds of conserved active subnetworks enriched for stem cell-associated functions such as cell cycle, DNA repair, and chromatin modification processes. Using a variation of this approach, we also find a number of species-specific networks, which likely reflect mechanisms of stem cell function that have diverged between mouse and human. We assess the statistical significance of the subnetworks by comparing them with subnetworks discovered on random permutations of the differential expression data. We also describe several case examples that illustrate the utility of comparative analysis of active subnetworks.  相似文献   

15.
Lung cancer continues to represent a major public health concern with high morbidity and mortality worldwide. Early detection of lung cancer is problematic due to a lack of diagnostic markers with high sensitivity and specificity. To determine the differently expressed proteins in the serum of lung cancer and identify the function of such proteins, two-dimensional electrophoresis (2DE) and liquid chromatography mass spectrometry (LC-MS) were used to screen the serum of lung cancer model induced by 4-(methylnitrosoamino)-1-(3-pyridyl)-1-butanone (NNK). A total of 25 protein spots were qualitatively different and 6 were quantitatively different in the serum from rats bearing induced lung cancer when compared with normal controls. Two of the proteins that showed major changes in concentration in sera were identified to be Immunoglobulin γ 2A chain C region (heavy chain) and Transferrin by LC-MS/MS.  相似文献   

16.
17.
18.
As a step towards understanding the complex differences between normal cells and cancer cells, we have used suppression subtractive hybridization (SSH) to generate a profile of genes overexpressed in primary colorectal cancer (CRC). From a 35? omitted?000 clone SSH-cDNA repertoire, we have screened 400 random clones by reverse Northern blotting, of which 45 clones were scored as overexpressed in tumor compared to matched normal mucosa. Sequencing showed 37 different genes and of these, 16 genes corresponded to known genes in the public databases. Twelve genes, including Smad5 and Fls353, have previously been shown to be overexpressed in CRC. A series of known genes which have not previously been reported to be overexpressed in cancer were also recovered: Hsc70, PBEF, ribophorin II and Ese-3B. The remaining 21 genes have as yet no functional annotation. These results show that SSH in conjunction with high throughput screening provides a very efficient means to produce a broad profile of genes differentially expressed in cancer. Some of the genes identified may provide novel points of therapeutic intervention.  相似文献   

19.
We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets.  相似文献   

20.

Background  

The biomedical community is developing new methods of data analysis to more efficiently process the massive data sets produced by microarray experiments. Systematic and global mathematical approaches that can be readily applied to a large number of experimental designs become fundamental to correctly handle the otherwise overwhelming data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号