首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cancer is a complex genetic disease, resulting from defects of multiple genes. Development of microarray techniques makes it possible to survey the whole genome and detect genes that have influential impacts on the progression of cancer. Statistical analysis of cancer microarray data is challenging because of the high dimensionality and cluster nature of gene expressions. Here, clusters are composed of genes with coordinated pathological functions and/or correlated expressions. In this article, we consider cancer studies where censored survival endpoint is measured along with microarray gene expressions. We propose a hybrid clustering approach, which uses both pathological pathway information retrieved from KEGG and statistical correlations of gene expressions, to construct gene clusters. Cancer survival time is modeled as a linear function of gene expressions. We adopt the clustering threshold gradient directed regularization (CTGDR) method for simultaneous gene cluster selection, within-cluster gene selection, and predictive model building. Analysis of two lymphoma studies shows that the proposed approach - which is composed of the hybrid gene clustering, linear regression model for survival, and clustering regularized estimation with CTGDR - can effectively identify gene clusters and genes within selected clusters that have satisfactory predictive power for censored cancer survival outcomes.  相似文献   

2.
3.
4.
5.
6.
7.
8.
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.  相似文献   

9.
10.
Kim HY  Kim MJ  Han JI  Kim BK  Lee YS  Lee YS  Kim JH 《Bio Systems》2009,95(1):17-25
A time-series microarray experiment is useful to study the changes in the expression of a large number of genes over time. Many methods for clustering genes using gene expression profiles have been suggested, but it is not easy to interpret the biological significance of the results or utilize these methods for understanding the dynamics of gene regulatory systems. In this study, we introduce an algorithm for readjusting the boundaries of clusters by adopting the advantages of both k-means and singular value decomposition (SVD). In addition, we suggest a methodology for searching the principal genes that can be the most crucial genes in regulation of clusters. We found 34 principal genes from 171 clusters having strong concentratedness in their expression patterns and distinct ranges of oscillatory phases, by using a time-series microarray dataset of mouse embryonic stem (ES) cells after induction of dopaminergic neural differentiation. The biological significance of the principal genes examined in the literature supports the feasibility of our algorithms in that the hierarchy of clusters may lead the manifestation of the phenotypes, e.g., the development of the nervous system.  相似文献   

11.
12.
SUMMARY: With the availability of whole genome sequence in many species, linkage analysis, positional cloning and microarray are gradually becoming powerful tools for investigating the links between phenotype and genotype or genes. However, in these methods, causative genes underlying a quantitative trait locus, or a disease, are usually located within a large genomic region or a large set of genes. Examining the function of every gene is very time consuming and needs to retrieve and integrate the information from multiple databases or genome resources. PGMapper is a software tool for automatically matching phenotype to genes from a defined genome region or a group of given genes by combining the mapping information from the Ensembl database and gene function information from the OMIM and PubMed databases. PGMapper is currently available for candidate gene search of human, mouse, rat, zebrafish and 12 other species. AVAILABILITY: Available online at http://www.genediscovery.org/pgmapper/index.jsp.  相似文献   

13.
Choi D  Fang Y  Mathers WD 《Genomics》2006,87(4):500-508
Deciphering genetic regulatory codes remains a challenge. Here, we present an effective approach to identifying in vivo condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. A resampling-based algorithm was adopted to cluster our microarray data of a stress response, which generated 35 tight clusters with unique expression patterns containing 811 genes of 5652 genes significantly altered. Database searches identified many known motifs within the 3-kb regulatory regions of 40 genes from 3 clusters and modules with six to nine motifs that were commonly shared by 60-100% of these genes. The upstream regulatory region contained the highest frequency of these common motifs. CisModule program predictions were comparable with the results from database searches and found four potentially novel motifs. This result indicates that these motifs and modules could be responsible for gene coregulation of the stress response in the lacrimal gland.  相似文献   

14.
MOTIVATION: Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomies. Cluster analysis techniques such as hierarchical clustering and self-organizing maps have frequently been used for investigating structure in microarray data. However, clustering algorithms always detect clusters, even on random data, and it is easy to misinterpret the results without some objective measure of the reproducibility of the clusters. RESULTS: We present statistical methods for testing for overall clustering of gene expression profiles, and we define easily interpretable measures of cluster-specific reproducibility that facilitate understanding of the clustering structure. We apply these methods to elucidate structure in cDNA microarray gene expression profiles obtained on melanoma tumors and on prostate specimens.  相似文献   

15.
Testing association of a pathway with survival using gene expression data   总被引:2,自引:0,他引:2  
MOTIVATION: A recent surge of interest in survival as the primary clinical endpoint of microarray studies has called for an extension of the Global Test methodology to survival. RESULTS: We present a score test for association of the expression profile of one or more groups of genes with a (possibly censored) survival time. Groups of genes may be pathways, areas of the genome, clusters from a cluster analysis or all genes on a chip. The test allows one to test hypotheses about the influence of these groups of genes on survival directly, without the intermediary of single gene testing. The test is based on the Cox proportional hazards model and is calculated using martingale residuals. It is possible to adjust the test for the presence of covariates. We also present a diagnostic graph to assist in the interpretation of the test result, visualizing the influence of genes. The test is applied to a tumor dataset, revealing pathways from the gene ontology database that are associated with survival of patients. AVAILABILITY: The Global Test for survival has been incorporated into the R-package globaltest (version 3.0), available at http://www.bioconductor.org  相似文献   

16.
17.
18.
There are approximately 25 000 species in the division Teleostei and most are believed to have arisen during a relatively short period of time ca. 200 Myr ago. The discovery of 'extra' Hox gene clusters in zebrafish (Danio rerio), medaka (Oryzias latipes), and pufferfish (Fugu rubripes), has led to the hypothesis that genome duplication provided the genetic raw material necessary for the teleost radiation. We identified 27 groups of orthologous genes which included one gene from man, mouse and chicken, one or two genes from tetraploid Xenopus and two genes from zebrafish. A genome duplication in the ancestor of teleost fishes is the most parsimonious explanation for the observations that for 15 of these genes, the two zebrafish orthologues are sister sequences in phylogenies that otherwise match the expected organismal tree, the zebrafish gene pairs appear to have been formed at approximately the same time, and are unlinked. Phylogenies of nine genes differ a little from the tree predicted by the fish-specific genome duplication hypothesis: one tree shows a sister sequence relationship for the zebrafish genes but differs slightly from the expected organismal tree and in eight trees, one zebrafish gene is the sister sequence to a clade which includes the second zebrafish gene and orthologues from Xenopus, chicken, mouse and man. For these nine gene trees, deviations from the predictions of the fish-specific genome duplication hypothesis are poorly supported. The two zebrafish orthologues for each of the three remaining genes are tightly linked and are, therefore, unlikely to have been formed during a genome duplication event. We estimated that the unlinked duplicated zebrafish genes are between 300 and 450 Myr. Thus, genome duplication could have provided the genetic raw material for teleost radiation. Alternatively, the loss of different duplicates in different populations (i.e. 'divergent resolution') may have promoted speciation in ancient teleost populations.  相似文献   

19.
Physical clusters of co-regulated, but apparently functionally unrelated, genes are present in many genomes. Despite the important implication that the genomic environment contributes appreciably to the regulation of gene expression, no simple statistical method has been described to identify physical clusters of co-regulated genes. Here we report the development of a model that allows the direct calculation of the significance of such clusters. We have implemented the derived statistical relation in a software program, Pyxis, and have analyzed a selection of Saccharomyces cerevisiae gene expression microarray data sets. We have identified many gene clusters where constituent genes exhibited a regulatory dependence on proteins previously implicated in chromatin structure. Specifically, we found that Tup1p-dependent gene domains were enriched close to telomeres, which suggested a new role for Tup1p in telomere silencing. In addition, we identified Sir2p-, Sir3p- and Sir4p-dependent clusters, which suggested the presence of Sir-mediated heterochromatin in previously unidentified regions of the yeast genome. We also showed the presence of Sir4p-dependent gene clusters bordering the HMRa heterothallic locus, which suggested leaky termination of the heterochromatin by the boundary elements. These results demonstrate the utility of Pyxis in identifying possible higher order genomic features that may contribute to gene regulation in extended domains.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号