首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

A typical step in the analysis of gene expression data is the determination of clusters of genes that exhibit similar expression patterns. Researchers are confronted with the seemingly arbitrary choice between numerous algorithms to perform cluster analysis.  相似文献   

2.

Background  

Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.  相似文献   

3.

Background  

Genes responsible for biosynthesis of fungal secondary metabolites are usually tightly clustered in the genome and co-regulated with metabolite production. Epipolythiodioxopiperazines (ETPs) are a class of secondary metabolite toxins produced by disparate ascomycete fungi and implicated in several animal and plant diseases. Gene clusters responsible for their production have previously been defined in only two fungi. Fungal genome sequence data have been surveyed for the presence of putative ETP clusters and cluster data have been generated from several fungal taxa where genome sequences are not available. Phylogenetic analysis of cluster genes has been used to investigate the assembly and heredity of these gene clusters.  相似文献   

4.

Background  

Gene order in eukaryotic genomes is not random, with genes with similar expression profiles tending to cluster. In yeasts, the model taxon for gene order analysis, such syntenic clusters of non-homologous genes tend to be conserved over evolutionary time. Whether similar clusters show gene order conservation in other lineages is, however, undecided. Here, we examine this issue in Drosophila melanogaster using high-resolution chromosome rearrangement data.  相似文献   

5.

Background  

Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.  相似文献   

6.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

7.

Background  

Interpretation of comprehensive DNA microarray data sets is a challenging task for biologists and process engineers where scientific assistance of statistics and bioinformatics is essential. Interdisciplinary cooperation and concerted development of software-tools for simplified and accelerated data analysis and interpretation is the key to overcome the bottleneck in data-analysis workflows. This approach is exemplified by gcExplorer an interactive visualization toolbox based on cluster analysis. Clustering is an important tool in gene expression data analysis to find groups of co-expressed genes which can finally suggest functional pathways and interactions between genes. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results.  相似文献   

8.
9.

Background  

Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap.  相似文献   

10.
11.

Background  

Ever since the theory about two rounds of genome duplication (2R) in the vertebrate lineage was proposed, the Hox gene clusters have served as the prime example of quadruplicate paralogy in mammalian genomes. In teleost fishes, the observation of additional Hox clusters absent in other vertebrate lineages suggested a third tetraploidization (3R). Because the Hox clusters occupy a quite limited part of each chromosome, and are special in having position-dependent regulation within the multi-gene cluster, studies of syntenic gene families are needed to determine the extent of the duplicated chromosome segments. We have analyzed in detail 14 gene families that are syntenic with the Hox clusters to see if their phylogenies are compatible with the Hox duplications and the 2R/3R scenario. Our starting point was the gene family for the NPY family of peptides located near the Hox clusters in the pufferfish Takifugu rubripes, the zebrafish Danio rerio, and human.  相似文献   

12.

Background  

Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation.  相似文献   

13.

Background  

The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters.  相似文献   

14.

Background  

Genes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated.  相似文献   

15.

Background

Hox and ParaHox gene clusters are thought to have resulted from the duplication of a ProtoHox gene cluster early in metazoan evolution. However, the origin and evolution of the other genes belonging to the extended Hox group of homeobox-containing genes, that is, Mox and Evx, remains obscure. We constructed phylogenetic trees with mouse, amphioxus and Drosophila extended Hox and other related Antennapedia-type homeobox gene sequences and analyzed the linkage data available for such genes.

Results

We claim that neither Mox nor Evx is a Hox or ParaHox gene. We propose a scenario that reconciles phylogeny with linkage data, in which an Evx/Mox ancestor gene linked to a ProtoHox cluster was involved in a segmental tandem duplication event that generated an array of all Hox-like genes, referred to as the 'coupled' cluster. A chromosomal breakage within this cluster explains the current composition of the extended Hox cluster (with Evx, Hox and Mox genes) and the ParaHox cluster.

Conclusions

Most studies dealing with the origin and evolution of Hox and ParaHox clusters have not included the Hox-related genes Mox and Evx. Our phylogenetic analyses and the available linkage data in mammalian genomes support an evolutionary scenario in which an ancestor of Evx and Mox was linked to the ProtoHox cluster, and that a tandem duplication of a large genomic region early in metazoan evolution generated the Hox and ParaHox clusters, plus the cluster-neighbors Evx and Mox. The large 'coupled' Hox-like cluster EvxHox/MoxParaHox was subsequently broken, thus grouping the Mox and Evx genes to the Hox clusters, and isolating the ParaHox cluster.
  相似文献   

16.

Background  

Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters.  相似文献   

17.

Background

The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis).

Methodology/Principal Findings

We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes) and mammalian (54–59 genes) clusters. The anole protocadherin genes are organized into four subclusters: the δ, α, β and γ. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the δ subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the γb subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion.

Conclusions/Significance

Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin δ subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles. We present a model for the evolutionary history of the protocadherin cluster in tetrapods.  相似文献   

18.
19.

Background  

Functional gene modules and protein complexes are being sought from combinations of gene expression and protein-protein interaction data with various clustering-type methods. Central features missing from most of these methods are handling of uncertainty in both protein interaction and gene expression measurements, and in particular capability of modeling overlapping clusters. It would make sense to assume that proteins may play different roles in different functional modules, and the roles are evidenced in their interactions.  相似文献   

20.

Background  

Vertebrate alpha (α)- and beta (β)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the α- and β-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil β-globin gene (ω) in the marsupial α-cluster, however, suggested that duplication of the α-β cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous α- and β-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号