首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
Supervised clustering of genes   总被引:1,自引:0,他引:1  
Dettling M  Bühlmann P 《Genome biology》2002,3(12):research0069.1-research006915

Background  

We focus on microarray data where experiments monitor gene expression in different tissues and where each experiment is equipped with an additional response variable such as a cancer type. Although the number of measured genes is in the thousands, it is assumed that only a few marker components of gene subsets determine the type of a tissue. Here we present a new method for finding such groups of genes by directly incorporating the response variables into the grouping process, yielding a supervised clustering algorithm for genes.  相似文献   

5.

Background  

Inference of protein interaction networks from various sources of data has become an important topic of both systems and computational biology. Here we present a supervised approach to identification of gene expression regulatory networks.  相似文献   

6.
Recent advances in DNA sequencing technology have allowed the collection of high-dimensional data from human-associated microbial communities on an unprecedented scale. A major goal of these studies is the identification of important groups of microorganisms that vary according to physiological or disease states in the host, but the incidence of rare taxa and the large numbers of taxa observed make that goal difficult to obtain using traditional approaches. Fortunately, similar problems have been addressed by the machine learning community in other fields of study such as microarray analysis and text classification. In this review, we demonstrate that several existing supervised classifiers can be applied effectively to microbiota classification, both for selecting subsets of taxa that are highly discriminative of the type of community, and for building models that can accurately classify unlabeled data. To encourage the development of new approaches to supervised classification of microbiota, we discuss several structures inherent in microbial community data that may be available for exploitation in novel approaches, and we include as supplemental information several benchmark classification tasks for use by the community.  相似文献   

7.
8.
Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM). In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.  相似文献   

9.
10.
11.
基因芯片数据的监督聚类分析   总被引:1,自引:0,他引:1  
随着后基因组时代的到来,基因芯片技术越来越多地被应用到功能基因组的研究当中。如何快速有效地分析基因芯片实验所获得的大量生物学数据,成为当前一项具有重要意义的研究工作。监督聚类(supervised clustering analysis)是聚类分析的一种,它根据样本的先验信息或假设来决定样本的分类,并据此建立判别模型,继而利用该判别模型对未知对象进行分类。该方法已经成功应用到生物医学研究中的许多领域,成为分析基因芯片数据的重要手段。  相似文献   

12.
13.
Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.  相似文献   

14.
Proteins that have homology with dehydrins have been identified immunologically in the desiccationsensitive (recalcitrant) seeds of English oak (Quercus robur L.), European chestnut (Castanea sativa L.), horse chestnut (Aesculus hippocastanum L.), sycamore (Acer psuedoplatanus L.) and silver maple (Acer saccharinum L.), and in the desiccation-tolerant seeds of Norway maple (Acer platanoides L.). The mRNA for a late embryogenesis abundant (LEA) protein (dehydrin) was also detected by Northern blotting, using a cDNA clone (D11) from cotton embryos, in the recalcitrant and orthodox seeds. Medium-stringency washing was required to detect this hybridization. InQ. robur the amount of dehydrin protein increased during seed development, andLEA mRNA was induced by limited desiccation and by abscisic acid. Confirmation of the presence of dehydrin mRNA in matureQ. robur andC. sativa seeds was obtained by in-vitro translation of the extracted polyadenylated RNA followed by analysis of the immunoprecipitation products. Thus the presence of dehydrin proteins is not sufficient to confer desiccation tolerance on truly recalcitrant seeds, nor can their presence or absence be used as clear criteria for identification of recalcitrant seeds.  相似文献   

15.
Supervised reconstruction of biological networks with local models   总被引:1,自引:0,他引:1  
MOTIVATION: Inference and reconstruction of biological networks from heterogeneous data is currently an active research subject with several important applications in systems biology. The problem has been attacked from many different points of view with varying degrees of success. In particular, predicting new edges with a reasonable false discovery rate is highly demanded for practical applications, but remains extremely challenging due to the sparsity of the networks of interest. RESULTS: While most previous approaches based on the partial knowledge of the network to be inferred build global models to predict new edges over the network, we introduce here a novel method which predicts whether there is an edge from a newly added vertex to each of the vertices of a known network using local models. This involves learning individually a certain subnetwork associated with each vertex of the known network, then using the discovered classification rule associated with only that vertex to predict the edge to the new vertex. Excellent experimental results are shown in the case of metabolic and protein-protein interaction network reconstruction from a variety of genomic data. AVAILABILITY: An implementation of the proposed algorithm is available upon request from the authors.  相似文献   

16.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

17.
Relationships between gene trees and species trees   总被引:39,自引:10,他引:39  
It is well known that a phylogenetic tree (gene tree) constructed from DNA sequences for a genetic locus does not necessarily agree with the tree that represents the actual evolutionary pathway of the species involved (species tree). One of the important factors that cause this difference is genetic polymorphism in the ancestral species. Under the assumption of neutral mutations, this problem can be studied by evaluating the probability (P) that a gene tree has the same topology as that of the species tree. When one gene (allele) is used from each of the species involved, the probability can be expressed as a simple function of Ti = ti/(2N), where ti is the evolutionary time measured in generations for the ith internodal branch of the species tree and N is the effective population size. When any of the Ti's is less than 1, the probability P becomes considerably less than 1.0. This probability cannot be substantially increased by increasing the number of alleles sampled from a locus. To increase the probability, one has to use DNA sequences from many different loci that have evolved independently of each other.   相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号