共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Badri Padhukasahasram Chandan K. Reddy Albert M. Levin Esteban G. Burchard L. Keoki Williams 《PloS one》2015,10(11)
Multi-marker approaches have received a lot of attention recently in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene-, gene-set- and pathway-based association tests are increasingly being viewed as useful supplements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not look at the joint effects of multiple genetic variants which individually may have weak or moderate signals. Here, we describe novel tests for multi-marker association analyses that are based on phenotype predictions obtained from machine learning algorithms. Instead of assuming a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for prediction. We show that phenotype predictions obtained from ensemble learning algorithms provide a new framework for multi-marker association analysis. They can be used for constructing tests for the joint association of multiple variants, adjusting for covariates and testing for the presence of interactions. To demonstrate the power and utility of this new approach, we first apply our method to simulated SNP datasets. We show that the proposed method has the correct Type-1 error rates and can be considerably more powerful than alternative approaches in some situations. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests. 相似文献
3.
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. 相似文献
4.
5.
6.
The inference of gene regulatory network (GRN) from gene expression data is an unsolved problem of great importance. This inference has been stated, though not proven, to be underdetermined implying that there could be many equivalent (indistinguishable) solutions. Motivated by this fundamental limitation, we have developed new framework and algorithm, called TRaCE, for the ensemble inference of GRNs. The ensemble corresponds to the inherent uncertainty associated with discriminating direct and indirect gene regulations from steady-state data of gene knock-out (KO) experiments. We applied TRaCE to analyze the inferability of random GRNs and the GRNs of E. coli and yeast from single- and double-gene KO experiments. The results showed that, with the exception of networks with very few edges, GRNs are typically not inferable even when the data are ideal (unbiased and noise-free). Finally, we compared the performance of TRaCE with top performing methods of DREAM4 in silico network inference challenge. 相似文献
7.
About 80% of Rhizobium meliloti strains contain 1 to 11 copies of insertion sequence ISRm1 in their genomes (R. Wheatcroft and R. J. Watson, J. Gen. Microbiol. 134:113-121, 1988). Hybridization to separated genomic DNA fragments with an ISRm1-specific probe produces patterns of hybridization bands which are distinctive for each strain. These patterns can be compared between strains to prove or disprove common identity. In most cases relatedness can be inferred despite phenotypic differences or minor genomic alterations. 相似文献
8.
帕金森病和阿尔茨海默氏病是世界范围内最普遍的神经退行性疾病.常规药物和手术治疗只能缓解症状,不能推迟或者终止疾病进程.近年来分子生物学与医学研究进展促进了对帕金森病和阿尔茨海默氏病发病机制的深入了解,为其基因治疗策略提供了理论和实验依据.综述了目前帕金森病、阿尔茨海默氏病的基因治疗研究进展.基因治疗作为帕金森病和阿尔茨海默氏病的一种全新治疗手段,无疑对于了解帕金森病和阿尔茨海默氏病的病因及其全面治疗具有重要意义. 相似文献
9.
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. 相似文献
10.
目前,基于计算机数学方法对基因的功能注释已成为热点及挑战,其中以机器学习方法应用最为广泛。生物信息学家不断提出有效、快速、准确的机器学习方法用于基因功能的注释,极大促进了生物医学的发展。本文就关于机器学习方法在基因功能注释的应用与进展作一综述。主要介绍几种常用的方法,包括支持向量机、k近邻算法、决策树、随机森林、神经网络、马尔科夫随机场、logistic回归、聚类算法和贝叶斯分类器,并对目前机器学习方法应用于基因功能注释时如何选择数据源、如何改进算法以及如何提高预测性能上进行讨论。 相似文献
11.
Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets. 相似文献
12.
One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful. 相似文献
13.
14.
15.
Suyu Mei 《PloS one》2013,8(11)
Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research. 相似文献
16.
17.
《Endocrine practice》2016,22(5):567-574
Objective: Cushing disease (CD) causes a wide variety of nonspecific symptoms, which may result in delayed diagnosis. It may be possible to uncover unusual combinations of otherwise common symptoms using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes. Our aim was to identify and evaluate dyads of clinical symptoms or conditions associated with CD.Methods: We conducted a matched case-control study using a commercial healthcare insurance claims database designed to compare the relative risk (RR) of individual conditions and dyad combinations of conditions among patients with CD versus matched non-CD controls.Results: With expert endocrinologist input, we isolated 10 key conditions (localized adiposity, hirsutism, facial plethora, polycystic ovary syndrome, abnormal weight gain, hypokalemia, deep venous thrombosis, muscle weakness, female balding, osteoporosis) with RRs varying from 5.3 for osteoporosis to 61.0 for hirsutism (and infinite RR for localized adiposity). The RRs of dyads of these conditions ranged from 4.1 for psychiatric disorders/serious infections to 128.0 for hirsutism/fatigue in patients with versus without CD. Construction of uncommon dyads resulted in further increases in RRs beyond single condition analyses; for example, osteoporosis alone had an RR of 5.3, which increased to 8.3 with serious infections and to 52.0 with obesity.Conclusion: This study demonstrated that RR of any one of 10 key conditions selected by expert opinion was ≥5 times greater in CD compared to non-CD, and nearly all dyads had RR≥5. An uncommon dyad of osteoporosis and obesity had an RR of 52.0. If clinicians consider the diagnosis of CD when the highest-risk conditions are seen, identification of this rare disease may improve.Abbreviations:CD = Cushing diseaseCPT = Current Procedural TerminologyCS = Cushing syndromeEMR = electronic medical recordICD-9-CM = International Classification of Diseases, Ninth Revision, Clinical ModificationID = identificationRR = relative risk 相似文献
18.
Gene Silencing-Based Disease Resistance 总被引:4,自引:0,他引:4
Wassenegger M 《Transgenic research》2002,11(6):639-653
19.
营养期杀虫蛋白(Vip)是苏云金杆菌在营养期所产生的一类新型杀虫蛋白,代表了第二代转基因杀虫蛋白,它能在一定程度上克服许多害虫对δ-内毒素低敏感或者不敏感的缺陷。但是,目前和已经深入研究的δ-内毒素相比较,有关Vip蛋白结构和功能关系方面的报道还甚少。本文采用最大似然方法和基于最大简约的滑窗分析对Vip蛋白的分子进化机制进行了评价。结果发现Vip蛋白在进化过程当中经历了正选择,并采用贝叶斯方法确定了16个正选择氨基酸残基。有意思的是所有这些正选择残基都位于Vip蛋白C端从705到809的区域。当把这些正选择残基定位到二级结构和三级结构时,发现绝大部分正选择残基都暴露在Vip蛋白空间结构的表面并且聚集在环的区域。推测Vip蛋白分子进化的机制应该是受到了正选择压力而不是功能约束的松弛。导致Vip蛋白C端多样性的潜在正选择压力可能是Vip蛋白为了在和目标昆虫之间竞争取得优势,或者是为了扩大Vip蛋白的杀虫范围。文中确定的经历了正选择残基很有可能是和昆虫宿主范围有关,因此可以为今后研究Vip蛋白的结构和功能提供相应的靶点。 相似文献
20.
Kiyokazu Kakugawa Takuwa Yasuda Ikuo Miura Ayako Kobayashi Hitomi Fukiage Rumi Satoh Masashi Matsuda Haruhiko Koseki Shigeharu Wakana Hiroshi Kawamoto Hisahiro Yoshida 《Molecular and cellular biology》2009,29(18):5128-5135
A critical step during intrathymic T-cell development is the transition of CD4+ CD8+ double-positive (DP) cells to the major histocompatibility complex class I (MHC-I)-restricted CD4− CD8+ and MHC-II-restricted CD4+ CD8− single-positive (SP) cell stage. Here, we identify a novel gene that is essential for this process. Through the T-cell phenotype-based screening of N-ethyl-N-nitrosourea (ENU)-induced mutant mice, we established a mouse line in which numbers of CD4 and CD8 SP thymocytes as well as peripheral CD4 and CD8 T cells were dramatically reduced. Using linkage analysis and DNA sequencing, we identified a missense point mutation in a gene, E430004N04Rik (also known as themis), that does not belong to any known gene family. This orphan gene is expressed specifically in DP and SP thymocytes and peripheral T cells, whereas in mutant thymocytes the levels of protein encoded by this gene were drastically reduced. We generated E430004N04Rik-deficient mice, and their phenotype was virtually identical to that of the ENU mutant mice, thereby confirming that this gene is essential for the development of SP thymocytes.The differentiation step from the double-positive (DP) to single-positive (SP) thymocyte stage is critically regulated by signals originating from the T-cell receptor α/β (TCRα/β) expressed on their surface (3, 5, 16, 17). By using reverse genetic approaches by knocking out or overexpressing various genes that are expected to be involved in TCR signaling, including its ligand major histocompatibility complex molecules and coreceptors CD4 and CD8, the roles of these genes in T-cell development have been investigated intensively (11, 12). However, to identify totally unknown mechanisms in T-cell development, the forward genetic approach is required. N-ethyl-N-nitrosourea (ENU) is a potent mutagen that randomly induces point mutations throughout the genome in a dose-dependent manner, and ENU mutagenesis has been a representative forward genetic strategy (4, 15). We have been screening phenotypes of ENU-mutagenized mice, focusing on defects in T-cell development. 相似文献