首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 17 毫秒
1.
Multi-marker approaches have received a lot of attention recently in genome wide association studies and can enhance power to detect new associations under certain conditions. Gene-, gene-set- and pathway-based association tests are increasingly being viewed as useful supplements to the more widely used single marker association analysis which have successfully uncovered numerous disease variants. A major drawback of single-marker based methods is that they do not look at the joint effects of multiple genetic variants which individually may have weak or moderate signals. Here, we describe novel tests for multi-marker association analyses that are based on phenotype predictions obtained from machine learning algorithms. Instead of assuming a linear or logistic regression model, we propose the use of ensembles of diverse machine learning algorithms for prediction. We show that phenotype predictions obtained from ensemble learning algorithms provide a new framework for multi-marker association analysis. They can be used for constructing tests for the joint association of multiple variants, adjusting for covariates and testing for the presence of interactions. To demonstrate the power and utility of this new approach, we first apply our method to simulated SNP datasets. We show that the proposed method has the correct Type-1 error rates and can be considerably more powerful than alternative approaches in some situations. Then, we apply our method to previously studied asthma-related genes in 2 independent asthma cohorts to conduct association tests.  相似文献   

2.
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification.  相似文献   

3.
4.
5.
The inference of gene regulatory network (GRN) from gene expression data is an unsolved problem of great importance. This inference has been stated, though not proven, to be underdetermined implying that there could be many equivalent (indistinguishable) solutions. Motivated by this fundamental limitation, we have developed new framework and algorithm, called TRaCE, for the ensemble inference of GRNs. The ensemble corresponds to the inherent uncertainty associated with discriminating direct and indirect gene regulations from steady-state data of gene knock-out (KO) experiments. We applied TRaCE to analyze the inferability of random GRNs and the GRNs of E. coli and yeast from single- and double-gene KO experiments. The results showed that, with the exception of networks with very few edges, GRNs are typically not inferable even when the data are ideal (unbiased and noise-free). Finally, we compared the performance of TRaCE with top performing methods of DREAM4 in silico network inference challenge.  相似文献   

6.
One important method to obtain the continuous surfaces of soil properties from point samples is spatial interpolation. In this paper, we propose a method that combines ensemble learning with ancillary environmental information for improved interpolation of soil properties (hereafter, EL-SP). First, we calculated the trend value for soil potassium contents at the Qinghai Lake region in China based on measured values. Then, based on soil types, geology types, land use types, and slope data, the remaining residual was simulated with the ensemble learning model. Next, the EL-SP method was applied to interpolate soil potassium contents at the study site. To evaluate the utility of the EL-SP method, we compared its performance with other interpolation methods including universal kriging, inverse distance weighting, ordinary kriging, and ordinary kriging combined geographic information. Results show that EL-SP had a lower mean absolute error and root mean square error than the data produced by the other models tested in this paper. Notably, the EL-SP maps can describe more locally detailed information and more accurate spatial patterns for soil potassium content than the other methods because of the combined use of different types of environmental information; these maps are capable of showing abrupt boundary information for soil potassium content. Furthermore, the EL-SP method not only reduces prediction errors, but it also compliments other environmental information, which makes the spatial interpolation of soil potassium content more reasonable and useful.  相似文献   

7.
A Positive Strain Identification Method for Rhizobium meliloti   总被引:4,自引:8,他引:4       下载免费PDF全文
About 80% of Rhizobium meliloti strains contain 1 to 11 copies of insertion sequence ISRm1 in their genomes (R. Wheatcroft and R. J. Watson, J. Gen. Microbiol. 134:113-121, 1988). Hybridization to separated genomic DNA fragments with an ISRm1-specific probe produces patterns of hybridization bands which are distinctive for each strain. These patterns can be compared between strains to prove or disprove common identity. In most cases relatedness can be inferred despite phenotypic differences or minor genomic alterations.  相似文献   

8.
Learning gene expression programs directly from a set of observations is challenging due to the complexity of gene regulation, high noise of experimental measurements, and insufficient number of experimental measurements. Imposing additional constraints with strong and biologically motivated regularizations is critical in developing reliable and effective algorithms for inferring gene expression programs. Here we propose a new form of regulation that constrains the number of independent connectivity patterns between regulators and targets, motivated by the modular design of gene regulatory programs and the belief that the total number of independent regulatory modules should be small. We formulate a multi-target linear regression framework to incorporate this type of regulation, in which the number of independent connectivity patterns is expressed as the rank of the connectivity matrix between regulators and targets. We then generalize the linear framework to nonlinear cases, and prove that the generalized low-rank regularization model is still convex. Efficient algorithms are derived to solve both the linear and nonlinear low-rank regularized problems. Finally, we test the algorithms on three gene expression datasets, and show that the low-rank regularization improves the accuracy of gene expression prediction in these three datasets.  相似文献   

9.
10.
Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.  相似文献   

11.
Lewontin连锁不平衡公式是定位疾病基因最好的方法之一(Q代表基因重组率,代表自从基因突变以来经历的世代数(,研究证明:在疾病基因比率高所情况下,Lewontin连锁不平衡公式的仍然通用,而其他类似方法则不通用;Lewontin连锁不平衡公式略加修正后,通用于显性,隐性,共显性及超显性遗传;以Lewontin公式为基础,推导出在有重复突变情况下仍然通用的公式,基于这些优越性,Lewontin公式应受到重视和发展,使其应用于定位常见病基因上。  相似文献   

12.
Gene Silencing-Based Disease Resistance   总被引:4,自引:0,他引:4  
  相似文献   

13.
A critical step during intrathymic T-cell development is the transition of CD4+ CD8+ double-positive (DP) cells to the major histocompatibility complex class I (MHC-I)-restricted CD4 CD8+ and MHC-II-restricted CD4+ CD8 single-positive (SP) cell stage. Here, we identify a novel gene that is essential for this process. Through the T-cell phenotype-based screening of N-ethyl-N-nitrosourea (ENU)-induced mutant mice, we established a mouse line in which numbers of CD4 and CD8 SP thymocytes as well as peripheral CD4 and CD8 T cells were dramatically reduced. Using linkage analysis and DNA sequencing, we identified a missense point mutation in a gene, E430004N04Rik (also known as themis), that does not belong to any known gene family. This orphan gene is expressed specifically in DP and SP thymocytes and peripheral T cells, whereas in mutant thymocytes the levels of protein encoded by this gene were drastically reduced. We generated E430004N04Rik-deficient mice, and their phenotype was virtually identical to that of the ENU mutant mice, thereby confirming that this gene is essential for the development of SP thymocytes.The differentiation step from the double-positive (DP) to single-positive (SP) thymocyte stage is critically regulated by signals originating from the T-cell receptor α/β (TCRα/β) expressed on their surface (3, 5, 16, 17). By using reverse genetic approaches by knocking out or overexpressing various genes that are expected to be involved in TCR signaling, including its ligand major histocompatibility complex molecules and coreceptors CD4 and CD8, the roles of these genes in T-cell development have been investigated intensively (11, 12). However, to identify totally unknown mechanisms in T-cell development, the forward genetic approach is required. N-ethyl-N-nitrosourea (ENU) is a potent mutagen that randomly induces point mutations throughout the genome in a dose-dependent manner, and ENU mutagenesis has been a representative forward genetic strategy (4, 15). We have been screening phenotypes of ENU-mutagenized mice, focusing on defects in T-cell development.  相似文献   

14.
15.
激活标记技术是植物基因克隆与基因功能鉴定的一种重要的方法。已经在拟南芥、矮牵牛、杨树、番茄和水稻等多种植物中应用,并成功分离鉴定了一系列的功能基因。文章介绍了用激活标记方法克隆鉴定多个与植物生长发育相关的功能基因技术。  相似文献   

16.
Hydroxycinnamate coenzyme A (CoA) thioesters are substrates for biosynthesis of lignin and hydroxycinnamate esters of polysaccharides and other polymers. Hence, a supply of these substrates is essential for investigation of cell wall biosynthesis. In this study, three recombinant enzymes, caffeic acid 3-O-methyltransferase, 4-coumarate-CoA ligase 1, and 4-coumarate-CoA ligase 5, were cloned from wheat, tobacco, and Arabidopsis, respectively, and were used to synthesize 14C-feruloyl-CoA, caffeoyl-CoA, p-coumaroyl-CoA, feruloyl-CoA, and sinapoyl-CoA. The corresponding hydroxycinnamoyl-CoA thioesters were high-performance liquid chromatography purified, the only extraction/purification step necessary, with total yields between 88–95%. Radiolabeled 14C-feruloyl-CoA was generated from caffeic acid and S-adenosyl-14C-methionine under the combined action of caffeic acid 3-O-methyltransferase and 4-coumarate-CoA ligase 1. About 70% of 14C-methyl groups from S-adenosyl methionine were incorporated into the final product. The methods presented are simple, fast, and efficient for the preparation of the hydroxycinnamate thioesters.  相似文献   

17.
Most of the disease resistance genes (R-genes) discovered in plants have conserved functional domains, predominantly among them are nucleotide binding sites (NBS) and leucine rich repeats (LRR). The sequence information of the conserved domains can be invariably used to mine similar sequences from other plant species, using degenerate and specific primers for their amplification in a polymerase chain reaction. Such derived sequences, known as Resistance Gene Analogues (RGAs), can serve as molecular markers for rapid identification and isolation of R-genes. Besides, they can also provide clues about the evolutionary mechanism of resistance genes and the interaction involved in pathogen recognition. In the recent years, this sequence-homology based approach has been used extensively for the cloning and mapping of RGAs in cereals, pulses, oilseeds, coffee, spices, forest trees and horticultural crops. In this article, the current status of cloning of RGAs from different crops has been reviewed. A general method of RGA cloning and its modifications like NBS-profiling and AFLP-NBS have also been discussed along with examples. Further, it has been suggested that the RGAs cloned in various crops would be a useful genomic resource for developing cultivars with durable resistance to diseases in different crop breeding programmes.  相似文献   

18.
19.
20.
一个传染病模型的周期正解   总被引:2,自引:0,他引:2  
本文研究了一类传染病模型周期正解的存在唯一性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号