首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Fuzzy decision trees are powerful, top-down, hierarchical search methodology to extract human interpretable classification rules. However, they are often criticized to result in poor learning accuracy. In this paper, we propose Neuro-Fuzzy Decision Trees (N-FDTs); a fuzzy decision tree structure with neural like parameter adaptation strategy. In the forward cycle, we construct fuzzy decision trees using any of the standard induction algorithms like fuzzy ID3. In the feedback cycle, parameters of fuzzy decision trees have been adapted using stochastic gradient descent algorithm by traversing back from leaf to root nodes. With this strategy, during the parameter adaptation stage, we keep the hierarchical structure of fuzzy decision trees intact. The proposed approach of applying backpropagation algorithm directly on the structure of fuzzy decision trees improves its learning accuracy without compromising the comprehensibility (interpretability). The proposed methodology has been validated using computational experiments on real-world datasets.  相似文献   

2.
3.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

4.
Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.  相似文献   

5.
6.

Purpose

In the USA, several studies have been conducted to analyze the energy consumption and atmospheric emissions of Warm-mix Asphalt (WMA) pavements. However, the direct and indirect environmental, economic, and social impacts, termed as Triple-Bottom-Line (TBL), were not addressed sufficiently. Hence, the aim of this study is to develop TBL-oriented sustainability assessment model to evaluate the environmental and socio-economic impacts of pavements constructed with different types of WMA mixtures and compare them to a conventional Hot-mix Asphalt (HMA). The types of WMA technologies investigated in this research include Asphamin® WMA, Evotherm? WMA, and Sasobit® WMA.

Methods

To achieve this goal, supply and use tables published by the U.S. Bureau of Economic Analysis were merged with 16 macro-level sustainability metrics. A hybrid TBL-LCA model was built to evaluate the life-cycle sustainability performance of using WMA technologies in construction of asphalt pavements. The impacts on the sustainability were calculated in terms of socio-economic (import, income, gross operating surplus, government tax, work-related injuries, and employment) and environmental (water withdrawal, energy use, carbon footprint, hazardous waste generation, toxic releases into air, and land use). A stochastic compromise programming model was then developed for finding the optimal allocation of different pavement types for the U.S. highways.

Results and discussion

WMAs did not perform better in terms of environmental impacts compared to HMA. Asphamin® WMA was found to have the highest environmental and socio-economic impacts compared to other pavement types. Material extractions and processing phase had the highest contribution to all environmental impact indicators that shows the importance of cleaner production strategies for pavement materials. Based on stochastic compromised programming results, in a balanced weighting situation, Sasobit® WMA had the highest percentage of allocation (61 %); while only socio-economic aspects matter, Asphamin® WMA had the largest share (57 %) among the asphalt pavements. The optimization results also supported the significance of an increased WMA use in the U.S. highways.

Conclusions

This research complemented previous LCA studies by evaluating pavements not only from environmental emissions and energy consumption standpoint, but also from socio-economic perspectives. Multi-objective optimization results also provided important insights for decision makers when finding the optimum allocation of pavement alternatives based on different environmental and socio-economic priorities. Consequently, this study aimed to increase awareness of the inherent benefits of economic input–output analysis and multi-criteria decision making through application to emerging sustainable pavement practices.  相似文献   

7.

Background

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

Results

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

Conclusions

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
  相似文献   

8.
Habitat modeling studies the influence of abiotic factors on the abundance of a given taxonomic group of organisms. In this work, we investigate the effect of environmental conditions on communities of organisms in three different ecosystems. Namely, we consider the diatom community in Lake Prespa, Macedonia, the Collembola community in the soils of Denmark and 14 organisms living in Slovenian rivers. The data for these case studies consist of physical and chemical properties of the environment as well as the relative abundances or presence of the organisms under investigation.The multi-species data are analyzed by constructing habitat models for each species separately (single-target decision trees) or by constructing a single habitat model for all the species (multi-target predictive clustering trees). Typically, habitat models are constructed for each species individually and thus do not exploit the interactions between/among species. While approaches for building a single habitat model of a group of organisms exist, they typically construct models that are not readily interpretable and, thus, are seldom used by the research community. In this work, we explore in detail the construction of interpretable models of both types. Furthermore, we construct ensembles of decision trees and ensembles of predictive clustering trees to increase the predictive performance of the models.The key outcomes of the interpretation and discussion of the obtained models for each case study are as follows. First, we show that multi-target predictive clustering trees are a very useful method for the analysis of multi-species data and that they are more efficient and produce more concise models than single-target decision trees. The obtained multi-target habitat models are readily interpretable and identify the environmental conditions that influence the composition and structure of a given community of organisms. Second, we conclude that the temperature and magnesium are the most important factors influencing the complete diatom community in Lake Prespa, while the nitrates and the temperature influence more the most abundant species. Third, the biological oxygen demand is the most influential factor for the abundance of river dwelling species, while the river community structure is mostly influenced by the NO2 concentration. Finally, the structure of the community of soil microarthropods is mostly influenced by the soil type and the crop history.  相似文献   

9.
Alternating tangential flow (ATF) filtration has been used with success in the Biopharmaceutical industry as a lower shear technology for cell retention with perfusion cultures. The ATF system is different than tangential flow filtration; however, in that reverse flow is used once per cycle as a means to minimize fouling. Few studies have been reported in the literature that evaluates ATF and how key system variables affect the rate at which ATF filters foul. In this study, an experimental setup was devised that allowed for determination of the time it took for fouling to occur for given mammalian (PER.C6) cell culture cell densities and viabilities as permeate flow rate and antifoam concentration was varied. The experimental results indicate, in accordance with D'Arcy's law, that the average resistance to permeate flow (across a cycle of operation) increases as biological material deposits on the membrane. Scanning electron microscope images of the post‐run filtration surface indicated that both cells and antifoam micelles deposit on the membrane. A unique mathematical model, based on the assumption that fouling was due to pore blockage from the cells and micelles in combination, was devised that allowed for estimation of sticking factors for the cells and the micelles on the membrane. This model was then used to accurately predict the increase in transmembane pressure during constant flux operation for an ATF cartridge used for perfusion cell culture. © 2014 American Institute of Chemical Engineers Biotechnol. Prog., 30:1291–1300, 2014  相似文献   

10.
Sen S  Satagopan JM  Churchill GA 《Genetics》2005,170(1):447-464
We examine the efficiency of different genotyping and phenotyping strategies in inbred line crosses from an information perspective. This provides a mathematical framework for the statistical aspects of QTL experimental design, while guiding our intuition. Our central result is a simple formula that quantifies the fraction of missing information of any genotyping strategy in a backcross. It includes the special case of selectively genotyping only the phenotypic extreme individuals. The formula is a function of the square of the phenotype and the uncertainty in our knowledge of the genotypes at a locus. This result is used to answer a variety of questions. First, we examine the cost-information trade-off varying the density of markers and the proportion of extreme phenotypic individuals genotyped. Then we evaluate the information content of selective phenotyping designs and the impact of measurement error in phenotyping. A simple formula quantifies the information content of any combined phenotyping and genotyping design. We extend our results to cover multigenotype crosses, such as the F(2) intercross, and multiple QTL models. We find that when the QTL effect is small, any contrast in a multigenotype cross benefits from selective genotyping in the same manner as in a backcross. The benefit remains in the presence of a second unlinked QTL with small effect (explaining <20% of the variance), but diminishes if the second QTL has a large effect. Software for performing power calculations for backcross and F(2) intercross incorporating selective genotyping and marker spacing is available from http://www.biostat.ucsf.edu/sen.  相似文献   

11.
12.
Wessel J  Zapala MA  Schork NJ 《Genomics》2007,90(1):132-142
The availability of high-throughput genotyping technologies and microarray assays has allowed researchers to consider pursuing investigations whose ultimate goal is the identification of genetic variations that influence levels of gene expression, e.g., "expression quantitative trait locus" or "eQTL" mapping studies. However, the large number of genes whose expression levels can be tested for association with genetic variations in such studies can create both statistical and biological interpretive problems. We consider the integrated analysis of eQTL mapping data that incorporates pathway, function, and disease process information. The goal of this analysis is to determine if compelling patterns emerge from the data that are consistent with the notion that perturbations in the molecular physiologic environment induced by genetic variations implicate the expression patterns of multiple genes via genetic network relationships or feedback mechanisms. We apply available genetic network and pathway analysis software, as well as a novel regression analysis technique, to carry out the proposed studies. We also consider extensions of the proposed strategies and areas of future research.  相似文献   

13.
Sharpee T  Bialek W 《PloS one》2007,2(7):e646
We consider here how to separate multidimensional signals into two categories, such that the binary decision transmits the maximum possible information about those signals. Our motivation comes from the nervous system, where neurons process multidimensional signals into a binary sequence of responses (spikes). In a small noise limit, we derive a general equation for the decision boundary that locally relates its curvature to the probability distribution of inputs. We show that for Gaussian inputs the optimal boundaries are planar, but for non-Gaussian inputs the curvature is nonzero. As an example, we consider exponentially distributed inputs, which are known to approximate a variety of signals from natural environment.  相似文献   

14.
When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.  相似文献   

15.
In this paper we offer the quantum-like (QL) representation of the Shafir–Tversky statistical effect which is well known in cognitive psychology. We apply the so-called contextual approach. We consider the Shafir–Tversky effect to result from mixing statistical data obtained in incompatible contexts which are involved, e.g. in Prisoner’s Dilemma or in more general games in which the disjunction effect can be found. As a consequence, the law of total probability is violated for the experimental data obtained in experiments on cognitive psychology by Shafir and Tversky [Shafir, E., Tversky, A., 1992. Thinking through uncertainty: nonconsequential reasoning and choice. Cogn. Psychol. 24, 449–474] as well as Tversky and Shafir [Tversky, A., Shafir, E., 1992. The disjunction effect in choice under uncertainty. Psychol. Sci. 3, 305–309]. Moreover, we can find a numerical measure of contextual incompatibility (the so-called coefficient of interference) as well as represent contexts which are involved in Prisoner’s Dilemma (PD) by probability amplitudes—normalized vectors (“mental wave functions”). We remark that statistical data from Shafir and Tversky [Shafir, E., Tversky, A., 1992. Thinking through uncertainty: nonconsequential reasoning and choice. Cogn. Psychol. 24, 449–474] and Tversky and Shafir [Tversky, A., Shafir, E., 1992. The disjunction effect in choice under uncertainty. Psychol. Sci. 3, 305–309] experiments differ crucially from the point of view of mental interference. The second one exhibits the conventional trigonometric (cos?cos?-type) interference while the first one exhibits even the so-called hyperbolic (cosh?cosh?-type) interference. We discuss the QL processing of information by cognitive systems, especially, the QL decision making and both classical and QL rationality and ethics.  相似文献   

16.
The aim of this study was to assess the usefulness of the decision trees method as a research method of multidimensional associations between menarche and socioeconomic variables. The article is based on data collected from the rural area of Choszczno in the West Pomerania district of Poland between 1987 and 2001. Girls were asked about the appearance of first menstruation (a yes/no method). The average menarchal age was estimated by the probit analysis method, using second grade polynomials. The socioeconomic status of the girls' families was determined using five qualitative variables: fathers' and mothers' educational level, source of income, household appliances and the number of children in a family. For classification based on five socioeconomic variables, one of the most effective algorithms CART (Classification and Regression Trees) was used. In 2001 the menarchal age in 66% of examined girls was properly classified, while a higher efficiency of 70% was obtained for girls examined in 1987. The decision trees method enabled the definition of the hierarchy of socioeconomic variables influencing girls' biological development level. The strongest discriminatory power was attributed to the number of children in a family, and the mother's and then father's educational level. Using this method it is possible to detect differences in strength of socioeconomic variables associated with girls' pubescence before 1987 and after 2001 during the transformation of the economic and political systems in Poland. However, the decision trees method is infrequently applied in social sciences and constitutes a novelty; this article proves its usefulness in examining relations between biological processes and a population's living conditions.  相似文献   

17.
18.
Contact: ihh{at}berkeley.edu Associate Editor: Alex Bateman  相似文献   

19.
20.
The number of trait loci in late-onset Alzheimer disease   总被引:10,自引:0,他引:10       下载免费PDF全文
Although it is clear that apoE plays an important role in the genetics of late-onset Alzheimer disease (AD), evidence exists that additional genes may play a role in AD, and estimates of the total contribution of apoE to the variance in onset of AD vary widely. Unfortunately, little information is available on the number and contribution of additional genes. We estimated the number of additional quantitative-trait loci and their contribution to the variance in age at onset of AD, as well as the contribution of apoE and sex, in an oligogenic segregation analysis of 75 families (742 individuals) ascertained for members with late-onset AD. We found evidence that four additional loci make a contribution to the variance in age at onset of late-onset AD that is similar to or greater in magnitude than that made by apoE, with one locus making a contribution several times greater than that of apoE. Additionally, we confirmed previous findings of a dose effect for the apoE varepsilon4 allele, a protective effect for the varepsilon2 allele, evidence for allelic interactions at the apoE locus, and a small protective effect for males. Furthermore, although we estimate that the apoE genotype can make a difference of 相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号