首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Stiglic G  Kocbek S  Pernek I  Kokol P 《PloS one》2012,7(3):e33812

Purpose

Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible.

Methods

This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree.

Results

The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree.

Conclusions

The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.  相似文献   

2.

Purpose

A number of previous studies have shown inconsistencies between sub-scale scores and component summary scores using traditional scoring methods of the SF-36 version 1. This study addresses the issue in Version 2 and asks if the previous problems of disagreement between the eight SF-36 Version 1 sub-scale scores and the Physical and Mental Component Summary persist in version 2. A second study objective is to review the recommended scoring methods for the creation of factor scoring weights and the effect on producing summary scale scores

Methods

The 2004 South Australian Health Omnibus Survey dataset was used for the production of coefficients. There were 3,014 observations with full data for the SF-36. Data were analysed in LISREL V8.71. Confirmatory factor analysis models were fit to the data producing diagonally weighted least squares estimates. Scoring coefficients were validated on an independent dataset, the 2008 South Australian Health Omnibus Survey.

Results

Problems of agreement were observed with the recommended orthogonal scoring methods which were corrected using confirmatory factor analysis.

Conclusions

Confirmatory factor analysis is the preferred method to analyse SF-36 data, allowing for the correlation between physical and mental health.  相似文献   

3.

Background

In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models.

Methods

We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not.

Results

We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1616-z) contains supplementary material, which is available to authorized users.  相似文献   

4.
Xu P  Yang P  Lei X  Yao D 《PloS one》2011,6(1):e14634

Background

There is a growing interest in the study of signal processing and machine learning methods, which may make the brain computer interface (BCI) a new communication channel. A variety of classification methods have been utilized to convert the brain information into control commands. However, most of the methods only produce uncalibrated values and uncertain results.

Methodology/Principal Findings

In this study, we presented a probabilistic method “enhanced BLDA” (EBLDA) for multi-class motor imagery BCI, which utilized Bayesian linear discriminant analysis (BLDA) with probabilistic output to improve the classification performance. EBLDA builds a new classifier that enlarges training dataset by adding test samples with high probability. EBLDA is based on the hypothesis that unlabeled samples with high probability provide valuable information to enhance learning process and generate a classifier with refined decision boundaries. To investigate the performance of EBLDA, we first used carefully designed simulated datasets to study how EBLDA works. Then, we adopted a real BCI dataset for further evaluation. The current study shows that: 1) Probabilistic information can improve the performance of BCI for subjects with high kappa coefficient; 2) With supplementary training samples from the test samples of high probability, EBLDA is significantly better than BLDA in classification, especially for small training datasets, in which EBLDA can obtain a refined decision boundary by a shift of BLDA decision boundary with the support of the information from test samples.

Conclusions/Significance

The proposed EBLDA could potentially reduce training effort. Therefore, it is valuable for us to realize an effective online BCI system, especially for multi-class BCI systems.  相似文献   

5.

Background

It has been suggested that withdrawal of inhaled corticosteroids (ICS) in COPD patients on maintenance treatment results in deterioration of symptoms, lung function and exacerbations. The aim of this real-life, prospective, multicentric study was to investigate whether withdrawal of ICS in COPD patients at low risk of exacerbation is linked to a deterioration in lung function and symptoms and to a higher frequency of exacerbations.

Methods

914 COPD patients, on maintenance therapy with bronchodilators and ICS, FEV1>50% predicted, and <2 exacerbations/year were recruited. Upon decision of the primary physicians, 59% of patients continued their ICS treatment whereas in 41% of patients ICS were withdrawn and regular therapy was continued with long-acting bronchodilators mostly (91% of patients). FEV1, CAT (COPD Assessment Test), and occurrence of exacerbations were measured at the beginning (T0) and at the end (T6) of the 6 months observational period.

Results

816 patients (89.3%) concluded the study. FEV1, CAT and exacerbations history were similar in the two groups (ICS and no ICS) at T0 and at T6. We did not observe any deterioration of lung function symptoms, and exacerbation rate between the two groups at T0 and T6.

Conclusions

We conclude that the withdrawal of ICS, in COPD patients at low risk of exacerbation, can be safe provided that patients are left on maintenance treatment with long-acting bronchodilators.  相似文献   

6.
Ho WH  Lee KT  Chen HY  Ho TW  Chiu HC 《PloS one》2012,7(1):e29179

Background

A database for hepatocellular carcinoma (HCC) patients who had received hepatic resection was used to develop prediction models for 1-, 3- and 5-year disease-free survival based on a set of clinical parameters for this patient group.

Methods

The three prediction models included an artificial neural network (ANN) model, a logistic regression (LR) model, and a decision tree (DT) model. Data for 427, 354 and 297 HCC patients with histories of 1-, 3- and 5-year disease-free survival after hepatic resection, respectively, were extracted from the HCC patient database. From each of the three groups, 80% of the cases (342, 283 and 238 cases of 1-, 3- and 5-year disease-free survival, respectively) were selected to provide training data for the prediction models. The remaining 20% of cases in each group (85, 71 and 59 cases in the three respective groups) were assigned to validation groups for performance comparisons of the three models. Area under receiver operating characteristics curve (AUROC) was used as the performance index for evaluating the three models.

Conclusions

The ANN model outperformed the LR and DT models in terms of prediction accuracy. This study demonstrated the feasibility of using ANNs in medical decision support systems for predicting disease-free survival based on clinical databases in HCC patients who have received hepatic resection.  相似文献   

7.

Introduction

Various conditions of liver disease and the downsides of liver biopsy call for a non-invasive option to assess liver fibrosis. A non-invasive score would be especially useful to identify patients with slow advancing fibrotic processes, as in Non-Alcoholic Fatty Liver Disease (NAFLD), which should undergo histological examination for fibrosis.

Patients/Methods

Classic liver serum parameters, hyaluronic acid (HA) and cell death markers of 126 patients undergoing bariatric surgery for morbid obesity were analyzed by machine learning techniques (logistic regression, k-nearest neighbors, linear support vector machines, rule-based systems, decision trees and random forest (RF)). Specificity, sensitivity and accuracy of the evaluated datasets to predict fibrosis were assessed.

Results

None of the single parameters (ALT, AST, M30, M60, HA) did differ significantly between patients with a fibrosis score 1 or 2. However, combining these parameters using RFs reached 79% accuracy in fibrosis prediction with a sensitivity of more than 60% and specificity of 77%. Moreover, RFs identified the cell death markers M30 and M65 as more important for the decision than the classic liver parameters.

Conclusion

On the basis of serum parameters the generation of a fibrosis scoring system seems feasible, even when only marginally fibrotic tissue is available. Prospective evaluation of novel markers, i.e. cell death parameters, should be performed to identify an optimal set of fibrosis predictors.  相似文献   

8.

Background

Serial C-reactive protein (CRP) values may be useful for decision-making regarding duration of antibiotics in neonates. However, established standard of practice for its use in preterm very low birth weight (<1500 g, VLBW) infants are lacking.

Objective

Evaluate compliance with a CRP-guided computerized decision support (CDS) algorithm and compare characteristics and outcomes of compliant versus non-compliant cases. Measure correlation between CRPs and white blood count (WBC) indices.

Methods

We examined 3 populations: 1) all preterm VLBW infants born at Vanderbilt 2006–2011 – we assessed provider compliance with CDS algorithm and measured relevant outcomes; 2) all patients with positive blood culture results admitted to the Vanderbilt NICU 2006–2012 – we tested the correlation between CRP and WBC results within 7 days of blood culture phlebotomy; 3) 1,000 randomly selected patients out of the 7,062 patients admitted to the NICU 2006–2012 – we correlated time-associated CRP values and absolute neutrophil counts.

Results

Of 636 VLBW infants in cohort 1), 569 (89%) received empiric antibiotics for suspected early-onset sepsis. In 409 infants (72%) the CDS algorithm was followed; antibiotics were discontinued ≤48 hours in 311 (55%) with normal serial CRPs and continued in 98 (17%) with positive CRPs, resulting in significant reduction in antibiotic exposure (p<0.001) without increase in complications or subsequent infections. One hundred sixty (28%) were considered non-compliant because antibiotics were continued beyond 48 hours despite negative serial CRPs and blood cultures. Serial CRPs remained negative in 38 (12%) of 308 blood culture-positive infants from cohort 2, but only 4 patients had clinically probable sepsis with single organisms and no immunodeficiency besides extreme prematurity. Leukopenia of any cell type was not linked with CRPs in cohorts 2 and 3.

Conclusions

CDS/CRP-guided antibiotic use is safe and effective in culture-negative VLBW infants. CRP results are not affected by low WBC indices.  相似文献   

9.

Background

Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.

Results

Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.

Conclusions

Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.  相似文献   

10.
11.

Introduction

Observational studies using case-control designs have showed an increased risk of pneumonia associated with inhaled corticosteroid (ICS)-containing medications in patients with chronic obstructive pulmonary disease (COPD). New-user observational cohort designs may minimize biases associated with previous case-control designs.

Objective

To estimate the association between ICS and pneumonia among new users of ICS relative to inhaled long-acting bronchodilator (LABD) monotherapy.

Methods

Pneumonia events in COPD patients ≥45 years old were compared among new users of ICS medications (n = 11,555; ICS, ICS/long-acting β2-agonist [LABA] combination) and inhaled LABD monotherapies (n = 6,492; LABA, long-acting muscarinic antagonists) using Cox proportional hazards models, with propensity scores to adjust for confounding. Setting: United Kingdom electronic medical records with linked hospitalization and mortality data (2002–2010). New users were censored at earliest of: pneumonia event, death, changing/discontinuing treatment, or end of follow-up. Outcomes: severe pneumonia (primary) and any pneumonia (secondary).

Results

Following adjustment, new use of ICS-containing medications was associated with an increased risk of pneumonia hospitalization (n = 322 events; HR = 1.55, 95% CI: 1.14, 2.10) and any pneumonia (n = 702 events; HR = 1.49, 95% CI: 1.22, 1.83). Crude incidence rates of any pneumonia were 48.7 and 30.9 per 1000 person years among the ICS-containing and LABD cohorts, respectively. Excess risk of pneumonia with ICS was reduced when requiring ≥1 month or ≥ 6 months of new use. There was an apparent dose-related effect, with greater risk at higher daily doses of ICS. There was evidence of channeling bias, with more severe patients prescribed ICS, for which the analysis may not have completely adjusted.

Conclusions

The results of this new-user cohort study are consistent with published findings; ICS were associated with a 20–50% increased risk of pneumonia in COPD, which reduced with exposure time. This risk must be weighed against the benefits when prescribing ICS to patients with COPD.  相似文献   

12.

Background

Mechanistic models that describe the dynamical behaviors of biochemical systems are common in computational systems biology, especially in the realm of cellular signaling. The development of families of such models, either by a single research group or by different groups working within the same area, presents significant challenges that range from identifying structural similarities and differences between models to understanding how these differences affect system dynamics.

Results

We present the development and features of an interactive model exploration system, MOSBIE, which provides utilities for identifying similarities and differences between models within a family. Models are clustered using a custom similarity metric, and a visual interface is provided that allows a researcher to interactively compare the structures of pairs of models as well as view simulation results.

Conclusions

We illustrate the usefulness of MOSBIE via two case studies in the cell signaling domain. We also present feedback provided by domain experts and discuss the benefits, as well as the limitations, of the approach.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-316) contains supplementary material, which is available to authorized users.  相似文献   

13.

Background

State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients.

Results

In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study.

Conclusions

Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-291) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Drug allergy represent an important subset of adverse drug reactions that is worthy of attention because many of these reactions are potentially preventable with use of computerised decision support systems. This is however dependent on the accurate and comprehensive recording of these reactions in the electronic health record. The objectives of this study were to understand approaches to the recording of drug allergies in electronic health record systems.

Materials and Methods

We undertook a case study comprising of 21 in-depth interviews with a purposefully selected group of primary and secondary care clinicians, academics, and members of the informatics and drug regulatory communities, observations in four General Practices and an expert group discussion with 15 participants from the Allergy and Respiratory Expert Resource Group of the Royal College of General Practitioners.

Results

There was widespread acceptance among healthcare professionals of the need for accurate recording of drug allergies and adverse drug reactions. Most drug reactions were however likely to go unreported to and/or unrecognised by healthcare professionals and, even when recognised and reported, not all reactions were accurately recorded. The process of recording these reactions was not standardised.

Conclusions

There is considerable variation in the way drug allergies are recorded in electronic health records. This limits the potential of computerised decision support systems to help alert clinicians to the risk of further reactions. Inaccurate recording of information may in some instances introduce new problems as patients are denied treatments that they are erroneously believed to be allergic to.  相似文献   

15.

Background

The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered.

Methods and Findings

We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34).

Conclusions

We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research.  相似文献   

16.

Objective

To explore and document the experiences of those receiving support from a lay health trainer, in order to inform the optimisation and evaluation of such interventions.

Design

Longitudinal qualitative study with up to four serial interviews conducted over 12 months. Interviews were transcribed and analysed using the constant comparative approach associated with grounded theory.

Participants

13 health trainers, 5 managers and 26 clients.

Setting

Three health trainer services targeting disadvantaged communities in northern England.

Results

The final dataset comprised 116 interviews (88 with clients and 28 with staff). Discussions with health trainers and managers revealed a high degree of heterogeneity between the local services in terms of their primary aims and activities. However, these were found to converge over time. There was agreement that health trainer interventions are generally ‘person-centred’ in terms of being tailored to the needs of individual clients. This led to a range of self-reported outcomes, including behaviour changes, physical health improvements and increased social activity. Factors impacting on the maintenance of lifestyle changes included the cost and timing of health-promoting activities, ill-health or low mood. Participants perceived a need for ongoing access to low cost facilities to ensure that any lifestyle changes can be maintained in the longer term.

Conclusions

Health trainers may be successful in terms of supporting people from socio-economically disadvantaged communities to make positive lifestyle changes, as well as achieving other health-related outcomes. This is not a ‘one-size-fits-all’ approach; commissioners and providers should select the intervention models that best meet the needs of their local populations. By delivering holistic interventions that address multiple lifestyle risks and incorporate relapse prevention strategies, health trainers could potentially have a significant impact on health inequalities. However, rigorous, formal outcome and economic evaluation of the range of health trainer delivery models is needed.  相似文献   

17.
18.

Background and Aims

Functional–structural plant models (FSPMs) are used to integrate knowledge and test hypotheses of plant behaviour, and to aid in the development of decision support systems. A significant amount of effort is being put into providing a sound methodology for building them. Standard techniques, such as procedural or object-oriented programming, are not suited for clearly separating aspects of plant function that criss-cross between different components of plant structure, which makes it difficult to reuse and share their implementations. The aim of this paper is to present an aspect-oriented programming approach that helps to overcome this difficulty.

Methods

The L-system-based plant modelling language L+C was used to develop an aspect-oriented approach to plant modelling based on multi-modules. Each element of the plant structure was represented by a sequence of L-system modules (rather than a single module), with each module representing an aspect of the element''s function. Separate sets of productions were used for modelling each aspect, with context-sensitive rules facilitated by local lists of modules to consider/ignore. Aspect weaving or communication between aspects was made possible through the use of pseudo-L-systems, where the strict-predecessor of a production rule was specified as a multi-module.

Key Results

The new approach was used to integrate previously modelled aspects of carbon dynamics, apical dominance and biomechanics with a model of a developing kiwifruit shoot. These aspects were specified independently and their implementation was based on source code provided by the original authors without major changes.

Conclusions

This new aspect-oriented approach to plant modelling is well suited for studying complex phenomena in plant science, because it can be used to integrate separate models of individual aspects of plant development and function, both previously constructed and new, into clearly organized, comprehensive FSPMs. In a future work, this approach could be further extended into an aspect-oriented programming language for FSPMs.  相似文献   

19.

Background

The potential role of DSS in CVD prevention remains unclear as only a few studies report on patient outcomes for cardiovascular disease.

Methods and Results

A systematic review and meta-analysis of randomised controlled trials and observational studies was done using Medline, Embase, Cochrane Library, PubMed, Amed, CINAHL, Web of Science, Scopus databases; reference lists of relevant studies to 30 July 2011; and email contact with experts. The primary outcome was prevention of cardiovascular disorders (myocardial infarction, stroke, coronary heart disease, peripheral vascular disorders and heart failure) and management of hypertension owing to decision support systems, clinical decision supports systems, computerized decision support systems, clinical decision making tools and medical decision making (interventions). From 4116 references ten studies met our inclusion criteria (including 16,312 participants). Five papers reported outcomes on blood pressure management, one paper on heart failure, two papers each on stroke, and coronary heart disease. The pooled estimate for CDSS versus control group differences in SBP (mm of Hg) was - 0.99 (95% CI −3.02 to 1.04 mm of Hg; I2 = 0; p = 0.851).

Conclusions

DSS show an insignificant benefit in the management and control of hypertension (insignificant reduction of SBP). The paucity of well-designed studies on patient related outcomes is a major hindrance that restricts interpretation for evaluating the role of DSS in secondary prevention. Future studies on DSS should (1) evaluate both physician performance and patient outcome measures (2) integrate into the routine clinical workflow with a provision for decision support at the point of care.  相似文献   

20.

Background

Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes–syndromic surveillance–using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users.

Methods

This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory.

Results

High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro = .955), however the classification process is not transparent to the domain experts.

Conclusion

The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号