首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Predicting the final neuropeptide products from neuropeptides genes has been problematic because of the large number of enzymes responsible for their processing. The basic processing of 22 Aplysia californica prohormones representing 750 cleavage sites have been analyzed and statistically modeled using binary logistic regression analyses. Two models are presented that predict cleavage probabilities at basic residues based on prohormone sequence. The complex model has a correct classification rate of 97%, a sensitivity of 97%, and a specificity of 96% when tested on the Aplysia dataset.  相似文献   

4.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.  相似文献   

5.
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.  相似文献   

6.
OBJECTIVE--To investigate differences between hospitals in clinical management of patients admitted with fractured hip and to relate these to mortality at 90 days. DESIGN--A prospective audit of process and outcome of care based on interviews with patients, abstraction from records with standard proforma, and follow up at three months. Data were analysed with chi 2 test and forward stepwise regression modelling of mortality. SETTING--All eight hospitals in East Anglia with trauma orthopaedic departments. PATIENTS--580 consecutive patients admitted for fracture of neck of femur. MAIN OUTCOME MEASURE--Mortality at 90 days. RESULTS--Patients admitted to each hospital were similar with respect to age, sex, pre-existing illnesses, and activities of daily living before fracture. In all, 560 (97%) were treated surgically, by a range of grades of surgeon. Two hundred and sixty one patients (45%; range between hospitals 10-91%) received pharmaceutical thromboembolic prophylaxis, 502 (93%; 81-99%) perioperative antibiotic prophylaxis. The incidence of fatal pulmonary emboli differed between patients who received and those who did not receive prophylaxis against deep vein thrombosis (P = 0.001). Mortality at 90 days was 18%, differing significantly between hospitals (5-24%). One hospital had significantly better survival than the others (odds ratio 0.14; 95% confidence interval 0.04-0.48; P = 0.0016). CONCLUSIONS--No single factor or aspect of practice accounted for this protective effect. Lower mortality may be associated with the cumulative effects of several aspects of the organisation of treatment and the management of fracture of the hip, including thromboembolic pharmaceutical prophylaxis, antibiotic prophylaxis, and early mobilisation.  相似文献   

7.
Glioblastoma multiforme (GBM) is a highly malignant brain tumor. We explored the prognostic gene signature in 443 GBM samples by systematic bioinformatics analysis, using GSE16011 with microarray expression and corresponding clinical data from Gene Expression Omnibus as the training set. Meanwhile, patients from The Chinese Glioma Genome Atlas database (CGGA) were used as the test set and The Cancer Genome Atlas database (TCGA) as the validation set. Through Cox regression analysis, Kaplan-Meier analysis, t-distributed Stochastic Neighbor Embedding algorithm, clustering, and receiver operating characteristic analysis, a two-gene signature (GRIA2 and RYR3) associated with survival was selected in the GSE16011 dataset. The GRIA2-RYR3 signature divided patients into two risk groups with significantly different survival in the GSE16011 dataset (median: 0.72, 95% confidence interval [CI]: 0.64-0.98, vs median: 0.98, 95% CI: 0.65-1.61 years, logrank test P < .001), the CGGA dataset (median: 0.84, 95% CI: 0.70-1.18, vs median: 1.21, 95% CI: 0.95-2.94 years, logrank test P = .0017), and the TCGA dataset (median: 1.03, 95% CI: 0.86-1.24, vs median: 1.23, 95% CI: 1.04-1.85 years, logrank test P = .0064), validating the predictive value of the signature. And the survival predictive potency of the signature was independent from clinicopathological prognostic features in multivariable Cox analysis. We found that after transfection of U87 cells with small interfering RNA, GRIA2 and RYR3 influenced the biological behaviors of proliferation, migration, and invasion of glioblastoma cells. In conclusion, the two-gene signature was a robust prognostic model to predict GBM survival.  相似文献   

8.
A protective effect of breastfeeding on overweight (binary) has been reported by meta-analyses using logistic regression, whereas studies using linear regression and BMI (continuous) detected no significant association. To assess the relationship of these differences with different outcome classification, we compared results for linear, logistic, and quantile regression models in a cross-sectional data set of considerable size. Height, weight, and questionnaire data on 9,368 preschool children were collected during school-entry examinations in 1999 and 2002 in Bavaria, Southern Germany. We calculated multivariable linear, logistic, and quantile regression models with outcomes BMI, overweight, obesity, and BMI quantiles (as appropriate). Models considered the covariates breastfeeding (breastfed vs. never breastfed), gender, age, smoking in pregnancy, TV watching, maternal BMI, parental education, and early infant weight gain. No significant association was found in the linear regression model. In the logistic model, a significant association was observed for obesity (odds ratio: 0.72 (95% confidence interval (CI) 0.55, 0.94)). In quantile regression no significant point estimates were observed for the percentiles of 0.4-0.8. However, breastfeeding reduced the BMI of children having values on the 90th and 97th percentiles by -0.23 (95% CI -0.39, -0.07) and -0.26 (95% CI -0.45, -0.07) kg/m(2), respectively, on average. In contrast, breastfeeding was significantly associated with a low shift toward higher BMI values for BMI quantiles of 0.03 and from 0.1 to 0.3. The detection of associations between breastfeeding and childhood body composition might be related to the coding of the response variable (continuous or binary) and the statistical method used (linear, logistic, or quantile regression). Quantile regression should additionally be applied in such studies.  相似文献   

9.
BACKGROUND: Cytological smears obtained from the cervix are routinely examined under the microscope as part of screening programs for the early detection of cervical cancer. The aim of the present study was to investigate whether a simple feature extraction approach using only standard image processing techniques combined with a neural classifier would lead to acceptable results that might serve as a starting point for the development of a fully automated screening system. MATERIALS AND METHODS: Gray-value images of 106 cervical smears (512 x 512 pixels) divided into two groups--inconspicuous (57) and atypical (49)--by an experienced pathologist on the basis of the original smears were employed to evaluate the method. From these images, 31 features quantifying properties of either the cell nucleus or the cytoplasm were extracted. These features were categorized with three different architectures of a neural classifier: learning vector quantization (LVQ), multilayer perceptron (MLP) and a single perceptron. CONCLUSIONS: The results show a reclassification accuracy of about 91% for all three algorithms. Sensitivity was uniform at approximately 78%, and specificity varied between 75% and 91% in the leave-one-out evaluation. These very good results provide strong encouragement for further studies involving PAP scores and colour images.  相似文献   

10.
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features—structure, sequence, modularity, structural robustness and coding potential—to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.  相似文献   

11.
Banerjee AK  M S  M N  Murty US 《Bioinformation》2010,4(10):456-462
Biological systems are highly organized and enormously coordinated maintaining greater complexity. The increment of secondary data generation and progress of modern mining techniques provided us an opportunity to discover hidden intra and inter relations among these non linear dataset. This will help in understanding the complex biological phenomenon with greater efficiency. In this paper we report comparative classification of Pyruvate Dehydrogenase protein sequences from bacterial sources based on 28 different physicochemical parameters (such as bulkiness, hydrophobicity, total positively and negatively charged residues, α helices, β strand etc.) and 20 type amino acid compositions. Logistic, MLP (Multi Layer Perceptron), SMO (Sequential Minimal Optimization), RBFN (Radial Basis Function Network) and SL (simple logistic) methods were compared in this study. MLP was found to be the best method with maximum average accuracy of 88.20%. Same dataset was subjected for clustering using 2*2 grid of a two dimensional SOM (Self Organizing Maps). Clustering analysis revealed the proximity of the unannotated sequences with the Mycobacterium and Synechococcus genus.  相似文献   

12.
Background: Computational tools have been widely used in drug discovery process since they reduce the time and cost. Prediction of whether a protein is druggable is fundamental and crucial for drug research pipeline. Sequence based protein function prediction plays vital roles in many research areas. Training data, protein features selection and machine learning algorithms are three indispensable elements that drive the successfulness of the models. Methods: In this study, we tested the performance of different combinations of protein features and machine learning algorithms, based on FDA-approved small molecules’ targets, in druggable proteins prediction. We also enlarged the dataset to include the targets of small molecules that were in experiment or clinical investigation. Results: We found that although the 146-d vector used by Li et al. with neuron network achieved the best training accuracy of 91.10%, overlapped 3-gram word2vec with logistic regression achieved best prediction accuracy on independent test set (89.55%) and on newly approved-targets. Enlarged dataset with targets of small molecules in experiment and clinical investigation were trained. Unfortunately, the best training accuracy was only 75.48%. In addition, we applied our models to predict potential targets for references in future study. Conclusions: Our study indicates the potential ability of word2vec in the prediction of druggable protein. And the training dataset of druggable protein should not be extended to targets that are lack of verification. The target prediction package could be found on https://github.com/pkumdl/target_prediction.  相似文献   

13.

Introduction

The purpose of this study was to explore a data set of patients with fibromyalgia (FM), rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) who completed the Revised Fibromyalgia Impact Questionnaire (FIQR) and its variant, the Symptom Impact Questionnaire (SIQR), for discriminating features that could be used to differentiate FM from RA and SLE in clinical surveys.

Methods

The frequency and means of comparing FM, RA and SLE patients on all pain sites and SIQR variables were calculated. Multiple regression analysis was then conducted to identify the significant pain sites and SIQR predictors of group membership. Thereafter stepwise multiple regression analysis was performed to identify the order of variables in predicting their maximal statistical contribution to group membership. Partial correlations assessed their unique contribution, and, last, two-group discriminant analysis provided a classification table.

Results

The data set contained information on the SIQR and also pain locations in 202 FM, 31 RA and 20 SLE patients. As the SIQR and pain locations did not differ much between the RA and SLE patients, they were grouped together (RA/SLE) to provide a more robust analysis. The combination of eight SIQR items and seven pain sites correctly classified 99% of FM and 90% of RA/SLE patients in a two-group discriminant analysis. The largest reported SIQR differences (FM minus RA/SLE) were seen for the parameters "tenderness to touch," "difficulty cleaning floors" and "discomfort on sitting for 45 minutes." Combining the SIQR and pain locations in a stepwise multiple regression analysis revealed that the seven most important predictors of group membership were mid-lower back pain (29%; 79% vs. 16%), tenderness to touch (11.5%; 6.86 vs. 3.02), neck pain (6.8%; 91% vs. 39%), hand pain (5%; 64% vs. 77%), arm pain (3%; 69% vs. 18%), outer lower back pain (1.7%; 80% vs. 22%) and sitting for 45 minutes (1.4%; 5.56 vs. 1.49).

Conclusions

A combination of two SIQR questions ("tenderness to touch" and "difficulty sitting for 45 minutes") plus pain in the lower back, neck, hands and arms may be useful in the construction of clinical questionnaires designed for patients with musculoskeletal pain. This combination provided the correct diagnosis in 97% of patients, with only 7 of 253 patients misclassified.  相似文献   

14.
15.
16.
Singh A  Beveridge AJ  Singh N 《PloS one》2011,6(3):e16804
Sporadic Creutzfeldt-Jakob-disease (sCJD) is a fatal neurodegenerative condition that escapes detection until autopsy. Recently, brain iron dyshomeostasis accompanied by increased transferrin (Tf) was reported in sCJD cases. The consequence of this abnormality on cerebrospinal-fluid (CSF) levels of Tf is uncertain. We evaluated the accuracy of CSF Tf, a 'new' biomarker, as a pre-mortem diagnostic test for sCJD when used alone or in combination with the 'current' biomarker total-tau (T-tau). Levels of total-Tf (T-Tf), isoforms of Tf (Tf-1 and Tf-β2), and iron saturation of Tf were quantified in CSF collected 0.3-36 months before death (duration) from 99 autopsy confirmed sCJD (CJD+) and 75 confirmed cases of dementia of non-CJD origin (CJD-). Diagnostic accuracy was estimated by non-parametric tests, logistic regression, and receiver operating characteristic (ROC) analysis. Area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values (PV), and likelihood ratios (LR) of each biomarker and biomarker combination were calculated. We report that relative to CJD-, CJD+ cases had lower median CSF T-Tf (125,7093 vs. 217,7893) and higher T-tau (11530 vs. 1266) values. AUC was 0.90 (95% confidence interval (CI), 0.85-0.94) for T-Tf, and 0.93 (95% CI, 0.89-0.97) for T-Tf combined with T-tau. With cut-offs defined to achieve a sensitivity of ~85%, T-Tf identified CJD+ cases with a specificity of 71.6% (95% CI, 59.1-81.7), positive LR of 3.0 (95% CI, 2.1-4.5), negative LR of 0.2 (95% CI, 0.1-0.3), and accuracy of 80.1%. The effect of patient age and duration was insignificant. T-Tf combined with T-tau identified CJD+ with improved specificity of 87.5% (95%CI, 76.3-94.1), positive LR of 6.8 (95% CI, 3.5-13.1), negative LR of 0.2 (95% CI, 0.1-0.3), positive-PV of 91.0%, negative-PV of 80.0%, and accuracy of 86.2%. Thus, CSF T-Tf, a new biomarker, when combined with the current biomarker T-tau, is a reliable pre-mortem diagnostic test for sCJD.  相似文献   

17.
Significance testing for correlated binary outcome data   总被引:1,自引:0,他引:1  
B Rosner  R C Milton 《Biometrics》1988,44(2):505-512
Multiple logistic regression is a commonly used multivariate technique for analyzing data with a binary outcome. One assumption needed for this method of analysis is the independence of outcome for all sample points in a data set. In ophthalmologic data and other types of correlated binary data, this assumption is often grossly violated and the validity of the technique becomes an issue. A technique has been developed (Rosner, 1984) that utilizes a polychotomous logistic regression model to allow one to look at multiple exposure variables in the context of a correlated binary data structure. This model is an extension of the beta-binomial model, which has been widely used to model correlated binary data when no covariates are present. In this paper, a relationship is developed between the two techniques, whereby it is shown that use of ordinary logistic regression in the presence of correlated binary data can result in true significance levels that are considerably larger than nominal levels in frequently encountered situations. This relationship is explored in detail in the case of a single dichotomous exposure variable. In this case, the appropriate test statistic can be expressed as an adjusted chi-square statistic based on the 2 X 2 contingency table relating exposure to outcome. The test statistic is easily computed as a function of the ordinary chi-square statistic and the correlation between eyes (or more generally between cluster members) for outcome and exposure, respectively. This generalizes some previous results obtained by Koval and Donner (1987, in Festschrift for V. M. Joshi, I. B. MacNeill (ed.), Vol. V, 199-224.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

18.
Lameness is one of the costliest health problems, as well as a welfare concern in dairy cows. However, it is difficult to detect cows with possible lameness, or the ones that are at risk of becoming lame e.g. in the next week or so. In this study, we investigated the ability of three machine learning algorithms, Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP), to predict cases of lameness using milk production and conformation traits. The performance of these algorithms was compared with logistic regression (LR) as the gold standard approach for binary classification. We had a total of 2 535 lameness scores (2 248 sound and 287 unsound) and 29 predictor features from nine dairy herds in Australia to predict lameness incidence. Training was done on 80% of the data within each herd with the remainder used as validation set. Our results indicated that in terms of area under curve of receiver operating characteristics, there were negligible differences between LR (0.67) and NB (0.66) while MLP (0.62) and RF (0.61) underperformed compared to the other two methods. However, the F1-score in NB (27%) outperformed LR (1%), suggesting that NB could potentially be a more reliable method for the prediction of lameness in practice, given enough relevant data are available for proper training, which was a limitation in this study. Considering the small size of our dataset, lack of information about environmental conditions prior to the incidence of lameness, management practices, short time gap between production records and lameness scoring, and farm information, this study proved the concept of using machine learning predictive models to predict the incidence of lameness a priori to its occurrence and thus may become a valuable decision support system for better lameness management in precision dairy farming.  相似文献   

19.
20.
OBJECTIVE: To determine the observer variability in reporting fibroadenoma of the breast by fine needle aspiration (FNA) and to review the cytomorphological features of the lesion with cytohistological correlation. METHODS: Retrospective analysis of FNA smears from 110 cases diagnosed as fibroadenoma of which surgical pathology follow-up was available in 33. Two pathologists were asked to categorize smears from 67 cases of breast lesions while blinded to the clinical finding as fibroadenoma, epithelial hyperplasia (usual and atypical) and malignant. All fibroadenoma (33) and cancer (15) cases were biopsy-proven. The same set of slides was re-circulated to one of the pathologists, and his first and second round results were compared. RESULTS: Pre-review cytohistological correlation was attained in 32 of 33 cases of fibroadenoma (97%). The overall agreement between the two observers was 87% [Kappa = 0.74, 95% confidence interval (CI) 0.72-0.76]. Cytohistological correlation was achieved in 26 of 33 (79%) cases. Intra-observer agreement was 91% (Kappa = 0.82, 95% CI 0.89-0.93) with cytohistological correlation in 29 of 33 (87%) cases. Causes of diagnostic errors included marked dissociation, pleomorphism, poorly cellular smears from hyalinized fibrodenoma, lacational changes and apocrine metaplasia with cystic changes. Multinucleated giant cells were frequently encountered in FNA smears from fibroadenoma (31.8%), but in none of the lumpectomy specimens. Their histiocytic nature was suggested by immunohistochemistry. CONCLUSION: FNA was a highly sensitive method for the diagnosis of fibroadenoma. Current cytological criteria were reliable and gave high inter- and intra-observer reproducibility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号