首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 182 毫秒
1.
2.
S.B. Akben 《IRBM》2018,39(5):353-358

Background

Chronic kidney disease (CKD) is a disorder associated with breakdown of kidney structure and function. CKD can be diagnosed in its early stage only by experienced nephrologists and urologists (medical experts) using the disease history, symptoms and laboratory tests. There are few studies related to the automatic diagnosis of CKD in the literature. However, these methods are not adequate to help the medical experts.

Methods

In this study, a new method was proposed to automatically diagnose the chronic kidney disease in its early stage. The method aims to help the medical diagnosis utilizing the results of urine test, blood test and disease history. Classification algorithms were used as the data mining methods. In the method section of the study, analysis data were first subjected to pre-processing. In the first phase of the method section of the study, pre-processing was applied to CKD data. K-Means clustering method was used as the pre-processing method. Then, the classification methods (KNN, SVM, and Naïve Bayes) were applied to pre-processed data to diagnose the CKD.

Results

Highest success rate obtained by classification methods is 97.8% (98.2% for ages 35 and older). This result showed that the data mining methods are useful for automatic diagnosis of CKD in its early stage.

Conclusion

A new automatic early stage CKD diagnosis method was proposed to help the medical doctors. Attributes that would provide the highest diagnosis success rate were the use of specific gravity, albumin, sugar and red blood cells together. Also, the relation between the success rate of automatic diagnosis method and age was identified.  相似文献   

3.

Background

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations.

Results

First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms.

Conclusion

By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.
  相似文献   

4.

Objectives

Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants.

Methods

In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n?=?39, 58?±?17.5 μM change) and non-rapid progressors (NRP, n?=?343, 5.8?±?20 μM change, p?<?0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score.

Results

The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance.

Conclusions

By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.
  相似文献   

5.

Background

As individual naïve CD4 T lymphocytes circulate in the body after emerging from the thymus, they are likely to have individually varying microenvironmental interactions even in the absence of stimulation via specific target recognition. It is not clear if these interactions result in alterations in their activation, survival and effector programming. Naïve CD4 T cells show unimodal distribution for many phenotypic properties, suggesting that the variation is caused by intrinsic stochasticity, although underlying variation due to subsets created by different histories of microenvironmental interactions remains possible. To explore this possibility, we began examining the phenotype and functionality of naïve CD4 T cells differing in a basic unimodally distributed property, the CD4 levels, as well as the causal origin of these differences.

Results

We examined separated CD4hi and CD4lo subsets of mouse naïve CD4 cells. CD4lo cells were smaller with higher CD5 levels and lower levels of the dual-specific phosphatase (DUSP)6-suppressing micro-RNA miR181a, and responded poorly with more Th2-skewed outcomes. Human naïve CD4lo and CD4hi cells showed similar differences. Naïve CD4lo and CD4hi subsets of thymic single-positive CD4 T cells did not show differences whereas peripheral naïve CD4lo and CD4hi subsets of T cell receptor (TCR)-transgenic T cells did. Adoptive transfer-mediated parking of naïve CD4 cells in vivo lowered CD4 levels, increased CD5 and reactive oxygen species (ROS) levels and induced hyporesponsiveness in them, dependent, at least in part, on availability of major histocompatibility complex class II (MHCII) molecules. ROS scavenging or DUSP inhibition ameliorated hyporesponsiveness. Naïve CD4 cells from aged mice showed lower CD4 levels and cell sizes, higher CD5 levels, and hyporesponsiveness and Th2-skewing reversed by DUSP inhibition.

Conclusions

Our data show that, underlying a unimodally distributed property, the CD4 level, there are subsets of naïve CD4 cells that vary in the time spent in the periphery receiving MHCII-mediated signals and show resultant alteration of phenotype and functionality via ROS and DUSP activity. Our findings also suggest the feasibility of potential pharmacological interventions for improved CD4 T cell responses during vaccination of older people via either anti-oxidant or DUSP inhibitor small molecules.
  相似文献   

6.
Pluripotency is a unique property of stem cells that allows them to differentiate into all types of adult cells or maintain the self-renewal property. PluriPred predicts whether a protein is involved in pluripotency from primary protein sequence using manually curated pluripotent proteins as training datasets. Machine learning techniques (MLTs) such as Support Vector Machine (SVM), Naïve Base (NB), Random Forest (RF), and sequence alignment technique BLAST were used in our study. The combination of SVM and PSI-BLAST was our proposed best model, which obtained a sensitivity of 77.40%, specificity of 79.72%, accuracy of 79.2%, and area under the ROC curve was 0.82 using 5-fold cross-validation. Furthermore, PluriPred gives the confidence of the prediction from training dataset’s SVM score distribution and p-value from BLAST. We validated our proposed model with the other existing high-throughput studies using blind/independent datasets. Using PluriPred, 233 novel core and 323 novel extended core pluripotent proteins from mouse proteome, and 167 novel core and 385 extended core pluripotent proteins from human proteome, were predicted with high confidence. The Web application of PluriPred is available from bicresources.jcbose.ac.in/ssaha4/pluripred/. Many pluripotent genes/proteins take part in protein-protein networks associated with stem cell, cancer, and developmental biology, and we believe that PluriPred will help in these research.  相似文献   

7.

Background

Long noncoding RNAs (lncRNAs) are widely involved in the initiation and development of cancer. Although some computational methods have been proposed to identify cancer-related lncRNAs, there is still a demanding to improve the prediction accuracy and efficiency. In addition, the quick-update data of cancer, as well as the discovery of new mechanism, also underlay the possibility of improvement of cancer-related lncRNA prediction algorithm. In this study, we introduced CRlncRC, a novel Cancer-Related lncRNA Classifier by integrating manifold features with five machine-learning techniques.

Results

CRlncRC was built on the integration of genomic, expression, epigenetic and network, totally in four categories of features. Five learning techniques were exploited to develop the effective classification model including Random Forest (RF), Naïve bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR) and K-Nearest Neighbors (KNN). Using ten-fold cross-validation, we showed that RF is the best model for classifying cancer-related lncRNAs (AUC?=?0.82). The feature importance analysis indicated that epigenetic and network features play key roles in the classification. In addition, compared with other existing classifiers, CRlncRC exhibited a better performance both in sensitivity and specificity. We further applied CRlncRC to lncRNAs from the TANRIC (The Atlas of non-coding RNA in Cancer) dataset, and identified 121 cancer-related lncRNA candidates. These potential cancer-related lncRNAs showed a certain kind of cancer-related indications, and many of them could find convincing literature supports.

Conclusions

Our results indicate that CRlncRC is a powerful method for identifying cancer-related lncRNAs. Machine-learning-based integration of multiple features, especially epigenetic and network features, had a great contribution to the cancer-related lncRNA prediction. RF outperforms other learning techniques on measurement of model sensitivity and specificity. In addition, using CRlncRC method, we predicted a set of cancer-related lncRNAs, all of which displayed a strong relevance to cancer as a valuable conception for the further cancer-related lncRNA function studies.
  相似文献   

8.
9.
Personalized medicine aims to identify those patients who have good or poor prognosis for overall disease outcomes or therapeutic efficacy for a specific treatment. A well-established approach is to identify a set of biomarkers using statistical methods with a classification algorithm to identify patient subgroups for treatment selection. However, there are potential false positives and false negatives in classification resulting in incorrect patient treatment assignment. In this paper, we propose a hybrid mixture model taking uncertainty in class labels into consideration, where the class labels are modeled by a Bernoulli random variable. An EM algorithm was developed to estimate the model parameters, and a parametric bootstrap method was used to test the significance of the predictive variables that were associated with subgroup memberships. Simulation experiments showed that the proposed method averagely had higher accuracy in identifying the subpopulations than the Naïve Bayes classifier and logistic regression. A breast cancer dataset was analyzed to illustrate the proposed hybrid mixture model.  相似文献   

10.
Glioblastoma multiforme (GBM) or grade IV astrocytoma is the most common and lethal adult malignant brain tumor. The present study was conducted to investigate the alterations in the serum proteome in GBM patients compared to healthy controls. Comparative proteomic analysis was performed employing classical 2DE and 2D‐DIGE combined with MALDI TOF/TOF MS and results were further validated through Western blotting and immunoturbidimetric assay. Comparison of the serum proteome of GBM and healthy subjects revealed 55 differentially expressed and statistically significant (p <0.05) protein spots. Among the identified proteins, haptoglobin, plasminogen precursor, apolipoprotein A‐1 and M, and transthyretin are very significant due to their functional consequences in glioma tumor growth and migration, and could further be studied as glioma biomarkers and grade‐specific protein signatures. Analysis of the lipoprotein pattern indicated elevated serum levels of cholesterol, triacylglycerol, and low‐density lipoproteins in GBM patients. Functional pathway analysis was performed using multiple software including ingenuity pathway analysis (IPA), protein analysis through evolutionary relationships (PANTHER), database for annotation, visualization and integrated discovery (DAVID), and GeneSpring to investigate the biological context of the identified proteins, which revealed the association of candidate proteins in a few essential physiological pathways such as intrinsic prothrombin activation pathway, plasminogen activating cascade, coagulation system, glioma invasiveness signaling, and PI3K signaling in B lymphocytes. A subset of the differentially expressed proteins was applied to build statistical sample class prediction models for discrimination of GBM patients and healthy controls employing partial least squares discriminant analysis (PLS‐DA) and other machine learning methods such as support vector machine (SVM), Decision Tree and Naïve Bayes, and excellent discrimination between GBM and control groups was accomplished.  相似文献   

11.

Background

High resolution mass spectrometry has been employed to rapidly and accurately type and subtype influenza viruses. The detection of signature peptides with unique theoretical masses enables the unequivocal assignment of the type and subtype of a given strain. This analysis has, to date, required the manual inspection of mass spectra of whole virus and antigen digests.

Results

A computer algorithm, FluTyper, has been designed and implemented to achieve the automated analysis of MALDI mass spectra recorded for proteolytic digests of the whole influenza virus and antigens. FluTyper incorporates the use of established signature peptides and newly developed naïve Bayes classifiers for four common influenza antigens, hemagglutinin, neuraminidase, nucleoprotein, and matrix protein 1, to type and subtype the influenza virus based on their detection within proteolytic peptide mass maps. Theoretical and experimental testing of the classifiers demonstrates their applicability at protein coverage rates normally achievable in mass mapping experiments. The application of FluTyper to whole virus and antigen digests of a range of different strains of the influenza virus is demonstrated.

Conclusions

FluTyper algorithm facilitates the rapid and automated typing and subtyping of the influenza virus from mass spectral data. The newly developed naïve Bayes classifiers increase the confidence of influenza virus subtyping, especially where signature peptides are not detected. FluTyper is expected to popularize the use of mass spectrometry to characterize influenza viruses.  相似文献   

12.
Ahmed A  Gohlke H 《Proteins》2006,63(4):1038-1051
The development of a two-step approach for multiscale modeling of macromolecular conformational changes is based on recent developments in rigidity and elastic network theory. In the first step, static properties of the macromolecule are determined by decomposing the molecule into rigid clusters by using the graph-theoretical approach FIRST and an all-atom representation of the protein. In this way, rigid clusters are not limited to consist of residues adjacent in sequence or secondary structure elements as in previous studies. Furthermore, flexible links between rigid clusters are identified and can be modeled as such subsequently. In the second step, dynamical properties of the molecule are revealed by the rotations-translations of blocks approach (RTB) using an elastic network model representation of the coarse-grained protein. In this step, only rigid body motions are allowed for rigid clusters, whereas links between them are treated as fully flexible. The approach was tested on a data set of 10 proteins that showed conformational changes on ligand binding. For efficiency, coarse-graining the protein results in a remarkable reduction of memory requirements and computational times by factors of 9 and 27 on average and up to 25 and 125, respectively. For accuracy, directions and magnitudes of motions predicted by our approach agree well with experimentally determined ones, despite embracing in extreme cases >50% of the protein into one rigid cluster. In fact, the results of our method are in general comparable with when no or a uniform coarse-graining is applied; and the results are superior if the movement is dominated by loop or fragment motions. This finding indicates that explicitly distinguishing between flexible and rigid regions is advantageous when using a simplified protein representation in the second step. Finally, motions of atoms in rigid clusters are also well predicted by our approach, which points to the need to consider mobile protein regions in addition to flexible ones when modeling correlated motions.  相似文献   

13.

Background

Intrinsically Disordered Proteins (IDPs) lack an ordered three-dimensional structure and are enriched in various biological processes. The Molecular Recognition Features (MoRFs) are functional regions within IDPs that undergo a disorder-to-order transition on binding to a partner protein. Identifying MoRFs in IDPs using computational methods is a challenging task.

Methods

In this study, we introduce hidden Markov model (HMM) profiles to accurately identify the location of MoRFs in disordered protein sequences. Using windowing technique, HMM profiles are utilised to extract features from protein sequences and support vector machines (SVM) are used to calculate a propensity score for each residue. Two different SVM kernels with high noise tolerance are evaluated with a varying window size and the scores of the SVM models are combined to generate the final propensity score to predict MoRF residues. The SVM models are designed to extract maximal information between MoRF residues, its neighboring regions (Flanks) and the remainder of the sequence (Others).

Results

To evaluate the proposed method, its performance was compared to that of other MoRF predictors; MoRFpred and ANCHOR. The results show that the proposed method outperforms these two predictors.

Conclusions

Using HMM profile as a source of feature extraction, the proposed method indicates improvement in predicting MoRFs in disordered protein sequences.
  相似文献   

14.

Background

The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins.

Results

We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF).

Conclusions

Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins.
  相似文献   

15.

Background  

Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).  相似文献   

16.
Xu F  Li G  Zhao C  Li Y  Li P  Cui J  Deng Y  Shi T 《BMC genomics》2010,11(Z2):S2

Background

Many essential cellular processes, such as cellular metabolism, transport, cellular metabolism and most regulatory mechanisms, rely on physical interactions between proteins. Genome-wide protein interactome networks of yeast, human and several other animal organisms have already been established, but this kind of network reminds to be established in the field of plant.

Results

We first predicted the protein protein interaction in Arabidopsis thaliana with methods, including ortholog, SSBP, gene fusion, gene neighbor, phylogenetic profile, coexpression, protein domain, and used Naïve Bayesian approach next to integrate the results of these methods and text mining data to build a genome-wide protein interactome network. Furthermore, we adopted the data of GO enrichment analysis, pathway, published literature to validate our network, the confirmation of our network shows the feasibility of using our network to predict protein function and other usage.

Conclusions

Our interactome is a comprehensive genome-wide network in the organism plant Arabidopsis thaliana, and provides a rich resource for researchers in related field to study the protein function, molecular interaction and potential mechanism under different conditions.
  相似文献   

17.

Background

Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context.

Methods

This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models'' predictive ability.

Results

The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data.

Conclusions

The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.  相似文献   

18.

Real-time accurate traffic congestion prediction can enable Intelligent traffic management systems (ITMSs) that replace traditional systems to improve the efficiency of traffic and reduce traffic congestion. The ITMS consists of three main layers, which are: Internet of Things (IoT), edge, and cloud layers. Edge can collect real-time data from different routes through IoT devices such as wireless sensors, and then it can compute and store this collected data before transmitting them to the cloud for further processing. Thus, an edge is an intermediate layer between IoT and cloud layers that can receive the transmitted data through IoT to overcome cloud challenges such as high latency. In this paper, a novel real-time traffic congestion prediction strategy (TCPS) is proposed based on the collected data in the edge’s cache server at the edge layer. The proposed TCPS contains three stages, which are: (i) real-time congestion prediction (RCP) stage, (ii) congestion direction detection (CD2) stage, and (iii) width change decision (WCD) stage. The RCP aims to predict traffic congestion based on the causes of congestion in the hotspot using a fuzzy inference system. If there is congestion, the CD2 stage is used to detect the congestion direction based on the predictions from the RCP by using the Optimal Weighted Naïve Bayes (OWNB) method. The WCD stage aims to prevent the congestion occurrence in which it is used to change the width of changeable routes (CR) after detecting the direction of congestion in CD2. The experimental results have shown that the proposed TCPS outperforms other recent methodologies. TCPS provides the highest accuracy, precision, and recall. Besides, it provides the lowest error, with values equal to 95%, 74%, 75%, and 5% respectively.

  相似文献   

19.

The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers—decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron—were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

  相似文献   

20.
《IRBM》2022,43(5):434-446
ObjectiveThe initial principal task of a Brain-Computer Interfacing (BCI) research is to extract the best feature set from a raw EEG (Electroencephalogram) signal so that it can be used for the classification of two or multiple different events. The main goal of the paper is to develop a comparative analysis among different feature extraction techniques and classification algorithms.Materials and methodsIn this present investigation, four different methodologies have been adopted to classify the recorded MI (motor imagery) EEG signal, and their comparative study has been reported. Haar Wavelet Energy (HWE), Band Power, Cross-correlation, and Spectral Entropy (SE) based Cross-correlation feature extraction techniques have been considered to obtain the necessary features set from the raw EEG signals. Four different machine learning algorithms, viz. LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis), Naïve Bayes, and Decision Tree, have been used to classify the features.ResultsThe best average classification accuracies are 92.50%, 93.12%, 72.26%, and 98.71% using the four methods. Further, these results have been compared with some recent existing methods.ConclusionThe comparative results indicate a significant accuracy level performance improvement of the proposed methods with respect to the existing one. Hence, this presented work can guide to select the best feature extraction method and the classifier algorithm for MI-based EEG signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号