首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.  相似文献   

2.
Qiu P 《PloS one》2012,7(5):e37038
Flow cytometry provides multi-dimensional data at the single-cell level. Such data contain information about the cellular heterogeneity of bulk samples, making it possible to correlate single-cell features with phenotypic properties of bulk tissues. Predicting phenotypes from single-cell measurements is a difficult challenge that has not been extensively studied. The 6th Dialogue for Reverse Engineering Assessments and Methods (DREAM6) invited the research community to develop solutions to a computational challenge: classifying acute myeloid leukemia (AML) positive patients and healthy donors using flow cytometry data. DREAM6 provided flow cytometry data for 359 normal and AML samples, and the class labels for half of the samples. Researchers were asked to predict the class labels of the remaining half. This paper describes one solution that was constructed by combining three algorithms: spanning-tree progression analysis of density-normalized events (SPADE), earth mover's distance, and a nearest-neighbor classifier called Relief. This solution was among the top-performing methods that achieved 100% prediction accuracy.  相似文献   

3.
BACKGROUND: Recent advances in flow cytometry have resulted in the development of reliable techniques for performing polychromatic (5-17 color) flow cytometry analysis. However, the data reduction and analysis involved in the resolution of hundreds of possible cellular subphenotypes identified, using a single polychromatic flow cytometry staining panel, presents a major obstacle to the successful application of this technology. METHODS: To generate two distinct collections of T cell populations with differentially expressed surface markers, cryopreserved lymph node cells from 5 melanoma patients vaccinated with the modified gp100(209-2M) melanoma peptide were stimulated with cognate peptide and cultured in either IL-21 + low-dose IL-2 or IL-15 + low-dose IL-2. In vitro stimulated (IVS) cells were interrogated using 8-color flow cytometry. Data were analyzed using Winlist Hyperlog and FCOM software, and 32 T cell subsets were resolved for each culture condition. Hierarchical clustering analysis was applied to the relative percentages of each subphenotype for both IVS conditions to determine if unique cell surface marker expression signatures were produced for each IVS culture. RESULTS: Sequential data analysis using Hyperlog and FCOM demonstrated that lymphocytes cultured in IL-21 + IL-2 had a distinctively different set of subphenotype signatures compared to cells grown in IL-15 + IL-2 for all 5 patients. Importantly, subsequent cluster analysis of all 32 subphenotype frequencies in each IVS test condition for all 5 patients reproducibly demonstrated that cellular subphenotypes produced after IL-21 + IL-2 IVS partitioned separately from subphenotypes produced by IL-15 + IL-2 IVS. CONCLUSIONS: The integrated sequential use of Hyperlog and FCOM software with cluster analysis algorithms for the reduction and analysis of polychromatic flow cytometry data produces an effective, rapid technique for the assessment of complex patterns of subphenotype expression between and within multiple test samples. This approach to data analysis may enhance the use of polychromatic flow cytometry for both research and clinical applications.  相似文献   

4.
基于机器学习的肠道菌群数据建模与分析研究综述   总被引:1,自引:0,他引:1  
人体肠道菌群与人类的健康和疾病存在密切关系,对肠道菌群的宏基因组数据进行建模和分析,在疾病预测及诊断相关领域科学研究和社会应用方面均具有重要意义。本文从大数据分析和机器学习的角度,对人体肠道菌群数据的建模、分析和预测算法的原理、过程以及典型研究应用实例进行综述,以期推动肠道菌群分析相关研究发展以及探索结合机器学习算法进行肠道菌群分析的有效方式,同时也为开发基于肠道菌群数据的新型诊疗手段提供借鉴,推动我国精准医疗事业发展。  相似文献   

5.
Biomarkers which can identify Diffuse Large B-Cell Lymphoma (DLBCL) likely to be refractory to first-line therapy are essential for selecting this population prior to therapy initiation to offer alternate therapeutic options that can improve prognosis. We tested the ability of a CT-based radiomics approach with machine learning to predict Primary Treatment Failure (PTF)-DLBCL from initial imaging evaluation. Twenty-six refractory patients were matched to 26 non-refractory patients, yielding 180 lymph nodes for analysis. Manual 3D delineation of the total node volume was performed by two independent readers to test the reproducibility. Then, 1218 hand-crafted radiomic features were extracted. The Random Forests machine learning approach was used as a classifier for constructing the prediction models. Seventy percent of the nodes were randomly assigned to a training set and the remaining 30% were assigned to an independent test set. The final model was tested on the dataset from the 2 readers, showing a mean accuracy, sensitivity and specificity of 73%, 62% and 82%, respectively, for distinguishing between refractory and non-refractory patients. The area under the receiver operating characteristic curve (AUC) was 0.83 and 0.79 for the two readers. We conclude that machine learning CT-based radiomics analysis is able to identify a priori PTF-DLBCL with a good accuracy.  相似文献   

6.
The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a “wisdom of the crowd” approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.  相似文献   

7.

Background

The diagnosis of malignant hematologic diseases has become increasingly complex during the last decade. It is based on the interpretation of results from different laboratory analyses, which range from microscopy to gene expression profiling. Recently, a method for the analysis of RNA phenotypes has been developed, the nCounter technology (Nanostring® Technologies), which allows for simultaneous quantification of hundreds of RNA molecules in biological samples. We evaluated this technique in a Swiss multi-center study on eighty-six samples from acute leukemia patients.

Methods

mRNA and protein profiles were established for normal peripheral blood and bone marrow samples. Signal intensities of the various tested antigens with surface expression were similar to those found in previously performed Affymetrix microarray analyses. Acute leukemia samples were analyzed for a set of twenty-two validated antigens and the Pearson Correlation Coefficient for nCounter and flow cytometry results was calculated.

Results

Highly significant values between 0.40 and 0.97 were found for the twenty-two antigens tested. A second correlation analysis performed on a per sample basis resulted in concordant results between flow cytometry and nCounter in 44–100% of the antigens tested (mean = 76%), depending on the number of blasts present in a sample, the homogeneity of the blast population, and the type of leukemia (AML or ALL).

Conclusions

The nCounter technology allows for fast and easy depiction of a mRNA profile from hematologic samples. This technology has the potential to become a valuable tool for the diagnosis of acute leukemias, in addition to multi-color flow cytometry.  相似文献   

8.
The purpose of this narrative review is to provide a critical reflection of how analytical machine learning approaches could provide the platform to harness variability of patient presentation to enhance clinical prediction. The review includes a summary of current knowledge on the physiological adaptations present in people with spinal pain. We discuss how contemporary evidence highlights the importance of not relying on single features when characterizing patients given the variability of physiological adaptations present in people with spinal pain. The advantages and disadvantages of current analytical strategies in contemporary basic science and epidemiological research are reviewed and we consider how analytical machine learning approaches could provide the platform to harness the variability of patient presentations to enhance clinical prediction of pain persistence or recurrence. We propose that machine learning techniques can be leveraged to translate a potentially heterogeneous set of variables into clinically useful information with the potential to enhance patient management.  相似文献   

9.
This study was to explore the feasibility of prediction and classification of cells in different stages of apoptosis with a stain-free method based on diffraction images and supervised machine learning. Apoptosis was induced in human chronic myelogenous leukemia K562 cells by cis-platinum (DDP). A newly developed technique of polarization diffraction imaging flow cytometry (p-DIFC) was performed to acquire diffraction images of the cells in three different statuses (viable, early apoptotic and late apoptotic/necrotic) after cell separation through fluorescence activated cell sorting with Annexin V-PE and SYTOX® Green double staining. The texture features of the diffraction images were extracted with in-house software based on the Gray-level co-occurrence matrix algorithm to generate datasets for cell classification with supervised machine learning method. Therefore, this new method has been verified in hydrogen peroxide induced apoptosis model of HL-60. Results show that accuracy of higher than 90% was achieved respectively in independent test datasets from each cell type based on logistic regression with ridge estimators, which indicated that p-DIFC system has a great potential in predicting and classifying cells in different stages of apoptosis.  相似文献   

10.
Single-cell network profiling (SCNP) data generated from multi-parametric flow cytometry analysis of bone marrow (BM) and peripheral blood (PB) samples collected from patients >55 years old with non-M3 AML were used to train and validate a diagnostic classifier (DXSCNP) for predicting response to standard induction chemotherapy (complete response [CR] or CR with incomplete hematologic recovery [CRi] versus resistant disease [RD]). SCNP-evaluable patients from four SWOG AML trials were randomized between Training (N = 74 patients with CR, CRi or RD; BM set = 43; PB set = 57) and Validation Analysis Sets (N = 71; BM set = 42, PB set = 53). Cell survival, differentiation, and apoptosis pathway signaling were used as potential inputs for DXSCNP. Five DXSCNP classifiers were developed on the SWOG Training set and tested for prediction accuracy in an independent BM verification sample set (N = 24) from ECOG AML trials to select the final classifier, which was a significant predictor of CR/CRi (area under the receiver operating characteristic curve AUROC = 0.76, p = 0.01). The selected classifier was then validated in the SWOG BM Validation Set (AUROC = 0.72, p = 0.02). Importantly, a classifier developed using only clinical and molecular inputs from the same sample set (DXCLINICAL2) lacked prediction accuracy: AUROC = 0.61 (p = 0.18) in the BM Verification Set and 0.53 (p = 0.38) in the BM Validation Set. Notably, the DXSCNP classifier was still significant in predicting response in the BM Validation Analysis Set after controlling for DXCLINICAL2 (p = 0.03), showing that DXSCNP provides information that is independent from that provided by currently used prognostic markers. Taken together, these data show that the proteomic classifier may provide prognostic information relevant to treatment planning beyond genetic mutations and traditional prognostic factors in elderly AML.  相似文献   

11.
The diagnosis of acute myeloblastic leukaemia (AML) is based on cell morphology, cytogenetic and molecular changes, cell markers and clinical data. Our aim was to establish whether morphology and cell markers are comparable in the evaluation of AML. Bone marrow smears were analysed, and flow cytometry and monoclonal antibodies were used to determine cell type and maturity. Morphology and cell markers correlated differently in different AML subtypes.  相似文献   

12.
Quantitative methods for interpretation of flow cytometry DNA histograms are required for the widespread clinical use of this technology. The usefulness of a histogram analysis technique in this setting requires that it be operator independent, easy to implement in a clinical laboratory, and provide high sensitivity to the desired information. Additionally, the technique must be tolerant of the relatively low signal-to-noise ratios often found in DNA distributions obtained from clinical samples. Among the factors that have been used to assess the malignant potential of tumors are the presence of an aneuploid population, the proportion of hyperdiploid cells, the width of the G1 peak, the DNA index, and the fraction of cells in S. A computer-based method has been developed for extraction of the above-mentioned features from DNA histograms. The program detects peaks in the histogram and uses straight-line fits to the cumulative frequency distribution to define cell population bounds. A test set of 44 histograms compiled from bladder irrigation specimens obtained from patients with a present or past history of transitional cell carcinoma (TCC) was analyzed by five collaborating laboratories forming a Network sponsored by the National Cancer Institute (NCI). This test set was used to evaluate the performance of the computer-based method by comparing results with those of four expert observers. In this preliminary analysis, perfect agreement was found in the detection of aneuploid cell populations by all observers and the computer-based method. Correlation of percent hyperdiploid cell fraction was also excellent.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

13.
Although conventional cytology represents the most widely performed cytometric analysis of bladder cancer cells, DNA flow cytometry has, over the past decade, been increasingly used to evaluate cell proliferation and DNA ploidy in cells from bladder washings. We have investigated whether DNA flow cytometry and conventional cytology of epithelial cells obtained from bladder washings provide reliable surrogate endpoint biomarkers in clinical chemoprevention trials. We used cytometric and clinical data from a chemoprevention trial of the synthetic retinoid Fenretinide on 99 patients with superficial bladder cancer. A total of 642 bladder washing specimens obtained from the patients at 4 month intervals was analyzed. Intra-individual agreement and correlation of flow cytometric DNA ploidy (diploid vs. aneuploid), DNA Index, Hyper-Diploid-Fraction (proportion of cells with DNA content higher than 2C), and conventional cytologic examination, as assessed by kappa statistics and Spearman's correlation test, were poor from baseline through 24 months. Moreover, no correlation was found between DNA ploidy and cytology at each time point. The same results were obtained when the analyses were stratified by treatment group. In addition, the association between the results of bladder washing (by either DNA flow cytometry or cytology) and concomitant tumor recurrence was significant only for abnormal cytology, while neither biomarker was predictive of tumor recurrence at the subsequent visit. During the time of this study only four patients progressed to muscle-invasive bladder cancer, indicating the "low-risk" features of the patient population. We conclude that DNA flow cytometry and conventional cytology on epithelial cells obtained from bladder washings do not appear to provide suitable surrogate endpoint biomarkers during the early stages of bladder carcinogenesis.  相似文献   

14.
BackgroundRecent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer''s disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data.ObjectiveTo introduce and summarize the applications and challenges of machine learning methods in Alzheimer''s disease multi-source data analysis.MethodsThe literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer''s disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on.ConclusionThis study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions.  相似文献   

15.
Flow cytometry is a valuable tool in research and diagnostics including minimal residual disease (MRD) monitoring of hematologic malignancies. However, its gradual advancement toward increasing numbers of fluorescent parameters leads to information rich datasets, which are challenging to analyze by standard gating and do not reflect the multidimensionality of the data. We have developed a novel method to analyze complex flow cytometry data, based on hierarchical clustering analysis (HCA) but with a new underlying algorithm, using Mahalanobis distance measure. HCA is scalable to analyze complex multiparameter datasets (here demonstrated on up to 12 color flow cytometry and on a 20-parameter synthetic dataset). We have validated this method by comparison with standard gating approaches when performed independently by expert cytometrists. Acute lymphoblastic leukemia blast populations were analyzed in diagnostic and follow-up datasets (n = 123) from three centers. HCA results correlated very well (Passing-Bablok correlation coefficient = 0.992, slope = 1, intercept = -0.01) with standard gating data obtained by the I-BFM FLOW-MRD study group. To further improve the performance in follow-up samples with low MRD levels and to automate MRD detection, we combined HCA with support vector machine (SVM) learning. HCA in combination with SVM provides a novel diagnostic tool that not only allows analysis of increasingly complex flow cytometry data but also is less observer-dependent compared with classical gating and has potential for automation.  相似文献   

16.
H M Davey  A Jones  A D Shaw  D B Kell 《Cytometry》1999,35(2):162-168
BACKGROUND: When exploited fully, flow cytometry can be used to provide multiparametric data for each cell in the sample of interest. While this makes flow cytometry a powerful technique for discriminating between different cell types, the data can be difficult to interpret. Traditionally, dual-parameter plots are used to visualize flow cytometric data, and for a data set consisting of seven parameters, one should examine 21 of these plots. A more efficient method is to reduce the dimensionality of the data (e.g., using unsupervised methods such as principal components analysis) so that fewer graphs need to be examined, or to use supervised multivariate data analysis methods to give a prediction of the identity of the analyzed particles. MATERIALS AND METHODS: We collected multiparametric data sets for microbiological samples stained with six cocktails of fluorescent stains. Multivariate data analysis methods were explored as a means of microbial detection and identification. RESULTS: We show that while all cocktails and all methods gave good accuracy of predictions (>94%), careful selection of both the stains and the analysis method could improve this figure (to > 99% accuracy), even in a data set that was not used in the formation of the supervised multivariate calibration model. CONCLUSIONS: Flow cytometry provides a rapid method of obtaining multiparametric data for distinguishing between microorganisms. Multivariate data analysis methods have an important role to play in extracting the information from the data obtained. Artificial neural networks proved to be the most suitable method of data analysis.  相似文献   

17.

Background

Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists’ capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter-gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets.

Results

In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets’ ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets.

Conclusions

Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.
  相似文献   

18.
MOTIVATION: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naive Bayes classifier. It automatically generates a model from the training data, which consists of sequence and structure information of known miRNAs from a variety of species. RESULTS: Our study shows that the application of machine learning techniques, along with the integration of data from multiple species is a useful and general approach for miRNA gene prediction. Based on our experiments, we believe that this new technique is applicable to an extensive range of eukaryotes' genomes. Specific structure and sequence features are first used to identify miRNAs followed by a comparative analysis to decrease the number of false positives (FPs). The resulting algorithm exhibits higher specificity and similar sensitivity compared to currently used algorithms that rely on conserved genomic regions to decrease the rate of FPs.  相似文献   

19.
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.  相似文献   

20.
Flow cytometry has evolved over the past 30 y from a niche laboratory technique to a routine tool used by clinical pathologists and immunologists for diagnosis and monitoring of patients with cancer and immune deficiencies. Identification of novel patterns of expressed Ags has led to the recognition of cancers with unique pathophysiologies and treatment strategies. FACS had permitted the isolation of tumor-free populations of hematopoietic stem cells for cancer patients undergoing stem cell transplantation. Adaptation of flow cytometry to the analysis of multiplex arrays of fluorescent beads that selectively capture proteins and specific DNA sequences has produced highly sensitive and rapid methods for high through-put analysis of cytokines, Abs, and HLA genotypes. Automated data analysis has contributed to the development of a "cytomics" field that integrates cellular physiology, genomics, and proteomics. In this article, we review the impact of the flow cytometer in these areas of medical practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号