首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tissue classification with gene expression profiles.   总被引:29,自引:0,他引:29  
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximately equal to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al. (1999)). The third set consists of approximately equal to 7,100 genes, measured in 72 bone marrow and peripheral blood samples (Golub et al, 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.  相似文献   

2.
The detection of lung cancer has a special value in the diagnosis of cancer diseases. Based on nine elemental concentrations (i.e., chromium, iron, manganese, aluminum, cadmium, copper, zinc, nickel, and selenium) in urine samples and an ensemble linear discriminant analysis (ELDA), a detection method for lung cancer has been developed. A dataset containing 30 healthy samples and 27 lung cancer samples is used for experiment. The whole dataset was first split into a training set with 29 samples and a test set with 28 samples. The prediction results from the ELDA classifier were compared with those from single Fisher’s discriminate analysis (FDA). On the test set, the ELDA classifier achieved better performance, that is, a sensitivity of 100%, a specificity of 86.7%, and an overall accuracy of 92.9%, while the FDA classifier had a sensitivity of 92.3%, a specificity of 93.3%, and an overall accuracy of 92.9%. The superiority of ELDA to FDA is ascribed to the fact that ELDA can model more nonlinear relationships through the cooperation of several single models, suggesting that ensemble modeling is more advisable in such a task.  相似文献   

3.
The present study was designed to evaluate the levels of eight elements including lithium, zinc, chromium, copper, iron, manganese, nickel and vanadium in whole blood of type-2 diabetes patients, to compare them with age-matched healthy controls and to investigate the feasibility of combining them with an ensemble model for diagnosing purpose. A dataset involving 158 samples, among which 105 were taken from healthy adults and the remaining 53 from patients with type-2 diabetes, was collected. All samples were split into the training set and the test set with the equal size. Based on a simple variable selection, two elements, i.e., chromium and iron, are also picked out as the most important elements. Three kinds of algorithms, i.e., fisher linear discriminate analysis (FLDA), support vector machine (SVM) and decision tree (DT), were used for constructing member models. The best ensemble classifiers constructed on the training set were validated on the independent test set, and the prediction results were compared with those from clinical diagnostics on the same subjects. The results reveal that almost all ensemble classifiers exhibit similar performance, implying that these elements coupled with an appropriate ensemble classifier can serve as a valuable tool of diagnosing diabetes type-2.  相似文献   

4.
Dietary restriction (DR)-induced changes in the serum metabolome may be biomarkers for physiological status (e.g., relative risk of developing age-related diseases such as cancer). Megavariate analysis (unsupervised hierarchical cluster analysis [HCA]; principal components analysis [PCA]) of serum metabolites reproducibly distinguish DR from ad libitum fed rats. Component-based approaches (i.e., PCA) consistently perform as well as or better than distance-based metrics (i.e., HCA). We therefore tested the following: (A) Do identified subsets of serum metabolites contain sufficient information to construct mathematical models of class membership (i.e., expert systems)? (B) Do component-based metrics out-perform distance-based metrics? Testing was conducted using KNN (k-nearest neighbors, supervised HCA) and SIMCA (soft independent modeling of class analogy, supervised PCA). Models were built with single cohorts, combined cohorts or mixed samples from previously studied cohorts as training sets. Both algorithms over-fit models based on single cohort training sets. KNN models had >85% accuracy within training/test sets, but were unstable (i.e., values of k could not be accurately set in advance). SIMCA models had 100% accuracy within all training sets, 89 % accuracy in test sets, did not appear to over-fit mixed cohort training sets, and did not require post-hoc modeling adjustments. These data indicate that (i) previously defined metabolites are robust enough to construct classification models (expert systems) with SIMCA that can predict unknowns by dietary category; (ii) component-based analyses outperformed distance-based metrics; (iii) use of over-fitting controls is essential; and (iv) subtle inter-cohort variability may be a critical issue for high data density biomarker studies that lack state markers.  相似文献   

5.

Background

Gold nanoparticles (AuNPs) scatter light intensely at or near their surface plasmon wavelength region. Using AuNPs coupled with dynamic light scattering (DLS) detection, we developed a facile nanoparticle immunoassay for serum protein biomarker detection and analysis. A serum sample was first mixed with a citrate-protected AuNP solution. Proteins from the serum were adsorbed to the AuNPs to form a protein corona on the nanoparticle surface. An antibody solution was then added to the assay solution to analyze the target proteins of interest that are present in the protein corona. The protein corona formation and the subsequent binding of antibody to the target proteins in the protein corona were detected by DLS.

Results

Using this simple assay, we discovered multiple molecular aberrations associated with prostate cancer from both mice and human blood serum samples. From the mice serum study, we observed difference in the size of the protein corona and mouse IgG level between different mice groups (i.e., mice with aggressive or less aggressive prostate cancer, and normal healthy controls). Furthermore, it was found from both the mice model and the human serum sample study that the level of vascular endothelial growth factor (VEGF, a protein that is associated with tumor angiogenesis) adsorbed to the AuNPs is decreased in cancer samples compared to non-cancerous or less malignant cancer samples.

Conclusion

The molecular aberrations observed from this study may become new biomarkers for prostate cancer detection. The nanoparticle immunoassay reported here can be used as a convenient and general tool to screen and analyze serum proteins and to discover new biomarkers associated with cancer and other human diseases.  相似文献   

6.
Microarray gene expression data usually have a large number of dimensions, e.g., over ten thousand genes, and a small number of samples, e.g., a few tens of patients. In this paper, we use the support vector machine (SVM) for cancer classification with microarray data. Dimensionality reduction methods, such as principal components analysis (PCA), class-separability measure, Fisher ratio, and t-test, are used for gene selection. A voting scheme is then employed to do multi-group classification by k(k - 1) binary SVMs. We are able to obtain the same classification accuracy but with much fewer features compared to other published results.  相似文献   

7.
In order to improve the sensitivity and stability of human blood samples containing WR-1065 (i.e., active metabolite of the cytoprotective agent amifostine), a high-performance liquid chromatographic method was developed and validated using fluorescent derivatization with ThioGlo3. Using a sample volume of only 100 microl, the method was specific, sensitive (limit of quantitation=10 nM in deproteinized blood or 20 nM in whole blood), accurate (error < or = 3.2%) and reproducible (CV < or = 8.7%). In addition, the stability of WR-1065 in deproteinized and derivatized blood samples was assured for at least four weeks at -20 degrees C. This method should be particularly valuable in translating the kinetic-dynamic relationship of WR-1065 in preclinical models to that in cancer patients.  相似文献   

8.
Development of rapid and sensitive methods to detect pathogens is important to food and water safety. This study aimed to detect and discriminate important food- and waterborne bacteria (i.e., Escherichia coli O157:H7, Staphylococcus epidermidis, Listeria monocytogenes, and Enterococcus faecelis) by surface-enhanced Raman spectroscopy (SERS) coupled with intracellular nanosilver as SERS substrates. An in vivo molecular probing using intracellular nanosilver for the preparation of bacterial samples was established and assessed. Satisfactory SERS performance and characteristic SERS spectra were obtained from different bacterial samples. Distinctive differences were observed in SERS spectral data, specifically in the Raman shift region of 500–1,800 cm−1, and between bacterial samples at the species and strain levels. The detection limit of SERS coupled with in vivo molecular probing using silver nanosubstrates could reach the level of single cells. Experiments with a mixture of E. coli O157:H7 and S. epidermidis for SERS measurement demonstrate that SERS could be used for classification of mixed bacterial samples. Transmission electron microscopy was used to characterize changes of morphology and cellular composition of bacterial cells after treatment of intracellular nanosilver. The results indicate that SERS coupled with intracellular silver nanosubstrates is a promising method for detection and characterization of food- and waterborne pathogenic and non-pathogenic bacterial samples.  相似文献   

9.
A preliminary study was carried out in order to compare the selenium concentration in breast cancer patients and healthy subjects (controls) in Israel. Blood serum samples were obtained from 32 breast cancer patients and 36 controls and were analyzed for selenium by the XRF method. A weighted mean of 0.076±0.014 ppm Se in the blood serum of breast cancer patients, as compared to 0.119±0.023 ppm Se for controls, was obtained. These results indicate that the concentration of selenium in breast cancer patients is significantly lower than in controls. The relationship between selenium concentration and malignancy stage shows an inverse dependence, i.e., the concentration decreases with stage number.  相似文献   

10.
A prototype test-system for simultaneous quantitative assay of nine tumor markers in blood serum was developed. The main constituent of the test-system is an OM-9 biochip containing immobilized antibodies against nine oncomarkers: α-fetoprotein (AFP), carcinoembryonic antigen (CEA), human chorionic gonadotropin (HCG), cancer antigen 15-3 (CA 15-3), cancer antigen 125 (CA 125), cancer antigen 19-9 (CA 19-9), total and free forms of prostate-specific antigen (PSAtot and PSAfree), and neuron-specific enolase (NSE). The biochip-based two-step sandwich immunoassay procedure for carrying out simultaneous quantitative determination of nine tumor markers in patients’ blood serum was proposed. The main analytical characteristics of the method were obtained. The results suggest that the prototype of the test-system could be a promising instrument for clinical application. The test-system prototype was tested using blood serum samples of oncological patients (252 samples) and healthy donors (185 samples). Increased concentrations of one or more tumor markers above the normal level were found in 76.6% cases of oncological patients and only in 6% cases of healthy donors. For colorectal cancer patients, application of modern statistical methods of data processing in medical research, i.e., receiver operating characteristics analysis (ROC curve) and logistic regression, indicated that the simultaneous assay of nine markers on biochips showed much more diagnostic significance (area under the ROC curve, AUC, was 0.84) than a traditional assay of two tumor markers, CEA and CA 19-9 (AUC = 0.59). The developed biochip-based test-system can be recommended for both the estimation of people’s health, e.g., for standard medical examination, and tracking the tumoral process in the postsurgical period or after specific tumor treatment.  相似文献   

11.
Prostate cancer is the most common non-cutaneous malignancy and second leading cause of cancer mortality in men. The principle goal of this study was explore the feasibility of applying boosting coupled with trace element analysis of hair, for accurately distinguishing prostate cancer from healthy person. A total of 113 subjects containing 55 healthy men and 58 prostate cancers were collected. Based on a special index of variable importance and a forward selection scheme, only nine elements (i.e., Zn, Cr, Mg, Ca, Al, P, Cd, Fe, and Mo) were picked out from 20 candidate elements for modeling the relationship. As a result, an ensemble classifier consisting of only eight decision stumps achieved an overall accuracy of 98.2%, a sensitivity of 100%, and a specificity of 96.4% on the independent test set while all subjects on the training set are classified correctly. It seems that integrating boosting and element analysis of hair can serve as a valuable tool of diagnosing prostate cancer in practice.  相似文献   

12.
Using inductively coupled plasma mass spectrometry (ICP-MS) based analytical procedures, the concentration of several trace elements (Mn, As, Pb, Co, Ni, Cu, Zn and Se) was determined in human milk samples collected from a group of healthy lactating Portuguese women (n=44), both on the 2nd day postpartum (i.e., colostrum; n=34) and at 1 month postpartum (i.e., mature milk; n=19). Blood samples (n=44), collected on the 2nd day after parturition, were also analyzed for the same trace elements. No major correlations were observed between the levels of the analyzed trace elements in blood and colostrum samples. All the studied elements, except for Co, Pb and Ni, showed a significant trend for a decrease in concentration in milk during the first month of lactation. This trend was more pronounced for Zn and Se, whose levels decreased to approximately 23% and 44% of their initial mean concentration, respectively. With the exception of Co (r=0.607) and Zn (r=0.487), no significant correlations were observed when comparing the levels of each trace element between samples of colostrum and mature milk. Several inter-element correlations were found within each type of milk sample. The most significant were: (i) Se vs Cu (r=0.828) and Se vs Co (r=0.605) in colostrum samples and (ii) Ni vs Pb (r=0.756), Ni vs Mn (r=0.743) and Se vs Co (r=0.714) in mature milk samples. An inverse correlation between Zn and Se was also found in both types of milk sample; however, it only reached statistical significance for mature milk (r=-0.624).  相似文献   

13.
The purpose of the study was to examine the validity of alpha1-microglobulin (alpha1-MG) in comparison with popularly used beta2-microglobulin (beta2-MG). A database was revisited to select ca. 7,500 spot urine samples (of adequate urine density) from non-pregnant, non-lactating and never-smoking adult women. The validity of the MGs was examined in terms of stability of the MG-uria prevalence in urine samples of various creatinine (CR or cr) concentration or specific gravity (SG or sg). Comparisons were made for MGs as observed (e.g., alpha1-MGob), as corrected for CR (e.g., alpha1-MGcr) and as corrected for SG of 1.016 (e.g., alpha1-MGsg). A cut-off value of 5.7 mg/g cr (or mg/l) for alpha1-MG was deduced from a cut-off value of 400 microg/g cr (or mcirog/l) for beta2-MG, because the correlation between alpha1-MGcr and beta2-MGcr was statistically significant. The prevalence of a 1-MGsg-uria was essentially unchanged (i.e., from a low of 13.6% to a high of 17.0%, or 1.2 times) except for in very dense or very thin urine samples, in contrast, beta2-MGcr-uria showed a substantial increase (from 0.0% to 2.8% with an infinite rate) as a reverse function of a decrease in CR in urine. The prevalence of uncorrected markers, i.e., alpha1-MGob-uria and beta2-MGob-uria, showed even greater CR- or SG-dependent changes. Thus, it appeared prudent to consider a alpha-MGsg rather than beta2-MGcr as a marker of tubular dysfunction among a general population with various urine density.  相似文献   

14.
The study on the relationship between trace elements and diseases often need to build a classification/regression model. Furthermore, the accuracy of such a model is of particular importance and directly decides its applicability. The goal of this study is to explore the feasibility of applying boosting, i.e., a new strategy from machine learning, to model the relationship between trace elements and diseases. Two examples are employed to illustrate the technique in the applications of classification and regression, respectively. The first example involves the diagnosis of anorexia according to the concentrations of six elements (i.e. classification task). Decision stump and support vector machine are used as the weak/base algorithm and reference algorithm, respectively. The second example involves the prediction of breast cancer mortality based on the intake of trace elements (i.e. a regression task). In this regard, partial least squares is not only used as the weak/base algorithm, but also the reference algorithm. The results from both examples confirm the potential of boosting in modeling the relationship between trace elements and diseases.  相似文献   

15.
Abstract Despite increased knowledge about environmental toxins and changes in lead use (i.e., the mandated use of nonlead paint, gasoline, and shotgun pellets used for hunting waterfowl on federal lands), lead poisoning continues to occur in terrestrial birds. The degree of exposure and its demographic effect, however, continue to be described, emphasizing the growing concern over lead exposure. We examined 302 blood samples from common ravens (Corvus corax) scavenging on hunter-killed large ungulates and their offal piles to determine if lead rifle-bullet residuum was a point source for lead ingestion in ravens. We took blood samples during a 15-month period spanning 2 hunting seasons. Of the ravens tested during the hunting season, 47% exhibited elevated blood lead levels (≥ 10 μg/dL) whereas 2% tested during the nonhunting season exhibited elevated levels. Females had significantly higher blood lead levels than did males. Our results confirm that ravens are ingesting lead during the hunting season and are likely exposed to lead from rifle-shot big-game offal piles.  相似文献   

16.
17.
18.

Background  

The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.  相似文献   

19.
《IRBM》2023,44(3):100749
ObjectiveThe most widespread and intrusive cancer type among women is breast cancer. Globally, this type of cancer causes more mortality among women, next to lung cancer. This made the researchers to focus more on developing effective Computer-Aided Detection (CAD) methodologies for the classification of such deadly cancer types. In order to improve the rate of survival and earlier diagnosis, an optimistic research methodology is required in the classification of breast cancer. Consequently, an improved methodology that integrates the principle of deep learning with metaheuristic and classification algorithms is proposed for the severity classification of breast cancer. Hence to enhance the recent findings, an improved CAD methodology is proposed for redressing the healthcare problem.Material and MethodsThe work intends to cast a light-of-research towards classifying the severities present in digital mammogram images. For evaluating the work, the publicly available MIAS, INbreast, and WDBC databases are utilized. The proposed work employs transfer learning for extricating the features. The novelty of the work lies in improving the classification performance of the weighted k-nearest neighbor (wKNN) algorithm using particle swarm optimization (PSO), dragon-fly optimization algorithm (DFOA), and crow-search optimization algorithm (CSOA) as a transformation technique i.e., transforming non-linear input features into minimal linear separable feature vectors.ResultsThe results obtained for the proposed work are compared then with the Gaussian Naïve Bayes and linear Support Vector Machine algorithms, where the highest accuracy for classification is attained for the proposed work (CSOA-wKNN) with 84.35% for MIAS, 83.19% for INbreast, and 97.36% for WDBC datasets respectively.ConclusionThe obtained results reveal that the proposed Computer-Aided-Diagnosis (CAD) tool is robust for the severity classification of breast cancer.  相似文献   

20.
Was compared frequency of lymphocytes mutant at loci of T-cell receptor (TCR) from samples of peripheral blood taken from 186 healthy donors and 46 untreated thyroid cancer patients, including the persons exposed to ionizing radiation as a result of inhabitation in radioactive polluted region of Russian Federation. Was shown that the cell mutation rate within thyroid cancer group was significantly higher than the same parameter for the healthy person with similar age distribution (p < 0.01). It could be a result of such factors as genotoxic influence, different sensitivity or possible genome instability (including radiation-induced). It was found that 37% of patients have the increased frequency of somatic mutation cells, i.e. it exceeded 95% confidence interval for the screening group. The presented results cause to anticipate that TCR-test could be used as one of criteria for formation groups of high cancer risk development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号