共查询到20条相似文献,搜索用时 15 毫秒
1.
用表面加强激光解析电离飞行时间质谱(SELDI-TOF-MS)和蛋白质芯片检测子宫内膜异位症(endometriosis,EM)患者血清蛋白质指纹图谱,探讨诊断模型在EM诊断中的临床应用价值。用SELDI-TOF-MS技术和H4蛋白质芯片检测16例EM和16例正常女性的血清蛋白质指纹图谱,并建立诊断模型。然后,对16名健康人和16例EM患者样本进行盲法测试验证该模型。筛选出4个有明显表达差异的蛋白质,其质荷比(m/z)分别为8141、6096、5894、3269。建立的诊断模型对EM检测的灵敏度为87.5%(14/16),特异性为93.75%(15/16),总准确率为90.625%(29/32)。SELDI-TOF-MS对小样本的EM诊断具有较高的敏感性和特异性,在EM的诊断及标志物筛选等方面具有较好的诊断价值。 相似文献
2.
High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA) in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, -nearest neighbor) on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang''s Breast Cancer, Gordon''s Lung Adenocarcinoma and Pomeroy''s Medulloblastoma. 相似文献
3.
Background
Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.Methodology
We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.Results
To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools. 相似文献4.
A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method''s variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique. 相似文献
5.
目的:分析结直肠腺瘤血清蛋白质谱的变化,寻找结直肠腺瘤的特异性生物标志物。方法:采用SELDI-TOF-MS技术(表面增强激光解析电离飞行时间质谱)对比分析31例结直肠腺瘤患者和11例正常人的血清蛋白质谱,用Biomarker Wizard软件对获得的蛋白质谱进行分析。结果:结直肠腺瘤组与正常对照组有24个蛋白峰有差异,其中有三个蛋白峰(8565.84D、8694.51D和5910.50D)的差异非常显著,8565.84D和8694.51D在结直肠腺瘤中高表达,在正常人中低表达,而5910.50D在两组人群中的表达相反。结论:这三个蛋白峰可能为结直肠腺瘤特异性的生物蛋白标志物。 相似文献
6.
Neural tube defects (NTDs) are common birth defects, whose specific biomarkers are needed. The purpose of this pilot study is to determine whether protein profiling in NTD-mothers differ from normal controls using SELDI-TOF-MS. ProteinChip Biomarker System was used to evaluate 82 maternal serum samples, 78 urine samples and 76 amniotic fluid samples. The validity of classification tree was then challenged with a blind test set including another 20 NTD-mothers and 18 controls in serum samples, and another 19 NTD-mothers and 17 controls in urine samples, and another 20 NTD-mothers and 17 controls in amniotic fluid samples. Eight proteins detected in serum samples were up-regulated and four proteins were down-regulated in the NTD group. Four proteins detected in urine samples were up-regulated and one protein was down-regulated in the NTD group. Six proteins detected in amniotic fluid samples were up-regulated and one protein was down-regulated in the NTD group. The classification tree for serum samples separated NTDs from healthy individuals, achieving a sensitivity of 91% and a specificity of 97% in the training set, and achieving a sensitivity of 90% and a specificity of 97% and a positive predictive value of 95% in the test set. The classification tree for urine samples separated NTDs from controls, achieving a sensitivity of 95% and a specificity of 94% in the training set, and achieving a sensitivity of 89% and a specificity of 82% and a positive predictive value of 85% in the test set. The classification tree for amniotic fluid samples separated NTDs from controls, achieving a sensitivity of 93% and a specificity of 89% in the training set, and achieving a sensitivity of 90% and a specificity of 88% and a positive predictive value of 90% in the test set. These suggest that SELDI-TOF-MS is an additional method for NTDs pregnancies detection. 相似文献
7.
Qingzhong Liu Andrew H. Sung Zhongxue Chen Jianzhong Liu Xudong Huang Youping Deng 《PloS one》2009,4(12)
Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA), which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods:
- Support Vector Machine Recursive Feature Elimination (SVMRFE)
- Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS)
- Gradient based Leave-one-out Gene Selection (GLGS)
8.
Background
Fatty Liver Disease (FLD) is one of the most critical diseases that should be detected and cured at the earlier stage in order to decrease the mortality rate. To identify the FLD, ultrasound images have been widely used by the radiologists. However, due to poor quality of ultrasound images, they found difficulties in recognizing FLD. To resolve this problem, many researchers have developed various Computer Aided Diagnosis (CAD) systems for the classification of fatty and normal liver ultrasound images. However, the performance of existing CAD systems is not good in terms of sensitivity while classifying the FLD.Methods
In this paper, an attempt has been made to present a CAD system for the classification of liver ultrasound images. For this purpose, texture features are extracted by using seven different texture models to represent the texture of Region of Interest (ROI). Highly discriminating features are selected by using Mutual Information (MI) feature selection method.Results
Extensive experiments have been carried out with four different classifiers, and for carrying out this study, 90 liver ultrasound images have been taken. From the experimental results, it has been found that the proposed CAD system is able to give 95.55% accuracy and sensitivity of 97.77% with the 20 best features selected by the MI feature selection technique.Conclusion
The experimental results show that the proposed system can be used for the classification of fatty and normal liver ultrasound images with higher accuracy. 相似文献9.
Jennifer R. Fleming Lalitha Sastry Thomas W. M. Crozier Grant B. Napier Lauren Sullivan Michael A. J. Ferguson 《PLoS neglected tropical diseases》2014,8(6)
Animal African Trypanosomosis (AAT) presents a severe problem for agricultural development in sub-Saharan Africa. It is caused by several trypanosome species and current means of diagnosis are expensive and impractical for field use. Our aim was to discover antigens for the detection of antibodies to Trypanosoma congolense, one of the main causative agents of AAT. We took a proteomic approach to identify potential immunodiagnostic parasite protein antigens. One hundred and thirteen proteins were identified which were selectively recognized by infected cattle sera. These were assessed for likelihood of recombinant protein expression in E. coli and fifteen were successfully expressed and assessed for their immunodiagnostic potential by ELISA using pooled pre- and post-infection cattle sera. Three proteins, members of the invariant surface glycoprotein (ISG) family, performed favorably and were then assessed using individual cattle sera. One antigen, Tc38630, evaluated blind with 77 randomized cattle sera in an ELISA assay gave sensitivity and specificity performances of 87.2% and 97.4%, respectively. Cattle immunoreactivity to this antigen diminished significantly following drug-cure, a feature helpful for monitoring the efficacy of drug treatment. 相似文献
10.
目的:探讨模拟失重环境下大鼠血浆蛋白质组变化特征.方法:健康成年雄性 Wistar 大鼠88只,按模拟失重时相随机分为11组,分别为6 h、12 h、1 d、2 d、3 d、5 d、1周、2周、3周、4周及0 h 组(对照组).采用尾悬吊法建立模拟失重动物模型,实验结束时取动物静脉血,利用表面增强激光解吸电离飞行时间质谱(SELDI-TOF-MS)技术及 MB-WCX 磁珠检测大鼠静脉血浆蛋白质谱,应用 Ciphergen Protein Chip Software 3.2.0和 Biomarker Wizard 3.1.0软件分析数据.结果:发现18个重力敏感蛋白,其中在模拟失重早期,相对分子质量较小的6个蛋白的表达呈上调趋势,而相对分子质量较大的12个蛋白的表达则逐渐下调;在模拟失重后期(悬尾2~3周后),上述蛋白的表达均呈回归趋势.结论:模拟失重环境对大鼠静脉血浆蛋白质谱产生明显影响,研究重力敏感蛋白对进一步揭示失重对机体的影响及机制具有重要意义,并对医监医保可能有一定的价值. 相似文献
11.
High-throughput biological technologies offer the promise of finding feature sets to serve as biomarkers for medical applications; however, the sheer number of potential features (genes, proteins, etc.) means that there needs to be massive feature selection, far greater than that envisioned in the classical literature. This paper considers performance analysis for feature-selection algorithms from two fundamental perspectives: How does the classification accuracy achieved with a selected feature set compare to the accuracy when the best feature set is used and what is the optimal number of features that should be used? The criteria manifest themselves in several issues that need to be considered when examining the efficacy of a feature-selection algorithm: (1) the correlation between the classifier errors for the selected feature set and the theoretically best feature set; (2) the regressions of the aforementioned errors upon one another; (3) the peaking phenomenon, that is, the effect of sample size on feature selection; and (4) the analysis of feature selection in the framework of high-dimensional models corresponding to high-throughput data. 相似文献
12.
13.
Maneesh Bhargava Trisha L. Becker Kevin J. Viken Pratik D. Jagtap Sanjoy Dey Michael S. Steinbach Baolin Wu Vipin Kumar Peter B. Bitterman David H. Ingbar Christine H. Wendt 《PloS one》2014,9(10)
Acute Respiratory Distress Syndrome (ARDS) continues to have a high mortality. Currently, there are no biomarkers that provide reliable prognostic information to guide clinical management or stratify risk among clinical trial participants. The objective of this study was to probe the bronchoalveolar lavage fluid (BALF) proteome to identify proteins that differentiate survivors from non-survivors of ARDS. Patients were divided into early-phase (1 to 7 days) and late-phase (8 to 35 days) groups based on time after initiation of mechanical ventilation for ARDS (Day 1). Isobaric tags for absolute and relative quantitation (iTRAQ) with LC MS/MS was performed on pooled BALF enriched for medium and low abundance proteins from early-phase survivors (n = 7), early-phase non-survivors (n = 8), and late-phase survivors (n = 7). Of the 724 proteins identified at a global false discovery rate of 1%, quantitative information was available for 499. In early-phase ARDS, proteins more abundant in survivors mapped to ontologies indicating a coordinated compensatory response to injury and stress. These included coagulation and fibrinolysis; immune system activation; and cation and iron homeostasis. Proteins more abundant in early-phase non-survivors participate in carbohydrate catabolism and collagen synthesis, with no activation of compensatory responses. The compensatory immune activation and ion homeostatic response seen in early-phase survivors transitioned to cell migration and actin filament based processes in late-phase survivors, revealing dynamic changes in the BALF proteome as the lung heals. Early phase proteins differentiating survivors from non-survivors are candidate biomarkers for predicting survival in ARDS. 相似文献
14.
高维、小样本数据的特征选择方法在蛋白质质谱数据处理分析领域有着广泛应用。本文针对蛋白质质谱特征选择问题,结合稀疏表示这一新理论框架,提出了一种基于稀疏表示的特征选择算法(sparse representation based feature selection,SRFS)。该方法将稀疏表示分类的结果作为评定某一个特征子空间特征相对重要性的度量,然后通过对大量随机采样子空间计算结果的统计,得到特征空间中每个特征的排序,并进一步分析提炼出与肿瘤疾病相关的若干谱峰。通过在卵巢癌公共数据集OC-WCX2a和浙江省肿瘤医院乳腺癌数据集BC-WCX2a上的实验结果表明,SRFS算法可以有效应用于本文所使用的SELDI-TOF蛋白质质谱数据的分析。 相似文献
15.
16.
Background
Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting) while minimizing the number of features used.Results
We propose an optimization approach for the feature selection problem that considers a “chaotic” version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics. 相似文献17.
A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1) select more informative image locations upon which to fixate their eyes, or 2) extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice. 相似文献
18.
Marshall S. Scicchitano Deidre A. Dalmas Rogely W. Boyce Heath C. Thomas Kendall S. Frazier 《The journal of histochemistry and cytochemistry》2009,57(9):849-860
Global mass spectrometry (MS) profiling and spectral count quantitation are used to identify unique or differentially expressed proteins and can help identify potential biomarkers. MS has rarely been conducted in retrospective studies, because historically, available samples for protein analyses were limited to formalin-fixed, paraffin-embedded (FFPE) archived tissue specimens. Reliable methods for obtaining proteomic profiles from FFPE samples are needed. Proteomic analysis of these samples has been confounded by formalin-induced protein cross-linking. The performance of extracted proteins in a liquid chromatography tandem MS format from FFPE samples and extracts from whole and laser capture microdissected (LCM) FFPE and frozen/optimal cutting temperature (OCT)–embedded matched control rat liver samples were compared. Extracts from FFPE and frozen/OCT–embedded livers from atorvastatin-treated rats were further compared to assess the performance of FFPE samples in identifying atorvastatin-regulated proteins. Comparable molecular mass representation was found in extracts from FFPE and OCT-frozen tissue sections, whereas protein yields were slightly less for the FFPE sample. The numbers of shared proteins identified indicated that robust proteomic representation from FFPE tissue and LCM did not negatively affect the number of identified proteins from either OCT-frozen or FFPE samples. Subcellular representation in FFPE samples was similar to OCT-frozen, with predominantly cytoplasmic proteins identified. Biologically relevant protein changes were detected in atorvastatin-treated FFPE liver samples, and selected atorvastatin-related proteins identified by MS were confirmed by Western blot analysis. These findings demonstrate that formalin fixation, paraffin processing, and LCM do not negatively impact protein quality and quantity as determined by MS and that FFPE samples are amenable to global proteomic analysis. (J Histochem Cytochem 57:849–860, 2009) 相似文献
19.
Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of
samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the
training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a
small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from
the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We
describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include
filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use
information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based
optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are
used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is
provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate. 相似文献
20.
两种过滤特征基因选择算法的有效性研究 总被引:2,自引:0,他引:2
对基因表达谱进行特征基因选择不仅能改善疾病分类方法的效能,而且为寻找与疾病相关的特征基因提供新的途径.通过比较用调整p值的t检验、非参数评分两种特征基因选择算法后和未进行选择时支持向量机(SVM)分类器的分类性能、支持向量(SV)的吻合度、错分样本ID的吻合度和对样本均匀翻倍后的稳定性.结果发现:特征选择后线性、核函数为二阶多项式和径向基的SVM分类性能明显提高;特征选择前后的SV及错分样本ID的吻合度均较高;SVM的稳定性较好.由此得出结论:这两种特征选择算法具有一定的有效性. 相似文献