共查询到20条相似文献,搜索用时 31 毫秒
1.
Max Bylesjö Mattias Rantalainen Jeremy K Nicholson Elaine Holmes Johan Trygg 《BMC bioinformatics》2008,9(1):106
Background
Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation. 相似文献2.
Background
State-of-the-art signal processing methods are known to detect information in single-trial event-related EEG data, a crucial aspect in development of real-time applications such as brain computer interfaces. This paper investigates one such novel approach, evaluating how individual classifier and feature subset tailoring affects classification of single-trial EEG finger movements. The discrete wavelet transform was used to extract signal features that were classified using linear regression and non-linear neural network models, which were trained and architecturally optimized with evolutionary algorithms. The input feature subsets were also allowed to evolve, thus performing feature selection in a wrapper fashion. Filter approaches were implemented as well by limiting the degree of optimization. 相似文献3.
Comparison of multivariate methods for estimating soil total nitrogen with visible/near-infrared spectroscopy 总被引:3,自引:0,他引:3
Tiezhu Shi Lijuan Cui Junjie Wang Teng Fei Yiyun Chen Guofeng Wu 《Plant and Soil》2013,366(1-2):363-375
Aims
This study aimed to compare stepwise multiple linear regression (SMLR), partial least squares regression (PLSR) and support vector machine regression (SVMR) for estimating soil total nitrogen (TN) contents with laboratory visible/near-infrared reflectance (Vis/NIR) of selected coarse and heterogeneous soils. Moreover, the effects of the first (1st) vs. second (2nd) derivative of spectral reflectance and the importance wavelengths were explored.Methods
The TN contents and the Vis/NIR were measured in the laboratory. Several methods were employed for Vis/NIR data pre-processing. The SMLR, PLSR and SVMR models were calibrated and validated using independent datasets.Results
Results showed that the SVMR and the PLSR models had similar performances, and better performances than the SMLR. The spectral bands near 1450, 1850, 2250, 2330 and 2430 nm in the PLSR model were important wavelengths. In addition, the 1st derivative was more appropriate than the 2nd derivative for spectral data pre-processing.Conclusions
PLSR was the most suitable method for estimating TN contents in this study. SVMR may be a promising technique, and its potential needs to be further explored. Moreover, the future studies using outdoor and airborne/satellite hyperspectral data for estimating TN content are necessary for testing the findings. 相似文献4.
Application of NIR Spectroscopy Coupled with PLS Regression for Quantification of Total Polyphenol Contents from the Fruit and Aerial Parts of Citrullus colocynthis 下载免费PDF全文
Tania S. Rizvi Fazal Mabood Liaqat Ali Mohammed Al‐Broumi Hamida K.M. Al Rabani Javid Hussain Farah Jabeen Suryyia Manzoor Ahmed Al‐Harrasi 《Phytochemical analysis : PCA》2018,29(1):16-22
Introduction
Citrullus colocynthis (L.) Schrad is extensively used to treat diabetes, obesity, fever, cancer, amenorrhea, jaundice, leukemia, rheumatism, and respiratory diseases. Chemical studies have indicated the presence of several cucurbitacins, flavones, and other polyphenols in this plant. These phytochemical constituents are responsible for the interesting antioxidant and other biological activities of C. colocynthis.Objective
In the present study, for the first time, near infrared (NIR) spectroscopy coupled with partial least square (PLS) regression analysis was used to quantify the polyphenolic phytochemicals of C. colocynthis.Methodology
The fruit and aerial parts of the C. colocynthis were extracted individually in methanol followed by fractionation in n‐hexane, chloroform, ethyl acetate, n‐butanol, and water. Near infrared (NIR) spectra were obtained in absorption mode in the wavelength range 700–2500 nm. The PLS regression model was then built from the obtained spectral data to quantify the total polyphenol contents in the selected plant samples.Results
The PLS regression model obtained had a R2 value of 99% with a 0.98 correlationship value and a good prediction with a root mean square error of prediction (RMSEP) value of 1.89% and correlation of 0.98. These results were further confirmed through UV–vis spectroscopy and it is found that the ethyl acetate fraction has the maximum value for polyphenol contents (101.7 mg/100 g; NIR, 100.4 mg/100 g; UV–vis).Conclusions
The polyphenolic phytochemicals of the fruit and aerial parts of C. colocynthis have been quantified successfully by using multivariate analysis in a non‐destructive, economical, precise, and highly sensitive method, which uses very simple sample preparation. Copyright © 2017 John Wiley & Sons, Ltd. 相似文献5.
Background
The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain. 相似文献6.
Paulo C Carvalho Juliana SG Fischer Emily I Chen John R YatesIII Valmir C Barbosa 《BMC bioinformatics》2008,9(1):316
Background
A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. 相似文献7.
Emmanuel G Reynaud Miguel A Andrade Fabien Bonneau Thi Bach Nga Ly Michael Knop Klaus Scheffzek Rainer Pepperkok 《BMC biology》2005,3(1):21
Background
Compartmentalization is a key feature of eukaryotic cells, but its evolution remains poorly understood. GTPases are the oldest enzymes that use nucleotides as substrates and they participate in a wide range of cellular processes. Therefore, they are ideal tools for comparative genomic studies aimed at understanding how aspects of biological complexity such as cellular compartmentalization evolved. 相似文献8.
Background
Protein-protein interaction (PPI) networks enable us to better understand the functional organization of the proteome. We can learn a lot about a particular protein by querying its neighborhood in a PPI network to find proteins with similar function. A spectral approach that considers random walks between nodes of interest is particularly useful in evaluating closeness in PPI networks. Spectral measures of closeness are more robust to noise in the data and are more precise than simpler methods based on edge density and shortest path length. 相似文献9.
Lin Q Peng Q Yao F Pan XF Xiong LW Wang Y Geng JF Feng JX Han BH Bao GL Yang Y Wang X Jin L Guo W Wang JC 《PloS one》2012,7(3):e34457
Purpose
Lung cancer is the leading cause of cancer death worldwide, but techniques for effective early diagnosis are still lacking. Proteomics technology has been applied extensively to the study of the proteins involved in carcinogenesis. In this paper, a classification method was developed based on principal components of surface-enhanced laser desorption/ionization (SELDI) spectral data. This method was applied to SELDI spectral data from 71 lung adenocarcinoma patients and 24 healthy individuals. Unlike other peak-selection-based methods, this method takes each spectrum as a unity. The aim of this paper was to demonstrate that this unity-based classification method is more robust and powerful as a method of diagnosis than peak-selection-based methods.Results
The results showed that this classification method, which is based on principal components, has outstanding performance with respect to distinguishing lung adenocarcinoma patients from normal individuals. Through leaving-one-out, 19-fold, 5-fold and 2-fold cross-validation studies, we found that this classification method based on principal components completely outperforms peak-selection-based methods, such as decision tree, classification and regression tree, support vector machine, and linear discriminant analysis.Conclusions and Clinical Relevance
The classification method based on principal components of SELDI spectral data is a robust and powerful means of diagnosing lung adenocarcinoma. We assert that the high efficiency of this classification method renders it feasible for large-scale clinical use. 相似文献10.
Background
Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. 相似文献11.
12.
Background
Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models. 相似文献13.
Introduction
Glucosamine is an amino-monosaccharide and precursor of glycosaminoglycans, major components of joint cartilage. Glucosamine has been clinically introduced for the treatment of osteoarthritis but the data about its protective role in disease are insufficient. The goal of this study was to investigate the effect of long term administration of glucosamine on bone resorption and remodeling. 相似文献14.
15.
Background
Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. 相似文献16.
Henry Han 《BMC medical genomics》2014,7(Z1):S5
Background
As a promising way to transform medicine, mass spectrometry based proteomics technologies have seen a great progress in identifying disease biomarkers for clinical diagnosis and prognosis. However, there is a lack of effective feature selection methods that are able to capture essential data behaviors to achieve clinical level disease diagnosis. Moreover, it faces a challenge from data reproducibility, which means that no two independent studies have been found to produce same proteomic patterns. Such reproducibility issue causes the identified biomarker patterns to lose repeatability and prevents it from real clinical usage.Methods
In this work, we propose a novel machine-learning algorithm: derivative component analysis (DCA) for high-dimensional mass spectral proteomic profiles. As an implicit feature selection algorithm, derivative component analysis examines input proteomics data in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We further demonstrate DCA's advantages in disease diagnosis by viewing input proteomics data as a profile biomarker via integrating it with support vector machines to tackle the reproducibility issue, besides comparing it with state-of-the-art peers.Results
Our results show that high-dimensional proteomics data are actually linearly separable under proposed derivative component analysis (DCA). As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in subtle data behavior discovery, but also suggests an effective resolution to overcoming proteomics data's reproducibility problem and provides new techniques and insights in translational bioinformatics and machine learning. The DCA-based profile biomarker diagnosis makes clinical level diagnostic performances reproducible across different proteomic data, which is more robust and systematic than the existing biomarker discovery based diagnosis.Conclusions
Our findings demonstrate the feasibility and power of the proposed DCA-based profile biomarker diagnosis in achieving high sensitivity and conquering the data reproducibility issue in serum proteomics. Furthermore, our proposed derivative component analysis suggests the subtle data characteristics gleaning and de-noising are essential in separating true signals from red herrings for high-dimensional proteomic profiles, which can be more important than the conventional feature selection or dimension reduction. In particular, our profile biomarker diagnosis can be generalized to other omics data for derivative component analysis (DCA)'s nature of generic data analysis.17.
Background
In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable. 相似文献18.
Feature selection for splice site prediction: A new method using EDA-based feature ranking 总被引:1,自引:0,他引:1
Background
The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. 相似文献19.
Rita L Strack Dibyendu Bhattacharyya Benjamin S Glick Robert J Keenan 《BMC biotechnology》2009,9(1):32-10
Background
Whole-cell labeling is a common application of fluorescent proteins (FPs), but many red and orange FPs exhibit cytotoxicity that limits their use as whole-cell labels. Recently, a tetrameric red FP called DsRed-Express2 was engineered for enhanced solubility and was shown to be noncytotoxic in bacterial and mammalian cells. Our goal was to create derivatives of this protein with different spectral properties. 相似文献20.