首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation.  相似文献   

2.

Background  

State-of-the-art signal processing methods are known to detect information in single-trial event-related EEG data, a crucial aspect in development of real-time applications such as brain computer interfaces. This paper investigates one such novel approach, evaluating how individual classifier and feature subset tailoring affects classification of single-trial EEG finger movements. The discrete wavelet transform was used to extract signal features that were classified using linear regression and non-linear neural network models, which were trained and architecturally optimized with evolutionary algorithms. The input feature subsets were also allowed to evolve, thus performing feature selection in a wrapper fashion. Filter approaches were implemented as well by limiting the degree of optimization.  相似文献   

3.

Aims

This study aimed to compare stepwise multiple linear regression (SMLR), partial least squares regression (PLSR) and support vector machine regression (SVMR) for estimating soil total nitrogen (TN) contents with laboratory visible/near-infrared reflectance (Vis/NIR) of selected coarse and heterogeneous soils. Moreover, the effects of the first (1st) vs. second (2nd) derivative of spectral reflectance and the importance wavelengths were explored.

Methods

The TN contents and the Vis/NIR were measured in the laboratory. Several methods were employed for Vis/NIR data pre-processing. The SMLR, PLSR and SVMR models were calibrated and validated using independent datasets.

Results

Results showed that the SVMR and the PLSR models had similar performances, and better performances than the SMLR. The spectral bands near 1450, 1850, 2250, 2330 and 2430 nm in the PLSR model were important wavelengths. In addition, the 1st derivative was more appropriate than the 2nd derivative for spectral data pre-processing.

Conclusions

PLSR was the most suitable method for estimating TN contents in this study. SVMR may be a promising technique, and its potential needs to be further explored. Moreover, the future studies using outdoor and airborne/satellite hyperspectral data for estimating TN content are necessary for testing the findings.  相似文献   

4.

Introduction

Citrullus colocynthis (L.) Schrad is extensively used to treat diabetes, obesity, fever, cancer, amenorrhea, jaundice, leukemia, rheumatism, and respiratory diseases. Chemical studies have indicated the presence of several cucurbitacins, flavones, and other polyphenols in this plant. These phytochemical constituents are responsible for the interesting antioxidant and other biological activities of C. colocynthis.

Objective

In the present study, for the first time, near infrared (NIR) spectroscopy coupled with partial least square (PLS) regression analysis was used to quantify the polyphenolic phytochemicals of C. colocynthis.

Methodology

The fruit and aerial parts of the C. colocynthis were extracted individually in methanol followed by fractionation in n‐hexane, chloroform, ethyl acetate, n‐butanol, and water. Near infrared (NIR) spectra were obtained in absorption mode in the wavelength range 700–2500 nm. The PLS regression model was then built from the obtained spectral data to quantify the total polyphenol contents in the selected plant samples.

Results

The PLS regression model obtained had a R2 value of 99% with a 0.98 correlationship value and a good prediction with a root mean square error of prediction (RMSEP) value of 1.89% and correlation of 0.98. These results were further confirmed through UV–vis spectroscopy and it is found that the ethyl acetate fraction has the maximum value for polyphenol contents (101.7 mg/100 g; NIR, 100.4 mg/100 g; UV–vis).

Conclusions

The polyphenolic phytochemicals of the fruit and aerial parts of C. colocynthis have been quantified successfully by using multivariate analysis in a non‐destructive, economical, precise, and highly sensitive method, which uses very simple sample preparation. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

5.

Background  

The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain.  相似文献   

6.

Background  

A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.  相似文献   

7.

Background  

Compartmentalization is a key feature of eukaryotic cells, but its evolution remains poorly understood. GTPases are the oldest enzymes that use nucleotides as substrates and they participate in a wide range of cellular processes. Therefore, they are ideal tools for comparative genomic studies aimed at understanding how aspects of biological complexity such as cellular compartmentalization evolved.  相似文献   

8.

Background  

Protein-protein interaction (PPI) networks enable us to better understand the functional organization of the proteome. We can learn a lot about a particular protein by querying its neighborhood in a PPI network to find proteins with similar function. A spectral approach that considers random walks between nodes of interest is particularly useful in evaluating closeness in PPI networks. Spectral measures of closeness are more robust to noise in the data and are more precise than simpler methods based on edge density and shortest path length.  相似文献   

9.
Lin Q  Peng Q  Yao F  Pan XF  Xiong LW  Wang Y  Geng JF  Feng JX  Han BH  Bao GL  Yang Y  Wang X  Jin L  Guo W  Wang JC 《PloS one》2012,7(3):e34457

Purpose

Lung cancer is the leading cause of cancer death worldwide, but techniques for effective early diagnosis are still lacking. Proteomics technology has been applied extensively to the study of the proteins involved in carcinogenesis. In this paper, a classification method was developed based on principal components of surface-enhanced laser desorption/ionization (SELDI) spectral data. This method was applied to SELDI spectral data from 71 lung adenocarcinoma patients and 24 healthy individuals. Unlike other peak-selection-based methods, this method takes each spectrum as a unity. The aim of this paper was to demonstrate that this unity-based classification method is more robust and powerful as a method of diagnosis than peak-selection-based methods.

Results

The results showed that this classification method, which is based on principal components, has outstanding performance with respect to distinguishing lung adenocarcinoma patients from normal individuals. Through leaving-one-out, 19-fold, 5-fold and 2-fold cross-validation studies, we found that this classification method based on principal components completely outperforms peak-selection-based methods, such as decision tree, classification and regression tree, support vector machine, and linear discriminant analysis.

Conclusions and Clinical Relevance

The classification method based on principal components of SELDI spectral data is a robust and powerful means of diagnosing lung adenocarcinoma. We assert that the high efficiency of this classification method renders it feasible for large-scale clinical use.  相似文献   

10.

Background  

Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.  相似文献   

11.
12.

Background  

Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models.  相似文献   

13.

Introduction  

Glucosamine is an amino-monosaccharide and precursor of glycosaminoglycans, major components of joint cartilage. Glucosamine has been clinically introduced for the treatment of osteoarthritis but the data about its protective role in disease are insufficient. The goal of this study was to investigate the effect of long term administration of glucosamine on bone resorption and remodeling.  相似文献   

14.
15.

Background  

Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.  相似文献   

16.

Background

As a promising way to transform medicine, mass spectrometry based proteomics technologies have seen a great progress in identifying disease biomarkers for clinical diagnosis and prognosis. However, there is a lack of effective feature selection methods that are able to capture essential data behaviors to achieve clinical level disease diagnosis. Moreover, it faces a challenge from data reproducibility, which means that no two independent studies have been found to produce same proteomic patterns. Such reproducibility issue causes the identified biomarker patterns to lose repeatability and prevents it from real clinical usage.

Methods

In this work, we propose a novel machine-learning algorithm: derivative component analysis (DCA) for high-dimensional mass spectral proteomic profiles. As an implicit feature selection algorithm, derivative component analysis examines input proteomics data in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We further demonstrate DCA's advantages in disease diagnosis by viewing input proteomics data as a profile biomarker via integrating it with support vector machines to tackle the reproducibility issue, besides comparing it with state-of-the-art peers.

Results

Our results show that high-dimensional proteomics data are actually linearly separable under proposed derivative component analysis (DCA). As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in subtle data behavior discovery, but also suggests an effective resolution to overcoming proteomics data's reproducibility problem and provides new techniques and insights in translational bioinformatics and machine learning. The DCA-based profile biomarker diagnosis makes clinical level diagnostic performances reproducible across different proteomic data, which is more robust and systematic than the existing biomarker discovery based diagnosis.

Conclusions

Our findings demonstrate the feasibility and power of the proposed DCA-based profile biomarker diagnosis in achieving high sensitivity and conquering the data reproducibility issue in serum proteomics. Furthermore, our proposed derivative component analysis suggests the subtle data characteristics gleaning and de-noising are essential in separating true signals from red herrings for high-dimensional proteomic profiles, which can be more important than the conventional feature selection or dimension reduction. In particular, our profile biomarker diagnosis can be generalized to other omics data for derivative component analysis (DCA)'s nature of generic data analysis.
  相似文献   

17.

Background  

In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable.  相似文献   

18.

Background  

The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.  相似文献   

19.

Background  

Whole-cell labeling is a common application of fluorescent proteins (FPs), but many red and orange FPs exhibit cytotoxicity that limits their use as whole-cell labels. Recently, a tetrameric red FP called DsRed-Express2 was engineered for enhanced solubility and was shown to be noncytotoxic in bacterial and mammalian cells. Our goal was to create derivatives of this protein with different spectral properties.  相似文献   

20.

Background

The study of the signal-receiver relationship between flowering plants and pollinators requires a capacity to accurately map both the spectral and spatial components of a signal in relation to the perceptual abilities of potential pollinators. Spectrophotometers can typically recover high resolution spectral data, but the spatial component is difficult to record simultaneously. A technique allowing for an accurate measurement of the spatial component in addition to the spectral factor of the signal is highly desirable.

Methodology/Principal findings

Consumer-level digital cameras potentially provide access to both colour and spatial information, but they are constrained by their non-linear response. We present a robust methodology for recovering linear values from two different camera models: one sensitive to ultraviolet (UV) radiation and another to visible wavelengths. We test responses by imaging eight different plant species varying in shape, size and in the amount of energy reflected across the UV and visible regions of the spectrum, and compare the recovery of spectral data to spectrophotometer measurements. There is often a good agreement of spectral data, although when the pattern on a flower surface is complex a spectrophotometer may underestimate the variability of the signal as would be viewed by an animal visual system.

Conclusion

Digital imaging presents a significant new opportunity to reliably map flower colours to understand the complexity of these signals as perceived by potential pollinators. Compared to spectrophotometer measurements, digital images can better represent the spatio-chromatic signal variability that would likely be perceived by the visual system of an animal, and should expand the possibilities for data collection in complex, natural conditions. However, and in spite of its advantages, the accuracy of the spectral information recovered from camera responses is subject to variations in the uncertainty levels, with larger uncertainties associated with low radiance levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号