期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space

Max Bylesjö Mattias Rantalainen Jeremy K Nicholson Elaine Holmes Johan Trygg 《BMC bioinformatics》2008,9(1):106

Background

Kernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation. 相似文献

2.

Evolutionary optimization of classifiers and features for single-trial EEG Discrimination

Malin CB Åberg Johan Wessberg 《Biomedical engineering online》2007,6(1):32

Background

State-of-the-art signal processing methods are known to detect information in single-trial event-related EEG data, a crucial aspect in development of real-time applications such as brain computer interfaces. This paper investigates one such novel approach, evaluating how individual classifier and feature subset tailoring affects classification of single-trial EEG finger movements. The discrete wavelet transform was used to extract signal features that were classified using linear regression and non-linear neural network models, which were trained and architecturally optimized with evolutionary algorithms. The input feature subsets were also allowed to evolve, thus performing feature selection in a wrapper fashion. Filter approaches were implemented as well by limiting the degree of optimization. 相似文献

3.

Comparison of multivariate methods for estimating soil total nitrogen with visible/near-infrared spectroscopy 总被引：3，自引：0，他引：3

Tiezhu Shi Lijuan Cui Junjie Wang Teng Fei Yiyun Chen Guofeng Wu 《Plant and Soil》2013,366(1-2):363-375

Aims

This study aimed to compare stepwise multiple linear regression (SMLR), partial least squares regression (PLSR) and support vector machine regression (SVMR) for estimating soil total nitrogen (TN) contents with laboratory visible/near-infrared reflectance (Vis/NIR) of selected coarse and heterogeneous soils. Moreover, the effects of the first (1st) vs. second (2nd) derivative of spectral reflectance and the importance wavelengths were explored.

Methods

The TN contents and the Vis/NIR were measured in the laboratory. Several methods were employed for Vis/NIR data pre-processing. The SMLR, PLSR and SVMR models were calibrated and validated using independent datasets.

Results

Results showed that the SVMR and the PLSR models had similar performances, and better performances than the SMLR. The spectral bands near 1450, 1850, 2250, 2330 and 2430 nm in the PLSR model were important wavelengths. In addition, the 1st derivative was more appropriate than the 2nd derivative for spectral data pre-processing.

Conclusions

PLSR was the most suitable method for estimating TN contents in this study. SVMR may be a promising technique, and its potential needs to be further explored. Moreover, the future studies using outdoor and airborne/satellite hyperspectral data for estimating TN content are necessary for testing the findings. 相似文献

4.

Application of NIR Spectroscopy Coupled with PLS Regression for Quantification of Total Polyphenol Contents from the Fruit and Aerial Parts of Citrullus colocynthis

下载免费PDF全文

Tania S. Rizvi Fazal Mabood Liaqat Ali Mohammed Al‐Broumi Hamida K.M. Al Rabani Javid Hussain Farah Jabeen Suryyia Manzoor Ahmed Al‐Harrasi 《Phytochemical analysis : PCA》2018,29(1):16-22

Introduction

Citrullus colocynthis (L.) Schrad is extensively used to treat diabetes, obesity, fever, cancer, amenorrhea, jaundice, leukemia, rheumatism, and respiratory diseases. Chemical studies have indicated the presence of several cucurbitacins, flavones, and other polyphenols in this plant. These phytochemical constituents are responsible for the interesting antioxidant and other biological activities of C. colocynthis.

Objective

In the present study, for the first time, near infrared (NIR) spectroscopy coupled with partial least square (PLS) regression analysis was used to quantify the polyphenolic phytochemicals of C. colocynthis.

Methodology

The fruit and aerial parts of the C. colocynthis were extracted individually in methanol followed by fractionation in n‐hexane, chloroform, ethyl acetate, n‐butanol, and water. Near infrared (NIR) spectra were obtained in absorption mode in the wavelength range 700–2500 nm. The PLS regression model was then built from the obtained spectral data to quantify the total polyphenol contents in the selected plant samples.

Results

The PLS regression model obtained had a R² value of 99% with a 0.98 correlationship value and a good prediction with a root mean square error of prediction (RMSEP) value of 1.89% and correlation of 0.98. These results were further confirmed through UV–vis spectroscopy and it is found that the ethyl acetate fraction has the maximum value for polyphenol contents (101.7 mg/100 g; NIR, 100.4 mg/100 g; UV–vis).

Conclusions

The polyphenolic phytochemicals of the fruit and aerial parts of C. colocynthis have been quantified successfully by using multivariate analysis in a non‐destructive, economical, precise, and highly sensitive method, which uses very simple sample preparation. Copyright © 2017 John Wiley & Sons, Ltd. 相似文献

5.

Detailed protein sequence alignment based on Spectral Similarity Score (SSS)

Kshitiz?Gupta Email author Dina?Thomas SV?Vidya KV?Venkatesh Email author S?Ramakumar 《BMC bioinformatics》2005,6(1):105

Background

The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain. 相似文献

6.

PatternLab for proteomics: a tool for differential shotgun proteomics

Paulo C Carvalho Juliana SG Fischer Emily I Chen John R YatesIII Valmir C Barbosa 《BMC bioinformatics》2008,9(1):316

Background

A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. 相似文献

7.

Human Lsg1 defines a family of essential GTPases that correlates with the evolution of compartmentalization

Emmanuel G Reynaud Miguel A Andrade Fabien Bonneau Thi Bach Nga Ly Michael Knop Klaus Scheffzek Rainer Pepperkok 《BMC biology》2005,3(1):21

Background

Compartmentalization is a key feature of eukaryotic cells, but its evolution remains poorly understood. GTPases are the oldest enzymes that use nucleotides as substrates and they participate in a wide range of cellular processes. Therefore, they are ideal tools for comparative genomic studies aimed at understanding how aspects of biological complexity such as cellular compartmentalization evolved. 相似文献

8.

Spectral affinity in protein networks

Konstantin Voevodski Shang-Hua Teng Yu Xia 《BMC systems biology》2009,3(1):112-13

Background

Protein-protein interaction (PPI) networks enable us to better understand the functional organization of the proteome. We can learn a lot about a particular protein by querying its neighborhood in a PPI network to find proteins with similar function. A spectral approach that considers random walks between nodes of interest is particularly useful in evaluating closeness in PPI networks. Spectral measures of closeness are more robust to noise in the data and are more precise than simpler methods based on edge density and shortest path length. 相似文献

9.

A classification method based on principal components of SELDI spectra to diagnose of lung adenocarcinoma

Lin Q Peng Q Yao F Pan XF Xiong LW Wang Y Geng JF Feng JX Han BH Bao GL Yang Y Wang X Jin L Guo W Wang JC 《PloS one》2012,7(3):e34457

Purpose

Lung cancer is the leading cause of cancer death worldwide, but techniques for effective early diagnosis are still lacking. Proteomics technology has been applied extensively to the study of the proteins involved in carcinogenesis. In this paper, a classification method was developed based on principal components of surface-enhanced laser desorption/ionization (SELDI) spectral data. This method was applied to SELDI spectral data from 71 lung adenocarcinoma patients and 24 healthy individuals. Unlike other peak-selection-based methods, this method takes each spectrum as a unity. The aim of this paper was to demonstrate that this unity-based classification method is more robust and powerful as a method of diagnosis than peak-selection-based methods.

Results

The results showed that this classification method, which is based on principal components, has outstanding performance with respect to distinguishing lung adenocarcinoma patients from normal individuals. Through leaving-one-out, 19-fold, 5-fold and 2-fold cross-validation studies, we found that this classification method based on principal components completely outperforms peak-selection-based methods, such as decision tree, classification and regression tree, support vector machine, and linear discriminant analysis.

Conclusions and Clinical Relevance

The classification method based on principal components of SELDI spectral data is a robust and powerful means of diagnosing lung adenocarcinoma. We assert that the high efficiency of this classification method renders it feasible for large-scale clinical use. 相似文献

10.

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

Malik Yousef Segun Jung Louise C Showe Michael K Showe 《BMC bioinformatics》2007,8(1):144

Background

Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. 相似文献

11.

Combining strong sparsity and competitive predictive power with the L-sOPLS approach for biomarker discovery in metabolomics

Baptiste Féraud Carine Munaut Manon Martin Michel Verleysen Bernadette Govaerts 《Metabolomics : Official journal of the Metabolomic Society》2017,13(11):130

相似文献

12.

Survival prediction from clinico-genomic models - a comparative study

Hege M B?velstad St?le Nyg?rd ?rnulf Borgan 《BMC bioinformatics》2009,10(1):413

Background

Survival prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical covariates that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions, but there is a lack of systematic studies on the topic. Also, for the widely used Cox regression model, it is not obvious how to handle such combined models. 相似文献

13.

Bone resorption and remodeling in murine collagenase-induced osteoarthritis after administration of glucosamine

Ivanovska N Dimitrova P 《Arthritis research & therapy》2011,13(2):R44

Introduction

Glucosamine is an amino-monosaccharide and precursor of glycosaminoglycans, major components of joint cartilage. Glucosamine has been clinically introduced for the treatment of osteoarthritis but the data about its protective role in disease are insufficient. The goal of this study was to investigate the effect of long term administration of glucosamine on bone resorption and remodeling. 相似文献

14.

An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis

Romesh Stanislaus John M Arthur Balaji Rajagopalan Rick Moerschell Brian McGlothlen Jonas S Almeida 《BMC bioinformatics》2008,9(1):4

相似文献

15.

Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

Niclas C Tan Wayne G Fisher Kevin P Rosenblatt Harold R Garner 《BMC bioinformatics》2009,10(1):144

Background

Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. 相似文献

16.

Derivative component analysis for mass spectral serum proteomic profiles

Henry Han 《BMC medical genomics》2014,7(Z1):S5

Background

As a promising way to transform medicine, mass spectrometry based proteomics technologies have seen a great progress in identifying disease biomarkers for clinical diagnosis and prognosis. However, there is a lack of effective feature selection methods that are able to capture essential data behaviors to achieve clinical level disease diagnosis. Moreover, it faces a challenge from data reproducibility, which means that no two independent studies have been found to produce same proteomic patterns. Such reproducibility issue causes the identified biomarker patterns to lose repeatability and prevents it from real clinical usage.

Methods

In this work, we propose a novel machine-learning algorithm: derivative component analysis (DCA) for high-dimensional mass spectral proteomic profiles. As an implicit feature selection algorithm, derivative component analysis examines input proteomics data in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We further demonstrate DCA's advantages in disease diagnosis by viewing input proteomics data as a profile biomarker via integrating it with support vector machines to tackle the reproducibility issue, besides comparing it with state-of-the-art peers.

Results

Our results show that high-dimensional proteomics data are actually linearly separable under proposed derivative component analysis (DCA). As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in subtle data behavior discovery, but also suggests an effective resolution to overcoming proteomics data's reproducibility problem and provides new techniques and insights in translational bioinformatics and machine learning. The DCA-based profile biomarker diagnosis makes clinical level diagnostic performances reproducible across different proteomic data, which is more robust and systematic than the existing biomarker discovery based diagnosis.

Conclusions

Our findings demonstrate the feasibility and power of the proposed DCA-based profile biomarker diagnosis in achieving high sensitivity and conquering the data reproducibility issue in serum proteomics. Furthermore, our proposed derivative component analysis suggests the subtle data characteristics gleaning and de-noising are essential in separating true signals from red herrings for high-dimensional proteomic profiles, which can be more important than the conventional feature selection or dimension reduction. In particular, our profile biomarker diagnosis can be generalized to other omics data for derivative component analysis (DCA)'s nature of generic data analysis.

相似文献

17.

msmsEval: tandem mass spectral quality assignment for high-throughput proteomics

Jason WH Wong Matthew J Sullivan Hugh M Cartwright Gerard Cagney 《BMC bioinformatics》2007,8(1):51

Background

In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable. 相似文献

18.

Feature selection for splice site prediction: A new method using EDA-based feature ranking 总被引：1，自引：0，他引：1

Yvan?Saeys Sven?Degroeve Dirk?Aeyels Pierre?Rouzé Yves?Van de Peer Email author 《BMC bioinformatics》2004,5(1):64

Background

The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. 相似文献

19.

Noncytotoxic orange and red/green derivatives of DsRed-Express2 for whole-cell labeling

Rita L Strack Dibyendu Bhattacharyya Benjamin S Glick Robert J Keenan 《BMC biotechnology》2009,9(1):32-10

Background

Whole-cell labeling is a common application of fluorescent proteins (FPs), but many red and orange FPs exhibit cytotoxicity that limits their use as whole-cell labels. Recently, a tetrameric red FP called DsRed-Express2 was engineered for enhanced solubility and was shown to be noncytotoxic in bacterial and mammalian cells. Our goal was to create derivatives of this protein with different spectral properties. 相似文献

20.

Flower Colours through the Lens: Quantitative Measurement with Visible and Ultraviolet Digital Photography

Jair E. Garcia Andrew D. Greentree Mani Shrestha Alan Dorin Adrian G. Dyer 《PloS one》2014,9(5)

Background

The study of the signal-receiver relationship between flowering plants and pollinators requires a capacity to accurately map both the spectral and spatial components of a signal in relation to the perceptual abilities of potential pollinators. Spectrophotometers can typically recover high resolution spectral data, but the spatial component is difficult to record simultaneously. A technique allowing for an accurate measurement of the spatial component in addition to the spectral factor of the signal is highly desirable.

Methodology/Principal findings

Consumer-level digital cameras potentially provide access to both colour and spatial information, but they are constrained by their non-linear response. We present a robust methodology for recovering linear values from two different camera models: one sensitive to ultraviolet (UV) radiation and another to visible wavelengths. We test responses by imaging eight different plant species varying in shape, size and in the amount of energy reflected across the UV and visible regions of the spectrum, and compare the recovery of spectral data to spectrophotometer measurements. There is often a good agreement of spectral data, although when the pattern on a flower surface is complex a spectrophotometer may underestimate the variability of the signal as would be viewed by an animal visual system.

Conclusion

Digital imaging presents a significant new opportunity to reliably map flower colours to understand the complexity of these signals as perceived by potential pollinators. Compared to spectrophotometer measurements, digital images can better represent the spatio-chromatic signal variability that would likely be perceived by the visual system of an animal, and should expand the possibilities for data collection in complex, natural conditions. However, and in spite of its advantages, the accuracy of the spectral information recovered from camera responses is subject to variations in the uncertainty levels, with larger uncertainties associated with low radiance levels. 相似文献