首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SELDI-TOF mass spectrometer''s compact size and automated, high throughput design have been attractive to clinical researchers, and the platform has seen steady-use in biomarker studies. Despite new algorithms and preprocessing pipelines that have been developed to address reproducibility issues, visual inspection of the results of SELDI spectra preprocessing by the best algorithms still shows miscalled peaks and systematic sources of error. This suggests that there continues to be problems with SELDI preprocessing. In this work, we study the preprocessing of SELDI in detail and introduce improvements. While many algorithms, including the vendor supplied software, can identify peak clusters of specific mass (or m/z) in groups of spectra with high specificity and low false discover rate (FDR), the algorithms tend to underperform estimating the exact prevalence and intensity of peaks in those clusters. Thus group differences that at first appear very strong are shown, after careful and laborious hand inspection of the spectra, to be less than significant. Here we introduce a wavelet/neural network based algorithm which mimics what a team of expert, human users would call for peaks in each of several hundred spectra in a typical SELDI clinical study. The wavelet denoising part of the algorithm optimally smoothes the signal in each spectrum according to an improved suite of signal processing algorithms previously reported (the LibSELDI toolbox under development). The neural network part of the algorithm combines those results with the raw signal and a training dataset of expertly called peaks, to call peaks in a test set of spectra with approximately 95% accuracy. The new method was applied to data collected from a study of cervical mucus for the early detection of cervical cancer in HPV infected women. The method shows promise in addressing the ongoing SELDI reproducibility issues.  相似文献   

2.

Motivation

Mass spectrometry is a high throughput, fast, and accurate method of protein analysis. Using the peaks detected in spectra, we can compare a normal group with a disease group. However, the spectrum is complicated by scale shifting and is also full of noise. Such shifting makes the spectra non-stationary and need to align before comparison. Consequently, the preprocessing of the mass data plays an important role during the analysis process. Noises in mass spectrometry data come in lots of different aspects and frequencies. A powerful data preprocessing method is needed for removing large amount of noises in mass spectrometry data.

Results

Hilbert-Huang Transformation is a non-stationary transformation used in signal processing. We provide a novel algorithm for preprocessing that can deal with MALDI and SELDI spectra. We use the Hilbert-Huang Transformation to decompose the spectrum and filter-out the very high frequencies and very low frequencies signal. We think the noise in mass spectrometry comes from many sources and some of the noises can be removed by analysis of signal frequence domain. Since the protein in the spectrum is expected to be a unique peak, its frequence domain should be in the middle part of frequence domain and will not be removed. The results show that HHT, when used for preprocessing, is generally better than other preprocessing methods. The approach not only is able to detect peaks successfully, but HHT has the advantage of denoising spectra efficiently, especially when the data is complex. The drawback of HHT is that this approach takes much longer for the processing than the wavlet and traditional methods. However, the processing time is still manageable and is worth the wait to obtain high quality data.  相似文献   

3.

Background  

Surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI) is a proteomics tool for biomarker discovery and other high throughput applications. Previous studies have identified various areas for improvement in preprocessing algorithms used for protein peak detection. Bottom-up approaches to preprocessing that emphasize modeling SELDI data acquisition are promising avenues of research to find the needed improvements in reproducibility.  相似文献   

4.
5.
Our goal in this paper is to show an analytical workflow for selecting protein biomarker candidates from SELDI-MS data. The clinical question at issue is to enable prediction of the complete remission (CR) duration for acute myeloid leukemia (AML) patients. This would facilitate disease prognosis and make individual therapy possible. SELDI-mass spectrometry proteomics analyses were performed on blast cell samples collected from AML patients pre-chemotherapy. Although the biobank available included approximately 200 samples, only 58 were available for analysis. The presented workflow includes sample selection, experimental optimization, repeatability estimation, data preprocessing, data fusion, and feature selection. Specific difficulties have been the small number of samples and the skew distribution of the CR duration among the patients. Further, we had to deal with both noisy SELDI-MS data and a diverse patient cohort. This has been handled by sample selection and several methods for data preprocessing and feature detection in the analysis workflow. Four conceptually different methods for peak detection and alignment were considered, as well as two diverse methods for feature selection. The peak detection and alignment methods included the recently developed annotated regions of significance (ARS) method, the SELDI-MS software Ciphergen Express which was regarded as the standard method, segment-wise spectral alignment by a genetic algorithm (PAGA) followed by binning, and, finally, binning of raw data. In the feature selection, the "standard" Mann-Whitney t test was compared with a hierarchical orthogonal partial least-squares (O-PLS) analysis approach. The combined information from all these analyses gave a collection of 21 protein peaks. These were regarded as the most potential and robust biomarker candidates since they were picked out as significant features in several of the models. The chosen peaks will now be our first choice for the continuing work on protein identification and biological validation. The identification will be performed by chromatographic purification and MALDI MS/MS. Thus, we have shown that the use of several data handling methods can improve a protein profiling workflow from experimental optimization to a predictive model. The framework of this methodology should be seen as general and could be used with other one-dimensional spectral omics data than SELDI MS including an adequate number of samples.  相似文献   

6.
MOTIVATION: There has been much interest in using patterns derived from surface-enhanced laser desorption and ionization (SELDI) protein mass spectra from serum to differentiate samples from patients both with and without disease. Such patterns have been used without identification of the underlying proteins responsible. However, there are questions as to the stability of this procedure over multiple experiments. RESULTS: We compared SELDI proteomic spectra from serum from three experiments by the same group on separating ovarian cancer from normal tissue. These spectra are available on the web at http://clinicalproteomics.steem.com. In general, the results were not reproducible across experiments. Baseline correction prevents reproduction of the results for two of the experiments. In one experiment, there is evidence of a major shift in protocol mid-experiment which could bias the results. In another, structure in the noise regions of the spectra allows us to distinguish normal from cancer, suggesting that the normals and cancers were processed differently. Sets of features found to discriminate well in one experiment do not generalize to other experiments. Finally, the mass calibration in all three experiments appears suspect. Taken together, these and other concerns suggest that much of the structure uncovered in these experiments could be due to artifacts of sample processing, not to the underlying biology of cancer. We provide some guidelines for design and analysis in experiments like these to ensure better reproducible, biologically meaningfully results. AVAILABILITY: The MATLAB and Perl code used in our analyses is available at http://bioinformatics.mdanderson.org  相似文献   

7.
In most microarray technologies, a number of critical stepsare required to convert raw intensity measurements into thedata relied upon by data analysts, biologists, and clinicians.These data manipulations, referred to as preprocessing, caninfluence the quality of the ultimate measurements. In the lastfew years, the high-throughput measurement of gene expressionis the most popular application of microarray technology. Forthis application, various groups have demonstrated that theuse of modern statistical methodology can substantially improveaccuracy and precision of the gene expression measurements,relative to ad hoc procedures introduced by designers and manufacturersof the technology. Currently, other applications of microarraysare becoming more and more popular. In this paper, we describea preprocessing methodology for a technology designed for theidentification of DNA sequence variants in specific genes orregions of the human genome that are associated with phenotypesof interest such as disease. In particular, we describe a methodologyuseful for preprocessing Affymetrix single-nucleotide polymorphismchips and obtaining genotype calls with the preprocessed data.We demonstrate how our procedure improves existing approachesusing data from 3 relatively large studies including the onein which large numbers of independent calls are available. Theproposed methods are implemented in the package oligo availablefrom Bioconductor.  相似文献   

8.
The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.  相似文献   

9.
ABSTRACT: BACKGROUND: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms. RESULTS: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes. CONCLUSIONS: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.  相似文献   

10.

Background  

A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. There exist different kind of microarray platforms, that produce different types of binary data (images and raw data). Moreover, also considering a single vendor, different chips are available. The analysis of microarray data requires an initial preprocessing phase (i.e. normalization and summarization) of raw data that makes them suitable for use on existing platforms, such as the TIGR M4 Suite. Nevertheless, the annotations of data with additional information such as gene function, is needed to perform more powerful analysis. Raw data preprocessing and annotation is often performed in a manual and error prone way. Moreover, many available preprocessing tools do not support annotation. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of microarray data are needed.  相似文献   

11.

Background

Proteomic profiling of complex biological mixtures by the ProteinChip technology of surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS) is one of the most promising approaches in toxicological, biological, and clinic research. The reliable identification of protein expression patterns and associated protein biomarkers that differentiate disease from health or that distinguish different stages of a disease depends on developing methods for assessing the quality of SELDI-TOF mass spectra. The use of SELDI data for biomarker identification requires application of rigorous procedures to detect and discard low quality spectra prior to data analysis.

Results

The systematic variability from plates, chips, and spot positions in SELDI experiments was evaluated using biological and technical replicates. Systematic biases on plates, chips, and spots were not found. The reproducibility of SELDI experiments was demonstrated by examining the resulting low coefficient of variances of five peaks presented in all 144 spectra from quality control samples that were loaded randomly on different spots in the chips of six bioprocessor plates. We developed a method to detect and discard low quality spectra prior to proteomic profiling data analysis, which uses a correlation matrix to measure the similarities among SELDI mass spectra obtained from similar biological samples. Application of the correlation matrix to our SELDI data for liver cancer and liver toxicity study and myeloma-associated lytic bone disease study confirmed this approach as an efficient and reliable method for detecting low quality spectra.

Conclusion

This report provides evidence that systematic variability between plates, chips, and spots on which the samples were assayed using SELDI based proteomic procedures did not exist. The reproducibility of experiments in our studies was demonstrated to be acceptable and the profiling data for subsequent data analysis are reliable. Correlation matrix was developed as a quality control tool to detect and discard low quality spectra prior to data analysis. It proved to be a reliable method to measure the similarities among SELDI mass spectra and can be used for quality control to decrease noise in proteomic profiling data prior to data analysis.
  相似文献   

12.
Ovarian cancer is characterize by few early symptoms, presentation at an advanced stage, and poor survival. As a result, it is the most frequent cause of death from gynecological cancer. During the last decade, a research effort has been directed toward improving outcomes for ovarian cancer by screening for preclinical, early stage disease using both imaging techniques and serum markers. Numerous biomarkers have shown potential in samples from clinically diagnosed ovarian cancer patients, but few have been thoroughly assessed in preclinical disease and screening. The most thoroughly investigated biomarker in ovarian cancer screening is CA125. Prospective studies have demonstrated that both CA125 and transvaginal ultrasound can detect a significant proportion of preclinical ovarian cancers, and refinements in interpretation of results have improved sensitivity and reduced the false-positive rate of screening. There is preliminary evidence that screening can improve survival, but the impact of screening on mortality from ovarian cancer is still unclear. Prospective studies of screening are in progress in both the general population and high-risk population, including the United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), a randomized trial involving 200,000 postmenopausal women designed to document the impact of screening on mortality. Recent advances in technology for the study of the serum proteome offer exciting opportunities for the identification of novel biomarkers or patterns of markers that will have greater sensitivity and lead time for preclinical disease than CA125. Considerable interest and controversy has been generated by initial results utilizing surface-enhanced laser desorption/ionization (SELDI) in ovarian cancer. There are challenging issues related to the design of studies to evaluate SELDI and other proteomic technology, as well as the reproducibility, sensitivity, and specificity of this new technology. Large serum banks such as that assembled in UKCTOCS, which contain preclinical samples from patients who later developed ovarian cancer and other disorders, provide a unique resource for carefully designed studies of proteomic technology. There is a sound basis for optimism that further developments in serum proteomic analysis will provide powerful methods for screening in ovarian cancer and many other diseases.  相似文献   

13.
Despite advances in imaging technologies for the heart, screening of patients for cardiac pathology continues to include the use of traditional stethoscope auscultation. Detection of heart murmurs by the primary care physician often results in the ordering of additional expensive testing or referral to cardiology subspecialists, although many of the patients are eventually found to have no pathologic condition. In contrast, auscultation by an experienced cardiologist is highly sensitive and specific for detecting heart disease. Although attempts have been made to automate screening by auscultation, no device is currently available to fulfill this function. Multiple indicators of pathology are nonetheless available from heart sounds and can be elicited using certain signal processing techniques such as wavelet analysis. Results presented here show that a signal of pathology, the systolic murmur, can reliably be detected and classified as pathologic using a portable electrocardiogram and heart sound measurement unit combined with a wavelet-based algorithm. Wavelet decomposition holds promise for extending these results to detection and evaluation of other audible pathologic indicators.  相似文献   

14.
Residue coevolution has recently emerged as an important concept, especially in the context of protein structures. While a multitude of different functions for quantifying it have been proposed, not much is known about their relative strengths and weaknesses. Also, subtle algorithmic details have discouraged implementing and comparing them. We addressed this issue by developing an integrated online system that enables comparative analyses with a comprehensive set of commonly used scoring functions, including Statistical Coupling Analysis (SCA), Explicit Likelihood of Subset Variation (ELSC), mutual information and correlation-based methods. A set of data preprocessing options are provided for improving the sensitivity and specificity of coevolution signal detection, including sequence weighting, residue grouping and the filtering of sequences, sites and site pairs. A total of more than 100 scoring variations are available. The system also provides facilities for studying the relationship between coevolution scores and inter-residue distances from a crystal structure if provided, which may help in understanding protein structures. AVAILABILITY: The system is available at http://coevolution.gersteinlab.org. The source code and JavaDoc API can also be downloaded from the web site.  相似文献   

15.
M Gulotta 《Biophysical journal》1995,69(5):2168-2173
LabVIEW is a graphic object-oriented computer language developed to facilitate hardware/software communication. LabVIEW is a complete computer language that can be used like Basic, FORTRAN, or C. In LabVIEW one creates virtual instruments that aesthetically look like real instruments but are controlled by sophisticated computer programs. There are several levels of data acquisition VIs that make it easy to control data flow, and many signal processing and analysis algorithms come with the software as premade VIs. In the classroom, the similarity between virtual and real instruments helps students understand how information is passed between the computer and attached instruments. The software may be used in the absence of hardware so that students can work at home as well as in the classroom. This article demonstrates how LabVIEW can be used to control data flow between computers and instruments, points out important features for signal processing and analysis, and shows how virtual instruments may be used in place of physical instrumentation. Applications of LabVIEW to the teaching laboratory are also discussed, and a plausible course outline is given.  相似文献   

16.
Signal peptidases, the endoproteases that remove the amino-terminal signal sequence from many secretory proteins, have been isolated from various sources. Seven signal peptidases have been purified, two fromE. coli, two from mammalian sources, and three from mitochondrial matrix. The mitochondrial enzymes are soluble and function as a heterogeneous dimer. The mammalian enzymes are isolated as a complex and share a common glycosylated subunit. The bacterial enzymes are isolated as monomers and show no sequence homology with each other or the mammalian enzymes. The membrane-bound enzymes seem to require a substrate containing a consensus sequence following the –3, –1 rule of von Heijne at the cleavage site; however, processing of the substrate is strongly influenced by the hydrophobic region of the signal peptide. The enzymes appear to recognize an unknown three-dimensional motif rather than a specific amino acid sequence around the cleavage site. The matrix mitochondrial enzymes are metallo-endopeptidases; however, the other signal peptidases may belong to a unique class of proteases as they are resistant to chelators and most protease inhibitors. There are no data concerning the substrate binding site of these enzymes. In vivo, the signal peptide is rapidly degraded. Three different enzymes inEscherichia coli that can degrade a signal peptidein vitro have been identified. The intact signal peptide is not accumulated in mutants lacking these enzymes, which suggests that these peptidases individually are not responsible for the degredation of an intact signal peptidein vivo. It is speculated that signal peptidases and signal peptide hydrolases are integral components of the secretory pathway and that inhibition of the terminal steps can block translocation.  相似文献   

17.
The subthalamic nucleus and the directly adjacent substantia nigra are small and important structures in the basal ganglia. Functional magnetic resonance imaging studies have shown that the subthalamic nucleus and substantia nigra are selectively involved in response inhibition, conflict processing, and adjusting global and selective response thresholds. However, imaging these nuclei is complex, because they are in such close proximity, they can vary in location, and are very small relative to the resolution of most fMRI sequences. Here, we investigated the consistency in localization of these nuclei in BOLD fMRI studies, comparing reported coordinates with probabilistic atlas maps of young human participants derived from ultra-high resolution 7T MRI scanning. We show that the fMRI signal reported in previous studies is likely not unequivocally arising from the subthalamic nucleus but represents a mixture of subthalamic nucleus, substantia nigra, and surrounding tissue. Using a simulation study, we also tested to what extent spatial smoothing, often used in fMRI preprocessing pipelines, influences the mixture of BOLD signals. We propose concrete steps how to analyze fMRI BOLD data to allow inferences about the functional role of small subcortical nuclei like the subthalamic nucleus and substantia nigra.  相似文献   

18.
Discovery of biomarker patterns using proteomic techniques requires examination of large numbers of patient and control samples, followed by data mining of the molecular read-outs (e.g., mass spectra). Adequate signal processing and statistical analysis are critical for successful extraction of markers from these data sets. The protocol, specifically designed for use in conjunction with MALDI-TOF-MS-based serum peptide profiling, is a data analysis pipeline, starting with transfer of raw spectra that are interpreted using signal processing algorithms to define suitable features (i.e., peptides). We describe an algorithm for minimal entropy-based peak alignment across samples. Peak lists obtained in this way, and containing all samples, all peptide features and their normalized MS-ion intensities, can be evaluated, and results validated, using common statistical methods. We recommend visual inspection of the spectra to confirm all results, and have written freely available software for viewing and color-coding of spectral overlays.  相似文献   

19.
在对生物医学信号时间序列进行复杂度分析时,粗粒化预处理有可能会造成丢失原始信号中所蕴含的信息,甚至在某些情况下根本改变原信号的动力学性质。用计算机计算时的量化过程也是一种粗粒化,民有这类问题。通过对近似熵和我们所定义的C0复杂度这两种复杂度在不同量化精度下对一些典型时间序列复杂度分析的比较研究,发现一秀说来量化精度对复杂度分析的影响不是很大,仅当时原始信号进行二值比等极端情况下,才会显著改变原信号  相似文献   

20.
lumi: a pipeline for processing Illumina microarray   总被引:2,自引:0,他引:2  
Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data. Availability: The lumi Bioconductor package, www.bioconductor.org  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号