首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.  相似文献   

2.
In proteomics, tandem mass spectrometry is the key technology for peptide sequencing. However, partially due to the deficiency of peptide identification software, a large portion of the tandem mass spectra are discarded in almost all proteomics centers because they are not interpretable. The problem is more acute with the lower quality data from low end but more popular devices such as the ion trap instruments. In order to deal with the noisy and low quality data, this paper develops a systematic machine learning approach to construct a robust linear scoring function, whose coefficients are determined by a linear programming. A prototype, PRIMA, was implemented. When tested with large benchmarks of varying qualities, PRIMA consistently has higher accuracy than commonly used software MASCOT, SEQUEST and X! Tandem.  相似文献   

3.
Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventional database search approach to protein identification has inherent difficulties in comprehensive detection of protein modifications. In recent years, increasing efforts have been devoted to developing unrestrictive approaches to modification identification, but they often suffer from their lack of speed. This paper presents a statistical algorithm named DeltAMT (Delta Accurate Mass and Time) for fast detection of abundant protein modifications from tandem mass spectra with high-accuracy precursor masses. The algorithm is based on the fact that the modified and unmodified versions of a peptide are usually present simultaneously in a sample and their spectra are correlated with each other in precursor masses and retention times. By representing each pair of spectra as a delta mass and time vector, bivariate Gaussian mixture models are used to detect modification-related spectral pairs. Unlike previous approaches to unrestrictive modification identification that mainly rely upon the fragment information and the mass dimension in liquid chromatography-tandem mass spectrometry, the proposed algorithm makes the most of precursor information. Thus, it is highly efficient while being accurate and sensitive. On two published data sets, the algorithm effectively detected various modifications and other interesting events, yielding deep insights into the data. Based on these discoveries, the spectral identification rates were significantly increased and many modified peptides were identified.  相似文献   

4.
Reproducible and comprehensive sample extraction and detection of metabolites with a broad range of physico-chemical properties from biological matrices can be a highly challenging process. A single LC/MS separation method was developed for a 2.1mmx100mm, 1.8mum ZORBAX SB-Aq column that was used to separate human erythrocyte metabolites extracted under sample extraction solvent conditions where the pH was neutral or had been adjusted to either, pH 2, 6 or 9. Internal standards were included and evaluated for tracking sample extraction efficiency. Through the combination of electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI) techniques in both positive (+) and negative (-) ion modes, a total of 2370 features (compounds and associated compound related components: isotopes, adducts and dimers) were detected across all pHs. Broader coverage of the detected metabolome was achieved by observing that (1) performing extractions at pH 2 and 9, leads to a combined 92% increase in detected features over pH 7 alone; and (2) including APCI in the analysis results in a 34% increase in detected features, across all pHs, than the total number detected by ESI. A significant dependency of extraction solvent pH on the recovery of heme and other compounds was observed in erythrocytes and underscores the need for a comprehensive sample extraction strategy and LC/MS analysis in metabolomics profiling experiments.  相似文献   

5.

Background  

There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.  相似文献   

6.
A note on robust variance estimation for cluster-correlated data   总被引:43,自引:0,他引:43  
Williams RL 《Biometrics》2000,56(2):645-646
There is a simple robust variance estimator for cluster-correlated data. While this estimator is well known, it is poorly documented, and its wide range of applicability is often not understood. The estimator is widely used in sample survey research, but the results in the sample survey literature are not easily applied because of complications due to unequal probability sampling. This brief note presents a general proof that the estimator is unbiased for cluster-correlated data regardless of the setting. The result is not new, but a simple and general reference is not readily available. The use of the method will benefit from a general explanation of its wide applicability.  相似文献   

7.

Background

Field effect transistor (FET) based signal-transduction (Bio-FET) is an emerging technique for label-free and real-time basis biosensors for a wide range of targets. Glucose has constantly been of interest due to its clinical relevance. Use of glucose oxidase (GOD) and a lectin protein Concanavalin A are two common strategies to generate glucose-dependent electrochemical events. However, these protein-based materials are intolerant of long-term usage and storage due to their inevitable denaturing.

Methods

A phenylboronic acid (PBA) modified self-assembled monolayer (SAM) on a gold electrode with an optimized disassociation constant of PBA, that is, 3-fluoro-4-carbamoyl-PBA possessing its pKa of 7.1, was prepared and utilized as an extended gate electrode for Bio-FET.

Results

The prepared electrode showed a glucose-dependent change in the surface potential under physiological conditions, thus providing a remarkably simple rationale for the glyco-sensitive Bio-FET. Importantly, the PBA modified electrode showed tolerance to relatively severe heat and drying treatments; conditions under which protein based materials would surely be denatured.

Conclusions

A PBA modified SAM with optimized disassociation constant (pKa) can exhibit a glucose-dependent change in the surface potential under physiological conditions, providing a remarkably simple but robust method for the glyco-sensing.

General significance

This protein-free, totally synthetic glyco-sensing strategy may offer cheap, robust and easily accessible platform that may be useful in developing countries. This article is part of a Special Issue entitled Organic Bioelectronics—Novel Applications in Biomedicine.  相似文献   

8.
Glassy carbon electrode modified with boron oxide nanoparticles supported on multiwall carbon nanotubes was obtained via a facile approach. The as-prepared modified electrode exhibits excellent electrocatalytic activity toward the redox of glucose in pH 7.0 phosphate buffer solution. The electrochemical response of the modified electrode to glucose shows a linear range of 1.5-260 μM with a correlation coefficient of 0.9986 and the calculated detection limit is 0.8 μM at a signal-to-noise ratio of 3, which makes it useful for developing the electrochemical determination of glucose concentrations without using glucose oxidase at physiological pH.  相似文献   

9.
S N Freeman  B J Morgan 《Biometrics》1992,48(1):217-235
In this paper we propose a strategy for analysing recovery data from birds ringed as nestlings. The approach advocated starts with a global model, involving calendar year dependence of both reporting and first-year survival rates, and age-dependence of survival rates for older birds. Likelihood ratio tests are then used to choose between a range of submodels. The strategy is illustrated through application to three data sets, on mallards, herring gulls, and blue-winged teal. The effect of age-dependence operating also on reporting rates is examined through matched simulations, since a model with age-dependent reporting rates cannot be fitted directly. This reveals an underestimation of the first-year survival rates, when the probability of recovery for first-year birds is greater than that for older birds. It is argued that this bias may not be serious and indeed may be allowed for in practice. For mallards and teal, comparisons are drawn with the results from other models that additionally analyse recoveries of birds ringed as adults; the same general conclusions are reached.  相似文献   

10.
11.
12.
The preparation of large quantities of purified membrane proteins for structural studies presents significant difficulties. Central among these are the frequent toxicity associated with over-expressing membrane targets and the difficulty associated with identifying the appropriate detergents for their solubilization and purification. To begin addressing these challenges, and lay the groundwork for membrane structural genomics efforts, we have developed a robust strategy for the expression and purification of large numbers of prokaryotic membrane proteins. Our approach rapidly identifies highly expressed targets and greatly simplifies their solubilization and purification. In this review, specific, hands-on protocols are provided for the expression and purification of CorA magnesium transporters. These methods form the basis for the expression and purification of many other membrane proteins, as discussed.  相似文献   

13.
14.

Background  

There is a continuing need to develop molecular diagnostic tools which complement histopathologic examination to increase the accuracy of cancer diagnosis. DNA microarrays provide a means for measuring gene expression signatures which can then be used as components of genomic-based diagnostic tests to determine the presence of cancer.  相似文献   

15.
A high-throughput software pipeline for analyzing high-performance mass spectral data sets has been developed to facilitate rapid and accurate biomarker determination. The software exploits the mass precision and resolution of high-performance instrumentation, bypasses peak-finding steps, and instead uses discrete m/z data points to identify putative biomarkers. The technique is insensitive to peak shape, and works on overlapping and non-Gaussian peaks which can confound peak-finding algorithms. Methods are presented to assess data set quality and the suitability of groups of m/z values that map to peaks as potential biomarkers. The algorithm is demonstrated with serum mass spectra from patients with and without ovarian cancer. Biomarker candidates are identified and ranked by their ability to discriminate between cancer and noncancer conditions. Their discriminating power is tested by classifying unknowns using a simple distance calculation, and a sensitivity of 95.6% and a specificity of 97.1% are obtained. In contrast, the sensitivity of the ovarian cancer blood marker CA125 is approximately 50% for stage I/II and approximately 80% for stage III/IV cancers. While the generalizability of these markers is currently unknown, we have demonstrated the ability of our analytical package to extract biomarker candidates from high-performance mass spectral data.  相似文献   

16.
Current limitations in proteome analysis by high-throughput mass spectrometry (MS) approaches have sometimes led to incomplete (or inconclusive) data sets being published or unpublished. In this work, we used an iTRAQ reference data on hepatocellular carcinoma (HCC) to design a two-stage functional analysis pipeline to widen and improve the proteome coverage and, subsequently, to unveil the molecular changes that occur during HCC progression in human tumorous tissue. The first involved functional cluster analysis by incorporating an expansion step on a cleaned integrated network. The second used an in-house developed pathway database where recovery of shared neighbors was followed by pathway enrichment analysis. In the original MS data set, over 500 proteins were detected from the tumors of 12 male patients, but in this paper we reported an additional 1000 proteins after application of our bioinformatics pipeline. Through an integrative effort of network cleaning, community finding methods, and network analysis, we also uncovered several biologically interesting clusters implicated in HCC. We established that HCC transition from a moderate to poor stage involved densely connected clusters that comprised of PCNA, XRCC5, XRCC6, PARP1, PRKDC, and WRN. From our pathway enrichment analyses, it appeared that the HCC moderate stage, unlike the poor stage, is enriched in proteins involved in immune responses, thus suggesting the acquisition of immuno-evasion. Our strategy illustrates how an original oncoproteome could be expanded to one of a larger dynamic range where current technology limitations prevent/limit comprehensive proteome characterization.  相似文献   

17.
Parameter identification of structured models is often a problem in biotechnology, because the poor data situation and the number of unknown parameters only allow for inaccurate estimates. But often only a subset of all kinetic parameters of the model are of interest for production purposes, e.g. for fed-batch cultivation. These parameters should be estimated with a given accuracy. In addition, the experiments for information acquisition with respect to these parameters should be as simple as possible and should consider some practical restrictions. In this contribution a fed-batch feeding strategy is proposed to allow for an accurate estimation of yield and of critical growth rate of baker's yeast. The feeding also allows for economic and stereotyped use of staff and equipment and is therefore suitable for routine use in screening of strains and media. The overall pattern is similar to that one, usually used in production scale to minimize errors by limited model validity. After an initial phase for achieving a reproducible state three different growth rates are adjusted to cover the range of possible critical growth rates. From biomass and ethanol measurements yield and critical growth rate can be estimated with an accuracy of about 2.1%. The fermentation pattern ends up with a constant feeding rate to simulate a limited oxygen transfer rate and to allow for an uptake of residual sugar and ethanol before a dough test can be carried out. Beside experimental results simulations and sensitivity analyses are shown.List of Symbols P ethanol concentration - S substrate concentration - S f substrate concentration in feed - T fermentation time - V fermenter volume - X biomass concentration - C measurement error covariance matrix - F Fisher information matrix - X state variables - Y output variables - X p state sensitivity functions with respect to parameters - Y p output sensitivity functions - e eigenvectors - k vector of limitation and inhibition parameters - n number of observations - q in feeding stream - q b stream for samples and ammonia feed - r vector of specific turnover rates - y vector of yields - specific weight - eigenvalues - specific growth rate - set exponent in exponential feeding - standard deviation Dedicated to the 65th birthday of Professor Fritz Wagner.A. O. Ejiofor and B. O. Solomon are grateful to the Alexander von Humboldt Stiftung for granting them fellowships and to GBF for providing all the materials necessary for their successful research stay in Germany.  相似文献   

18.
MOTIVATION: Target selection strategies for structural genomic projects must be able to prioritize gene regions on the basis of significant sequence similarity with proteins that have already been structurally determined. With the rapid development of protein comparison software a robust prioritization scheme should be independent of the choice of algorithm and be able to incorporate different sequence similarity thresholds. RESULTS: A robust target selection strategy has been developed that can assign a priority level to all genes in any genome. Structural assignments to genome sequences are calculated at two thresholds and six levels (1-6) describe the prioritization of all whole genes and partial gene regions. This simple two-threshold approach can be implemented with any fold recognition or homology detection algorithms. The results for 10 genomes are presented using the SSEARCH and PSI-BLAST programs. AVAILABILITY: Programs are available on request from the authors.  相似文献   

19.

Background

High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.

Results

We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.

Conclusion

Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号