首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Paternity inference using highly polymorphic codominant markers is becoming common in the study of natural populations. However, multiple males are often found to be genetically compatible with each offspring tested, even when the probability of excluding an unrelated male is high. While various methods exist for evaluating the likelihood of paternity of each nonexcluded male, interpreting these likelihoods has hitherto been difficult, and no method takes account of the incomplete sampling and error-prone genetic data typical of large-scale studies of natural systems. We derive likelihood ratios for paternity inference with codominant markers taking account of typing error, and define a statistic Δ for resolving paternity. Using allele frequencies from the study population in question, a simulation program generates criteria for Δ that permit assignment of paternity to the most likely male with a known level of statistical confidence. The simulation takes account of the number of candidate males, the proportion of males that are sampled and gaps and errors in genetic data. We explore the potentially confounding effect of relatives and show that the method is robust to their presence under commonly encountered conditions. The method is demonstrated using genetic data from the intensively studied red deer ( Cervus elaphus ) population on the island of Rum, Scotland. The Windows-based computer program, CERVUS , described in this study is available from the authors. CERVUS can be used to calculate allele frequencies, run simulations and perform parentage analysis using data from all types of codominant markers.  相似文献   

2.
Elias JE  Gygi SP 《Nature methods》2007,4(3):207-214
Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.  相似文献   

3.
Tyrosine nitration is the consequence of a complex machinery of formation and merging of oxygen and nitrogen radicals, and has been associated with both physiological pathways as well as with several human diseases. The latter turned this posttranslational protein modification into an interesting biomarker, being either a consequence of the disease or a factor contributing to the disease onset. However, the interpretation of MS and MS/MS data of peptides containing nitrotyrosine has proven to be very challenging and consequently, the risk of linking MS/MS spectra to incorrect peptide sequences exists and has been reported. Here, we discuss the causes of data misinterpretation and describe a general method to avoid mistakes of MS/MS spectrum misinterpretation. Central in our approach is the reduction of nitrotyrosine into aminotyrosine and the use of the Peptizer algorithm to inspect MS/MS quality-related assumptions.  相似文献   

4.
This paper presents tested statistical methods for reducing sampling bias in, for assigning approximate confidence bounds to, and for testing hypotheses about information-theoretic measures used in the study of animal communication and in behavioral sequence analysis. These procedures are also applicable in fields other than animal behavior, including psychology and ecology. Results of a Monte Carlo evaluation of the methods, including comparison of these techniques with possible alternative procedures, are presented.  相似文献   

5.
6.
We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone.  相似文献   

7.
8.
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.  相似文献   

9.
Shotgun proteomics yields tandem mass spectra of peptides that can be identified by database search algorithms. When only a few observed peptides suggest the presence of a protein, establishing the accuracy of the peptide identifications is necessary for accepting or rejecting the protein identification. In this protocol, we describe the properties of peptide identifications that can differentiate legitimately identified peptides from spurious ones. The chemistry of fragmentation, as embodied in the 'mobile proton' and 'pathways in competition' models, informs the process of confirming or rejecting each spectral match. Examples of ion-trap and tandem time-of-flight (TOF/TOF) mass spectra illustrate these principles of fragmentation.  相似文献   

10.
11.
A recent approach for gene mapping based on confidence set inference (CSI) promises several advantages, including avoidance of corrections for multiple tests, availability of confidence intervals with known statistical properties, and sufficient localizations of disease genes. This paper proposes an extended CSI procedure that can handle markers with incomplete polymorphism, thereby increasing the applicability of the set of CSI methods in practical situations. Simulation studies show that the new procedure retains the main advantages of the original CSI. Although it generally requires more data to achieve a similar power, this increase is moderate for markers with 80% heterozygosity or higher. We also investigate the effects of relative risk estimates and disease models. Our analyses show that perturbation from actual relative risks or multilocus disease models generally leads to reduction in power or inflation in type I error, as expected. Nevertheless, for certain classes of two-locus disease models, CSI can still perform well, with reasonably high actual coverage probabilities for at least one of the disease loci. Application of CSI to the data provided by the Genetic Analysis Workshop 13 yields encouraging results, as they compare favorably to those obtained from GENEHUNTER using its NPL sib-pair method.  相似文献   

12.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

13.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

14.
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.  相似文献   

15.
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.  相似文献   

16.
17.
18.

Background

Early diagnosis and treatment of Mycobacterium tuberculosis infection can prevent most deaths resulting from this pathogen; however, multidrug-resistant strains present serious threats to global tuberculosis control and prevention efforts. In this study, we identified antigens that could be used for the serodiagnosis of drug-resistant M. tuberculosis strains, using a proteomics-based analysis.

Results

Serum from patients infected with drug-resistant or drug-susceptible M. tuberculosis strains and healthy controls was subjected to two-dimensional gel electrophoresis using a western blot approach. This procedure identified nine immunoreactive proteins, which were subjected to MALDI-TOF-MS analysis. Six recombinant proteins, namely rRv2031c, rRv0444c, rRv2145c, rRv3692, rRv0859c, and rRv3040, were expressed and used to determine the immuno-reactivity of 100 serum samples. Antibody reactivity against rRv2031c, rRv3692, and rRv0444c was consistently observed. Among them, the best sensitivity and specificity of rRv3692 were 37% and 95% respectively. Furthermore, when rRv2031c and rRv3692 or rRv2031c, rRv3692, and rRv0444c were combined in 2:1 or equal amounts, the assay sensitivity and specificity were improved to 56.7% and 100% respectively.

Conclusions

These results suggest that Rv2031c, Rv3692, and Rv0444c are possible candidate biomarkers for effective use in the serodiagnosis of drug-resistant tuberculosis infections, and a combined formula of these antigens should be considered when designing a subunit assay kit.  相似文献   

19.
Choice of one method over another for MHC-II binding peptide prediction is typically based on published reports of their estimated performance on standard benchmark datasets. We show that several standard benchmark datasets of unique peptides used in such studies contain a substantial number of peptides that share a high degree of sequence identity with one or more other peptide sequences in the same dataset. Thus, in a standard cross-validation setup, the test set and the training set are likely to contain sequences that share a high degree of sequence identity with each other, leading to overly optimistic estimates of performance. Hence, to more rigorously assess the relative performance of different prediction methods, we explore the use of similarity-reduced datasets. We introduce three similarity-reduced MHC-II benchmark datasets derived from MHCPEP, MHCBN, and IEDB databases. The results of our comparison of the performance of three MHC-II binding peptide prediction methods estimated using datasets of unique peptides with that obtained using their similarity-reduced counterparts shows that the former can be rather optimistic relative to the performance of the same methods on similarity-reduced counterparts of the same datasets. Furthermore, our results demonstrate that conclusions regarding the superiority of one method over another drawn on the basis of performance estimates obtained using commonly used datasets of unique peptides are often contradicted by the observed performance of the methods on the similarity-reduced versions of the same datasets. These results underscore the importance of using similarity-reduced datasets in rigorously comparing the performance of alternative MHC-II peptide prediction methods.  相似文献   

20.
This paper describes a method for selecting an optimal set of attributes for use in identification schemes and diagnoses. The problem is formulated as a single-objective (or multiobjective) least-cost (or efficient, Pareto-optimal) set-covering program, in which the constraints are based on the information contained within a zero-one array, each of whose rows refers to a distinct pair of objects, and whose columns refer to the attributes; an element of this array is unity if the row pair are distinguishable by the attribute, and is zero otherwise. Intrinsic costs are based on the logical interrelationships in this array; extrinsic costs are obtained from external properties of the attributes. Numerical examples illustrate the procedures. A simulated annealing algorithm for use in the solution of nonlinear and multiobjective set-covering problems is proposed, but not explored in depth.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号