首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Stockmarr A 《Biometrics》1999,55(3):671-677
A crime has been committed, and a DNA profile of the perpetrator is obtained from the crime scene. A suspect with a matching profile is found. The problem of evaluating this DNA evidence in a forensic context, when the suspect is found through a database search, is analysed through a likelihood approach. The recommendations of the National Research Council of the U.S. are derived in this setting as the proper way of evaluating the evidence when finiteness of the population of possible perpetrators is not taken into account. When a finite population of possible perpetrators may be assumed, it is possible to take account of the sampling process that resulted in the actual database, so one can deal with the problem where a large proportion of the possible perpetrators belongs to the database in question. It is shown that the last approach does not in general result in a greater weight being assigned to the evidence, though it does when a sufficiently large amount of the possible perpetrators are in the database. The value of the likelihood ratio corresponding to the probable cause setting constitutes an upper bound for this weight, and the upper bound is only attained when all but one of the possible perpetrators are in the database.  相似文献   

2.
Meester R  Sjerps M 《Biometrics》2003,59(3):727-732
Summary . Does the evidential strength of a DNA match depend on whether the suspect was identified through database search or through other evidence (“probable cause”)? In Balding and Donnelly (1995, Journal of the Royal Statistical Society, Series A 158, 21–53) and elsewhere, it has been argued that the evidential strength is slightly larger in a database search case than in a probable cause case, while Stockmarr (1999 , Biometrics 55, 671–677) reached the opposite conclusion. Both these approaches use likelihood ratios. By making an excursion to a similar problem, the two‐stain problem, we argue in this article that there are certain fundamental difficulties with the use of a likelihood ratio, which can be avoided by concentrating on the posterior odds. This approach helps resolving the above‐mentioned conflict.  相似文献   

3.
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.  相似文献   

4.
Egeland T  Salas A 《PloS one》2011,6(10):e26723

Background

Mitochondrial DNA (mtDNA) variation is commonly analyzed in a wide range of different biomedical applications. Cases where more than one individual contribute to a stain genotyped from some biological material give rise to a mixture. Most forensic mixture cases are analyzed using autosomal markers. In rape cases, Y-chromosome markers typically add useful information. However, there are important cases where autosomal and Y-chromosome markers fail to provide useful profiles. In some instances, usually involving small amounts or degraded DNA, mtDNA may be the only useful genetic evidence available. Mitochondrial DNA mixtures also arise in studies dealing with the role of mtDNA variation in tumorigenesis. Such mixtures may be generated by the tumor, but they could also originate in vitro due to inadvertent contamination or a sample mix-up.

Methods/Principal Findings

We present the statistical methods needed for mixture interpretation and emphasize the modifications required for the more well-known methods based on conventional markers to generalize to mtDNA mixtures. Two scenarios are considered. Firstly, only categorical mtDNA data is assumed available, that is, the variants contributing to the mixture. Secondly, quantitative data (peak heights or areas) on the allelic variants are also accessible. In cases where quantitative information is available in addition to allele designation, it is possible to extract more precise information by using regression models. More precisely, using quantitative information may lead to a unique solution in cases where the qualitative approach points to several possibilities. Importantly, these methods also apply to clinical cases where contamination is a potential alternative explanation for the data.

Conclusions/Significance

We argue that clinical and forensic scientists should give greater consideration to mtDNA for mixture interpretation. The results and examples show that the analysis of mtDNA mixtures contributes substantially to forensic casework and may also clarify erroneous claims made in clinical genetics regarding tumorigenesis.  相似文献   

5.
Comprehensive proteome profiling of breast cancer tissue samples is challenging, as the tissue samples contain many proteins with varying concentrations and modifications. We report an effective sample preparation strategy combined with liquid chromatography (LC) electrospray ionization (ESI) quadrupole time-of-flight (QTOF) MS/MS for proteome analysis of human breast cancer tissue. The complexity of the breast cancer tissue proteome was reduced by using protein precipitation from a tissue extract, followed by sequential protein solubilization in solvents of different solubilizing strength. The individual fractions of protein mixtures or subproteomes were subjected to trypsin digestion and the resultant peptides were separated by strong cation exchange (SCX) chromatography, followed by reversed-phase capillary LC combined with high resolution and high accuracy ESI-QTOF MS/MS. This approach identified 14407 unique peptides from 3749 different proteins based on peptide matches with scores above the threshold scores at the 95% confidence level in MASCOT database search of the acquired MS/MS spectra. The false positive rate of peptide matches was determined to be 0.95% by using the target-decoy sequence search strategy. On the basis of gene ontology categorization, the identified proteins represented a wide variety of biological functions, cellular processes, and cellular locations.  相似文献   

6.
Balding DJ 《Biometrics》2002,58(1):241-244
A recent article in Biometrics (Stockmarr, 1999, 55, 671-677) has generated correspondence (56, 1274-1277; 57, 976-980) reigniting a controversy started by a 1996 report on DNA profile evidence issued by the U.S. National Research Council (NRC). The issue concerns the evidential weight of a DNA profile match when the match results from a search through a profile database. The views of both Stockmarr and the NRC report conflict with those of many statisticians working in the area, and the differing viewpoints lead to dramatically different assessments of evidence. I outline reasons why Stockmarr and the NRC report are wrong. I also briefly discuss possible reasons why forensic applications tend to be problematic for statisticians.  相似文献   

7.
8.
Probabilistic expert systems for DNA mixture profiling   总被引:1,自引:0,他引:1  
We show how probabilistic expert systems can be used to structure and solve complex cases of forensic identification involving DNA traces that might be mixtures of several DNA profiles. In particular, this approach can readily handle cases where the number of contributors to the mixture cannot be regarded as known in advance. The flexible modularity of the networks used also allows us to handle still more complex cases, for example where the finding of a mixed DNA trace is compounded by such features as missing individuals or the possibility of unobserved alleles.  相似文献   

9.
A novel database search algorithm is presented for the qualitative identification of proteins over a wide dynamic range, both in simple and complex biological samples. The algorithm has been designed for the analysis of data originating from data independent acquisitions, whereby multiple precursor ions are fragmented simultaneously. Measurements used by the algorithm include retention time, ion intensities, charge state, and accurate masses on both precursor and product ions from LC‐MS data. The search algorithm uses an iterative process whereby each iteration incrementally increases the selectivity, specificity, and sensitivity of the overall strategy. Increased specificity is obtained by utilizing a subset database search approach, whereby for each subsequent stage of the search, only those peptides from securely identified proteins are queried. Tentative peptide and protein identifications are ranked and scored by their relative correlation to a number of models of known and empirically derived physicochemical attributes of proteins and peptides. In addition, the algorithm utilizes decoy database techniques for automatically determining the false positive identification rates. The search algorithm has been tested by comparing the search results from a four‐protein mixture, the same four‐protein mixture spiked into a complex biological background, and a variety of other “system” type protein digest mixtures. The method was validated independently by data dependent methods, while concurrently relying on replication and selectivity. Comparisons were also performed with other commercially and publicly available peptide fragmentation search algorithms. The presented results demonstrate the ability to correctly identify peptides and proteins from data independent acquisition strategies with high sensitivity and specificity. They also illustrate a more comprehensive analysis of the samples studied; providing approximately 20% more protein identifications, compared to a more conventional data directed approach using the same identification criteria, with a concurrent increase in both sequence coverage and the number of modified peptides.  相似文献   

10.
Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.  相似文献   

11.
A new database search algorithm has been developed to identify disulfide-linked peptides in tandem MS data sets. The algorithm is included in the newly developed tandem MS database search program, MassMatrix. The algorithm exploits the probabilistic scoring model in MassMatrix to achieve identification of disulfide bonds in proteins and peptides. Proteins and peptides with disulfide bonds can be identified with high confidence without chemical reduction or other derivatization. The approach was tested on peptide and protein standards with known disulfide bonds. All disulfide bonds in the standard set were identified by MassMatrix. The algorithm was further tested on bovine pancreatic ribonuclease A (RNaseA). The 4 native disulfide bonds in RNaseA were detected by MassMatrix with multiple validated peptide matches for each disulfide bond with high statistical scores. Fifteen nonnative disulfide bonds were also observed in the protein digest under basic conditions (pH = 8.0) due to disulfide bond interchange. After minimizing the disulfide bond interchange (pH = 6.0) during digestion, only one nonnative disulfide bond was observed. The MassMatrix algorithm offers an additional approach for the discovery of disulfide bond from tandem mass spectrometry data.  相似文献   

12.
To elucidate the role of high mass accuracy in mass spectrometric peptide mapping and database searching, selected proteins were subjected to tryptic digestion and the resulting mixtures were analyzed by electrospray ionization on a 7 Tesla Fourier transform mass spectrometer with a mass accuracy of 1 ppm. Two extreme cases were examined in detail: equine apomyoglobin, which digested easily and gave very few spurious masses, and bovine alpha-lactalbumin, which under the conditions used, gave many spurious masses. The effectiveness of accurate mass measurements in minimizing false protein matches was examined by varying the mass error allowed in the search over a wide range (2-500 ppm). For the "clean" data obtained from apomyoglobin, very few masses were needed to return valid protein matches, and the mass error allowed in the search had little effect up to 500 ppm. However, in the case of alpha-lactalbumin more mass values were needed, and low mass errors increased the search specificity. Mass errors below 30 ppm were particularly useful in eliminating false protein matches when few mass values were used in the search. Collision-induced dissociation of an unassigned peak in the alpha-lactalbumin digest provided sufficient data to unambiguously identify the peak as a fragment from alpha-lactalbumin and eliminate a large number of spurious proteins found in the peptide mass search. The results show that even with a relatively high mass error (0.8 Da for mass differences between singly charged product ions), collision-induced dissociation can help identify proteins in cases where unfavorable digest conditions or modifications render digest peaks unidentifiable by a simple mass mapping search.  相似文献   

13.
A variety of macromolecules and small molecules-(oligo)nucleotides, proteins, lipids and metabolites-are collectively considered essential to early life. However, previous schemes for the origin of life-e.g. the 'RNA world' hypothesis-have tended to assume the initial emergence of life based on one such molecular class followed by the sequential addition of the others, rather than the emergence of life based on a mixture of all the classes of molecules. This view is in part due to the perceived implausibility of multi-component reaction chemistry producing such a mixture. The concept of systems chemistry challenges such preconceptions by suggesting the possibility of molecular synergism in complex mixtures. If a systems chemistry method to make mixtures of all the classes of molecules considered essential for early life were to be discovered, the significant conceptual difficulties associated with pure RNA, protein, lipid or metabolism 'worlds' would be alleviated. Knowledge of the geochemical conditions conducive to the chemical origins of life is crucial, but cannot be inferred from a planetary sciences approach alone. Instead, insights from the organic reactivity of analytically accessible chemical subsystems can inform the search for the relevant geochemical conditions. If the common set of conditions under which these subsystems work productively, and compatibly, matches plausible geochemistry, an origins of life scenario can be inferred. Using chemical clues from multiple subsystems in this way is akin to triangulation, and constitutes a novel approach to discover the circumstances surrounding the transition from chemistry to biology. Here, we exemplify this strategy by finding common conditions under which chemical subsystems generate nucleotides and lipids in a compatible and potentially synergistic way. The conditions hint at a post-meteoritic impact origin of life scenario.  相似文献   

14.
Storvik G  Egeland T 《Biometrics》2007,63(3):922-925
Two different quantities have been suggested for quantification of evidence in cases where a suspect is found by a search through a database of DNA profiles. The likelihood ratio, typically motivated from a Bayesian setting, is preferred by most experts in the field. The so-called np rule has been suggested through frequentist arguments and has been suggested by the American National Research Council and Stockmarr (1999, Biometrics55, 671-677). The two quantities differ substantially and have given rise to the DNA database search controversy. Although several authors have criticized the different approaches, a full explanation of why these differences appear is still lacking. In this article we show that a P-value in a frequentist hypothesis setting is approximately equal to the result of the np rule. We argue, however, that a more reasonable procedure in this case is to use conditional testing, in which case a P-value directly related to posterior probabilities and the likelihood ratio is obtained. This way of viewing the problem bridges the gap between the Bayesian and frequentist approaches. At the same time it indicates that the np rule should not be used to quantify evidence.  相似文献   

15.
Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.  相似文献   

16.
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.  相似文献   

17.
In forensic science, trace evidence found at a crime scene and on suspect has to be evaluated from the measurements performed on them, usually in the form of multivariate data (for example, several chemical compound or physical characteristics). In order to assess the strength of that evidence, the likelihood ratio framework is being increasingly adopted. Several methods have been derived in order to obtain likelihood ratios directly from univariate or multivariate data by modelling both the variation appearing between observations (or features) coming from the same source (within-source variation) and that appearing between observations coming from different sources (between-source variation). In the widely used multivariate kernel likelihood-ratio, the within-source distribution is assumed to be normally distributed and constant among different sources and the between-source variation is modelled through a kernel density function (KDF). In order to better fit the observed distribution of the between-source variation, this paper presents a different approach in which a Gaussian mixture model (GMM) is used instead of a KDF. As it will be shown, this approach provides better-calibrated likelihood ratios as measured by the log-likelihood ratio cost (Cllr) in experiments performed on freely available forensic datasets involving different trace evidences: inks, glass fragments and car paints.  相似文献   

18.
Informatics for protein identification by mass spectrometry   总被引:3,自引:0,他引:3  
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche.  相似文献   

19.
Kim MS  Zhong J  Kandasamy K  Delanghe B  Pandey A 《Proteomics》2011,11(12):2568-2572
CID has become a routine method for fragmentation of peptides in shotgun proteomics, whereas electron transfer dissociation (ETD) has been described as a preferred method for peptides carrying labile PTMs. Though both of these fragmentation techniques have their obvious advantages, they also have their own drawbacks. By combining data from CID and ETD fragmentation, some of these disadvantages can potentially be overcome because of the complementarity of fragment ions produced. To evaluate alternating CID and ETD fragmentation, we analyzed a complex mixture of phosphopeptides on an LTQ-Orbitrap mass spectrometer. When the CID and ETD-derived spectra were searched separately, we observed 2504, 491, 2584, and 3249 phosphopeptide-spectrum matches from CID alone, ETD alone, decision tree-based CID/ETD, and alternating CID and ETD, respectively. Combining CID and ETD spectra prior to database searching should, intuitively, be superior to either method alone. However, when spectra from the alternating CID and ETD method were merged prior to database searching, we observed a reduction in the number of phosphopeptide-spectrum matches. The poorer identification rates observed after merging CID and ETD spectra are a reflection of a lack of optimized search algorithms for carrying out such searches and perhaps inherent weaknesses of this approach. Thus, although alternating CID and ETD experiments for phosphopeptide identification are desirable for increasing the confidence of identifications, merging spectra prior to database search has to be carefully evaluated further in the context of the various algorithms before adopting it as a routine strategy.  相似文献   

20.
MS/MS and database searching has emerged as a valuable technology for rapidly analyzing protein expression, localization, and post-translational modifications. The probability-based search engine Mascot has found widespread use as a tool to correlate tandem mass spectra with peptides in a sequence database. Although the Mascot scoring algorithm provides a probability-based model for peptide identification, the independent peptide scores do not correlate with the significance of the proteins to which they match. Herein, we describe a heuristic method for organizing proteins identified at a specified false-discovery rate using Mascot-matched peptides. We call this method PROVALT, and it uses peptide matches from a random database to calculate false-discovery rates for protein identifications and reduces a complex list of peptide matches to a nonredundant list of homologous protein groups. This method was evaluated using Mascot-identified peptides from a Trypanosoma cruzi epimastigote whole-cell lysate, which was separated by multidimensional LC and analyzed by MS/MS. PROVALT was then compared with the two traditional methods of protein identification when using Mascot, the single peptide score and cumulative protein score methods, and was shown to be superior to both in regards to the number of proteins identified and the inclusion of lower scoring nonrandom peptide matches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号