首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present a new procedure for assessing the statistical significance of the most likely unrooted dichotomous topology inferrable from four DNA sequences. The procedure calculates directly a P-value for the support given to this topology by the informative sites congruent with it, assuming the most likely star topology as the null hypothesis. Informative sites are crucial in the determination of the maximum likelihood dichotomous topology and are therefore an obvious target for a statistical test of phylogenies. Our P-value is the probability of producing through parallel substitutions on the branches of the star topology at least as much support as that given to the maximum likelihood dichotomous topology by the aforementioned informative sites, for any of the three possible dichotomous topologies. The degree of statistical significance is simply the complement of this P-value. Ours is therefore an a posteriori testing approach, in which no dichotomous topology is specified in advance. We implement the test for the case in which all sites behave identically and the substitution model has a single parameter. Under these conditions, the P-value can be easily calculated on the basis of the probabilities of change on the branches of the most likely star topology, because under these assumptions, each site can become informative independently from every other site; accordingly, the total number of informative sites of each kind is binomially distributed. We explore the test's type I error by applying it to data produced in star topologies having all branches equally long, or having two short and two long branches, and various degrees of homoplasy. The test is conservative but we demonstrate, by means of a discreteness correction and progressively assumption-free calculations of the P-values, that (1) the conservativeness is mostly due to the discrete nature of informative sites and (2) the P-values calculated empirically are moreover mostly quite accurate in absolute terms. Applying the test to data produced in dichotomous topologies with increasing internal branch length shows that, despite the test's "conservativeness," its power is much higher than that of the bootstrap, especially when the relevant informative sites are few.  相似文献   

2.
Multiple lines of evidence (LOE) are often considered when examining the potential impact of contaminated sediment. Three strategies are explored for combining information within and/or among different LOE. One technique uses a multivariate strategy for clustering sites into groups of similar impact. A second method employs meta-analysis to pool empirically derived P-values. The third method uses a quantitative estimation of probability derived from odds ratios. These three strategies are compared with respect to a set of data describing reference conditions and a contaminated area in the Great Lakes. Common themes in these three strategies include the critical issue of defining an appropriate set of reference/control conditions, the definition of impact as a significant departure from the normal variation observed in the reference conditions, and the use of distance from the reference distribution to define any of the effect measures. Reasons for differences in results between the three approaches are explored and strategies for improving the approaches are suggested.  相似文献   

3.
The goal of this paper is to illustrate the value and importance of the “weight of evidence” approach (use of multiple lines of evidence from field and laboratory data) to assess the occurrence or absence of ecological impairment in the aquatic environment. Single species toxicity tests, microcosms, and community metric approaches such as the Index of Biotic Integrity (IBI) are discussed. Single species toxicity tests or other single lines of evidence are valuable first tier assessments that should be used as screening tools to identify potentially toxic conditions in a effluent or the ambient environment but these tests should not be used as the final quantitative indicator of absolute ecological impairment that may result in regulatory action. Both false positive and false negative predictions of ecological effects can occur due to the inherent variability of measurement endpoints such as survival, growth and reproduction used in single species toxicity tests. A comparison of single species ambient toxicity test results with field data showed that false positives are common and likely related to experimental variability or toxicity to selected test species without measureable effects on the ecosystem. Results from microcosm studies have consistently demonstrated that chemical exposures exceeding the acute or chronic toxicity concentrations for highly sensitive species may cause little or no ecologically significant damage to an aquatic ecosystem. Sources of uncertainty identified when extrapolating from single species tests to ecological effects were: variability in individual response to pesticide exposure; variation among species in sensitivity to pesticides; effects of time varying and repeated exposures; and extrapolation from individual to population-level endpoints. Data sets from the Chesapeake Bay area (Maryland) were used to show the importance of using “multiple lines of evidence” when assessing biological impact due to conflicting results reported from ambient water column and sediment toxicity tests and biological indices (benthic and fish IBIs). Results from water column and sediment toxicity tests with multiple species in tidal areas showed that no single species was consistently the most sensitive. There was also a high degree of disagreement between benthic and fish IBI data for the various stations. The lack of agreement for these biological community indices is not surprising due to the differences in exposure among habitats occupied by these different taxonomic assemblages. Data from a fish IBI, benthic IBI and Maryland Physical Habitat Index (MPHI) were compared for approximately 1100 first through third-order Maryland non-tidal streams to show the complexity of data interpretation and the incidence of conflicting lines of evidence. A key finding from this non-tidal data set was the need for using more than one biological indicator to increase the discriminatory power of identifying impaired streams and reduce the possibility of “false negative results”. Based on historical data, temporal variability associated with an IBI in undisturbed areas was reported to be lower than the variability associated with single species toxicity tests.  相似文献   

4.
MOTIVATION: Many heuristic algorithms have been designed to approximate P-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true P-value by orders of magnitude. Exact P-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown. RESULTS: We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact P-values of motifs. We define the exact P-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact P-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms. AVAILABILITY: MotifRank is available from http://bio.dlg.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
MOTIVATION: A number of available program packages determine the significant enrichments and/or depletions of GO categories among a class of genes of interest. Whereas a correct formulation of the problem leads to a single exact null distribution, these GO tools use a large variety of statistical tests whose denominations often do not clarify the underlying P-value computations. SUMMARY: We review the different formulations of the problem and the tests they lead to: the binomial, chi2, equality of two probabilities, Fisher's exact and hypergeometric tests. We clarify the relationships existing between these tests, in particular the equivalence between the hypergeometric test and Fisher's exact test. We recall that the other tests are valid only for large samples, the test of equality of two probabilities and the chi2-test being equivalent. We discuss the appropriateness of one- and two-sided P-values, as well as some discreteness and conservatism issues. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

6.
We applied the floristic quality index (FQI) to vegetation data collected across a chronosequence of created wetland (CW) sites in Virginia ranging in age from one to 15 years post-construction. At each site, we also applied FQI to a nearby forested reference wetland (REF). We tested the performance of the index against a selection of community metrics (species richness, diversity, evenness, percent native species) and site attributes (age, soil physiochemical variables). FQI performed better when non-native species (C-value = 0) were removed from the index, and also when calculated within rather than across vegetation layers. A modified, abundance-weighted FQI showed significant correlation with community and environmental variables in the CW herbaceous layer and REF herbaceous and shrub-sapling layers based on Canonical correspondence analysis (CCA) ordination output. These results suggest that a “natives only”, layer-based version of the index is most appropriate for our region, and an abundance-weighted FQI may be useful for assessing floristic quality in certain layers. The abundance-weighted format has the advantage of preserving the “heritage” aspect of the species conservatism concept while also entraining the “ecology” aspect of site assessment based on relative abundances of the inhabiting species. FQI did not successfully relate CW sites to REF sites, bringing into question the applicability of the FQI concept in comparing created wetlands to reference wetlands, and by analogy, the use of forested reference wetlands in general to assess vegetation development in created sites.  相似文献   

7.
Genetic association studies routinely involve massive numbers of statistical tests accompanied by P-values. Whole genome sequencing technologies increased the potential number of tested variants to tens of millions. The more tests are performed, the smaller P-value is required to be deemed significant. However, a small P-value is not equivalent to small chances of a spurious finding and significance thresholds may fail to serve as efficient filters against false results. While the Bayesian approach can provide a direct assessment of the probability that a finding is spurious, its adoption in association studies has been slow, due in part to the ubiquity of P-values and the automated way they are, as a rule, produced by software packages. Attempts to design simple ways to convert an association P-value into the probability that a finding is spurious have been met with difficulties. The False Positive Report Probability (FPRP) method has gained increasing popularity. However, FPRP is not designed to estimate the probability for a particular finding, because it is defined for an entire region of hypothetical findings with P-values at least as small as the one observed for that finding. Here we propose a method that lets researchers extract probability that a finding is spurious directly from a P-value. Considering the counterpart of that probability, we term this method POFIG: the Probability that a Finding is Genuine. Our approach shares FPRP''s simplicity, but gives a valid probability that a finding is spurious given a P-value. In addition to straightforward interpretation, POFIG has desirable statistical properties. The POFIG average across a set of tentative associations provides an estimated proportion of false discoveries in that set. POFIGs are easily combined across studies and are immune to multiple testing and selection bias. We illustrate an application of POFIG method via analysis of GWAS associations with Crohn''s disease.  相似文献   

8.
MOTIVATION: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.  相似文献   

9.
Nixon J 《Heredity》2006,96(4):290-297
It is important that breeders have the means to assess genetic scoring data for segregation distortion because of its probable effect on the design of efficient breeding strategies. Scoring data is usually assessed for segregation distortion by separate nonindependent chi2 tests at each locus in a set of marker loci. This analysis gives the loci most affected by selection if it exists, but it cannot give a statistically correct test for the presence or absence of selection in a linkage group as a whole. I have used a combined test based on the statistic, which is the most significant P-value from the above tests, called the single locus test. I have also derived mathematically a new combined statistical test, the overall test, for segregation distortion that requires genetic scoring data for a single linkage group. This test also takes genetic linkage into account. Using a range of marker densities and population sizes, simulations were carried out, to compare the power of these two statistical tests to detect the effect of selection at one or two loci. The single locus test was always found to be more powerful than the overall test, but the single locus test required a more complicated P-value correction. For the single locus test, approximate correction factors for the P-values are given for a range of marker densities and genetic lengths.  相似文献   

10.
11.
Due to the favorable attributes of Chinese hamster ovary (CHO) cells for therapeutic proteins and antibodies biomanufacturing, companies generate proprietary cells with desirable phenotypes. One key attribute is the ability to stably express multi-gram per liter titers in chemically defined media. Cell, media, and feed diversity has limited community efforts to translate knowledge. Moreover, academic, and nonprofit researchers generally cannot study “industrially relevant” CHO cells due to limited public availability, and the time and knowledge required to generate such cells. To address these issues, a university-industrial consortium (Advanced Mammalian Biomanufacturing Innovation Center, AMBIC) has acquired two CHO “reference cell lines” from different lineages that express monoclonal antibodies. These reference cell lines have relevant production titers, key performance outcomes confirmed by multiple laboratories, and a detailed technology transfer protocol. In commercial media, titers over 2 g/L are reached. Fed-batch cultivation data from shake flask and scaled-down bioreactors is presented. Using productivity as the primary attribute, two academic sites aligned with tight reproducibility at each site. Further, a chemically defined media formulation was developed and evaluated in parallel to the commercial media. The goal of this work is to provide a universal, industrially relevant CHO culture platform to accelerate biomanufacturing innovation.  相似文献   

12.
ARE PARTIAL MANTEL TESTS ADEQUATE?   总被引:7,自引:0,他引:7  
Partial Mantel tests were designed to test for correlation among three matrices of pairwise distances. We show through an example that these tests may be inadequate, because the associated P-value is not indicative of the type I error.  相似文献   

13.
Post‐translational modifications (PTMs) represent an important regulatory layer influencing the structure and function of proteins. With broader availability of experimental information on the occurrences of different PTM types, the investigation of a potential “crosstalk” between different PTM types and combinatorial effects have moved into the research focus. Hypothesizing that relevant interferences between different PTM types and sites may become apparent when investigating their mutual physical distances, we performed a systematic survey of pairwise homo‐ and heterotypic distances of seven frequent PTM types considering their sequence and spatial distances in resolved protein structures. We found that actual PTM site distance distributions differ from random distributions with most PTM type pairs exhibiting larger than expected distances with the exception of homotypic phosphorylation site distances and distances between phosphorylation and ubiquitination sites that were found to be closer than expected by chance. Random reference distributions considering canonical acceptor amino acid residues only were found to be shifted to larger distances compared to distances between any amino acid residue type indicating an underlying tendency of PTM‐amenable residue types to be further apart than randomly expected. Distance distributions based on sequence separations were found largely consistent with their spatial counterparts suggesting a primary role of sequence‐based pairwise PTM‐location encoding rather than folding‐mediated effects. Our analysis provides a systematic and comprehensive overview of the characteristics of pairwise PTM site distances on proteins and reveals that, predominantly, PTM sites tend to avoid close proximity with the potential implication that an independent attachment or removal of PTMs remains possible. Proteins 2016; 85:78–92. © 2016 Wiley Periodicals, Inc.  相似文献   

14.
1. We evaluated whether the surprisingly weak biological recovery associated with declines in acid deposition were related to drought‐induced re‐acidification of streams. We used test site analysis (TSA) to characterise temporal changes (1995–2003) in the degree and nature of impairments to stream benthic macroinvertebrate (BMI) communities influenced by acid deposition and drought. 2. The BMI communities in four historically impacted test streams were compared with communities in 30 minimally impacted reference streams. Six multivariate (e.g. ordination axes scores) and four traditional (e.g. % Diptera) summary metrics were used to describe BMI communities. Using all metrics simultaneously (i.e. Mahalanobis or generalised distance), the TSA provided a single probability that a test community was impaired. If a test community was significantly impaired, a further analysis was done to identify the metric(s) important in distinguishing the test community from reference condition. 3. Results of the TSAs indicated that the generalised distances between test communities and the reference condition were inversely related to stream water pH (n = 36). The TSAs also indicated ordination metrics based on BMI abundance were important in distinguishing significantly impaired communities from reference conditions. Temporal trends indicated that there has been short‐term recovery of these BMI communities, but that overall improvements have been hampered by acid or metal toxicity associated with drought‐induced re‐acidification of the streams. 4. Our use of a variety of summary metrics to obtain a single statistical test of significance within the context of the reference‐condition approach provided a simple and unambiguous framework for evaluating the biological condition of test sites.  相似文献   

15.
16.
Lloyd CJ 《Biometrics》2008,64(3):716-723
Summary .   We consider the problem of testing for a difference in the probability of success from matched binary pairs. Starting with three standard inexact tests, the nuisance parameter is first estimated and then the residual dependence is eliminated by maximization, producing what I call an E+M P-value. The E+M P-value based on McNemar's statistic is shown numerically to dominate previous suggestions, including partially maximized P-values as described in Berger and Sidik (2003, Statistical Methods in Medical Research 12, 91–108). The latter method, however, may have computational advantages for large samples.  相似文献   

17.
Characterizing marine water bodies and defining ecological status, both present and past (pre-impacted), has become an important task for EU's Member States and their associates during the last decade due to the implementation of the Water Framework Directive (WFD). However, none of the methods used to define Ecological Quality Status (EcoQS) are able to accurately define the status for both the present-day and reference conditions at a given site (i.e., in situ). Recent studies have revealed a significant correlation between the diversity of living (stained) fossilizable benthic foraminifera (protists) and associated environmental parameters (e.g., dissolved oxygen concentration). The present study takes this relationship a step further by applying methods used to define present-day EcoQS on fossil benthic foraminiferal assemblages and, thereby, defining past EcoQS (PaleoEcoQS). This is particularly useful for defining reference condition in areas where biological- and instrumental time-series are limited or lacking. Our case study from the Oslofjord, Norway, shows that (1) the “Foraminiferal method” can define temporal developments in in situ EcoQS from reference to present-day conditions, (2) results of the “Foraminiferal method” reflect available historical biological records and hydrographic time series, (3) data (1993 and 2009) on macrofauna (traditional bio-monitoring tool) and benthic foraminifera from the same sites define the same EcoQS, and (4) the changes in foraminiferal diversity through time are due to human activity (pollution), rather than climate change. Using in situ data to define ecological reference conditions is preferable compared to modeling or comparisons with present-day supposedly similar reference conditions.  相似文献   

18.
Zooplankton are potentially powerful proxies for the assessments of biologic integrity. The paleolimnological perspective and use of fossil Cladocera also provide the means to reconstruct reference conditions and natural long-term community dynamics. Unfortunately, the use of zooplankton in lake quality assessments is currently underexploited. We studied a surface sediment dataset of 41 lakes in Finland to examine the relationship between Cladocera remains and environmental variables. Of the examined environmental variables, total phosphorus availability was found to be the most important variable in explaining the Cladocera community composition. Following the tests on species environment relations, we selected a lake trophic typology as the most suitable environmental variable for developing a new tool for limnoecological quality assessments. A test of the model on a modern and historic sample from a eutrophied lake showed that the test lake has proceeded from “mesotrophic/poor” to “eutrophic/bad” limnoecological state in agreement with previous independent evidence. The model developed here showed favorable performance that can be used to provide reliable estimates of ecological and environmental state of lakes.  相似文献   

19.
Summary Central place foraging models assume that animals return to a single central place such as a nest, burrow, or sleeping site. Many animals, however choose between one of a limited number of central places. Such animals can be considered Multiple Central Place Foragers (MCPF), and such a strategy could reduce overall travel costs, if the forager selected a sleeping site close to current feeding areas. We examined the selection of sleeping sites (central places) by a community of spider monkeys (Ateles geoffroyi) in Santa Rosa National Park, Costa Rica in relation to the location of their feeding areas. Spider monkeys repeatedly used 11 sleeping trees, and they tended to choose the sleeping site closest to their current feeding area. A comparison of the observed travel distances with distances predicted for a MCPF strategy, a single central place strategy, and a strategy of randomly selecting sleeping sites demonstrated (1) that the MCPF strategy entailed the lowest travel costs, and (2) that the observed travel distance was best predicted by the MCPF strategy. Deviations between the observed distance travelled and the values predicted by the MCPF model increased after a feeding site had been used for several days. This appears to result from animals sampling their home range to locate new feeding sites.  相似文献   

20.
In experiments with many statistical tests there is need to balance type I and type II error rates while taking multiplicity into account. In the traditional approach, the nominal -level such as 0.05 is adjusted by the number of tests, , i.e., as 0.05/. Assuming that some proportion of tests represent “true signals”, that is, originate from a scenario where the null hypothesis is false, power depends on the number of true signals and the respective distribution of effect sizes. One way to define power is for it to be the probability of making at least one correct rejection at the assumed -level. We advocate an alternative way of establishing how “well-powered” a study is. In our approach, useful for studies with multiple tests, the ranking probability is controlled, defined as the probability of making at least correct rejections while rejecting hypotheses with smallest P-values. The two approaches are statistically related. Probability that the smallest P-value is a true signal (i.e., ) is equal to the power at the level , to an excellent approximation. Ranking probabilities are also related to the false discovery rate and to the Bayesian posterior probability of the null hypothesis. We study properties of our approach when the effect size distribution is replaced for convenience by a single “typical” value taken to be the mean of the underlying distribution. We conclude that its performance is often satisfactory under this simplification; however, substantial imprecision is to be expected when is very large and is small. Precision is largely restored when three values with the respective abundances are used instead of a single typical effect size value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号