首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 19 毫秒
1.
Abstract. Statistical measures of fidelity, i.e. the concentration of species occurrences in vegetation units, are reviewed and compared. The focus is on measures suitable for categorical data which are based on observed species frequencies within a vegetation unit compared with the frequencies expected under random distribution. Particular attention is paid to Bruelheide's u value. It is shown that its original form, based on binomial distribution, is an asymmetric measure of fidelity of a species to a vegetation unit which tends to assign comparatively high fidelity values to rare species. Here, a hypergeometric form of u is introduced which is a symmetric measure of the joint fidelity of species to a vegetation unit and vice versa. It is also shown that another form of the binomial u value may be defined which measures the asymmetric fidelity of a vegetation unit to a species. These u values are compared with phi coefficient, chi‐square, G statistic and Fisher's exact test. Contrary to the other measures, phi coefficient is independent of the number of relevés in the data set, and like the hypergeometric form of u and the chi‐square it is little affected by the relative size of the vegetation unit. It is therefore particularly useful when comparing species fidelity values among differently sized data sets and vegetation units. However, unlike the other measures it does not measure any statistical significance and may produce unreliable results for small vegetation units and small data sets. The above measures, all based on the comparison of observed/expected frequencies, are compared with the categorical form of the Dufrêne‐Legendre Indicator Value Index, an index strongly underweighting the fidelity of rare species. These fidelity measures are applied to a data set of 15 989 relevés of Czech herbaceous vegetation. In a small subset of this data set which simulates a phytosociological table, we demonstrate that traditional table analysis fails to determine diagnostic species of general validity in different habitats and large areas. On the other hand, we show that fidelity calculations used in conjunction with large data sets can replace expert knowledge in the determination of generally valid diagnostic species. Averaging positive fidelity values for all species within a vegetation unit is a useful approach to measure quality of delimination of the vegetation unit. We propose a new way of ordering species in synoptic species‐by‐relevé tables, using fidelity calculations.  相似文献   

2.
In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky, or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. Existing verification bias correction in three‐way ROC analysis focuses on ordinal tests. We propose verification bias‐correction methods to construct ROC surface and estimate the VUS for a continuous diagnostic test, based on inverse probability weighting. By applying U‐statistics theory, we develop asymptotic properties for the estimator. A Jackknife estimator of variance is also derived. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. The proposed methods are used to assess the ability of a biomarker to accurately identify stages of Alzheimer's disease.  相似文献   

3.
For an r × ctable with ordinal responses, odds ratios are commonly used to describe the relationship between the row and column variables. This article shows two types of ordinal odds ratios where local‐global odds ratios are used to compare several groups on a c‐category ordinal response and a global odds ratio is used to measure the global association between a pair of ordinal responses. When there is a stratification factor, we consider Mantel‐Haenszel (MH) type estimators of these odds ratios to summarize the association from several strata. Like the ordinary MH estimator of the common odds ratio for several 2 × 2 contingency tables, the estimators are used when the association is not expected to vary drastically among the strata. Also, the estimators are consistent under the ordinary asymptotic framework in which the number of strata is fixed and also under sparse asymptotics in which the number of strata grows with the sample size. Compared to the maximum likelihood estimators, simulations find that the MH type estimators perform better especially when each stratum has few observations. This article provides variances and covariances formulae for the local‐global odds ratios estimators and applies the bootstrap method to obtain a standard error for the global odds ratio estimator. At the end, we discuss possible ways of testing the homogeneity assumption.  相似文献   

4.
Evaluating the patterns of linkage disequilibrium (LD) is important for association mapping study as well as for studying the genomic architecture of human genome (e.g., haplotype block structures). Commonly used bi-allelic pairwise measures for assessing LD between two loci, such as r 2 and D′, may not make full and efficient use of modern multilocus data. Though extended to multilocus scenarios, their performance is still questionable. Meanwhile, most existing measures for an entire multilocus region, such as normalized entropy difference, do not consider existence of LD heterogeneity across the region under investigation. Additionally, these existing multilocus measures cannot handle distant regions where long-range LD patterns may exist. In this study, we proposed a novel multilocus LD measure developed based on mutual information theory. Our proposed measure described LD pattern between two chromosome regions each of which may consist of multiple loci (including multi-allele loci). As such, the proposed measure can better characterize LD patterns between two arbitrary regions. As potential applications, we developed algorithms on the proposed measure for partitioning haplotype blocks and for selecting haplotype tagging SNPs (htSNPs), which were helpful for follow-up association tests. The results on both simulated and empirical data showed that our LD measure had distinct advantages over pairwise and other multilocus measures. First, our measure was more robust, and can capture comprehensively the LD information between neighboring as well as disjointed regions. Second, haplotype blocks were better described via our proposed measure. Furthermore, association tests with htSNPs from the proposed algorithm had improved power over tests on single markers and on haplotypes.  相似文献   

5.
In 2012, Karplus and Diederichs demonstrated that the Pearson correlation coefficient CC1/2 is a far better indicator of the quality and resolution of crystallographic data sets than more traditional measures like merging R‐factor or signal‐to‐noise ratio. More specifically, they proposed that CC1/2 be computed for data sets in thin shells of increasing resolution so that the resolution dependence of that quantity can be examined. Recently, however, the CC1/2 values of entire data sets, i.e., cumulative correlation coefficients, have been used as a measure of data quality. Here, we show that the difference in cumulative CC1/2 value between a data set that has been accurately measured and a data set that has not is likely to be small. Furthermore, structures obtained by molecular replacement from poorly measured data sets are likely to suffer from extreme model bias.  相似文献   

6.
Abstract

We evaluated the feasibility of a set of indexes based on ground reaction forces to discriminate between the degree of severity of spastic diplegia, identified via Gross Motor Function Classification System (GMFCS). A stepwise discriminant ordinal regression analysis performed on a sample of 58 children returned a subset of variables related to the ratio between braking and propulsive vertical forces and anteroposterior timings. Rather, parameters concerning bilateral symmetry were poorly discriminating. The relative simplicity of the selected indexes allows for their easy implementation on existing gait analysis applications for screening purposes.  相似文献   

7.
  1. The receiver operating characteristic (ROC) and precision–recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence‐only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results.
  2. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user‐provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB‐based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB‐based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches.
  3. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB‐based ROC/PR plots also provide highly accurate estimations of c in our experiment.
  4. We conclude that the proposed PB‐based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
  相似文献   

8.
The origin recognition complex (ORC) is a pivotal element in DNA replication, heterochromatin assembly, checkpoint regulation and chromosome assembly. Although the functions of the ORC have been determined in yeast and model animals, they remain largely unknown in the plant kingdom. In this study, Oryza sativa Origin Recognition Complex subunit 3 (OsORC3) was cloned using map‐based cloning procedures, and functionally characterized using a rice (Oryza sativa) orc3 mutant. The mutant showed a temperature‐dependent defect in lateral root (LR) development. Map‐based cloning showed that a G→A mutation in the 9th exon of OsORC3 was responsible for the mutant phenotype. OsORC3 was strongly expressed in regions of active cell proliferation, including the primary root tip, stem base, lateral root primordium, emerged lateral root primordium, lateral root tip, young shoot, anther and ovary. OsORC3 knockdown plants lacked lateral roots and had a dwarf phenotype. The root meristematic zone of ORC3 knockdown plants exhibited increased cell death and reduced vital activity compared to the wild‐type. CYCB1;1::GUS activity and methylene blue staining showed that lateral root primordia initiated normally in the orc3 mutant, but stopped growing before formation of the stele and ground tissue. Our results indicate that OsORC3 plays a crucial role in the emergence of lateral root primordia.  相似文献   

9.
Collision‐activated dissociation and electron‐transfer dissociation (ETD) each produce spectra containing unique features. Though several database search algorithms (e.g. SEQUEST, MASCOT, and Open Mass Spectrometry Search Algorithm) have been modified to search ETD data, this consists chiefly of the ability to search for c‐ and z?‐ions; additional ETD‐specific features are often unaccounted for and may hinder identification. Removal of these features via spectral processing increased total search sensitivity by ~20% for both human and yeast data sets; unique peptide identifications increased by ~17% for the yeast data sets and ~16% for the human data set.  相似文献   

10.
Follmann D  Nason M 《Biometrics》2011,67(3):1127-1134
Summary Quantal bioassay experiments relate the amount or potency of some compound; for example, poison, antibody, or drug to a binary outcome such as death or infection in animals. For infectious diseases, probit regression is commonly used for inference and a key measure of potency is given by the IDP , the amount that results in P% of the animals being infected. In some experiments, a validation set may be used where both direct and proxy measures of the dose are available on a subset of animals with the proxy being available on all. The proxy variable can be viewed as a messy reflection of the direct variable, leading to an errors‐in‐variables problem. We develop a model for the validation set and use a constrained seemingly unrelated regression (SUR) model to obtain the distribution of the direct measure conditional on the proxy. We use the conditional distribution to derive a pseudo‐likelihood based on probit regression and use the parametric bootstrap for statistical inference. We re‐evaluate an old experiment in 21 monkeys where neutralizing antibodies (nABs) to HIV were measured using an old (proxy) assay in all monkeys and with a new (direct) assay in a validation set of 11 who had sufficient stored plasma. Using our methods, we obtain an estimate of the ID1 for the new assay, an important target for HIV vaccine candidates. In simulations, we compare the pseudo‐likelihood estimates with regression calibration and a full joint likelihood approach.  相似文献   

11.
The Tetrahymena thermophila origin recognition complex (ORC) contains an integral RNA subunit, 26T RNA, which confers specificity to the amplified ribosomal DNA (rDNA) origin by base pairing with an essential cis‐acting replication determinant—the type I element. Using a plasmid maintenance assay, we identified a 6.7 kb non‐rDNA fragment containing two closely associated replicators, ARS1‐A (0.8 kb) and ARS1‐B (1.2 kb). Both replicators lack type I elements and hence complementarity to 26T RNA, suggesting that ORC is recruited to these sites by an RNA‐independent mechanism. Consistent with this prediction, although ORC associated exclusively with origin sequences in the 21 kb rDNA minichromosome, the interaction between ORC and the non‐rDNA ARS1 chromosome changed across the cell cycle. In G2 phase, ORC bound to all tested sequences in a 60 kb interval spanning ARS1‐A/B. Remarkably, ORC and Mcm6 associated with just the ARS1‐A replicator in G1 phase when pre‐replicative complexes assemble. We propose that ORC is stochastically deposited onto newly replicated non‐rDNA chromosomes and subsequently targeted to preferred initiation sites prior to the next S phase.  相似文献   

12.
Although most of the statistical methods for diagnostic studies focus on disease processes with binary disease status, many diseases can be naturally classified into three ordinal diagnostic categories, that is normal, early stage, and fully diseased. For such diseases, the volume under the ROC surface (VUS) is the most commonly used index of diagnostic accuracy. Because the early disease stage is most likely the optimal time window for therapeutic intervention, the sensitivity to the early diseased stage has been suggested as another diagnostic measure. For the purpose of comparing the diagnostic abilities on early disease detection between two markers, it is of interest to estimate the confidence interval of the difference between sensitivities to the early diseased stage. In this paper, we present both parametric and non‐parametric methods for this purpose. An extensive simulation study is carried out for a variety of settings for the purpose of evaluating and comparing the performance of the proposed methods. A real example of Alzheimer's disease (AD) is analyzed using the proposed approaches.  相似文献   

13.
Acute myeloid leukaemia (AML) is the most common type of adult acute leukaemia and has a poor prognosis. Thus, optimal risk stratification is of greatest importance for reasonable choice of treatment and prognostic evaluation. For our study, a total of 1707 samples of AML patients from three public databases were divided into meta‐training, meta‐testing and validation sets. The meta‐training set was used to build risk prediction model, and the other four data sets were employed for validation. By log‐rank test and univariate COX regression analysis as well as LASSO‐COX, AML patients were divided into high‐risk and low‐risk groups based on AML risk score (AMLRS) which was constituted by 10 survival‐related genes. In meta‐training, meta‐testing and validation sets, the patient in the low‐risk group all had a significantly longer OS (overall survival) than those in the high‐risk group (P < .001), and the area under ROC curve (AUC) by time‐dependent ROC was 0.5854‐0.7905 for 1 year, 0.6652‐0.8066 for 3 years and 0.6622‐0.8034 for 5 years. Multivariate COX regression analysis indicated that AMLRS was an independent prognostic factor in four data sets. Nomogram combining the AMLRS and two clinical parameters performed well in predicting 1‐year, 3‐year and 5‐year OS. Finally, we created a web‐based prognostic model to predict the prognosis of AML patients ( https://tcgi.shinyapps.io/amlrs_nomogram/ ).  相似文献   

14.
Anderson MJ 《Biometrics》2006,62(1):245-253
Summary The traditional likelihood‐based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero‐inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non‐Euclidean dissimilarity would be more appropriate. Distance‐based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. They rely on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes. The tests are straightforward multivariate extensions of Levene's test, with P‐values obtained either using the traditional F‐distribution or using permutation of either least‐squares or LAD residuals. Examples illustrate the utility of the approach, including the analysis of stabilizing selection in sparrows, biodiversity of New Zealand fish assemblages, and the response of Indonesian reef corals to an El Niño. Monte Carlo simulations from the real data sets show that the distance‐based tests are robust and powerful for relevant alternative hypotheses of real differences in spread.  相似文献   

15.
Genetic marker‐based identification of distinct individuals and recognition of duplicated individuals has important applications in many research areas in ecology, evolutionary biology, conservation biology and forensics. The widely applied genotype mismatch (MM) method, however, is inaccurate because it relies on a fixed and suboptimal threshold number (TM) of mismatches, and often yields self‐inconsistent pairwise inferences. In this study, I improved MM method by calculating an optimal TM to accommodate the number, mistyping rates, missing data and allele frequencies of the markers. I also developed a pairwise likelihood relationship (LR) method and a likelihood clustering (LC) method for individual identification, using poor‐quality data that may have high and variable rates of allelic dropouts and false alleles at genotyped loci. The 3 methods together with the relatedness (RL) method were then compared in accuracy by analysing an empirical frog data set and many simulated data sets generated under different parameter combinations. The analysis results showed that LC is generally one or two orders more accurate for individual identification than the other methods. Its accuracy is especially superior when the sampled multilocus genotypes have poor quality (i.e. teemed with genotyping errors and missing data) and highly replicated, a situation typical of noninvasive sampling used in estimating population size. Importantly, LC is the only method that guarantees to produce self‐consistent results by partitioning the entire set of multilocus genotypes into distinct clusters, each cluster containing one or more genotypes that all represent the same individual. The LC and LR methods were implemented in a computer program COLONY for free download from the Internet.  相似文献   

16.
Several novel and established knowledge‐based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near‐native biomolecular interactions from incorrect ones. We developed the r·m·r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r·m·r had the best discriminatory accuracy and power for protein‐small molecule and protein‐DNA interactions, regardless of whether the native complex was included or excluded, from the test set. The superior performance of the r·m·r discriminatory function compared with seventeen alternative functions evaluated on publicly available test sets for protein‐small molecule and protein‐DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu . Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

17.
Question: Indices of functional diversity have been seen as the key for integrating information on species richness with measures that focus on those components of community composition related to ecosystem functioning. For comparing species richness among habitats on an equal‐effort basis, so‐called sample‐based rarefaction curves may be used. Given a study area that is sampled for species presence and absence in N plots, sample‐based rarefaction generates the expected number of accumulated species as the number of sampled plots increases from 1 to N. Accordingly, the question for this study is: can we construct a ‘functional rarefaction curve’ that summarizes the expected functional dissimilarity between species when n plots are drawn at random from a larger pool of N plots? Methods: In this paper, we propose a parametric measure of functional diversity that is obtained by combining sample‐based rarefaction techniques that are usually applied to species richness with Rao's quadratic diversity. For a given set of N presence/absence plots, the resulting measure summarizes the expected functional dissimilarity at an increasingly larger cumulative number of plots n (nN). Results and Conclusions: Due to its parametric nature, the proposed measure is progressively more sensitive to rare species with increasing plot number, thus rendering this measure adequate for comparing the functional diversity of species assemblages that have been sampled with variable effort.  相似文献   

18.

A comparison is made between existing mathematical models and experimental data that relate the reduction of the saturated hydraulic conductivity (K) of a porous medium to the porosity reduction caused by microbial growth. The models yielded a realistic prediction of a data set obtained with a model porous medium consisting of millimeter‐size glass spheres, but failed to predict the clogging behaviour observed in smaller‐than‐1‐mm sand. A new modelling approach, semi‐mechanistic in nature, is proposed that gives good predictions of fine sand media as well. It relaxes the assumption about uniformly‐thick biofilms by allowing a second arrangement to occur, i.e. discrete plugs filling the pore lumen. The new model requires input data on two intrinsic properties of the system, which renders it sufficiently flexible as to fit very different data sets. The two model parameters are Kmin, the minimum K value when all porosity is filled with microorganisms, and Bc, the biovolume fraction at which most cell detachment from biofilm occurs.  相似文献   

19.
Two sets of von Bertalanffy growth parameter (VBGP) estimates are provided for several Mediterranean fish stocks. All estimates are based on the non‐linear least square regression and accompanied by uncertainty measures (i.e. standard errors). The first set consists of growth parameters estimated from 73 published length‐at‐age data with no previous VBGP estimations; in this case, fitting was possible for 30 length‐at‐age sets, corresponding to 22 species, two estimates of which (Mycteroperca rubra and Myctophum punctatum) are the first for the Mediterranean. The second set refers to the re‐estimation of VBGPs from 69 published length‐at‐age data with available original VBGP estimates derived from linear methods (i.e. Ford‐Walford, von Bertalanffy and Gulland‐Holt plots); in this case, fitting was possible for 50 sets. Overall VBGP estimation was not possible for 43 and 19 cases for the first and second sets, respectively. This was because either (a) <4 mean length‐at‐age data were available, or (b) fitting was not possible because of an exponential or a very slow linear increase of length with age, or (c) estimates were unrealistic (i.e. Lmax/L∞ < 0.7) mainly because of unrealistic length‐at‐ages and/or insufficient sampling of older individuals. These estimations and re‐estimations enrich the available data on growth parameters of Mediterranean fishes, both in terms of quantity and quality of information.  相似文献   

20.
BackgroundReceiver Operator Characteristic (ROC) curves are being used to identify Minimally Important Change (MIC) thresholds on scales that measure a change in health status. In quasi-continuous patient reported outcome measures, such as those that measure changes in chronic diseases with variable clinical trajectories, sensitivity and specificity are often valued equally. Notwithstanding methodologists agreeing that these should be valued equally, different approaches have been taken to estimating MIC thresholds using ROC curves.MethodsUsing graphical methods, hypothetical data, and data from a large randomised controlled trial of manual therapy for low back pain, we compared two existing approaches with a new approach that is based on the addition of the sums of squares of 1-sensitivity and 1-specificity.ResultsThere can be divergence in the thresholds chosen by different estimators. The cut-point selected by different estimators is dependent on the relationship between the cut-points in ROC space and the different contours described by the estimators. In particular, asymmetry and the number of possible cut-points affects threshold selection.ConclusionChoice of MIC estimator is important. Different methods for choosing cut-points can lead to materially different MIC thresholds and thus affect results of responder analyses and trial conclusions. An estimator based on the smallest sum of squares of 1-sensitivity and 1-specificity is preferable when sensitivity and specificity are valued equally. Unlike other methods currently in use, the cut-point chosen by the sum of squares method always and efficiently chooses the cut-point closest to the top-left corner of ROC space, regardless of the shape of the ROC curve.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号