首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In biomedical studies the patients are often evaluated numerous times and a large number of variables are recorded at each time-point. Data entry and manipulation of longitudinal data can be performed using spreadsheet programs, which usually include some data plotting and analysis capabilities and are straightforward to use, but are not designed for the analyses of complex longitudinal data. Specialized statistical software offers more flexibility and capabilities, but first time users with biomedical background often find its use difficult. We developed medplot, an interactive web application that simplifies the exploration and analysis of longitudinal data. The application can be used to summarize, visualize and analyze data by researchers that are not familiar with statistical programs and whose knowledge of statistics is limited. The summary tools produce publication-ready tables and graphs. The analysis tools include features that are seldom available in spreadsheet software, such as correction for multiple testing, repeated measurement analyses and flexible non-linear modeling of the association of the numerical variables with the outcome. medplot is freely available and open source, it has an intuitive graphical user interface (GUI), it is accessible via the Internet and can be used within a web browser, without the need for installing and maintaining programs locally on the user’s computer. This paper describes the application and gives detailed examples describing how to use the application on real data from a clinical study including patients with early Lyme borreliosis.  相似文献   

2.
Meta‐analysis is an important tool for synthesizing research on a variety of topics in ecology and evolution, including molecular ecology, but can be susceptible to nonindependence. Nonindependence can affect two major interrelated components of a meta‐analysis: (i) the calculation of effect size statistics and (ii) the estimation of overall meta‐analytic estimates and their uncertainty. While some solutions to nonindependence exist at the statistical analysis stages, there is little advice on what to do when complex analyses are not possible, or when studies with nonindependent experimental designs exist in the data. Here we argue that exploring the effects of procedural decisions in a meta‐analysis (e.g. inclusion of different quality data, choice of effect size) and statistical assumptions (e.g. assuming no phylogenetic covariance) using sensitivity analyses are extremely important in assessing the impact of nonindependence. Sensitivity analyses can provide greater confidence in results and highlight important limitations of empirical work (e.g. impact of study design on overall effects). Despite their importance, sensitivity analyses are seldom applied to problems of nonindependence. To encourage better practice for dealing with nonindependence in meta‐analytic studies, we present accessible examples demonstrating the impact that ignoring nonindependence can have on meta‐analytic estimates. We also provide pragmatic solutions for dealing with nonindependent study designs, and for analysing dependent effect sizes. Additionally, we offer reporting guidelines that will facilitate disclosure of the sources of nonindependence in meta‐analyses, leading to greater transparency and more robust conclusions.  相似文献   

3.
The stochastic nature of high-throughput screening (HTS) data indicates that information may be gleaned by applying statistical methods to HTS data. A foundation of parametric statistics is the study and elucidation of population distributions, which can be modeled using modern spreadsheet software. The methods and results described here use fundamental concepts of statistical population distributions analyzed using a spreadsheet to provide tools in a developing armamentarium for extracting information from HTS data. Specific examples using two HTS kinase assays are analyzed. The analyses use normal and gamma distributions, which combine to form mixture distributions. HTS data were found to be described well using such mixture distributions, and deconvolution of the mixtures to the constituent gamma and normal parts provided insight into how the assays performed. In particular, the proportion of hits confirmed was predicted from the original HTS data and used to assess screening assay performance. The analyses also provide a method for determining how hit thresholds--values used to separate active from inactive compounds--affect the proportion of compounds verified as active and how the threshold can be chosen to optimize the selection process.  相似文献   

4.
Bellio R  Jensen JE  Seiden P 《Biometrics》2000,56(4):1204-1212
Dose-response models are intensively used in herbicide bioassays. Despite recent advancements in the development of new herbicides, statistical analyses are commonly based on asymptotic approximations that are sometimes poor. This paper presents the use of recent results in higher order asymptotics for likelihood-based inference in nonlinear regression. The methods presented provide accurate approximation for the distribution of test statistics and for prediction limits. Analyses of the fit and measures of detection limits of the bioassays are considered, and the potential of the methods is illustrated by examples with real data.  相似文献   

5.
Introduction Statistical information for LCA is increasingly becoming available in databases. At the same time, processing of statistical information is increasingly becoming easier by software for LCA. A practical problem is that there is no unique unambiguous representation for statistical distributions.- Representations. This paper discusses the most frequently encountered statistical distributions, their representation in mathematical statistics, EcoSpold and CMLCA, and the relationships between these representations.- The distributions. Four statistical distributions are discussed: uniform, triangular, normal and lognormal.- Software and examples. An easy to use software tool is available for supporting the conversion steps. Its use is illustrated with a simple example.Discussion This paper shows which ambiguities exist for specifying statistical distributions, and which complications can arise when uncertainty information is transferred from a database to an LCA program. This calls for a more extensive standardization of the vocabulary and symbols to express such information. We invite suppliers of software and databases to provide their parameter representations in a clear and unambiguous way and hope that a future revision of the ISO/TS 14048 document will standardize representation and terminology for statistical information.  相似文献   

6.
Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta-analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta-analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non-normal error structure and/or variances, and (4) when data are non-independent. Although interpretations of effect sizes are often difficult, we provide some pointers to help researchers. This paper serves both as a beginner's instruction manual and a stimulus for changing statistical practice for the better in the biological sciences.  相似文献   

7.
Ploidy pattern analysis. Statistical considerations   总被引:1,自引:0,他引:1  
Availability of large data sets of ploidy measurements makes it possible to study ploidy patterns for the diagnostic and prognostic clues they can provide. Appropriate statistical analyses can improve the accuracy and precision of these studies. Such statistical analyses include considerations of sample size requirements for the detection of different types of deviations from normal, analyses of sources of variability in ploidy patterns and assessment of the probabilities of both types of possible errors in patient classification. The advantages of statistical assessment in the classification of ploidy patterns associated with diagnostic categories are discussed in the context of these considerations.  相似文献   

8.
Summary .  The majority of the statistical literature for the joint modeling of longitudinal and time-to-event data has focused on the development of models that aim at capturing specific aspects of the motivating case studies. However, little attention has been given to the development of diagnostic and model-assessment tools. The main difficulty in using standard model diagnostics in joint models is the nonrandom dropout in the longitudinal outcome caused by the occurrence of events. In particular, the reference distribution of statistics, such as the residuals, in missing data settings is not directly available and complex calculations are required to derive it. In this article, we propose a multiple-imputation-based approach for creating multiple versions of the completed data set under the assumed joint model. Residuals and diagnostic plots for the complete data model can then be calculated based on these imputed data sets. Our proposals are exemplified using two real data sets.  相似文献   

9.
10.
Test statistics for detecting aneuploidy and hyperdiploidy   总被引:1,自引:0,他引:1  
Possible approaches to the analytical evaluation of ploidy patterns are discussed and two specific problems are considered: detection of early onset of aneuploidy and detection of moderate hyperdiploidy. A statistical model for a euploid DNA pattern is formulated in terms of a mixture distribution. A test statistic for detecting deviations from this pattern is defined, and its performance is evaluated for simulated data representing differing degrees of severity of aneuploidy. An analysis based on a discriminant function using order statistics of the sample cumulative distribution functions is proposed for detecting hyperdiploidy. This procedure has the advantage of being relatively distribution-free; its performance is evaluated for simulated data and is compared with that of its classical counterparts. Although the results reported are only preliminary, they indicate that tailor-made statistical analyses can provide early detection of aneuploidy and hyperdiploidy with known and acceptable error rates using clinically reasonable sample sizes.  相似文献   

11.
Non-random distributions of missing data are a general problem for likelihood-based statistical analyses, including those in a phylogenetic context. Extensive non-randomly distributed missing data are particularly problematic in supermatrix analyses that include many terminals and/or loci. It has been widely reported that missing data can lead to loss of resolution, but only very rarely create misleading or otherwise unsupported results in a parsimony context. Yet this does not hold for all parametric-based analyses because of their assumption of homogeneity across characters and lineages, which can lead to both long-branch attraction and long-branch repulsion. Contrived examples were used to demonstrate that non-random distributions of missing data, even without rate heterogeneity among characters and a well fitting model, can provide misleading likelihood-based topologies and branch-support values that are radically unstable based on slight modifications to character sampling. The same can occur despite complete absence of parsimony-informative characters. Otherwise unsupported resolution and high branch support for these clades were found to occur frequently in 22 empirical examples derived from a published supermatrix. Partitioning characters based on the distribution of missing data helped to decrease, but did not eliminate, these artifacts. These artifacts were exacerbated by low quality tree searches, particularly when holding only a single optimal tree that must be fully resolved.  相似文献   

12.
Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.  相似文献   

13.
Speakman JR 《Aging cell》2005,4(4):167-175
Comparative differences between species provide a powerful source of information that may inform our understanding of the aging process. However, two problems regularly attend such analyses. The co-variation of traits with body mass is frequently ignored, along with the lack of independence of the data due to a shared phylogenetic history. These problems undermine the use of simple correlations between various factors and maximum lifespan potential (MLSP) across different species as evidence that the factors in question have causal effects on aging. Both of these problems have been widely addressed by comparative biologists working in fields other than aging research, and statistical solutions to these issues are available. Using these statistical approaches, of making analyses of residual traits with the effects of body mass removed, and deriving phylogenetically independent contrasts, will allow analyses of the relationships between physiology and maximum lifespan potential to proceed unhindered by these difficulties, potentially leading to many useful insights into the aging process.  相似文献   

14.
More than 1700 trajectories of proteins representative of monomeric soluble structures in the protein data bank (PDB) have been obtained by means of state-of-the-art atomistic molecular dynamics simulations in near-physiological conditions. The trajectories and analyses are stored in a large data warehouse, which can be queried for dynamic information on proteins, including interactions. Here, we describe the project and the structure and contents of our database, and provide examples of how it can be used to describe the global flexibility properties of proteins. Basic analyses and trajectories stripped of solvent molecules at a reduced resolution level are available from our web server.  相似文献   

15.
Recent advances in high‐throughput methods of molecular analyses have led to an explosion of studies generating large‐scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in‐depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high‐throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure.  相似文献   

16.
Zoo and aquarium research presents many logistic challenges, including extremely small sample sizes and lack of independent data points, which lend themselves to the misuse of statistics. Pseudoreplication and pooling of data are two statistical problems common in research in the biological sciences. Although the prevalence of these and other statistical miscues have been documented in other fields, little attention has been paid to the practice of statistics in the field of zoo biology. A review of articles published in the journal Zoo Biology between 1999–2004 showed that approximately 40% of the 146 articles utilizing inferential statistics during that span contained some evidence of pseudoreplication or pooling of data. Nearly 75% of studies did not provide degrees of freedom for all statistics and approximately 20% did not report test statistic values. Although the level of pseudoreplication in this dataset is not outside the levels found in other branches of biology, it does indicate the challenges of dealing with appropriate data analysis in zoo and aquarium studies. The standardization of statistical techniques to deal with the methodological challenges of zoo and aquarium populations can help advance zoo research by guiding the production and analysis of applied studies. This study recommends techniques for dealing with these issues, including complete disclosure of data manipulation and reporting of statistical values, checking and control for institutional effects in statistical models, and avoidance of pseudoreplicated observations. Additionally, zoo biologists should seek out other models such as hierarchical or factorial models or randomization tests to supplement their repertoire of t‐tests and ANOVA. These suggestions are intended to stimulate conversation and examination of the current use of statistics in zoo biology in an effort to develop more consistent requirements for publication. Zoo Biol 0:1–14, 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

17.
Mathematical statistics deals with abstract notions, while medicine solves complicated and many-sided problems. For this reason medical statistics faces some moot points in the interpretation of a number of notions and the classification of statistical indices. In the present article the definition of variables and statistical indices is formulated and their characterization is given. An attempt is made to provide the systematization and natural classification of the latter. Statistical indices are defined as the characteristics of statistical totalities. To classify statistical indices, the most essential signs are used: the character of a variable (external relations), the trend of study (internal content), the form of expression (calculation), derived indices and characteristics (comparison and the results of analysis).  相似文献   

18.
Kaiser MS  Caragea PC 《Biometrics》2009,65(3):857-865
Summary .  The application of Markov random field models to problems involving spatial data on lattice systems requires decisions regarding a number of important aspects of model structure. Existing exploratory techniques appropriate for spatial data do not provide direct guidance to an investigator about these decisions. We introduce an exploratory quantity that is directly tied to the structure of Markov random field models based on one-parameter exponential family conditional distributions. This exploratory diagnostic is shown to be a meaningful statistic that can inform decisions involved in modeling spatial structure with statistical dependence terms. In this article, we develop the diagnostic, illustrate its use in guiding modeling decisions with simulated examples, and reexamine a previously published application.  相似文献   

19.
Experimental design and statistical analysis of data for predator preferences towards different types of prey have been problematic for several reasons. In addition to fundamental issues concerning the definition of preference, traditional statistical issues such as the appropriateness of statistical distributions such as the Binomial distribution, pseudo-replication, and the appropriate conditioning of probabilities have hindered progress on this important topic in ecology. This paper discusses these issues in the context of the methodology proposed by Underwood and Clarke [Underwood, A.J., Clarke, K.R., 2005. Solving some statistical problems in analyses of experiments on choices of food and on associations with habitat. J. Exp. Mar. Biol. Ecol. 318, 227-237.] in order to provide further clarity concerning the assumptions of this approach and therefore its applicability. In light of the difficulty justifying the validity of these assumptions in practice, an alternative approach is presented which has simpler statistical assumptions.  相似文献   

20.
It has been argued that the missing heritability in common diseases may be in part due to rare variants and gene-gene effects. Haplotype analyses provide more power for rare variants and joint analyses across genes can address multi-gene effects. Currently, methods are lacking to perform joint multi-locus association analyses across more than one gene/region. Here, we present a haplotype-mining gene-gene analysis method, which considers multi-locus data for two genes/regions simultaneously. This approach extends our single region haplotype-mining algorithm, hapConstructor, to two genes/regions. It allows construction of multi-locus SNP sets at both genes and tests joint gene-gene effects and interactions between single variants or haplotype combinations. A Monte Carlo framework is used to provide statistical significance assessment of the joint and interaction statistics, thus the method can also be used with related individuals. This tool provides a flexible data-mining approach to identifying gene-gene effects that otherwise is currently unavailable. AVAILABILITY: http://bioinformatics.med.utah.edu/Genie/hapConstructor.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号