首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Disease mapping of a single disease has been widely studied in the public health setup. Simultaneous modeling of related diseases can also be a valuable tool both from the epidemiological and from the statistical point of view. In particular, when we have several measurements recorded at each spatial location, we need to consider multivariate models in order to handle the dependence among the multivariate components as well as the spatial dependence between locations. It is then customary to use multivariate spatial models assuming the same distribution through the entire population density. However, in many circumstances, it is a very strong assumption to have the same distribution for all the areas of population density. To overcome this issue, we propose a hierarchical multivariate mixture generalized linear model to simultaneously analyze spatial Normal and non‐Normal outcomes. As an application of our proposed approach, esophageal and lung cancer deaths in Minnesota are used to show the outperformance of assuming different distributions for different counties of Minnesota rather than assuming a single distribution for the population density. Performance of the proposed approach is also evaluated through a simulation study.  相似文献   

2.
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.  相似文献   

3.
Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples. AVAILABILITY: The estimator and statistical inference procedures have been implemented in an R package 'CORREP' that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

4.
Summary In 2002, Ker–Chau Li introduced the liquid association measure to characterize three‐way interactions between genes, and developed a computationally efficient estimator that can be used to screen gene expression microarray data for such interactions. That study, and others published since then, have established the biological validity of the method, and clearly demonstrated it to be a useful tool for the analysis of genomic data sets. To build on this work, we have sought a parametric family of multivariate distributions with the flexibility to model the full range of trivariate dependencies encompassed by liquid association. Such a model could situate liquid association within a formal inferential theory. In this article, we describe such a family of distributions, a trivariate, conditional normal model having Gaussian univariate marginal distributions, and in fact including the trivariate Gaussian family as a special case. Perhaps the most interesting feature of the distribution is that the parameterization naturally parses the three‐way dependence structure into a number of distinct, interpretable components. One of these components is very closely aligned to liquid association, and is developed as a measure we call modified liquid association. We develop two methods for estimating this quantity, and propose statistical tests for the existence of this type of dependence. We evaluate these inferential methods in a set of simulations and illustrate their use in the analysis of publicly available experimental data.  相似文献   

5.
This paper proposes dynamic treatment regimes (DTRs) as effective individualized treatment strategies for managing chronic periodontitis. The proposed DTRs are studied via SMARTp —a two-stage sequential multiple assignment randomized trial (SMART) design. For this design, we propose a statistical analysis plan and a novel cluster-level sample size calculation method that factors in typical features of periodontal responses such as non-Gaussianity, spatial clustering, and nonrandom missingness. Here, each patient is viewed as a cluster, and a tooth within a patient's mouth is viewed as an individual unit inside the cluster, with the tooth-level covariance structure described by a conditionally autoregressive structure. To accommodate possible skewness and tail behavior, the tooth-level clinical attachment level (CAL) response is assumed to be skew-t, with the nonrandomly missing structure captured via a shared parameter model corresponding to the missingness indicator. The proposed method considers mean comparison for the regimes with or without sharing an initial treatment, where the expected values and corresponding variances or covariance for the sample means of a pair of DTRs are derived by the inverse probability weighting and method of moments. Simulation studies are conducted to investigate the finite-sample performance of the proposed sample size formulas under a variety of outcome-generating scenarios. An R package SMARTp implementing our sample size formula is available at the Comprehensive R Archive Network for free download.  相似文献   

6.
On any spatial scale, the species composition of a taxonomic group often departs from a phylogenetically random subset drawn from the pool of species available on a higher scale. Analysis of the uneven representation of related lineages in different assemblages can reveal the action of various forces shaping their diversification. For any assemblage, unequal diversification among lineages can be estimated using diversity skewness, an index of the balance of a phylogenetic tree whose values increase with increasing differences in diversification rates among tree branches. We tested for geographical patterns in the diversity skewness of flea assemblages parasitic on small mammals in 26 distinct geographic localities from the Palaearctic and 15 from the Nearctic. Overall, diversity skewness of the Nearctic flea assemblage was unexpectedly high compared to that of the global flea fauna, whereas that of the Palaearctic did not depart from the expectations of a null model. On a smaller scale, the diversity skewness of local flea assemblages was sometimes lower, sometimes higher, but, in most of the 41 localities, it did not differ significantly from that of random subsets taken from the species pool available on the larger spatial scale (either the world fauna or that of the biogeographical realm, i.e. Palaearctic or Nearctic). More importantly, among Palaearctic assemblages, diversity skewness increased with increasing latitude and/or decreasing mean air temperatures. The different patterns observed in the Palaearctic and Nearctic may be in part due the fact that flea diversification appears to have been more intense in the former than the latter, and to differences between them in relief and glacial history. Temperature‐driven speciation rates may well explain the latitudinal gradient in diversity skewness in the Palaearctic. The results illustrate the action of various biogeographical processes in shaping the uneven differentiation of flea lineages on different spatial scales. © 2008 The Linnean Society of London, Biological Journal of the Linnean Society, 2008, 95 , 807–814.  相似文献   

7.
In order to provide inferential support to the MI measure of sexual dimorphism we proposed for populations distributed as mixture models with two normal components, an interval estimate is constructed. There do not appear to exist measures of sexual dimorphism that possess inferential properties other than some statistics used with this purpose. The use of these sample functions in such a context as well as the purported inferential support of some other sexual dimorphism indices are discussed. A biological case study illustrates the distinct inferential conclusions that can be obtained when the indices here discussed and the one we proposed are considered.  相似文献   

8.
Mixture modeling is a popular approach to accommodate overdispersion, skewness, and multimodality features that are very common for health care utilization data. However, mixture modeling tends to rely on subjective judgment regarding the appropriate number of mixture components or some hypothesis about how to cluster the data. In this work, we adopt a nonparametric, variational Bayesian approach to allow the model to select the number of components while estimating their parameters. Our model allows for a probabilistic classification of observations into clusters and simultaneous estimation of a Gaussian regression model within each cluster. When we apply this approach to data on patients with interstitial lung disease, we find distinct subgroups of patients with differences in means and variances of health care costs, health and treatment covariates, and relationships between covariates and costs. The subgroups identified are readily interpretable, suggesting that this nonparametric variational approach to inference can discover valid insights into the factors driving treatment costs. Moreover, the learning algorithm we employed is very fast and scalable, which should make the technique accessible for a broad range of applications.  相似文献   

9.
Path analysis is one of several methods available for quantitative genetic analysis, providing for both tests of hypotheses and estimates of relevant parameters. Central to the theory is the assumption that the observations follow a multivariate normal distribution within families. The purpose of the present investigation is to assess the effects of a certain type of departures from multivariate normality using quantitative family data on lipid and lipoprotein levels. The results show that even large departures produce reasonably unbiased parameter estimates. Whereas moderate departures lead to few inferential errors in hypothesis testing, gross departures from multivariate normality may have considerable effects on likelihood ratio tests.  相似文献   

10.
MALDI mass spectrometry can generate profiles that contain hundreds of biomolecular ions directly from tissue. Spatially-correlated analysis, MALDI imaging MS, can simultaneously reveal how each of these biomolecular ions varies in clinical tissue samples. The use of statistical data analysis tools to identify regions containing correlated mass spectrometry profiles is referred to as imaging MS-based molecular histology because of its ability to annotate tissues solely on the basis of the imaging MS data. Several reports have indicated that imaging MS-based molecular histology may be able to complement established histological and histochemical techniques by distinguishing between pathologies with overlapping/identical morphologies and revealing biomolecular intratumor heterogeneity. A data analysis pipeline that identifies regions of imaging MS datasets with correlated mass spectrometry profiles could lead to the development of novel methods for improved diagnosis (differentiating subgroups within distinct histological groups) and annotating the spatio-chemical makeup of tumors. Here it is demonstrated that highlighting the regions within imaging MS datasets whose mass spectrometry profiles were found to be correlated by five independent multivariate methods provides a consistently accurate summary of the spatio-chemical heterogeneity. The corroboration provided by using multiple multivariate methods, efficiently applied in an automated routine, provides assurance that the identified regions are indeed characterized by distinct mass spectrometry profiles, a crucial requirement for its development as a complementary histological tool. When simultaneously applied to imaging MS datasets from multiple patient samples of intermediate-grade myxofibrosarcoma, a heterogeneous soft tissue sarcoma, nodules with mass spectrometry profiles found to be distinct by five different multivariate methods were detected within morphologically identical regions of all patient tissue samples. To aid the further development of imaging MS based molecular histology as a complementary histological tool the Matlab code of the agreement analysis, instructions and a reduced dataset are included as supporting information.  相似文献   

11.
12.
Spatial pattern and ecological analysis   总被引:65,自引:0,他引:65  
  相似文献   

13.
phylin is a package for the r programming environment which offers different methods to spatially interpolate genetic information from phylogeographic data. These interpolations can be used to predict the spatial occurrence of different lineages within a phylogeny using a modified method of kriging, which allows the usage of a genetic distance matrix to derive a model of spatial dependence. phylin improves the available methods to generate interpolated surfaces from a phylogenetic trees by assessing the autocorrelation structure of the genetic information, interpolating the genetic data based on a statistical model, estimating the uncertainty of the predictions and identifying lineage occurrence and contact zones probability without projection of pairwise genetic distances into mid‐points between sample locations. The package also includes methods to plot interpolation surfaces and provide summary tables from the generated data and models. We provide an example of the usefulness of this tool by inferring the spatial occurrence of distinct historical evolutionary lineages of the Lataste's viper (Vipera latastei Boscá, 1878) in the Iberian Peninsula and identifying potential contact areas. The maps of phylogenetic patterns obtained with these methods provide a spatial context to test hypotheses related to processes underlying the geographic distribution of genetic diversity and to inform conservation planning.  相似文献   

14.
Characterizing genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata are not always easily integrated into these methods in a user‐friendly fashion. Here, we present a deployable Python‐based web‐tool, mvmapper , for visualizing and exploring results of multivariate analyses in geographic space. This tool can be used to map results of virtually any multivariate analysis of georeferenced data, and routines for exporting results from a number of standard methods have been integrated in the R package adegenet , including principal components analysis (PCA), spatial PCA, discriminant analysis of principal components, principal coordinates analysis, nonmetric dimensional scaling and correspondence analysis. mvmapper 's greatest strength is facilitating dynamic and interactive exploration of the statistical and geographic frameworks side by side, a task that is difficult and time‐consuming with currently available tools. Source code and deployment instructions, as well as a link to a hosted instance of mvmapper , can be found at https://popphylotools.github.io/mvMapper/ .  相似文献   

15.
For many diseases, it is difficult or impossible to establish a definitive diagnosis because a perfect "gold standard" may not exist or may be too costly to obtain. In this paper, we propose a method to use continuous test results to estimate prevalence of disease in a given population and to estimate the effects of factors that may influence prevalence. Motivated by a study of human herpesvirus 8 among children with sickle-cell anemia in Uganda, where 2 enzyme immunoassays were used to assess infection status, we fit 2-component multivariate mixture models. We model the component densities using parametric densities that include data transformation as well as flexible transformed models. In addition, we model the mixing proportion, the probability of a latent variable corresponding to the true unknown infection status, via a logistic regression to incorporate covariates. This model includes mixtures of multivariate normal densities as a special case and is able to accommodate unusual shapes and skewness in the data. We assess model performance in simulations and present results from applying various parameterizations of the model to the Ugandan study.  相似文献   

16.
Quantitative comparison of the protein content of biological samples is a fundamental tool of research. The TMT and iTRAQ isobaric labeling technologies allow the comparison of 2, 4, 6, or 8 samples in one mass spectrometric analysis. Sound statistical models that scale with the most advanced mass spectrometry (MS) instruments are essential for their efficient use. Through the application of robust statistical methods, we developed models that capture variability from individual spectra to biological samples. Classical experimental designs with a distinct sample in each channel as well as the use of replicates in multiple channels are integrated into a single statistical framework. We have prepared complex test samples including controlled ratios ranging from 100:1 to 1:100 to characterize the performance of our method. We demonstrate its application to actual biological data sets originating from three different laboratories and MS platforms. Finally, test data and an R package, named isobar, which can read Mascot, Phenyx, and mzIdentML files, are made available. The isobar package can also be used as an independent software that requires very little or no R programming skills.  相似文献   

17.
Many traits of evolutionary interest, when placed in their developmental, physiological, or environmental contexts, are function-valued. For instance, gene expression during development is typically a function of the age of an organism and physiological processes are often a function of environment. In comparative and experimental studies, a fundamental question is whether the function-valued trait of one group is different from another. To address this question, evolutionary biologists have several statistical methods available. These methods can be classified into one of two types: multivariate and functional. Multivariate methods, including univariate repeated-measures analysis of variance (ANOVA), treat each trait as a finite list of data. Functional methods, such as repeated-measures regression, view the data as a sample of points drawn from an underlying function. A key difference between multivariate and functional methods is that functional methods retain information about the ordering and spacing of a set of data values, information that is discarded by multivariate methods. In this study, we evaluated the importance of that discarded information in statistical analyses of function-valued traits. Our results indicate that functional methods tend to have substantially greater statistical power than multivariate approaches to detect differences in a function-valued trait between groups.  相似文献   

18.
19.
DNA microarrays have been used in applications ranging from the assignment of gene function to analytical uses in prognostics. However, the detection sensitivity, cross hybridization, and reproducibility of these arrays can affect experimental design and data interpretation. Moreover, several technologies are available for fabrication of oligonucleotide microarrays. We review these technologies and performance attributes and, with data sets generated from human brain RNA, present statistical tools and methods to analyze data quality and to mine and visualize the data. Our data show high reproducibility and should allow an investigator to discern biological and regional variability from differential expression. Although we have used brain RNA as a model system to illustrate some of these points, the oligonucleotide arrays and methods employed in this study can be used with cell lines, tissue sections, blood, and other fluids. To further demonstrate this point, we provide data generated from total RNA sample sizes of 200 ng.  相似文献   

20.
Zhu J  Eickhoff JC  Yan P 《Biometrics》2005,61(3):674-683
Observations of multiple-response variables across space and over time occur often in environmental and ecological studies. Compared to purely spatial models for a single response variable in the exponential family of distributions, fewer statistical tools are available for multiple-response variables that are not necessarily Gaussian. An exception is a common-factor model developed for multivariate spatial data by Wang and Wall (2003, Biostatistics 4, 569-582). The purpose of this article is to extend this multivariate space-only model and develop a flexible class of generalized linear latent variable models for multivariate spatial-temporal data. For statistical inference, maximum likelihood estimates and their standard deviations are obtained using a Monte Carlo EM algorithm. We also use a novel way to automatically adjust the Monte Carlo sample size, which facilitates the convergence of the Monte Carlo EM algorithm. The methodology is illustrated by an ecological study of red pine trees in response to bark beetle challenges in a forest stand of Wisconsin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号