首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Finite mixture of Gaussian distributions provide a flexible semiparametric methodology for density estimation when the continuous variables under investigation have no boundaries. However, in practical applications, variables may be partially bounded (e.g., taking nonnegative values) or completely bounded (e.g., taking values in the unit interval). In this case, the standard Gaussian finite mixture model assigns nonzero densities to any possible values, even to those outside the ranges where the variables are defined, hence resulting in potentially severe bias. In this paper, we propose a transformation‐based approach for Gaussian mixture modeling in case of bounded variables. The basic idea is to carry out density estimation not on the original data but on appropriately transformed data. Then, the density for the original data can be obtained by a change of variables. Both the transformation parameters and the parameters of the Gaussian mixture are jointly estimated by the expectation‐maximization (EM) algorithm. The methodology for partially and completely bounded data is illustrated using both simulated data and real data applications.  相似文献   

2.
In this paper, we apply flexible Bayesian survival analysis methods to investigate the risk of lymphoma associated with kidney transplantation among patients with end-stage renal disease. Of key interest is the potentially time-varying effect of a time-dependent exposure: transplant status. Bayesian modeling of the baseline hazard and the effect of transplant requires consideration of 2 timescales: time since study start and time since transplantation, respectively. Previous related work has not dealt with the separation of multiple timescales. Using a hierarchical model for the hazard function, both timescales are incorporated via conditionally independent stochastic processes; smoothing of each process is specified via intrinsic conditional Gaussian autoregressions. Features of the corresponding posterior distribution are evaluated from draws obtained via a Metropolis-Hastings-Green algorithm.  相似文献   

3.
Repeatability (more precisely the common measure of repeatability, the intra‐class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between‐subject (or between‐group) variation. As a consequence, the non‐repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non‐Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non‐Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation‐based, analysis of variance (ANOVA)‐based and linear mixed‐effects model (LMM)‐based methods, while for non‐Gaussian data, we focus on generalised linear mixed‐effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM‐ and GLMM‐based approaches mainly because of the ease with which confounding variables can be controlled for. Furthermore, we compare two types of repeatability (ordinary repeatability and extrapolated repeatability) in relation to narrow‐sense heritability. This review serves as a collection of guidelines and recommendations for biologists to calculate repeatability and heritability from both Gaussian and non‐Gaussian data.  相似文献   

4.
One size (SIZE) and four shape measures (SHAPE 1-SHAPE 4) were derived from a multiple group principal components analysis of 15 osteometric variables in inbred and hybrid house mice. In both sexes, SIZE and two of the four SHAPE variables showed positive heterosis, the other two SHAPE variables exhibiting negative heterosis. SIZE showed a greater magnitude of heterosis (average of about 2.3 standard deviations) than all SHAPE characters except SHAPE 2, a skull length/width contrast. Inbreds were more variable than hybrids (positive homeostasis) for all characters, and there was a significant, positive correlation between heterosis and homeostasis in these characters. The reciprocals category in hybrids was more important for SIZE than for the SHAPE variables, presumably because maternal effects have a greater influence on growth characters. Broad-sense heritabilities for SIZE were 0.8 in inbreds and 0.6 in hybrids whereas they averaged only 0.4 for the SHAPE variables. It was postulated that there is a greater number of loci governing SIZE compared to SHAPE, and that this explains both the heritability and heterosis differences between these characters.  相似文献   

5.
On the use of the variogram in checking for independence in spatial data   总被引:1,自引:0,他引:1  
Diblasi A  Bowman AW 《Biometrics》2001,57(1):211-218
The variogram is a standard tool in the analysis of spatial data, and its shape provides useful information on the form of spatial correlation that may be present. However, it is also useful to be able to assess the evidence for the presence of any spatial correlation. A method of doing this, based on an assessment of whether the true function underlying the variogram is constant, is proposed. Nonparametric smoothing of the squared differences of the observed variables, on a suitably transformed scale, is used to estimate variogram shape. A statistic based on a ratio of quadratic forms is proposed and the test is constructed by investigating the distributional properties of this statistic under the assumption of an independent Gaussian process. The power of the test is investigated. Reference bands are proposed as a graphical follow-up. An example is discussed.  相似文献   

6.
Using probabilistic analysis may be very useful for risk management in developing countries, where information, resources, and technical expertise are often scarce. Currently, most regulatory agencies recommend using deterministic approaches for the analysis of problems relating to decision-making. However, this approach does not incorporate uncertainty in the variables, nor the propagation of uncertainty through the different processes in which they are involved. The complexity of the problem is therefore arbitrarily reduced, and valuable information that could be useful for proposing realistic policies is not considered. This article compares the results of a deterministic analysis with those of a probabilistic one for regulating arsenic in Chile, and differences are established for public policy as a result of building uncertainty into the analysis. It is concluded that the use of a deterministic approach can lead to higher risks than necessary and that probabilistic results can help the regulator negotiate stricter standards. Alternatively, the regulator may end up imposing much higher costs to sources than originally expected as these will be forced to use expensive technology to comply consistently with a given standard.  相似文献   

7.
BACKGROUND AND AIMS: Biomass is an important trait in functional ecology and growth analysis. The typical methods for measuring biomass are destructive. Thus, they do not allow the development of individual plants to be followed and they require many individuals to be cultivated for repeated measurements. Non-destructive methods do not have these limitations. Here, a non-destructive method based on digital image analysis is presented, addressing not only above-ground fresh biomass (FBM) and oven-dried biomass (DBM), but also vertical biomass distribution as well as dry matter content (DMC) and growth rates. METHODS: Scaled digital images of the plants silhouettes were taken for 582 individuals of 27 grass species (Poaceae). Above-ground biomass and DMC were measured using destructive methods. With image analysis software Zeiss KS 300, the projected area and the proportion of greenish pixels were calculated, and generalized linear models (GLMs) were developed with destructively measured parameters as dependent variables and parameters derived from image analysis as independent variables. A bootstrap analysis was performed to assess the number of individuals required for re-calibration of the models. KEY RESULTS: The results of the developed models showed no systematic errors compared with traditionally measured values and explained most of their variance (R(2) > or = 0.85 for all models). The presented models can be directly applied to herbaceous grasses without further calibration. Applying the models to other growth forms might require a re-calibration which can be based on only 10-20 individuals for FBM or DMC and on 40-50 individuals for DBM. CONCLUSIONS: The methods presented are time and cost effective compared with traditional methods, especially if development or growth rates are to be measured repeatedly. Hence, they offer an alternative way of determining biomass, especially as they are non-destructive and address not only FBM and DBM, but also vertical biomass distribution and DMC.  相似文献   

8.
Fascioliasis is an important human and animal disease caused by Fasciola hepatica and Fasciola gigantica. In Iran, the distribution of these two species overlaps in most areas, including the northern human endemic province of Gilan where both fasciolids are simultaneously found in individual cattle and buffaloes. A phenotypic study of fasciolid adult flukes from naturally infected bovines from Gilan was carried out by means of an exhaustive morphometric analysis using traditional microscopic measurements and an allometric model. The Iranian fasciolids were compared to F. hepatica and F. gigantica standard populations, i.e. from geographical areas where both species do not co-exist (Bolivia and Burkina Faso, respectively). Although morphometric values somewhat overlapped, there were clear differences in allometric growth. The allometric function was adjusted to 25 pairs of variables. Results obtained revealed that Iranian F. hepatica-like specimens are larger than the F. hepatica standard and Iranian F. gigantica-like specimens are longer and narrower than the F. gigantica standard, but with smaller body area. Measurements which permit a specific differentiation in allopatric populations (distance between ventral sucker and posterior end of the body; ratio between body length and body width) overlap in the specimens from Gilan, thus proving the presence of intermediate forms. When compared to the standard populations, the different Iranian fasciolid morphs show greater differences in F. gigantica-like specimens than in F. hepatica-like specimens. This study shows that simple, traditional microscopic measurements may be sufficient for the morphometric characterisation of fasciolids, even in areas where intermediate forms are present.  相似文献   

9.
Multivariate linear models are increasingly important in quantitative genetics. In high dimensional specifications, factor analysis (FA) may provide an avenue for structuring (co)variance matrices, thus reducing the number of parameters needed for describing (co)dispersion. We describe how FA can be used to model genetic effects in the context of a multivariate linear mixed model. An orthogonal common factor structure is used to model genetic effects under Gaussian assumption, so that the marginal likelihood is multivariate normal with a structured genetic (co)variance matrix. Under standard prior assumptions, all fully conditional distributions have closed form, and samples from the joint posterior distribution can be obtained via Gibbs sampling. The model and the algorithm developed for its Bayesian implementation were used to describe five repeated records of milk yield in dairy cattle, and a one common FA model was compared with a standard multiple trait model. The Bayesian Information Criterion favored the FA model.  相似文献   

10.
Aquatic Oligochaetes in ditches   总被引:4,自引:4,他引:0  
  相似文献   

11.
Although many of the statistical techniques used in comparative biology were originally developed in quantitative genetics, subsequent development of comparative techniques has progressed in relative isolation. Consequently, many of the new and planned developments in comparative analysis already have well‐tested solutions in quantitative genetics. In this paper, we take three recent publications that develop phylogenetic meta‐analysis, either implicitly or explicitly, and show how they can be considered as quantitative genetic models. We highlight some of the difficulties with the proposed solutions, and demonstrate that standard quantitative genetic theory and software offer solutions. We also show how results from Bayesian quantitative genetics can be used to create efficient Markov chain Monte Carlo algorithms for phylogenetic mixed models, thereby extending their generality to non‐Gaussian data. Of particular utility is the development of multinomial models for analysing the evolution of discrete traits, and the development of multi‐trait models in which traits can follow different distributions. Meta‐analyses often include a nonrandom collection of species for which the full phylogenetic tree has only been partly resolved. Using missing data theory, we show how the presented models can be used to correct for nonrandom sampling and show how taxonomies and phylogenies can be combined to give a flexible framework with which to model dependence.  相似文献   

12.
Species distributions can be analysed under two perspectives: the niche‐based approach, which focuses on species–environment relationships; and the dispersal‐based approach, which focuses on metapopulation dynamics. The degree to which each of these two components affect species distributions may depend on habitat fragmentation, species traits and phylogenetic constraints. We analysed the distributions of 36 stream insect species across 60 stream sites in three drainage basins at high latitudes in Finland. We used binomial generalised linear models (GLMs) in which the predictor variables were environmental factors (E models), within‐basin spatial variables as defined by Moran's eigenvector maps (M models), among‐basin variability (B models), or a combination of the three (E + M + B models) sets of variables. Based on a comparative analysis, model performance was evaluated across all the species using Gaussian GLMs whereby the deviance accounted for by binomial GLMs was fitted on selected explanatory variables: niche position, niche breadth, site occupancy, biological traits and taxonomic relatedness. For each type of model, a reduced Gaussian GLM was eventually obtained after variable selection (Bayesian information criterion). We found that niche position was the only variable selected in all reduced models, implying that marginal species were better predicted than non‐marginal species. The influence of niche position was strongest in models based on environmental variables (E models) or a combination of all types of variables (E + M + B models), and weakest in spatial autocorrelation models (M models). This suggests that species–environment relationships prevail over dispersal processes in determining stream insect distributions at a regional scale. Our findings have clear implications for biodiversity conservation strategies, and they also emphasise the benefits of considering both the niche‐based and dispersal‐based approaches in species distribution modelling studies.  相似文献   

13.
Geostatistical simulated realization maps can represent the spatial heterogeneity of the studied spatial variable more realistically than the kriged optimal map because they overcome the smoothing effect of interpolation. The difference among realizations indicates spatial uncertainty. These realizations may serve as input data to transfer functions to further evaluate the resulting uncertainty in impacted dependent variables. In this study, sequential Gaussian simulation was used to simulate the spatial distribution of soil nickel (Ni) in the top soils of a 31 km2 area within the urban-rural transition zone of Wuhan, China. Simulated realizations were then imported into transfer functions to calculate the health risk costs caused by Ni polluted areas ignored in remediation due to underestimation of the Ni contents and the remediation risk costs caused by unnecessary remediation of unpolluted areas due to overestimation of the Ni contents. The uncertainty about the input Ni content values thus propagated through these transfer functions, leading to uncertain responses in health risk costs and remediation risk costs. The spatial uncertainty of the two forms of risk costs were assessed based on the response realizations. Because the risk of exposure of soil Ni to humans and animals is generally greater in contaminated arable lands than in industrial and residential areas, the effect of land use types was also taken into account in risk cost estimation. Results showed that high health costs mainly appear in the southwest part of the study area, while high remediation costs mainly occur in the east, middle and northwest of the study area, and that most of the south part of the study area was delineated as contaminated according to the minimum expected cost standard. This study shows that sequential Gaussian simulation and transfer functions are valuable tools for assessing risk costs of soil contamination delineation and associated spatial uncertainty.  相似文献   

14.
In many metabolomics studies, NMR spectra are divided into bins of fixed width. This spectral quantification technique, known as uniform binning, is used to reduce the number of variables for pattern recognition techniques and to mitigate effects from variations in peak positions; however, shifts in peaks near the boundaries can cause dramatic quantitative changes in adjacent bins due to non-overlapping boundaries. Here we describe a new Gaussian binning method that incorporates overlapping bins to minimize these effects. A Gaussian kernel weights the signal contribution relative to distance from bin center, and the overlap between bins is controlled by the kernel standard deviation. Sensitivity to peak shift was assessed for a series of test spectra where the offset frequency was incremented in 0.5 Hz steps. For a 4 Hz shift within a bin width of 24 Hz, the error for uniform binning increased by 150%, while the error for Gaussian binning increased by 50%. Further, using a urinary metabolomics data set (from a toxicity study) and principal component analysis (PCA), we showed that the information content in the quantified features was equivalent for Gaussian and uniform binning methods. The separation between groups in the PCA scores plot, measured by the J 2 quality metric, is as good or better for Gaussian binning versus uniform binning. The Gaussian method is shown to be robust in regards to peak shift, while still retaining the information needed by classification and multivariate statistical techniques for NMR-metabolomics data.  相似文献   

15.
Forecasting population decline to a certain critical threshold (the quasi-extinction risk) is one of the central objectives of population viability analysis (PVA), and such predictions figure prominently in the decisions of major conservation organizations. In this paper, we argue that accurate forecasting of a population's quasi-extinction risk does not necessarily require knowledge of the underlying biological mechanisms. Because of the stochastic and multiplicative nature of population growth, the ensemble behaviour of population trajectories converges to common statistical forms across a wide variety of stochastic population processes. This paper provides a theoretical basis for this argument. We show that the quasi-extinction surfaces of a variety of complex stochastic population processes (including age-structured, density-dependent and spatially structured populations) can be modelled by a simple stochastic approximation: the stochastic exponential growth process overlaid with Gaussian errors. Using simulated and real data, we show that this model can be estimated with 20-30 years of data and can provide relatively unbiased quasi-extinction risk with confidence intervals considerably smaller than (0,1). This was found to be true even for simulated data derived from some of the noisiest population processes (density-dependent feedback, species interactions and strong age-structure cycling). A key advantage of statistical models is that their parameters and the uncertainty of those parameters can be estimated from time series data using standard statistical methods. In contrast for most species of conservation concern, biologically realistic models must often be specified rather than estimated because of the limited data available for all the various parameters. Biologically realistic models will always have a prominent place in PVA for evaluating specific management options which affect a single segment of a population, a single demographic rate, or different geographic areas. However, for forecasting quasi-extinction risk, statistical models that are based on the convergent statistical properties of population processes offer many advantages over biologically realistic models.  相似文献   

16.
Zhu J  Eickhoff JC  Yan P 《Biometrics》2005,61(3):674-683
Observations of multiple-response variables across space and over time occur often in environmental and ecological studies. Compared to purely spatial models for a single response variable in the exponential family of distributions, fewer statistical tools are available for multiple-response variables that are not necessarily Gaussian. An exception is a common-factor model developed for multivariate spatial data by Wang and Wall (2003, Biostatistics 4, 569-582). The purpose of this article is to extend this multivariate space-only model and develop a flexible class of generalized linear latent variable models for multivariate spatial-temporal data. For statistical inference, maximum likelihood estimates and their standard deviations are obtained using a Monte Carlo EM algorithm. We also use a novel way to automatically adjust the Monte Carlo sample size, which facilitates the convergence of the Monte Carlo EM algorithm. The methodology is illustrated by an ecological study of red pine trees in response to bark beetle challenges in a forest stand of Wisconsin.  相似文献   

17.
Two numerical methods, Decision Analysis (DA) and Potential Problem Analysis (PPA) are presented as alternative selection methods to the logical method presented in Part I. In DA properties are weighted and outcomes are scored. The weighted scores for each candidate are totaled and final selection is based on the totals. Higher scores indicate better candidates. In PPA potential problems are assigned a seriousness factor and test outcomes are used to define the probability of occurrence. The seriousness-probability products are totaled and forms with minimal scores are preferred. DA and PPA have never been compared to the logical-elimination method. Additional data were available for two forms of McN-5707 to provide complete preformulation data for five candidate forms. Weight and seriousness factors (independent variables) were obtained from a survey of experienced formulators. Scores and probabilities (dependent variables) were provided independently by Preformulation. The rankings of the five candidate forms, best to worst, were similar for all three methods. These results validate the applicability of DA and PPA for candidate form selection. DA and PPA are particularly applicable in cases where there are many candidate forms and where each form has some degree of unfavorable properties. Presented at the 41st Annual Pharmaceutical Technologies Arden Conference—Oral Controlled Release Development and Technology, January 2006, West Point NY.  相似文献   

18.
Hokeun Sun  Hongzhe Li 《Biometrics》2012,68(4):1197-1206
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified‐likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re‐estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.  相似文献   

19.
We test the success of Principal Components, Factor and Regression Analysis at recovering environmental signals using numerical experiments in which we control species environmental responses, the environmental conditions and the sampling scheme used for calibration. We use two general conditions, one in which sampling of a continental margin for benthic foraminiferal assemblages is done in a standard grid and the driving environmental variables are correlated to one another, and the other where sampling is done so that the environmental variables are uncorrelated. The first condition mimics many studies in the literature. We find that where the controlling environmental variables are correlated, Principal Components/Factor Analysis yield factors that reflect the common variance (correlation) of those variables. Since this common variance is largely a product of the sampling scheme, the factors extracted do not reliably present true species ecologic behavior. This behavior cannot be accurately diagnosed and faulty interpretations may lead to substantial error when using factor coefficients to reconstruct conditions in the past. When the sampling scheme is constructed so that the controlling environmental variables for the calibration data set are uncorrelated the factor patterns will reflect these variables more accurately. Species responses can be more successfully interpreted from the Principal Components/Factor Analysis structure matrices. Additionally, regression analysis can successfully extract the independent environmental signals from the biotic data set. However, matrix closure is a confounding effect in all our numerical results as it distorts species' abundances and spatial distribution in the calibration data set. Our results show clearly that a knowledge of the controlling environmental variables, and the correlations among these variables over a study area, is essential for the successful application of multivariate techniques for paleoenvironmental reconstruction.  相似文献   

20.
《Journal of Asia》2020,23(4):901-908
The sugarcane aphid, Melanaphis sacchari, has been a severe pest throughout the sorghum field in Texas, which can worse the sorghum yield economically. For this purpose of early detection, the mechanism of herbivore-induced plant volatiles (HIPVs) needs to be utilized in the detection method. In this study, the HayeSep Q adsorbent combined gas chromatography mass spectrometry (GC/MS) was tested to analyze the volatile organic compounds (VOCs) that sorghum can emit when they are in good shape as well as they are infested by the sugarcane aphids, and multivariate techniques were performed for the fast screening of the infestation. Several VOCs identified from Student’s t-test with p < 0.05 were finally chosen as variables for multivariate analysis, and both unsupervised learning of principal component analysis (PCA) and clustering analysis (CA) and supervised learning of linear discriminant analysis (LDA) were done, showing good performance on discrimination between healthy and infested sorghum.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号