首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new family of distributions for circular random variables is proposed. It is based on nonnegative trigonometric sums and can be used to model data sets which present skewness and/or multimodality. In this family of distributions, the trigonometric moments are easily expressed in terms of the parameters of the distribution. The proposed family is applied to two data sets, one related with the directions taken by ants and the other with the directions taken by turtles, to compare their goodness of fit versus common distributions used in the literature.  相似文献   

2.
Signal, noise, and reliability in molecular phylogenetic analyses.   总被引:38,自引:0,他引:38  
DNA sequences and other molecular data compared among organisms may contain phylogenetic signal, or they may be randomized with respect to phylogenetic history. Some method is needed to distinguish phylogenetic signal from random noise to avoid analysis of data that have been randomized with respect to the historical relationships of the taxa being compared. We analyzed 8,000 random data matrices consisting of 10-500 binary or four-state characters and 5-25 taxa to study several options for detecting signal in systematic data bases. Analysis of random data often yields a single most-parsimonious tree, especially if the number of characters examined is large and the number of taxa examined is small (both often true in molecular studies). The most-parsimonious tree inferred from random data may also be considerably shorter than the second-best alternative. The distribution of tree lengths of all tree topologies (or a random sample thereof) provides a sensitive measure of phylogenetic signal: data matrices with phylogenetic signal produce tree-length distributions that are strongly skewed to the left, whereas those composed of random noise are closer to symmetrical. In simulations of phylogeny with varying rates of mutation (up to levels that produce random variation among taxa), the skewness of tree-length distributions is closely related to the success of parsimony in finding the true phylogeny. Tables of critical values of a skewness test statistic, g1, are provided for binary and four-state characters for 10-500 characters and 5-25 taxa. These tables can be used in a rapid and efficient test for significant structure in data matrices for phylogenetic analysis.  相似文献   

3.
Chicken immunoglobulin light chain (IgL) gene rearrangement has been characterized. Rearrangement of the single variable (VL) segment with the single joining (JL) segment within the chicken IgL locus results in the deletion of the DNA between VL and JL from the genome. This deletion is accomplished by a molecular mechanism in which a precise joining of the IgL recombination signal sequences leads to the formation of a circular episomal element. The circular episome is an unstable genetic element that fails to be propagated during B cell development. Evidence was obtained that the formation of the circular episome is accompanied by the addition of a single nonrandom base to both the VL and JL coding segments. The subsequent joining of the VL and JL segments appears to occur at random, as we observed at least 25 unique V-J junction sequences, 11 of which are out-of-frame. A novel recombination mechanism that accounts for the observed features of chicken IgL gene rearrangement is discussed.  相似文献   

4.
The basic and simplest system that one can consider in ecology is a group of individuals of equal age and representing one species, that is, a cohort. This paper is an attempt to show that analysis of such a system may be of great importance to understanding basic ecological problems, such as, intraspecific competition and the dynamics of a single population. It is easy to observe that in even-aged populations individuals differ in weights. A close look can show that weight distributions in even-aged populations may have different skewness. Most common are distributions with coefficients of skewness greater than zero. Sometimes weight distributions are symmetrical or with skewness coefficients less than zero. In a cohort of growing individuals the coefficient of skewness changes with time: most often starting from zero (symmetrical distribution), it increases in time; sometimes after an initial increase it can decrease in the final stage of growth, which is related to an increased mortality of individuals. The rate of change in skewness, and the skewness itself depend on the density of individuals in a cohort and on food conditions. They are greater at higher densities and increase with deteriorating food conditions. Weight distributions are symmetrical at low densities and optimal food conditions. The differences in individual weights measured by variance of weight distributions or coefficient of variation follow the same pattern, but observed changes with time, density and food conditions are not so clear. These conclusions rest upon the review of numerous papers concerning both plants and animals, which is presented in this paper. In the past, the properties of weight distributions in even-aged populations were explained not by interactions between individuals, but rather as a natural outcome of the growth process of non-interacting individuals. The exponential equation of growth, with relative growth rate having a normal distribution in populations, was used to support this hypothesis. Obtained weight distributions were of positive skewness; however, this model, which in fact is able to describe the growth process only in its initial stage, cannot explain the changes of skewness of weight distributions with density and food conditions. A model has been developed which includes competitive interactions among members of even-aged populations to explain observed properties of weight distributions in them. The basic assumption is that intraspecific competition leads to uneven partitioning of resources, which are the object of competition. Functions describing resource partitioning among individuals are included into the model.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

5.
Modeling functional data with spatially heterogeneous shape characteristics   总被引:1,自引:0,他引:1  
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).  相似文献   

6.
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.  相似文献   

7.
Abstract— It is common practice to attempt to find the minimum length tree (also known as the Wagner tree) for a given data matrix on a group of OTUs (taxa). However, little study has been made of the pattern of frequency distributions when the lengths of all possible networks (unrooted trees) are taken into consideration. A published real data matrix with eight OTUs was compared with randomly generated data, when the former showed a much larger variance and very marked skewness. A number of published data matrices with a larger number of OTUs were studied by random selection of 10240 out of the possible trees: these were compared with 32 randomly generated data sets with 13 OTUs, using the same program. An algorithm has been found for calculation of the expected mean, variance and skewness for random binary data with up to 13 OTUs, based on the number of characters representing each type of partition of the OTUs. The calculation requires listing of the possible topologies and their relative weighting, which are tabulated.  相似文献   

8.
Subterranean termites excavate tunnels in a search pattern to encounter food in soil. To investigate the effect of food size, food distribution and the branch length of tunnels on food encounter rate we used a lattice gas model to simulate tunnels of the Formosan subterranean termite, Coptotermes formosanus Shiraki. The model made use of minimized local rules derived from empirical data to simulate termite tunnel patterns in featureless soil. Food distributions with three types (uniform, random, and clumped) were defined by using an I-index proposed by Zimmer and Johnson (1985). The food encounter rate was higher in a clumped than in non-clumped (uniform and random) distribution of food particles. When food particle size was varied in random distributions of food particles a maximum encounter rate was found, with particles of larger or smaller size being encountered less frequently. We also discussed the relationship between the branch tunnel length and the tunnel search pattern in minimizing the redundancy of overlapping branches.  相似文献   

9.
The distribution of anthropometric measurements related to fatness levels is examined to determine if skewness alone accounts for the nonnormality of such measures. A mixture of two normal distributions or a single skewed distribution fit the data significantly better than a single normal in all cases. For maximum hip width, knee diameter, and weight, two skewed distributions give a better fit than one skewed distribution, rejecting the null hypothesis of a single distribution even when skewness is considered. There is evidence for three skewed component distributions for biceps skinfold. Abdomen circumference, upper arm circumference, triceps skinfold, and calf skinfold are best approximated by a one component log-normal distribution. Children and parents show slightly different patterns in skewness and kurtosis when considered separately, but differences between them do not account for the commingling found in the combined distributions.  相似文献   

10.
A model for children's blood lead concentrations as a function of environmental lead exposures was developed by combining two nationally representative sources of data that characterize the marginal distributions of blood lead and environmental lead with a third regional dataset that contains joint measures of blood lead and environmental lead. The complicating factor addressed in this article was the fact that methods for assessing environmental lead were different in the national and regional datasets. Relying on an assumption of transportability (that although the marginal distributions of blood lead and environmental lead may be different between the regional dataset and the nation as a whole, the joint relationship between blood lead and environmental lead is the same), the model makes use of a latent variable approach to estimate the joint distribution of blood lead and environmental lead nationwide.  相似文献   

11.
The distribution of a random variable is determined by the probability density functions (PDF) of all other random variables with which the variable in question is jointly distributed. If the PDF of the random variable of interest is normal, or skewed normal, then the distributions with which it is jointly distributed determine its mean and standard deviation. In the case described here (where hemolysis time of the red blood cell is a function of the permeability coefficient and geometric variables of the cell) the mean and standard deviation of the permeability coefficient and the known distributions of the geometric variables on which the hemolysis time depends determine a predicted distribution of hemolysis time. An observed distribution of the hemolysis time is obtained spectrophotometrically. By choosing the mean and standard deviation of the permeability coefficient so that the predicted PDF of the hemolysis time matches the observed PDF best by least-squares criterion, the complete distribution of the permeability coefficient is determined.  相似文献   

12.
Generalising the ANOVA method of estimating variance components in mixed linear models a simple procedure is presented to estimate skewness and kurtosis of the distributions of the random effects of the model. For the model II of a one-way classification this procedure is demonstrated explicitly.  相似文献   

13.
In this paper, we consider the inherent association between mean and covariance in the joint mean–covariance modeling and propose a joint mean–covariance random effect model based on the modified Cholesky decomposition for longitudinal data. Meanwhile, we apply M-H algorithm to simulate the posterior distributions of model parameters. Besides, a computationally efficient Monte Carlo expectation maximization (MCEM) algorithm is developed for carrying out maximum likelihood estimation. Simulation studies show that the model taking into account the inherent association between mean and covariance has smaller standard deviations of the estimators of parameters, which makes the statistical inferences much more reliable. In the real data analysis, the estimation of parameters in the mean and covariance structure is highly efficient.  相似文献   

14.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

15.
Analysis and optimization of recombinant DNA joining reactions   总被引:6,自引:0,他引:6  
The statistical segment length of duplex DNA was determined in phage T4 ligase (poly(deoxyribonucleotide): poly(deoxyribonucleotide) ligase (AMP forming), EC 6.5.1.1) buffer (50 mM-Tris . HCl (pH 7.8), 20 mM-dithiothreitol, 10 mM-MgCl2, 1 mM-ATP) at 12 degrees C to be 1030(+/- 116) A. This result was obtained by electron microscopic examination of the molecular distributions generated by T4 ligase-mediated joining of EcoRI-cleaved pBR322 DNA. This value of the statistical segment length was utilized in an extension of the Jacobson-Stockmayer theory on the probability of intramolecular cyclization in order to optimize DNA joining reactions that are of great utility in recombinant DNA experiments. Five cloning systems were analyzed: circular plasmid vectors that had been linearized with one or two restriction endonucleases, circular plasmids that had been tailed with deoxyhomopolymers before joining, lambda-type cloning vectors and cosmids. The results are tabulated for convenient use in molecular cloning experiments.  相似文献   

16.
We describe a nonparametric Bayesian approach for estimating the three-way ROC surface based on mixtures of finite Polya trees (MFPT) priors. Mixtures of finite Polya trees are robust models that can handle nonstandard features in the data. We address the difficulties in modeling continuous diagnostic data with skewness, multimodality, or other nonstandard features, and how parametric approaches can lead to misleading results in such cases. Robust, data-driven inference for the ROC surface and for the volume under the ROC surface is obtained. A simulation study is performed to assess the performance of the proposed method. Methods are applied to data from a magnetic resonance spectroscopy study on human immunodeficiency virus patients.  相似文献   

17.
Li Y  Lin X 《Biometrics》2003,59(1):25-35
In the analysis of clustered categorical data, it is of common interest to test for the correlation within clusters, and the heterogeneity across different clusters. We address this problem by proposing a class of score tests for the null hypothesis that the variance components are zero in random effects models, for clustered nominal and ordinal categorical responses. We extend the results to accommodate clustered censored discrete time-to-event data. We next consider such tests in the situation where covariates are measured with errors. We propose using the SIMEX method to construct the score tests for the null hypothesis that the variance components are zero. Key advantages of the proposed score tests are that they can be easily implemented by fitting standard polytomous regression models and discrete failure time models, and that they are robust in the sense that no assumptions need to be made regarding the distributions of the random effects and the unobserved covariates. The asymptotic properties of the proposed tests are studied. We illustrate these tests by analyzing two data sets and evaluate their performance with simulations.  相似文献   

18.
We describe a new pathway for multivariate analysis of data consisting of counts of species abundances that includes two key components: copulas, to provide a flexible joint model of individual species, and dissimilarity‐based methods, to integrate information across species and provide a holistic view of the community. Individual species are characterized using suitable (marginal) statistical distributions, with the mean, the degree of over‐dispersion, and/or zero‐inflation being allowed to vary among a priori groups of sampling units. Associations among species are then modeled using copulas, which allow any pair of disparate types of variables to be coupled through their cumulative distribution function, while maintaining entirely the separate individual marginal distributions appropriate for each species. A Gaussian copula smoothly captures changes in an index of association that excludes joint absences in the space of the original species variables. A permutation‐based filter with exact family‐wise error can optionally be used a priori to reduce the dimensionality of the copula estimation problem. We describe in detail a Monte Carlo expectation maximization algorithm for efficient estimation of the copula correlation matrix with discrete marginal distributions (counts). The resulting fully parameterized copula models can be used to simulate realistic ecological community data under fully specified null or alternative hypotheses. Distributions of community centroids derived from simulated data can then be visualized in ordinations of ecologically meaningful dissimilarity spaces. Multinomial mixtures of data drawn from copula models also yield smooth power curves in dissimilarity‐based settings. Our proposed analysis pathway provides new opportunities to combine model‐based approaches with dissimilarity‐based methods to enhance understanding of ecological systems. We demonstrate implementation of the pathway through an ecological example, where associations among fish species were found to increase after the establishment of a marine reserve.  相似文献   

19.
A model is presented for intramolecular recombination of herpesvirus DNA. It is proposed that the terminal repeat sequences of the viral DNA contain insertion sequences which may integrate with homologous repeat sequences between the long (L) and short (S) components. In class 2 herpes-virus DNA (as defined by Honess &; Watson, 1977) in which the repeat sequences flank the S component only, circular-linear DNA molecules can be formed as an intermediate step. Reorientation of the S component leads to the formation of two DNA isomers. In class 3 herpesvirus DNA in which repeat sequences flank both the L and S components, either circular-linear or 8-shaped DNA molecules are proposed as intermediates leading to the formation of four DNA isomers. Fragmentation of the S component could lead to the formation of small circular DNA molecules.  相似文献   

20.
The RecA and SSB proteins will catalyze the joining of two DNA molecules containing homologous sequences but lacking homologous ends in a reaction termed paranemic joining. The absence of homologous ends can be achieved by (1) pairing two circular DNAs or (2) using linear DNA(s) with ends lacking homology to the pairing partner. Here we have used electron microscopy (EM) to examine such pairings. Circular M13 single-stranded (ss) DNA enveloped by RecA protein into a presynaptic filament was paired with linear M13mp7 double-stranded (ds) DNA containing non-M13 sequences at its ends. Joint complexes were frequently seen in which the dsDNA was joined with the presynaptic filament over several kilobase (10(3) bases) lengths of the dsDNA. In this region, the presynaptic filament appeared disorganized as contrasted to the customary helical structure of the filament containing only a single strand of DNA. The same ultrastructure, but with greater detail, was observed when the samples were prepared for EM without fixation using a new method of fast-freezing and freeze-drying. EM immunogold staining demonstrated the presence of SSB protein in the disorganized region containing all three strands, but not in the regular helically arranged region. Psoralen photo-crosslinking of the DNA in the joint complexes revealed that the three DNA strands were in close proximity only over a single short (200 to 300 base-pairs) region. The joining of nicked circular M13 dsDNA and presynaptic filaments containing circular M13 ssDNA resulted in the intertwining of the dsDNA about the circular presynaptic filament. The joints produced in this case were short, as was the single region of psoralen photo-crosslinking of the three DNA strands. A model of how these long three-stranded joints form is presented involving the movement of a short "true" paranemic joint along the presynaptic filament.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号