首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper presents a maximum likelihood approach to estimating the variation of substitution rate among nucleotide sites. We assume that the rate varies among sites according to an invariant+gamma distribution, which has two parameters: the gamma parameter alpha and the proportion of invariable sites theta. Theoretical treatments on three, four, and five sequences have been conducted, and computer program have been developed. It is shown that rho = (1 + theta alpha)/(1 + alpha) is a good measure for the rate heterogeneity among sites. Extensive simulations show that (1) if the proportion of invariable sites is negligible, i.e., theta = 0, the gamma parameter alpha can be satisfactorily estimated, even with three sequences; (2) if the proportion of invariable sites is not negligible, the heterogeneity rho can still be suitably estimated with four or more sequences; and (3) the distances estimated by the proposed method are almost unbiased and are robust against violation of the assumption of the invariant + gamma distribution.   相似文献   

2.
H C Thode  S J Finch  N R Mendell 《Biometrics》1988,44(4):1195-1201
We find the percentage points of the likelihood ratio test of the null hypothesis that a sample of n observations is from a normal distribution with unknown mean and variance against the alternative that the sample is from a mixture of two distinct normal distributions, each with unknown mean and unknown (but equal) variance. The mixing proportion pi is also unknown under the alternative hypothesis. For 2,500 samples of sizes n = 15, 20, 25, 40, 50, 70, 75, 80, 100, 150, 250, 500, and 1,000, we calculated the likelihood ratio statistic, and from these values estimated the percentage points of the null distributions. Our algorithm for the calculation of the maximum likelihood estimates of the unknown parameters included precautions against convergence of the maximization algorithm to a local rather than global maximum. Investigations for convergence to an asymptotic distribution indicated that convergence was very slow and that stability was not apparent for samples as large as 1,000. Comparisons of the percentage points to the commonly assumed chi-squared distribution with 2 degrees of freedom indicated that this assumption is too liberal; i.e., one's P-value is greater than that indicated by chi 2(2). We conclude then that one would need what is usually an unfeasibly large sample size (n greater than 1,000) for the use of large-sample approximations to be justified.  相似文献   

3.
4.
E Kalb  F Paltauf    A Hermetter 《Biophysical journal》1989,56(6):1245-1253
Fluorescence lifetimes of 1-palmitoyl-2-diphenylhexatrienylpro-pionyl-phosphatidylc hol ine in vesicles of palmitoyloleoyl phosphatidylcholine (POPC) (1:300, mol/mol) in the liquid crystalline state were determined by multifrequency phase fluorometry. On the basis of statistic criteria (chi 2red) the measured phase angles and demodulation factors were equally well fitted to unimodal Lorentzian, Gaussian, or uniform lifetime distributions. No improvement in chi 2red could be observed if the experimental data were fitted to bimodal Lorentzian distributions or a double exponential decay. The unimodal Lorentzian lifetime distribution was characterized by a lifetime center of 6.87 ns and a full width at half maximum of 0.57 ns. Increasing amounts of cholesterol in the phospholipid vesicles (0-50 mol% relative to POPC) led to a slight increase of the lifetime center (7.58 ns at 50 mol% sterol) and reduced significantly the distributional width (0.14 ns at 50 mol% sterol). Lifetime distributions of POPC-cholesterol mixtures containing greater than 20 mol% sterol were within the resolution limit and could not be distinguished from monoexponential decays on the basis of chi 2red. Cholesterol stabilizes and rigidifies phospholipid bilayers in the fluid state. Considering its effect on lifetime distributions of fluorescent phospholipids it may also act as a membrane homogenizer.  相似文献   

5.
The distributions of side-chain conformations in 258 crystal structures of oligopeptides have been analyzed. The sample contains 321 residues having side chains that extend beyond the C beta atom. Statistically observed preferences of side-chain dihedral angles are summarized and correlated with stereochemical and energetic constraints. The distributions are compared with observed distributions in proteins of known X-ray structures and with computed minimum-energy conformations of amino acid derivatives. The distributions are similar in all three sets of data, and they appear to be governed primarily by intraresidue interactions. In side chains with no beta-branching, the most important interactions that determine chi 1 are those between the C gamma H2 group and atoms of the neighboring peptide groups. As a result, the g- conformation (chi 1 congruent to -60 degrees) occurs most frequently for rotation around the C alpha-C beta bond in oligopeptides, followed by the t conformation (chi 1 congruent to 180 degrees), while the g+ conformation (chi 1 congruent to 60 degrees) is least favored. In residues with beta-branching, steric repulsions between the C gamma H2 or C gamma H3 groups and backbone atoms govern the distribution of chi 1. The extended (t) conformation is highly favored for rotation around the C beta-C gamma and C gamma-C delta bonds in unbranched side chains, because the t conformer has a lower energy than the g+ and g- conformers in hydrocarbon chains. This study of the observed side-chain conformations has led to a refinement of one of the energy parameters used in empirical conformational energy computations.  相似文献   

6.
7.
We here consider the null distribution of the maximum lod score (LOD-M) obtained upon maximizing over transmission model parameters (penetrance values, dominance, and allele frequency) as well as the recombination fraction. Also considered is the lod score maximized over a fixed choice of genetic model parameters and recombination-fraction values set prior to the analysis (MMLS) as proposed by Hodge et al. The objective is to fit parametric distributions to MMLS and LOD-M. Our results are based on 3,600 simulations of samples of n = 100 nuclear families ascertained for having one affected member and at least one other sibling available for linkage analysis. Each null distribution is approximately a mixture p(2)(0) + (1 - p)(2)(v). The values of MMLS appear to fit the mixture 0.20(2)(0) + 0.80chi(2)(1.6). The mixture distribution 0.13(2)(0) + 0.87chi(2)(2.8). appears to describe the null distribution of LOD-M. From these results we derive a simple method for obtaining critical values of LOD-M and MMLS.  相似文献   

8.
Strauch K 《Human heredity》2007,64(3):192-202
A MOD-score analysis, in which the parametric LOD score is maximized with respect to the trait-model parameters, can be a powerful method for the mapping of complex traits. With affected sib pairs, it has been shown before that MOD scores asymptotically follow a mixture of chi(2) distributions with 2, 1 and 0 degrees of freedom under the null hypothesis of no linkage. In that context, a MOD-score analysis yields some (albeit limited) information regarding the trait-model parameters, and there is a chance for an increased power compared to a simple LOD-score analysis. Here, it is shown that with unilineal affected relative pairs, MOD scores asymptotically follow a mixture of chi(2) distributions with 1 and 0 degrees of freedom under the null hypothesis, that is, the same distribution as followed by simple LOD scores. No information regarding the trait model can be obtained in this setting, and no power is gained when compared to a LOD-score analysis. An outlook to larger pedigrees is given. The number of degrees of freedom underlying the null distribution of MOD scores, that depends on the type of pedigrees studied, corresponds to the number of explored dimensions related to power and to the number of parameters that can jointly be estimated.  相似文献   

9.
Aims Fits of species-abundance distributions to empirical data are increasingly used to evaluate models of diversity maintenance and community structure and to infer properties of communities, such as species richness. Two distributions predicted by several models are the Poisson lognormal (PLN) and the negative binomial (NB) distribution; however, at least three different ways to parameterize the PLN have been proposed, which differ in whether unobserved species contribute to the likelihood and in whether the likelihood is conditional upon the total number of individuals in the sample. Each of these has an analogue for the NB. Here, we propose a new formulation of the PLN and NB that includes the number of unobserved species as one of the estimated parameters. We investigate the performance of parameter estimates obtained from this reformulation, as well as the existing alternatives, for drawing inferences about the shape of species abundance distributions and estimation of species richness.Methods We simulate the random sampling of a fixed number of individuals from lognormal and gamma community relative abundance distributions, using a previously developed 'individual-based' bootstrap algorithm. We use a range of sample sizes, community species richness levels and shape parameters for the species abundance distributions that span much of the realistic range for empirical data, generating 1?000 simulated data sets for each parameter combination. We then fit each of the alternative likelihoods to each of the simulated data sets, and we assess the bias, sampling variance and estimation error for each method.Important findings Parameter estimates behave reasonably well for most parameter values, exhibiting modest levels of median error. However, for the NB, median error becomes extremely large as the NB approaches either of two limiting cases. For both the NB and PLN,>90% of the variation in the error in model parameters across parameter sets is explained by three quantities that corresponded to the proportion of species not observed in the sample, the expected number of species observed in the sample and the discrepancy between the true NB or PLN distribution and a Poisson distribution with the same mean. There are relatively few systematic differences between the four alternative likelihoods. In particular, failing to condition the likelihood on the total sample sizes does not appear to systematically increase the bias in parameter estimates. Indeed, overall, the classical likelihood performs slightly better than the alternatives. However, our reparameterized likelihood, for which species richness is a fitted parameter, has important advantages over existing approaches for estimating species richness from fitted species-abundance models.  相似文献   

10.
Procedures for discriminating between competing statistical models of synaptic transmission, and for providing confidence limits on the parameters of these models, have been developed. These procedures were tested against simulated data and were used to analyze the fluctuations in synaptic currents evoked in hippocampal neurones. All models were fitted to data using the Expectation-Maximization algorithm and a maximum likelihood criterion. Competing models were evaluated using the log-likelihood ratio (Wilks statistic). When the competing models were not nested, Monte Carlo sampling of the model used as the null hypothesis (H0) provided density functions against which H0 and the alternate model (H1) were tested. The statistic for the log-likelihood ratio was determined from the fit of H0 and H1 to these probability densities. This statistic was used to determine the significance level at which H0 could be rejected for the original data. When the competing models were nested, log-likelihood ratios and the chi 2 statistic were used to determine the confidence level for rejection. Once the model that provided the best statistical fit to the data was identified, many estimates for the model parameters were calculated by resampling the original data. Bootstrap techniques were then used to obtain the confidence limits of these parameters.  相似文献   

11.
Ranajit Chakraborty 《Genetics》1984,108(3):719-731
The distribution of the number of heterozygous loci in two randomly chosen gametes or in a random diploid zygote provides information regarding the nonrandom association of alleles among different genetic loci. Two alternative statistics may be employed for detection of nonrandom association of genes of different loci when observations are made on these distributions: observed variance of the number of heterozygous loci (s2k) and a goodness-of-fit criterion (X2) to contrast the observed distribution with that expected under the hypothesis of random association of genes. It is shown, by simulation, that s2k is statistically more efficient than X2 to detect a given extent of nonrandom association. Asymptotic normality of s2k is justified, and X2 is shown to follow a chi-square (chi 2) distribution with partial loss of degrees of freedom arising because of estimation of parameters from the marginal gene frequency data. Whenever direct evaluations of linkage disequilibrium values are possible, tests based on maximum likelihood estimators of linkage disequilibria require a smaller sample size (number of zygotes or gametes) to detect a given level of nonrandom association in comparison with that required if such tests are conducted on the basis of s2k. Summarization of multilocus genotype (or haplotype) data, into the different number of heterozygous loci classes, thus, amounts to appreciable loss of information.  相似文献   

12.
J J Lee 《Biometrics》1991,47(4):1573-1580
In the calibration problem, the need to construct a confidence interval to estimate the unknown chi 0 arises when the null hypothesis of zero slope is rejected. Otherwise, the resulting confidence interval will be infinite to reflect the fact that the slope of the regression line may be zero. Under the condition of rejecting the hypothesis of zero slope, we study the properties of the conditional coverage rate of the calibration confidence interval. The conditional coverage rate (P1) is a function of the slope, distance between chi 0 and the mean of the trailing sample means, the sum of squares of chi, and n. When the true slope is close to 0 and chi 0 is away from means, P1 can go down to 0. On the other hand, as the power of testing zero slope reaches 1, with or without chi 0 close to means, P1 will tend to the desired nominal coverage rate. In summary, one should choose a reasonably small alpha in testing zero slope to avoid constructing a confidence interval for chi 0 when the true slope is 0. In addition, it is desirable to have high power in testing zero slope so that the resulting confidence interval will maintain the desired coverage rate when using the conditional approach in the calibration problem.  相似文献   

13.
We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).  相似文献   

14.
D Zelterman 《Biometrics》1992,48(3):807-818
We introduce a statistical distribution with hazard function mu(t) proportional to (psi-t)beta-1 for 0 less than or equal to t less than psi and shape parameter beta satisfying 0 less than beta less than 1. This hazard function is suggested by a theory of aging in demography. We discuss properties of this distribution and maximum likelihood estimates of its parameters. Mixture distributions are considered to account for pooling across dissimilar populations. This model is compared with the Gompertz and generalized Pareto distributions and used to estimate a finite limit on human lifespan based on the survival of a group of female centenarians.  相似文献   

15.
Chen Y  Liang KY 《Biometrika》2010,97(3):603-620
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, θ = (γ, η), H(0) : γ = γ(0), based on the pseudolikelihood L(θ, ??), where ?? is a consistent estimator of ?, the nuisance parameter. We show that the asymptotic distribution of T under H(0) is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, θ(0), or the true value of the nuisance parameter, ?(0), lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments.  相似文献   

16.
Hyracoids have been allied with either perissodactyls or tethytheres (i.e., Proboscidea + Sirenia) based on morphological data. The latter hypothesis, termed Paenungulata, is corroborated by numerous molecular studies. However, molecular studies have failed to support Tethytheria, a group that is supported by morphological data. We examined relationships among living paenungulate orders using a multigene data set that included sequences from four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes (aquaporin, A2AB, IRBP, vWF). Nineteen maximum-likelihood models were employed, including models with process partitions for base composition and substitution parameterizations. With the inclusion of partitions with a heterogeneous base composition, 18 of 19 models favored Hyracoidea + Sirenia. All 19 models favored Hyracoidea + Sirenia after excluding heterogeneous base composition partitions. Most of the support for Hyracoidea + Sirenia derived from the mitochondrial genes (bootstrap support ranged from 51 to 99%); Tethytheria, in turn, received 0 to 19% support in different analyses. Bootstrap support deriving from the nuclear genes was more evenly split among the competing hypotheses (3 to 45% for Tethytheria; 17.5 to 62% for Hyracoidea + Sirenia). Lineage-specific rate variation among both mitochondrial and nuclear genes may contribute to the different results that were obtained with mitochondrial versus nuclear data. Whether Tethytheria or a competing hypothesis is correct, short internodes on the molecular phylogenies suggest that paenungulate orders diverged from each other over a 5- to 8-million-year time window extending from the late Paleocene into the early Eocene. We also used likelihood-ratio tests to compare different models of sequence evolution. A gamma distribution of rates results in a greater improvement in likelihood scores than does an allowance for invariant sites. Twenty-one rate partitions corresponding to stems, loops, and codon positions of different genes result in higher likelihood scores than a gamma distribution of rates and/or an allowance for invariant sites. Process partitions of the data that incorporate base composition and substitution parameterizations result in significant improvements in likelihood scores in comparison to models that allow only for relative rate differences among partitions.  相似文献   

17.
The influence of peptide structure of endogenous cell-surface glycoproteins on the branching and sialylation of their asparagine-linked oligosaccharides was evaluated in a murine B cell lymphoma, AKTB-1b. This cell line simultaneously synthesizes two classes of major histocompatibility antigens that, within each class, share a high degree of amino acid sequence homology and possess potential N-linked glycosylation sites at invariant positions. [3H]Mannose-labeled oligosaccharides were released from each of 11 purified glycosylation sites by the almond peptide:N-glycosidase and analyzed by a variety of chromatographic procedures and glycosidase treatments. The data indicate: 1) a unique distribution of oligosaccharide structures is present at each glycosylation site; 2) each site-specific oligosaccharide pattern is highly reproducible, independent of the number of in vivo tumor passages. The heavy chain of the class I antigens, H-2Kk and H-2Dk contain two and three sites, respectively, in which biantennary structures predominate. However, each site varies with respect to the extent of sialylation and the proportions of more highly branched structures present. The class II antigens, I-Ak and I-Ek, each contain an alpha-chain site toward the N terminus and a single beta-chain site where the overall extent of sialylation is similar, yet the distributions of antennary structures are dramatically different for each. The alpha-chains of each class II antigen also contain a more C-terminal underglycosylated site where sialylation and branching are reduced to differing degrees depending upon the site. The influence of peptide structure on oligosaccharide microheterogeneity is manifest at two levels. First, the overall distributions of oligosaccharides at corresponding sites on structurally related glycoproteins are similar. Second, the specific "fingerprint" of sialylation and branching patterns at a particular site are reproducibly unique. These data suggest that subtle changes in peptide structure are reflected in the extent of sialylation and branching of oligosaccharides found at corresponding glycosylation sites of structurally related glycoproteins.  相似文献   

18.
The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega < 1, omega = 1, and omega > 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.  相似文献   

19.
Many attempts to explain the species-abundance distribution (SAD) assume that it has a universal functional form which applies to most assemblages. However, if such a form does exist, then it has to be invariant under changes in the area of the study plot (the addition of neighboring areas or subdivision of the original area) and changes in taxonomic composition (the addition of sister taxa or subdivision to subtaxa). We developed a theory for such an area-and-taxon invariant SAD and derived a formula for such a distribution. Both the log-normal and our area-and-taxon invariant distribution fitted data well. However, the log-normal distributions of two adjoined sub-assemblages cannot be composed into a log-normal distribution for the resulting assemblage, and the SAD composed from two log-normal distributions fits the SAD for the assemblage poorly in comparison to the area-and-taxon invariant distribution. Observed abundance patterns therefore reveal area-and-taxon invariant properties absent in log-normal distributions, suggesting that multiplicative models generating log-normal-like SADs (including the power-fraction model) cannot be universally valid, as they necessarily apply only to particular scales and taxa. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

20.
Björn Bornkamp 《Biometrics》2012,68(3):893-901
Summary This article considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back‐transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey’s prior, respectively. The utility of the proposed priors is demonstrated in the context of design and analysis of nonlinear regression modeling in clinical dose‐finding trials, through a real data example and simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号