首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Not until recently has much attention been given to deriving maximum likelihood methods for estimating the intercept and slope parameters from a binormal ROC curve that assesses the accuracy of a continuous diagnostic test. We propose two new methods for estimating these parameters. The first method uses the profile likelihood and a simple algorithm to produce fully efficient estimates. The second method is based on a pseudo-maximum likelihood that can easily accommodate adjusting for covariates that could affect the accuracy of the continuous test.  相似文献   

2.
Selective DNA pooling is an efficient method to identify chromosomal regions that harbor quantitative trait loci (QTL) by comparing marker allele frequencies in pooled DNA from phenotypically extreme individuals. Currently used single marker analysis methods can detect linkage of markers to a QTL but do not provide separate estimates of QTL position and effect, nor do they utilize the joint information from multiple markers. In this study, two interval mapping methods for analysis of selective DNA pooling data were developed and evaluated. One was based on least squares regression (LS-pool) and the other on approximate maximum likelihood (ML-pool). Both methods simultaneously utilize information from multiple markers and multiple families and can be applied to different family structures (half-sib, F2 cross and backcross). The results from these two interval mapping methods were compared with results from single marker analysis by simulation. The results indicate that both LS-pool and ML-pool provided greater power to detect the QTL than single marker analysis. They also provide separate estimates of QTL location and effect. With large family sizes, both LS-pool and ML-pool provided similar power and estimates of QTL location and effect as selective genotyping. With small family sizes, however, the LS-pool method resulted in severely biased estimates of QTL location for distal QTL but this bias was reduced with the ML-pool.  相似文献   

3.
Nuclear SSRs are notorious for having relatively high frequencies of null alleles, i.e. alleles that fail to amplify and are thus recessive and undetected in heterozygotes. In this paper, we compare two kinds of approaches for estimating null allele frequencies at seven nuclear microsatellite markers in three French Fagus sylvatica populations: (1) maximum likelihood methods that compare observed and expected homozygote frequencies in the population under the assumption of Hardy-Weinberg equilibrium and (2) direct null allele frequency estimates from progeny where parent genotypes are known. We show that null allele frequencies are high in F. sylvatica (7.0% on average with the population method, 5.1% with the progeny method), and that estimates are consistent between the two approaches, especially when the number of sampled maternal half-sib progeny arrays is large. With null allele frequencies ranging between 5% and 8% on average across loci, population genetic parameters such as genetic differentiation (F ST) may be mostly unbiased. However, using markers with such average prevalence of null alleles (up to 15% for some loci) can be seriously misleading in fine scale population studies and parentage analysis.  相似文献   

4.
A P Soms 《Biometrics》1985,41(3):663-668
A regression technique, based on the limiting normal distribution of the multinomial, is given for point and interval estimation of the parameters in the removal trapping method of determining animal and insect populations. Comparisons are made with maximum likelihood estimates. Two examples of estimating spider populations are given.  相似文献   

5.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

6.
通过引入区域的初始比例因子,考虑了二个区域A与B的封闭种群标记重捕模型,再利用完整的极大似然函数和多项分布函数的性质,给出了当个体在不同区域的个体捕捉率相等时的二个区域之间的转移概率与各区域的初始比例的求法,推导出在不同区域的个体捕捉率不相等但个体低转移率条件下二个区域的封闭种群的标记重捕模型的参数表达式,并用实例说明。  相似文献   

7.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

8.
To test for association between a disease and a set of linked markers, or to estimate relative risks of disease, several different methods have been developed. Many methods for family data require that individuals be genotyped at the full set of markers and that phase can be reconstructed. Individuals with missing data are excluded from the analysis. This can result in an important decrease in sample size and a loss of information. A possible solution to this problem is to use missing-data likelihood methods. We propose an alternative approach, namely the use of multiple imputation. Briefly, this method consists in estimating from the available data all possible phased genotypes and their respective posterior probabilities. These posterior probabilities are then used to generate replicate imputed data sets via a data augmentation algorithm. We performed simulations to test the efficiency of this approach for case/parent trio data and we found that the multiple imputation procedure generally gave unbiased parameter estimates with correct type 1 error and confidence interval coverage. Multiple imputation had some advantages over missing data likelihood methods with regards to ease of use and model flexibility. Multiple imputation methods represent promising tools in the search for disease susceptibility variants.  相似文献   

9.
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite.  相似文献   

10.
Wang J 《Genetical research》2001,78(3):243-257
A pseudo maximum likelihood method is proposed to estimate effective population size (Ne) using temporal changes in allele frequencies at multi-allelic loci. The computation is simplified dramatically by (1) approximating the multi-dimensional joint probabilities of all the data by the product of marginal probabilities (hence the name pseudo-likelihood), (2) exploiting the special properties of transition matrix and (3) using a hidden Markov chain algorithm. Simulations show that the pseudo-likelihood method has a similar performance but needs much less computing time and storage compared with the full likelihood method in the case of 3 alleles per locus. Due to computational developments, I was able to assess the performance of the pseudo-likelihood method against the F-statistic method over a wide range of parameters by extensive simulations. It is shown that the pseudo-likelihood method gives more accurate and precise estimates of Ne than the F-statistic method, and the performance difference is mainly due to the presence of rare alleles in the samples. The pseudo-likelihood method is also flexible and can use three or more temporal samples simultaneously to estimate satisfactorily the NeS of each period, or the growth parameters of the population. The accuracy and precision of both methods depend on the ratio of the product of sample size and the number of generations involved to Ne, and the number of independent alleles used. In an application of the pseudo-likelihood method to a large data set of an olive fly population, more precise estimates of Ne are obtained than those from the F-statistic method.  相似文献   

11.
M Hühn 《Génome》2000,43(5):853-856
Some relationships between the estimates of recombination fraction in two-point linkage analysis obtained by maximum likelihood, minimum chi-square, and general least squares are derived. These theoretical results are based on an approximation for the multinomial distribution. Applications (theoretical and experimental) with RFLP (restriction fragment length polymorphism) markers for a segregating F2 population are given. The minimum chi-square estimate is slightly larger than the maximum likelihood estimate. For applications, however, both estimates must be considered to be approximately equal. The least squares estimates are slightly different (larger or smaller) from these estimates.  相似文献   

12.
Multivariate phenotypes may be characterized collectively by a variety of low level traits, such as in the diagnosis of a disease that relies on multiple disease indicators. Such multivariate phenotypes are often used in genetic association studies. If highly heritable components of a multivariate phenotype can be identified, it can maximize the likelihood of finding genetic associations. Existing methods for phenotype refinement perform unsupervised cluster analysis on low-level traits and hence do not assess heritability. Existing heritable component analytics either cannot utilize general pedigrees or have to estimate the entire covariance matrix of low-level traits from limited samples, which leads to inaccurate estimates and is often computationally prohibitive. It is also difficult for these methods to exclude fixed effects from other covariates such as age, sex and race, in order to identify truly heritable components. We propose to search for a combination of low-level traits and directly maximize the heritability of this combined trait. A quadratic optimization problem is thus derived where the objective function is formulated by decomposing the traditional maximum likelihood method for estimating the heritability of a quantitative trait. The proposed approach can generate linearly-combined traits of high heritability that has been corrected for the fixed effects of covariates. The effectiveness of the proposed approach is demonstrated in simulations and by a case study of cocaine dependence. Our approach was computationally efficient and derived traits of higher heritability than those by other methods. Additional association analysis with the derived cocaine-use trait identified genetic markers that were replicated in an independent sample, further confirming the utility and advantage of the proposed approach.  相似文献   

13.
An estimate of the risk or prevalence ratio, adjusted for confounders, can be obtained from a log binomial model (binomial errors, log link) fitted to binary outcome data. We propose a modification of the log binomial model to obtain relative risk estimates for nominal outcomes with more than two attributes (the "log multinomial model"). Extensive data simulations were undertaken to compare the performance of the log multinomial model with that of an expanded data multinomial logistic regression method based on the approach proposed by Schouten et al. (1993) for binary data, and with that of separate fits of a Poisson regression model based on the approach proposed by Zou (2004) and Carter, Lipsitz and Tilley (2005) for binary data. Log multinomial regression resulted in "inadmissable" solutions (out-of-bounds probabilities) exceeding 50% in some data settings. Coefficient estimates by the alternative methods produced out-of-bounds probabilities for the log multinomial model in up to 27% of samples to which a log multinomial model had been successfully fitted. The log multinomial coefficient estimates generally had lesser relative bias and mean squared error than the alternative methods. The practical utility of the log multinomial regression model was demonstrated with a real data example. The log multinomial model offers a practical solution to the problem of obtaining adjusted estimates of the risk ratio in the multinomial setting, but must be used with some care and attention to detail.  相似文献   

14.
Statistical aspects of genetic mapping in autopolyploids.   总被引:8,自引:0,他引:8  
Many plant species of agriculture importance are polyploid, having more than two copies of each chromosome per cell. In this paper, we describe statistical methods for genetic map construction in autopolyploid species with particular reference to the use of molecular markers. The first step is to determine the dosage of each DNA fragment (electrophoretic band) from its segregation ratio. Fragments present in a single dose can be used to construct framework maps for individual chromosomes. Fragments present in multiple doses can often be used to link the single chromosome maps into homologous groups and provide additional ordering information. Marker phenotype probabilities were calculated for pairs of markers arranged in different configurations among the homologous chromosomes. These probabilities were used to compute a maximum likelihood estimator of the recombination fraction between pairs of markers. A likelihood ratio test for linkage of multidose markers was derived. The information provided by each configuration and power and sample size considerations are also discussed. A set of 294 RFLP markers scored on 90 plants of the species Saccharum spontaneum L. was used to illustrate the construction of an autopolyploid map. Previous studies conducted on the same data revealed that this species of sugar cane is an autooctaploid with 64 chromosomes arranged into eight homologous groups. The methodology described permitted consolidation of 54 linkage groups into ten homologous groups.  相似文献   

15.
Wall JD 《Genetics》2004,167(3):1461-1473
We introduce a new method for jointly estimating crossing-over and gene conversion rates using sequence polymorphism data. The method calculates probabilities for subsets of the data consisting of three segregating sites and then forms a composite likelihood by multiplying together the probabilities of many subsets. Simulations show that this new method performs better than previously proposed methods for estimating gene conversion rates, but that all methods require large amounts of data to provide reliable estimates. While existing methods can easily estimate an "average" gene conversion rate over many loci, they cannot reliably estimate gene conversion rates for a single region of the genome.  相似文献   

16.
The main causes of numerical chromosomal anomalies, including trisomies, arise from an error in the chromosomal segregation during the meiotic process, named a non-disjunction. One of the most used techniques to analyze chromosomal anomalies nowadays is the polymerase chain reaction (PCR), which counts the number of peaks or alleles in a polymorphic microsatellite locus. It was shown in previous works that the number of peaks has a multinomial distribution whose probabilities depend on the non-disjunction fraction F. In this work, we propose a Bayesian approach for estimating the meiosis I non-disjunction fraction F. in the absence of the parental information. Since samples of trisomic patients are, in general, small, the Bayesian approach can be a good alternative for solving this problem. We consider the sampling/importance resampling technique and the Simpson rule to extract information from the posterior distribution of F. Bayes and maximum likelihood estimators are compared through a Monte Carlo simulation, focusing on the influence of different sample sizes and prior specifications in the estimates. We apply the proposed method to estimate F. for patients with trisomy of chromosome 21 providing a sensitivity analysis for the method. The results obtained show that Bayes estimators are better in almost all situations.  相似文献   

17.
In this paper we consider the detection of individual loci controlling quantitative traits of interest (quantitative trait loci or QTLs) in the large half-sib family structure found in some species. Two simple approaches using multiple markers are proposed, one using least squares and the other maximum likelihood. These methods are intended to provide a relatively fast screening of the entire genome to pinpoint regions of interest for further investigation. They are compared with a more traditional single-marker least-squares approach. The use of multiple markers is shown to increase power and has the advantage of providing an estimate for the location of the QTL. The maximum-likelihood and the least-squares approaches using multiple markers give similar power and estimates for the QTL location, although the likelihood approach also provides estimates of the QTL effect and sire heterozygote frequency. A number of assumptions have been made in order to make the likelihood calculations feasible, however, and computationally it is still more demanding than the least-squares approach. The least-squares approach using multiple markers provides a fast method that can easily be extended to include additional effects.  相似文献   

18.
The risk difference is an intelligible measure for comparing disease incidence in two exposure or treatment groups. Despite its convenience in interpretation, it is less prevalent in epidemiological and clinical areas where regression models are required in order to adjust for confounding. One major barrier to its popularity is that standard linear binomial or Poisson regression models can provide estimated probabilities out of the range of (0,1), resulting in possible convergence issues. For estimating adjusted risk differences, we propose a general framework covering various constraint approaches based on binomial and Poisson regression models. The proposed methods span the areas of ordinary least squares, maximum likelihood estimation, and Bayesian inference. Compared to existing approaches, our methods prevent estimates and confidence intervals of predicted probabilities from falling out of the valid range. Through extensive simulation studies, we demonstrate that the proposed methods solve the issue of having estimates or confidence limits of predicted probabilities out of (0,1), while offering performance comparable to its alternative in terms of the bias, variability, and coverage rates in point and interval estimation of the risk difference. An application study is performed using data from the Prospective Registry Evaluating Myocardial Infarction: Event and Recovery (PREMIER) study.  相似文献   

19.
ABSTRACT: BACKGROUND: Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher's intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele. FINDINGS: Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model's convergence. The functions are collectively available as the MLEP R package. CONCLUSIONS: Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at http://cran.r-project.org/web/packages/MLEP/index.html.  相似文献   

20.
Pedigree data can be evaluated, and subsequently corrected, by analysis of the distribution of genetic markers, taking account of the possibility of mistyping . Using a model of pedigree error developed previously, we obtained the maximum likelihood estimates of error parameters in pedigree data from Tokelau. Posterior probabilities for the possible true relationships in each family are conditional on the putative relationships and the marker data are calculated using the parameter estimates. These probabilities are used as a basis for discriminating between pedigree error and genetic marker errors in families where inconsistencies have been observed. When applied to the Tokelau data and compared with the results of retyping inconsistent families, these statistical procedures are able to discriminate between pedigree and marker error, with approximately 90% accuracy, for families with two or more offspring. The large proportion of inconsistencies inferred to be due to marker error (61%) indicates the importance of discriminating between error sources when judging the reliability of putative relationship data. Application of our model of pedigree error has proved to be an efficient way of determining and subsequently correcting sources of error in extensive pedigree data collected in large surveys.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号