首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Zhao JH  Sham PC 《Human heredity》2002,53(1):36-41
Linkage disequilibrium (LD) between tightly linked loci provides fine mapping information of disease-predisposing allelic variants. The most common method of LD analysis involves unrelated cases and controls. We have previously proposed model-free and permutation tests for diseases with unknown mode of inheritance that can be applied to several highly polymorphic loci. However, performing such analyses remained computer intensive. In this report we propose a speed-up of both the gene-counting procedure and the permutation procedure. We demonstrate the improved method with an analysis of schizophrenia and human leucocyte antigen markers, and an analysis of alcoholism and mitochondrial aldehyde dehydrogenase markers. Our implementation also allows the rapid calculation of permutation-based LD measures and related statistics.  相似文献   

The problem of estimating haplotype frequencies from unphased single nucleotide polymorphism (SNP) genotype data in sibships with and without parents is considered. We focus on the Fisher information of the haplotype frequencies of the parents in order to correctly deal with the dependence of haplotypes within sibships. We compare these Fisher information matrices with those obtained for unrelated individuals and study the relative efficiency of sibships with and without parents compared to unrelated individuals in estimating haplotype frequencies. Crudely summarizing, the second sib contributes half the information of the first, except for rare haplotypes, when the second sib counts almost as one. We argue that the relative efficiencies can also be used to correct for dependence in the calculation of standard errors after initially ignoring the dependence in the estimation phase.  相似文献   

Small area estimation methods typically combine direct estimatesfrom a survey with predictions from a model in order to obtainestimates of population quantities with reduced mean squarederror. When the auxiliary information used in the model is measuredwith error, using a small area estimator such as the Fay–Herriotestimator while ignoring measurement error may be worse thansimply using the direct estimator. We propose a new small areaestimator that accounts for sampling variability in the auxiliaryinformation, and derive its properties, in particular showingthat it is approximately unbiased. The estimator is appliedto predict quantities measured in the U.S. National Health andNutrition Examination Survey, with auxiliary information fromthe U.S. National Health Interview Survey.  相似文献   

MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).  相似文献   

Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families can be an alternative strategy in determining the linkage phase. In this paper, haplotype reconstruction and estimation of haplotype frequencies via expectation maximization (EM) algorithm including nuclear families with only one parent available is proposed. Parent and his (her) child are treated as parent-child pair with one shared haplotype. This reduces the number of potential haplotype pairs for both parent and child separately, resulting in a higher accuracy of the estimation. In a series of simulations, the comparisons of PHASE, GENEHUNTER, EM-based approach for complete nuclear families and our approach are carried out. In all situations, EM-based approach for trio data is comparable but slightly worse error rate than PHASE, our approach is slightly better and much faster than PHASE for incomplete trios, the performance of GENEHUNTER is very bad in simple nuclear family settings and dramatically decreased with the number of markers being increased. On the other hand, the comparison result of different sampling designs demonstrates that sampling trios is the most efficient design to estimate haplotype frequencies in populations under same genotyping cost.  相似文献   

Group testing is frequently used to reduce the costs of screening a large number of individuals for infectious diseases or other binary characteristics in small prevalence situations. In many applications, the goals include both identifying individuals as positive or negative and estimating the probability of positivity. The identification aspect leads to additional tests being performed, known as “retests”, beyond those performed for initial groups of individuals. In this paper, we investigate how regression models can be fit to estimate the probability of positivity while also incorporating the extra information from these retests. We present simulation evidence showing that significant gains in efficiency occur by incorporating retesting information, and we further examine which testing protocols are the most efficient to use. Our investigations also demonstrate that some group testing protocols can actually lead to more efficient estimates than individual testing when diagnostic tests are imperfect. The proposed methods are applied retrospectively to chlamydia screening data from the Infertility Prevention Project. We demonstrate that significant cost savings could occur through the use of particular group testing protocols.  相似文献   

A H Racine-Poon  D G Hoel 《Biometrics》1984,40(4):1151-1158
A nonparametric estimator for the survival function, accommodating censored survival times and uncertainty in the assignment of cause of death, is proposed. For example, in a carcinogenicity experiment the data on each animal may consist of an observed age-at-death and some indication of the probability that the tumor type under study caused death. An estimator of the net survival function, for time-to-death due to the cause of interest, is developed. Under certain assumptions, the proposed estimator is consistent and asymptotically normally distributed. Monte Carlo simulations were used to compare this estimator with the Kaplan-Meier estimator. Forcing the cause of death to be specified with certainty, as required by the Kaplan-Meier estimator, may result in substantial biases.  相似文献   

Haplotype analyses have become increasingly common in genetic studies of human disease because of their ability to identify unique chromosomal segments likely to harbor disease-predisposing genes. The study of haplotypes is also used to investigate many population processes, such as migration and immigration rates, linkage-disequilibrium strength, and the relatedness of populations. Unfortunately, many haplotype-analysis methods require phase information that can be difficult to obtain from samples of nonhaploid species. There are, however, strategies for estimating haplotype frequencies from unphased diploid genotype data collected on a sample of individuals that make use of the expectation-maximization (EM) algorithm to overcome the missing phase information. The accuracy of such strategies, compared with other phase-determination methods, must be assessed before their use can be advocated. In this study, we consider and explore sources of error between EM-derived haplotype frequency estimates and their population parameters, noting that much of this error is due to sampling error, which is inherent in all studies, even when phase can be determined. In light of this, we focus on the additional error between haplotype frequencies within a sample data set and EM-derived haplotype frequency estimates incurred by the estimation procedure. We assess the accuracy of haplotype frequency estimation as a function of a number of factors, including sample size, number of loci studied, allele frequencies, and locus-specific allelic departures from Hardy-Weinberg and linkage equilibrium. We point out the relative impacts of sampling error and estimation error, calling attention to the pronounced accuracy of EM estimates once sampling error has been accounted for. We also suggest that many factors that may influence accuracy can be assessed empirically within a data set-a fact that can be used to create "diagnostics" that a user can turn to for assessing potential inaccuracies in estimation.  相似文献   

Summary Procedures for ranking candidates for selection and for estimating genetic and environmental parameters when variances are heterogeneous are discussed. The best linear unbiased predictor (BLUP) accounts automatically for heterogeneous variance provided that the covariance structure is known and that the assumptions of the model hold. Under multivariate normality BLUP allowing for heterogeneous variance maximizes expected genetic progress. Examples of application of BLUP to selection when residual or genetic variances are heterogeneous are given. Restricted maximum likelihood estimation of heterogeneous variances and covariances via the expectation-maximization algorithm is presented.  相似文献   



The spread of infectious disease is determined by biological factors, e.g. the duration of the infectious period, and social factors, e.g. the arrangement of potentially contagious contacts. Repetitiveness and clustering of contacts are known to be relevant factors influencing the transmission of droplet or contact transmitted diseases. However, we do not yet completely know under what conditions repetitiveness and clustering should be included for realistically modelling disease spread.  相似文献   

The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

Estimating the mutation rate, or equivalently effective population size, is a common task in population genetics. If recombination is low or high, optimal linear estimation methods are known and well understood. For intermediate recombination rates, the calculation of optimal estimators is more challenging. As an alternative to model-based estimation, neural networks and other machine learning tools could help to develop good estimators in these involved scenarios. However, if no benchmark is available it is difficult to assess how well suited these tools are for different applications in population genetics.Here we investigate feedforward neural networks for the estimation of the mutation rate based on the site frequency spectrum and compare their performance with model-based estimators. For this we use the model-based estimators introduced by Fu, Futschik et al., and Watterson that minimize the variance or mean squared error for no and free recombination. We find that neural networks reproduce these estimators if provided with the appropriate features and training sets. Remarkably, using the model-based estimators to adjust the weights of the training data, only one hidden layer is necessary to obtain a single estimator that performs almost as well as model-based estimators for low and high recombination rates, and at the same time provides a superior estimation method for intermediate recombination rates. We apply the method to simulated data based on the human chromosome 2 recombination map, highlighting its robustness in a realistic setting where local recombination rates vary and/or are unknown.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号