首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Zhao JH  Sham PC 《Human heredity》2002,53(1):36-41
Linkage disequilibrium (LD) between tightly linked loci provides fine mapping information of disease-predisposing allelic variants. The most common method of LD analysis involves unrelated cases and controls. We have previously proposed model-free and permutation tests for diseases with unknown mode of inheritance that can be applied to several highly polymorphic loci. However, performing such analyses remained computer intensive. In this report we propose a speed-up of both the gene-counting procedure and the permutation procedure. We demonstrate the improved method with an analysis of schizophrenia and human leucocyte antigen markers, and an analysis of alcoholism and mitochondrial aldehyde dehydrogenase markers. Our implementation also allows the rapid calculation of permutation-based LD measures and related statistics.  相似文献   

2.
In genetic studies the haplotype structure of the regarded population is expected to carry important information. Experimental methods to derive haplotypes, however, are expensive and none of them has yet become standard methodology. On the other hand, maximum likelihood haplotype estimation from unphased individual genotypes may incur inaccuracies. We therefore investigated the relative efficiency of haplotype frequency estimation when nuclear family information is included compared to estimation from experimentally derived haplotypes. Efficiency was measured in terms of variance ratios of the estimates. The variances were derived from the binomial distribution for experimentally derived haplotypes, and from the Fisher information matrix corresponding to the general likelihood function of the haplotype frequency parameters, including family information. We subsequently compared these variance ratios to the variance ratios for the case of estimation from individual genotypes. We found that the information gained from a single child compensates missing phase information to a high degree, resulting in estimates almost as reliable as those derived from observed haplotypes. Thus, if children have already been genotyped for other reasons, it is highly recommendable to include them into the estimation. If child information is not already present, it depends on the number of loci and the haplotype diversity if it is useful to genotype a single child just to reduce phase ambiguity. In general, if the number of loci is less than or equal to three or if the number of haplotypes with a frequency >5% is less than or equal to four, haplotype estimation from individuals is quite good already and the improvement gained from a single child can not compensate the genotyping effort for it. On the other hand, under scenarios with many loci and high haplotype diversity, haplotype frequency estimation from trios can be more efficient than haplotype frequency estimation from individuals also on a per genotype base.  相似文献   

3.
Recent literature has suggested that haplotype inference through close relatives, especially from nuclear families can be an alternative strategy in determining the linkage phase. In this paper, haplotype reconstruction and estimation of haplotype frequencies via expectation maximization (EM) algorithm including nuclear families with only one parent available is proposed. Parent and his (her) child are treated as parent-child pair with one shared haplotype. This reduces the number of potential haplotype pairs for both parent and child separately, resulting in a higher accuracy of the estimation. In a series of simulations, the comparisons of PHASE, GENEHUNTER, EM-based approach for complete nuclear families and our approach are carried out. In all situations, EM-based approach for trio data is comparable but slightly worse error rate than PHASE, our approach is slightly better and much faster than PHASE for incomplete trios, the performance of GENEHUNTER is very bad in simple nuclear family settings and dramatically decreased with the number of markers being increased. On the other hand, the comparison result of different sampling designs demonstrates that sampling trios is the most efficient design to estimate haplotype frequencies in populations under same genotyping cost.  相似文献   

4.
Xu H  Wu X  Spitz MR  Shete S 《Human heredity》2004,58(2):63-68
OBJECTIVE: Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals. METHODS: We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study. RESULTS: While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study.  相似文献   

5.
6.
Despite the potential pitfalls of stratification, population-based association studies nowadays are being conducted more often than family-based association studies. However, the mechanism of genomic imprinting has lately been implicated in the etiology of genetic complex diseases and can be detected using statistics only in family-based designs. Powerful tests for association and imprinting have been proposed previously for case-parent trios and single markers. Since the power of association studies can be improved if multiple affected children and haplotypes are considered, we extended the parental asymmetry test (PAT) for imprinting to a test that is suited for both general nuclear families and haplotypes, called HAP-PAT. Significance of the HAP-PAT is determined via a Monte-Carlo simulation procedure. In addition to the HAP-PAT, we modified a haplotype-based association test, proposed by us before, in such a way that either only paternal or maternal transmissions contribute to the test statistic. The approaches were implemented in FAMHAP and we evaluated their performance under a variety of disease models. We were able to demonstrate the usefulness of our haplotype-based approaches to detect parent-of-origin effects. Furthermore, we showed that also in the presence of imprinting it is more reasonable to consider all affected children of a nuclear family, than to randomly select one affected child from each family and to conduct a trio study using the selected individuals.  相似文献   

7.
8.
9.
Approximately 90 different mutations associated with ornithine transcarbamylase (OTC) deficiency are currently known. Thus, the majority represent private mutations. However, some of the mutations seemed to be recurrent. Our laboratories identified apparent deleterious mutations in 78 consecutive families with OTC deficiency by screening all exons and exon/intron borders using single-strand conformational polymorphism (75 families) or sequencing of the entire coding sequence (3 families). Large deletions of one or more exons were found in 8% of families and approximately 10% had small deletions or insertions of 1–5 bases. Splice site mutations were found in 18% of families. Contrary to previous reports, recurrent point mutations seemed to be equally distributed among most CpG dinucleotides rather than show prevalent mutations. No single point mutation had a relative frequency of more than 6.4%. Of the 64 families with nucleotide substitutions, 24 (38%) were G to A with the next most common being C to T (16%) and A to T (11%).  相似文献   

10.
MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).  相似文献   

11.
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author.  相似文献   

12.
The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the Rh2 criteria of Stram et al. [1]) the differences between the three methods are very small and in particular that the single imputation method may be expected to work extremely well.  相似文献   

13.
Variance between and within sibships in anthropometric traits was ascertained in a sample of Mexican families in the U.S.A. (migrants) and in Mexico (sedentes), by sex. The factor of age was eliminated by standardization. The siblings intraclass coefficient of correlation for the various traits by means of the one-way variance analysis manifested differences between the sexes in various anthropometric traits. Variance between sibships was found to be significantly higher than within sibships in all the traits in each sex, and both migrant and sedente sibships. This result, also noted in other groups, would seem to reflect a general population phenomenon.  相似文献   

14.
The hypothesis of linkage between HLA and a disease susceptibility (DS) locus (or loci) for type 1 diabetes was tested. HLA segregation was random among 57 non-diabetic sibs but not among 39 diabetic sibs, suggesting that susceptibility to type 1 diabetes may be due to an HLA-linked gene(s). The data did not fit a genetic model involving either a single recessive or dominant gene. The excess of HLA-identical diabetic sibs and the reduced number who were HLA-discordant compared to expected numbers indicated that factors from both paternal and maternal haplotypes were necessary for DS. In 1 of the 3 families with a diabetic parent and more than one diabetic sib, the diabetic sibs inherited different haplotypes from the affected parent, suggesting that either of these haplotypes conferred DS. HLAB 8, B 18 and B 40 were increased in frequency among 97 unrelated type 1 diabetics compared with 238 controls, especially among those with onset age less than 10 years. This early onset group may represent a subtype of type 1 diabetes.  相似文献   

15.
Summary: In family-based genetic studies, it is often usefulto identify a subset of unrelated individuals. When such studiesare conducted in population isolates, however, most if not allindividuals are often detectably related to each other. To identifya set of maximally unrelated (or equivalently, minimally related)individuals, we have implemented simulated annealing, a general-purposealgorithm for solving difficult combinatorial optimization problems.We illustrate our method on data from a genetic study in theOld Order Amish of Lancaster County, Pennsylvania, a populationisolate derived from a modest number of founders. Given oneor more pedigrees, our program automatically and rapidly extractsa fixed number of maximally unrelated individuals. Availability: http://www.hg.med.umich.edu/labs/douglaslab/software.html(version 1.0.0) Contact: jddoug{at}umich.edu Associate Editor: Martin Bishop  相似文献   

16.
The main mutation causing Friedreich ataxia (FRDA) is the expansion of a GAA repeat localized within the intron between exon 1 and exon 2 of the gene X25. This expansion has been observed in 98% of FRDA chromosomes. To analyze frequencies of markers tightly linked to the Friedreich ataxia gene and to investigate wheter a limited number of ancestral chromosomes are shared by German FRDA families, a detailed analysis employing nine polymorphic markers was performed. We found strong linkage disequilibria and association of FRDA expansions with a few haplotypes. FRDA haplotypes differ significantly from control haplotypes. Our results confirm that GAA repeat expansions in intron 1 of the frataxin gene are limited to a few chromosomes and indicate an obvious founder effect in German patients. Based on these analyses, we estimate a minimum age of the mutation of 107 generations.  相似文献   

17.
The angiotensin I-converting enzyme (ACE) gene (17q23) is a candidate gene for essential hypertension and related diseases, but investigation of its role in human pathology is hampered by a lack of identified polymorphisms. Currently, a 287-bp insertion/deletion (I/D) RFLP in intron 16 represents the only one known. Additional polymorphisms for the ACE gene would make most families informative for linkage studies and would allow haplotypes to be assigned in association studies. To increase the information provided by the ACE gene, we used a sensitive screening technique, denaturing gradient gel electrophoresis (DGGE) blots, to identify polymorphisms and combined this with gene counting to identify haplotypes. Five independent polymorphisms, restriction fragment melting polymorphisms (RFMPs), were identified by four probes (encompassing half of the ACE cDNA) in digests produced by three restriction enzymes (DdeI, RsaI, and AluI). One RFMP has three alleles while the others have two alleles. In a sample of 67 unrelated control subjects, minor allele frequencies ranged from 0.12 to 0.49. A significant level of linkage disequilibrium was found for all pairs of markers. The four most informative RFMPs, taken in combination, define 24 potential haplotypes. Based on gene counting, 11 of the 24 are rare or nonexistent in this population, and the estimated heterozygosity of the remaining 13 haplotypes approaches 80%. Under these conditions for the ACE locus, phase-unknown genotypes could be assigned to haplotype pairs in unrelated subjects with reasonable certainty. Thus, using DGGE blot technique for identifying numerous DNA polymorphisms in a candidate locus, in combination with gene counting, one can often identify DNA haplotypes for both related and unrelated study subjects at a candidate locus. These markers in the ACE gene should be useful for clinical and epidemiologic studies of the role of ACE in human disease.  相似文献   

18.
Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.  相似文献   

19.

Background

While the possible sources underlying the so-called ‘missing heritability’ evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits.

Results

Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed.

Conclusions

We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-13) contains supplementary material, which is available to authorized users.  相似文献   

20.
Haplotype analyses have become increasingly common in genetic studies of human disease because of their ability to identify unique chromosomal segments likely to harbor disease-predisposing genes. The study of haplotypes is also used to investigate many population processes, such as migration and immigration rates, linkage-disequilibrium strength, and the relatedness of populations. Unfortunately, many haplotype-analysis methods require phase information that can be difficult to obtain from samples of nonhaploid species. There are, however, strategies for estimating haplotype frequencies from unphased diploid genotype data collected on a sample of individuals that make use of the expectation-maximization (EM) algorithm to overcome the missing phase information. The accuracy of such strategies, compared with other phase-determination methods, must be assessed before their use can be advocated. In this study, we consider and explore sources of error between EM-derived haplotype frequency estimates and their population parameters, noting that much of this error is due to sampling error, which is inherent in all studies, even when phase can be determined. In light of this, we focus on the additional error between haplotype frequencies within a sample data set and EM-derived haplotype frequency estimates incurred by the estimation procedure. We assess the accuracy of haplotype frequency estimation as a function of a number of factors, including sample size, number of loci studied, allele frequencies, and locus-specific allelic departures from Hardy-Weinberg and linkage equilibrium. We point out the relative impacts of sampling error and estimation error, calling attention to the pronounced accuracy of EM estimates once sampling error has been accounted for. We also suggest that many factors that may influence accuracy can be assessed empirically within a data set-a fact that can be used to create "diagnostics" that a user can turn to for assessing potential inaccuracies in estimation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号