首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Variation in gene expression may give rise to a significant fraction of inter-individual phenotypic variation. Studies searching for the underlying genetic controls for such variation have been conducted in model organisms and humans in recent years. In our previous effort of assessing conserved underlying haplotype patterns across ethnic populations, we constructed common haplotypes using SNPs having conserved linkage disequilibrium (LD) across ethnic populations. These common haplotypes cluster into a simple evolutionary structure based on their frequencies, defining only up to three conserved clusters termed 'haplotype frameworks'. One intriguing preliminary finding was that a significant portion of reported variants strongly associated with cis-regulation tags these globally conserved haplotype frameworks. Here we expand the investigation by collecting genes showing stringently determined cis-association between genotypes and expression phenotypes from major studies. We conducted phylogenetic analysis of current major haplotypes along with the corresponding haplotypes derived from chimpanzee reference sequences. Our analysis reveals that, for the vast majority of such cis-regulatory genes, the tagging SNPs showing the strongest association also tag the haplotype lineages directly separated from ancestry, inferred from either chimpanzee reference sequences or the allele frequency-derived haplotype frameworks, suggesting that the differentially expressed phenotypes were evolved relatively early in human history. Such evolutionary signatures provide keys for a more effective identification of globally-conserved candidate regulatory haplotypes across human genes in future epidemiologic and pharmacogenetic studies.  相似文献   

2.
The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.  相似文献   

3.
Analysis of haplotypes based on multiple single-nucleotide polymorphisms (SNP) is becoming common for both candidate gene and fine-mapping studies. Before embarking on studies of haplotypes from genetically distinct populations, however, it is important to consider variation both in linkage disequilibrium (LD) and in haplotype frequencies within and across populations, as both vary. Such diversity will influence the choice of "tagging" SNPs for candidate gene or whole-genome association studies because some markers will not be polymorphic in all samples and some haplotypes will be poorly represented or completely absent. Here we analyze 11 genes, originally chosen as candidate genes for oral clefts, where multiple markers were genotyped on individuals from four populations. Estimated haplotype frequencies, measures of pairwise LD, and genetic diversity were computed for 135 European-Americans, 57 Chinese-Singaporeans, 45 Malay-Singaporeans, and 46 Indian-Singaporeans. Patterns of pairwise LD were compared across these four populations and haplotype frequencies were used to assess genetic variation. Although these populations are fairly similar in allele frequencies and overall patterns of LD, both haplotype frequencies and genetic diversity varied significantly across populations. Such haplotype diversity has implications for designing studies of association involving samples from genetically distinct populations.  相似文献   

4.
Multilocus analysis of single-nucleotide-polymorphism (SNP) haplotypes may provide evidence of association with disease, even when the individual loci themselves do not. Haplotype-based methods are expected to outperform single-SNP analyses because (i) common genetic variation can be structured into haplotypes within blocks of strong linkage disequilibrium and (ii) the functional properties of a protein are determined by the linear sequence of amino acids corresponding to DNA variation on a haplotype. Here, I propose a flexible Bayesian framework for modeling haplotype association with disease in population-based studies of candidate genes or small candidate regions. I employ a Bayesian partition model to describe the correlation between marker-SNP haplotypes and causal variants at the underlying functional polymorphism(s). Under this model, haplotypes are clustered according to their similarity, in terms of marker-SNP allele matches, which is used as a proxy for recent shared ancestry. Haplotypes within a cluster are then assigned the same probability of carrying a causal variant at the functional polymorphism(s). In this way, I can account for the dominance effect of causal variants, here corresponding to any deviation from a multiplicative contribution to disease risk. The results of a detailed simulation study demonstrate that there is minimal cost associated with modeling these dominance effects, with substantial gains in power over haplotype-based methods that do not incorporate clustering and that assume a multiplicative model of disease risks.  相似文献   

5.

Background

Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more.

Results

We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes.

Conclusions

Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
  相似文献   

6.
Guo YL  Zhao X  Lanz C  Weigel D 《Plant physiology》2011,157(2):937-946
The S locus, a single polymorphic locus, is responsible for self-incompatibility (SI) in the Brassicaceae family and many related plant families. Despite its importance, our knowledge of S-locus evolution is largely restricted to the causal genes encoding the S-locus receptor kinase (SRK) receptor and S-locus cysteine-rich protein (SCR) ligand of the SI system. Here, we present high-quality sequences of the genomic region of six S-locus haplotypes: Arabidopsis (Arabidopsis thaliana; one haplotype), Arabidopsis lyrata (four haplotypes), and Capsella rubella (one haplotype). We compared these with reference S-locus haplotypes of the self-compatible Arabidopsis and its SI congener A. lyrata. We subsequently reconstructed the likely genomic organization of the S locus in the most recent common ancestor of Arabidopsis and Capsella. As previously reported, the two SI-determining genes, SCR and SRK, showed a pattern of coevolution. In addition, consistent with previous studies, we found that duplication, gene conversion, and positive selection have been important factors in the evolution of these two genes and appear to contribute to the generation of new recognition specificities. Intriguingly, the inactive pseudo-S-locus haplotype in the self-compatible species C. rubella is likely to be an old S-locus haplotype that only very recently became fixed when C. rubella split off from its SI ancestor, Capsella grandiflora.  相似文献   

7.
High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1∼22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework. Supported by the National Natural Science Foundation of China (Grant Nos. 30570424, 60601010 and 30600367), the National High-Tech Research and Development Program of China, (Grant No.2007AA02Z329), the Key Science and Technology Program of Heilongjiang Province(Grant No.GB03C602-4), Natural Science Foundation of Heilongjiang Province (Grant No. F2008-02), Youth Science Foundation of Harbin Medical University (Grant No. 060045) and Science Foundation of Heilongjiang Province Education Department (Grant Nos. 11531113 and 1152hq28).  相似文献   

8.
The immune responses of natural killer cells are regulated, in part, by killer cell immunoglobulin-like receptors (KIR). The 16 closely-related genes in the KIR gene system have been diversified by gene duplication and unequal crossing over, thereby generating haplotypes with variation in gene copy number. Allelic variation also contributes to diversity within the complex. In this study, we estimated allele-level haplotype frequencies and pairwise linkage disequilibrium statistics for 14 KIR loci. The typing utilized multiple methodologies by four laboratories to provide at least 2x coverage for each allele. The computational methods generated maximum-likelihood estimates of allele-level haplotypes. Our results indicate the most extensive allele diversity was observed for the KIR framework genes and for the genes localized to the telomeric region of the KIR A haplotype. Particular alleles of the stimulatory loci appear to be nearly fixed on specific, common haplotypes while many of the less frequent alleles of the inhibitory loci appeared on multiple haplotypes, some with common haplotype structures. Haplotype structures cA01 and/or tA01 predominate in this cohort, as has been observed in most populations worldwide. Linkage disequilibrium is high within the centromeric and telomeric haplotype regions but not between them and is particularly strong between centromeric gene pairs KIR2DL5KIR2DS3S5 and KIR2DS3S5KIR2DL1, and telomeric KIR3DL1KIR2DS4. Although 93% of the individuals have unique pairs of full-length allelic haplotypes, large genomic blocks sharing specific sets of alleles are seen in the most frequent haplotypes. These high-resolution, high-quality haplotypes extend our basic knowledge of the KIR gene system and may be used to support clinical studies beyond single gene analysis.  相似文献   

9.
Genetic variation in the human population may lead to functional variants of genes that contribute to risk for common chronic diseases such as cancer. In an effort to detect such possible predisposing variants, we constructed haplotypes for a candidate gene and tested their efficacy in association studies. We developed haplotypes consisting of 14 biallelic neutral-sequence variants that span 142 kb of the ATM locus. ATM is the gene responsible for the autosomal recessive disease ataxia-telangiectasia (AT). These ATM noncoding single-nucleotide polymorphisms (SNPs) were genotyped in nine CEPH families (89 individuals) and in 260 DNA samples from four different ethnic origins. Analysis of these data with an expectation-maximization algorithm revealed 22 haplotypes at this locus, with three major haplotypes having frequencies > or = .10. Tests for recombination and linkage disequilibrium (LD) show reduced recombination and extensive LD at the ATM locus, in all four ethnic groups studied. The most striking example was found in the study population of European ancestry, in which no evidence for recombination could be discerned. The potential of ATM haplotypes for detection of genetic variants through association studies was tested by analysis of 84 individuals carrying one of three ATM coding SNPs. Each coding SNP was detected by association with an ATM haplotype. We demonstrate that association studies with haplotypes for candidate genes have significant potential for the detection of genetic backgrounds that contribute to disease.  相似文献   

10.
Haplotype reconstruction from genotype data using Imperfect Phylogeny   总被引:13,自引:0,他引:13  
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/  相似文献   

11.
RFLP haplotypes at the alpha-globin gene complex have been examined in 190 individuals from the Niokolo Mandenka population of Senegal: haplotypes were assigned unambiguously for 210 chromosomes. The Mandenka share with other African populations a sample size-independent haplotype diversity that is much greater than that in any non-African population: the number of haplotypes observed in the Mandenka is typically twice that seen in the non-African populations sampled to date. Of these haplotypes, 17.3% had not been observed in any previous surveys, and a further 19.1% have previously been reported only in African populations. The haplotype distribution shows clear differences between African and non-African peoples, but this is on the basis of population-specific haplotypes combined with haplotypes common to all. The relationship of the newly reported haplotypes to those previously recorded suggests that several mutation processes, particularly recombination as homologous exchange or gene conversion, have been involved in their production. A computer program based on the expectation-maximization (EM) algorithm was used to obtain maximum-likelihood estimates of haplotype frequencies for the entire data set: good concordance between the unambiguous and EM-derived sets was seen for the overall haplotype frequencies. Some of the low-frequency haplotypes reported by the estimation algorithm differ greatly, in structure, from those haplotypes known to be present in human populations, and they may not represent haplotypes actually present in the sample.  相似文献   

12.
The zebrafish is an important animal model for stem cell biology, cancer, and immunology research. Histocompatibility represents a key intersection of these disciplines; however, histocompatibility in zebrafish remains poorly understood. We examined a set of diverse zebrafish class I major histocompatibility complex (MHC) genes that segregate with specific haplotypes at chromosome 19, and for which donor-recipient matching has been shown to improve engraftment after hematopoietic transplantation. Using flanking gene polymorphisms, we identified six distinct chromosome 19 haplotypes. We describe several novel class I U lineage genes and characterize their sequence properties, expression, and haplotype distribution. Altogether, ten full-length zebrafish class I genes were analyzed, mhc1uba through mhc1uka. Expression data and sequence properties indicate that most are candidate classical genes. Several substitutions in putative peptide anchor residues, often shared with deduced MHC molecules from additional teleost species, suggest flexibility in antigen binding. All ten zebrafish class I genes were uniquely assigned among the six haplotypes, with dominant or codominant expression of one to three genes per haplotype. Interestingly, while the divergent MHC haplotypes display variable gene copy number and content, the different genes appear to have ancient origin, with extremely high levels of sequence diversity. Furthermore, haplotype variability extends beyond the MHC genes to include divergent forms of psmb8. The many disparate haplotypes at this locus therefore represent a remarkable form of genomic region configuration polymorphism. Defining the functional MHC genes within these divergent class I haplotypes in zebrafish will provide an important foundation for future studies in immunology and transplantation.  相似文献   

13.
To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03.  相似文献   

14.
DNA polymorphisms at the phenylalanine hydroxylase (PAH) locus have proved highly effective in linkage diagnosis of phenylketonuria (PKU) in Caucasian families. More than 10 RFLP sites have been reported within the PAH structural locus in Caucasians. With information from affected and unaffected offspring in PKU families it is often possible to reconstruct complete RFLP haplotypes in parents and to use these haplotypes to follow the segregation of PKU within families and to determine the distribution of PKU chromosomes within populations. To establish the utility of these RFLPs in characterizing Asian families with PKU, we typed eight DNA sites in 21 Chinese families and 12 Japanese families with classical PKU. The eight RFLPs were chosen for their informativeness in Caucasians. From these families we reconstructed a total of 91 complete PAH haplotypes, 44 from non-PKU chromosomes and 47 from PKU-bearing chromosomes. Although all eight marker sites are polymorphic in both Chinese and Japanese, there is much less haplotypic variation in Asians than in Caucasians. In particular, one haplotype alone, haplotype 4, accounts for more than 77% of non-PKU chromosomes and for more than 80% of PKU-bearing chromosomes. Haplotype 4 is also relatively common in Caucasians. The next most common Asian haplotype is 10 times less frequent than haplotype 4. By contrast, in many Caucasian populations the sum of the frequencies of the five most common haplotypes is still less than 80%, and several of the most common haplotypes are equally frequent. Even though the extent of haplotypic variation in Asians is severely limited, the few haplotypes that are found often differ at a number of RFLP sites.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

15.
Mutations at the cystic fibrosis transmembrane conductance regulator gene (CFTR) cause cystic fibrosis, the most prevalent severe genetic disorder in individuals of European descent. We have analyzed normal allele and haplotype variation at four short tandem repeat polymorphisms (STRPs) and two single-nucleotide polymorphisms (SNPs) in CFTR in 18 worldwide population samples, comprising a total of 1,944 chromosomes. The rooted phylogeny of the SNP haplotypes was established by typing ape samples. STRP variation within SNP haplotype backgrounds was highest in most ancestral haplotypes-although, when STRP allele sizes were taken into account, differences among haplotypes became smaller. Haplotype background determines STRP diversity to a greater extent than populations do, which indicates that haplotype backgrounds are older than populations. Heterogeneity among STRPs can be understood as the outcome of differences in mutation rate and pattern. STRP sites had higher heterozygosities in Africans, although, when whole haplotypes were considered, no significant differences remained. Linkage disequilibrium (LD) shows a complex pattern not easily related to physical distance. The analysis of the fraction of possible different haplotypes not found may circumvent some of the methodological difficulties of LD measure. LD analysis showed a positive correlation with locus polymorphism, which could partly explain the unusual pattern of similar LD between Africans and non-Africans. The low values found in non-Africans may imply that the size of the modern human population that emerged "Out of Africa" may be larger than what previous LD studies suggested.  相似文献   

16.
17.
Pe'er I  Beckmann JS 《Human genetics》2004,114(2):214-217
The rationale for mapping all common haplotypes in our species relies on reports of the conservation of haplotype blocks across human populations. Recent findings indicate that these blocks may, at least in part, be a random artifact of genetic drift. This raises the concern that the latter process may challenge the general applicability of a human haplotype map to case-by-case population-specific association studies. We develop arguments indicating that even stochastic drift-originated blocks will, under many conditions, be shared across populations, supporting the utilization of a panhuman haplotype map.  相似文献   

18.
A. Lagziel  E. Lipkin    M. Soller 《Genetics》1996,142(3):945-951
The bovine Growth Hormone gene (bGH) is an attractive candidate gene for milk production in cattle. Single-strand conformation polymorphisms at bGH were identified and used to define haplotype configurations at this gene in the Israeli Holstein dairy cattle population (Bos taurus) and in the parent animals of the International Bovine Reference Family Panel (a collection of B. taurus and B. indicus crosses). B. taurus and B. indicus haplotypes at the bGH gene differed qualitatively, confirming the previously proposed long evolutionary separation of these cattle subraces. Only a small number of bGH haplotypes were present in the Israel Holstein population. One of the haplotypes, apparently of B. indicus origin, was found to have a highly significant positive effect on milk protein percentage. This illustrates the utility of the haplotype approach for uncovering candidate gene involvement in quantitative genetic variation in agricultural populations. The strong effect of an indicine haplotype in a taurine background raises the possibility that indicine alleles at other candidate genes may comprise a genetic resource for improvement of taurine populations. It is proposed that haplotype analysis may be a useful adjunct to measures of genetic distance for evaluating rare breeds with respect to gene conservation.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号