首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SNP and haplotype variation in the human genome   总被引:19,自引:0,他引:19  
We have surveyed and summarized several aspects of DNA variability among humans. The variation described is the result of mutation followed by a combination of drift, migration and selection bringing the frequencies high enough to be observed. This paper describes what we have learned about how DNA variability differs among genes and populations. We sequenced functional regions of a set of 3950 genes. DNA was sampled from 82 unrelated humans: 20 African-Americans, 20 East Asians, 21 Caucasians, 18 Hispanic-Latinos and 3 Native Americans. Different aspects of variability showed a great deal of concordance. In particular, we studied patterns of single nucleotide polymorphism (SNP) allele and haplotype sharing among the four, large sample populations. We also examined how linkage disequilibrium (LD) between SNPs relates to physical distance in the different populations. It is clear from our findings that while many variants are common to all populations, many others have a more restricted distribution. Research that attempts to find genetic variants that explain phenotypic variants must be careful in their choice of study population.  相似文献   

2.
人类基因组单体型图(Haplotypemap,以下简称HapMap)计划是与不久前完成的基因组测序计划相当的又一多国参与的重大国际合作项目,也是人类基因组研究领域的第2个重大战略目标。其目的是在通过测序了解了遗传基本信息的基础上,进一步确立世界上主要族群基因组的遗传变异图谱。这一计划的主要  相似文献   

3.
Comparisons between haplotypes from affected patients and the human reference genome are frequently used to identify candidates for disease-causing mutations, even though these alignments are expected to reveal a high level of background neutral polymorphism. This limits the scope of genetic studies to relatively small genomic intervals, because current methods for distinguishing potential causal mutations from neutral variation are inefficient. Here we describe a new strategy for detecting mutations that is based on comparing affected haplotypes with closely matched control sequences from healthy individuals, rather than with the human reference genome. We use theory, simulation, and a real data set to show that this approach is expected to reduce the number of sequence variants that must be subjected to follow-up analysis by at least a factor of 20 when closely matched control sequences are selected from a reference panel with as few as 100 control genomes. We also define a reference data resource that would allow efficient application of this strategy to large critical intervals across the genome.  相似文献   

4.
Using haplotype blocks to map human complex trait loci   总被引:28,自引:0,他引:28  
Understanding of linkage disequilibrium (LD) in human populations could facilitate the discovery of genes that influence complex human diseases. The "HapMap" project is now underway to characterize patterns of LD in the human genome. A pilot study showed "haplotype blocks" in 51 regions scattered throughout the genome. These intriguing results raise important questions about the nature of recombination, and highlight practical issues of marker collection, the influence of statistical modelling on apparent block structure, and the levels of genotyping necessary for studies of common diseases. Knowledge of local disequilibrium patterns may help identify common polymorphisms involved in complex disease, but completely new analytical methods and experimental designs will be required to identify important rare variants.  相似文献   

5.
6.
Genome wide association studies using high throughput technology are already being conducted despite the significant hurdles that need to be overcome (Nat Rev Genet 6:95–108, 2005; Nat Rev Genet 6:109–118, 2005). Methods for detecting haplotype association signals in genome wide haplotype datasets are as yet very limited. Much methodological research has already been devoted to linkage disequilibrium (LD) fine mapping where the focus is the identification of the disease locus rather than the detection of a disease signal. Applications of these approaches to genome wide scanning are limited by the strong model assumptions of the sharing process, which lead to computational complexity. We describe a new algorithm for the initial identification of disease susceptibility loci in genome wide haplotype association studies. Excess sharing of ancestral haplotypes, which indicates the presence of a disease locus, is detected with a simple, easy to interpret, χ 2 based statistic. The method allows genome wide scanning for qualitative traits within reasonable computational timeframes and can serve as a first pass analysis prior to the usage of likelihood based methods, providing candidate regions and inferred susceptibility haplotypes. Our method makes no assumptions regarding the population history or the pattern of background LD. Statistical significance is evaluated with permutation tests. The method is illustrated on simulated and real data where it is applied to simple (cystic fibrosis) and complex disease (multiple sclerosis) examples. The statistic has low type I error and greater power to map disease loci over conventional single marker tests for low to moderate levels of LD.  相似文献   

7.
Population-genetic basis of haplotype blocks in the 5q31 region   总被引:3,自引:0,他引:3       下载免费PDF全文
We investigated patterns of nucleotide variation in the 5q31 region identified by Daly et al. as containing haplotype blocks, to determine whether the blocklike pattern requires the assumption of hotspots in recombination. Using extensive simulations that generate data matched to the Daly et al. data set in (a) the method of ascertainment of single-nucleotide polymorphisms, (b) the heterozygosity of ascertained markers, (c) the number of block boundaries, and (d) the diversity of haplotypes within blocks, we show that the patterns found in the Daly et al. data are not consistent with the assumption of uniform recombination in a population of constant size but are consistent either with the presence of hotspots in a population of constant size or with the absence of hotspots if there was a period of rapid population growth. We further show that estimates of local recombination rate can distinguish between population growth and hotspots as the primary cause of a blocklike pattern. Estimates of local recombination rates for the Daly et al. data do not indicate the presence of recombination hotspots.  相似文献   

8.
The human genome has linkage disequilibrium (LD) blocks, within which single-nucleotide polymorphisms show strong association with each other. We examined data from the International HapMap Project to define LD blocks and to detect DNA sequence features inside of them. We used permutation tests to determine the empirical significance of the association of LD blocks with genes and Alu repeats. Very large LD blocks (>200 kb) have significantly higher gene coverage and Alu frequency than the outcome obtained from permutation-based simulation, whereas there was no significant positive correlation between gene density and block size. We also observed a reduced frequency of Alu repeats at the gaps between large LD blocks, indicating that their enrichment in large LD blocks does not introduce recombination hotspots that would cause these gaps.  相似文献   

9.
The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation.  相似文献   

10.
Tang S  Hyman BC 《Genetics》2007,176(2):1139-1150
Characterization of mitochondrial genomes from individual Thaumamermis cosgrovei nematodes, obligate parasites of the isopod Armadillidium vulgare, revealed that numerous mtDNA haplotypes, ranging in size from 19 to 34 kb, are maintained in several spatially separated isopod populations. The magnitude and frequency of conspecific mtDNA size variation is unprecedented among all studied size-polymorphic metazoan mitochondrial genomes. To understand the molecular basis of this hypervariation, complete nucleotide sequences of two T. cosgrovei mtDNA haplotypes were determined. A hypervariable segment, residing between the atp6 and rrnL genes, contributes exclusively to T. cosgrovei mtDNA size variation. Within this region, mtDNA coding genes and putative nonfunctional sequences have accumulated substitutions and are duplicated and rearranged to varying extents. Hypervariation at this level has enabled a first insight into the life history of T. cosgrovei. In five A. vulgare hosts infected with multiple nematodes, four carried nematodes with identical mtDNA haplotypes, suggesting that hosts may become infected by ingesting a recently hatched egg clutch or become parasitized by individuals from the same brood prior to dispersal of siblings within the soil.  相似文献   

11.
MOTIVATION: The identification of signatures of positive selection can provide important insights into recent evolutionary history in human populations. Current methods mostly rely on allele frequency determination or focus on one or a small number of candidate chromosomal regions per study. With the availability of large-scale genotype data, efficient approaches for an unbiased whole genome scan are becoming necessary. METHODS: We have developed a new method, the whole genome long-range haplotype test (WGLRH), which uses genome-wide distributions to test for recent positive selection. Adapted from the long-range haplotype (LRH) test, the WGLRH test uses patterns of linkage disequilibrium (LD) to identify regions with extremely low historic recombination. Common haplotypes with significantly longer than expected ranges of LD given their frequencies are identified as putative signatures of recent positive selection. In addition, we have also determined the ancestral alleles of SNPs by genotyping chimpanzee and gorilla DNA, and have identified SNPs where the non-ancestral alleles have risen to extremely high frequencies in human populations, termed 'flipped SNPs'. Combining the haplotype test and the flipped SNPs determination, the WGLRH test serves as an unbiased genome-wide screen for regions under putative selection, and is potentially applicable to the study of other human populations. RESULTS: Using WGLRH and high-density oligonucleotide arrays interrogating 116 204 SNPs, we rapidly identified putative regions of positive selection in three populations (Asian, Caucasian, African-American), and extended these observations to a fourth population, Yoruba, with data obtained from the International HapMap consortium. We mapped significant regions to annotated genes. While some regions overlap with genes previously suggested to be under positive selection, many of the genes have not been previously implicated in natural selection and offer intriguing possibilities for further study. AVAILABILITY: the programs for the WGLRH algorithm are freely available and can be downloaded at http://www.affymetrix.com/support/supplement/WGLRH_program.zip.  相似文献   

12.
Martin OC  Hospital F 《Genetics》2011,189(2):645-654
We consider recombinant inbred lines obtained by crossing two given homozygous parents and then applying multiple generations of self-crossings or full-sib matings. The chromosomal content of any such line forms a mosaic of blocks, each alternatively inherited identically by descent from one of the parents. Quantifying the statistical properties of such mosaic genomes has remained an open challenge for many years. Here, we solve this problem by taking a continuous chromosome picture and assuming crossovers to be noninterfering. Using a continuous-time random walk framework and Markov chain theory, we determine the statistical properties of these identical-by-descent blocks. We find that successive block lengths are only very slightly correlated. Furthermore, the blocks on the ends of chromosomes are larger on average than the others, a feature understandable from the nonexponential distribution of block lengths.  相似文献   

13.
OBJECTIVE: The presence of linkage disequilibrium (LD) forms the basis for a range of uses, including the fine-mapping of diseases and studies on human genealogy. Recent findings indicate that single nucleotide polymorphisms (SNP) can occur in blocks of limited haplotypic diversity with high degrees of LD. Commonly used measures for LD, such as r(2) and D', consider only two loci and might miss information to appropriately describe LD in larger haplotypic structures. METHODS: We introduce the Normalized Entropy Difference, epsilon, as a new multilocus measure for LD. A related quantity, deltaS, provides an approximate chi(2) test for the significance of LD. The ability of the measure to detect haplotype blocks is investigated using simulated data sets as well as a real data set previously analyzed by Daly et al. (2001). RESULTS: epsilon allows for arbitrary numbers of loci, describes LD with regard to the loci sequence, and can be interpreted as a multilocus extension of r(2). The application of epsilon to the data sets demonstrated the measure's ability to appropriately describe simultaneous multilocus LD and to detect haplotype blocks. CONCLUSIONS: epsilon is a reasonable multilocus LD measure and might be of potential use in the construction of the human haplotype map.  相似文献   

14.
MOTIVATION: Missing data in genotyping single nucleotide polymorphism (SNP) spots are common. High-throughput genotyping methods usually have a high rate of missing data. For example, the published human chromosome 21 data by Patil et al. contains about 20% missing SNPs. Inferring missing SNPs using the haplotype block structure is promising but difficult because the haplotype block boundaries are not well defined. Here we propose a global algorithm to overcome this difficulty. RESULTS: First, we propose to use entropy as a measure of haplotype diversity. We show that the entropy measure combined with a dynamic programming algorithm produces better haplotype block partitions than other measures. Second, based on the entropy measure, we propose a two-step iterative partition-inference algorithm for the inference of missing SNPs. At the first step, we apply the dynamic programming algorithm to partition haplotypes into blocks. At the second step, we use an iterative process similar to the expectation-maximization algorithm to infer missing SNPs in each haplotype block so as to minimize the block entropy. The algorithm iterates these two steps until the total block entropy is minimized. We test our algorithm in several experimental data sets. The results show that the global approach significantly improves the accuracy of the inference. AVAILABILITY: Upon request.  相似文献   

15.
16.
This report describes single-nucleotide polymorphisms (SNPs) in the sheep major histocompatibility complex (MHC) class II and class III regions and provides insights into the internal structure of this important genomic complex. MHC haplotypes were deduced from sheep family trios based on genotypes from 20 novel SNPs representative of the class II region and 10 previously described SNPs spanning the class III region. All 30 SNPs exhibited Hardy-Weinberg proportions in the sheep population studied. Recombination within an extended sire haplotype was observed within the class II region for 4 of 20 sheep chromosomes, thereby supporting the presence of separated IIa and IIb subregions similar to those present in cattle. SNP heterozygosity varied across the class II and III regions. One segment of the class IIa subregion manifested very low heterozygosity for several SNPs spanning approximately 120 Kbp. This feature corresponds to a subregion within the human MHC class II region previously described as a 'SNP desert' because of its paucity of SNPs. Linkage disequilibrium (LD) was reduced at the junction separating the putative class IIb and IIa subregions and also between the class IIa and the class III subregions. The latter observation is consistent with either an unmapped physical separation at this location or more likely a boundary characterized by more frequent recombination between two conserved subregions, each manifesting high within-block LD. These results identify internal blocks of loci in the sheep MHC, within which recombination is relatively rare.  相似文献   

17.
Recently, genomic data have revealed a "block-like" structure of haplotype diversity on human chromosomes. This structure is anticipated to facilitate gene mapping studies, because strong associations among loci within a block may allow haplotype variation to be tagged with a limited number of markers. But its usefulness to mapping efforts depends on the consistency of the block structure within and among populations, which in turn depends on how the block structure arises. Recombination hot spots are generally thought to underlie the block structure, but haplotype blocks can also develop stochastically under random recombination, in which case the block structure will show limited consistency among populations. Using coalescent models, which we upscaled to simulate the evolution of haplotypes with many markers at fixed distances, we show that the relationship between block boundaries and historic recombination intensity may be surprisingly weak. The majority of historic recombinations do not leave a footprint in present-day linkage disequilibrium patterns, and the block structure is sensitive to factors that affect the timing of recombination relative to marker mutation events in the genealogy, such as marker frequency bias and historic population size changes. Our results give insight into the potential of stochastic events to affect haplotype block structure, which can limit the usefulness of the block structure to mapping studies.  相似文献   

18.
19.
SNPing in the human genome   总被引:4,自引:0,他引:4  
More than a million genetic markers in the form of single nucleotide polymorphisms are now available for use in genotype-phenotype studies in humans. The application of new strategies for representational cloning and sequencing from genomes combined with the mining of high-quality sequence variations in clone overlaps of genomic and/or cDNA sequences has played an important role in generating this new resource. The focus of variation analysis is now shifting from the identification of new markers to their typing in populations, and novel typing strategies are rapidly emerging. Assay readouts on oligonucleotide arrays, in microtiter plates, gels, flow cytometers and mass spectrometers have all been developed, but decreasing cost and increasing throughput of DNA typing remain key if high-density genetic maps are to be applied on a large scale.  相似文献   

20.

Background  

It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号