首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The inference of population divergence times and branching patterns is of fundamental importance in many population genetic analyses. Many methods have been developed for estimating population divergence times, and recently, there has been particular attention towards genome-wide single-nucleotide polymorphisms (SNP) data. However, most SNP data have been affected by an ascertainment bias caused by the SNP selection and discovery protocols. Here, we present a modification of an existing maximum likelihood method that will allow approximately unbiased inferences when ascertainment is based on a set of outgroup populations. We also present a method for estimating trees from the asymmetric dissimilarity measures arising from pairwise divergence time estimation in population genetics. We evaluate the methods by simulations and by applying them to a large SNP data set of seven East Asian populations.  相似文献   

2.
An international effort is underway to generate a comprehensive haplotype map (HapMap) of the human genome represented by an estimated 300000 to 1 million ‘tag’ single nucleotide polymorphisms (SNPs). Our analysis indicates that the current human SNP map is not sufficiently dense to support the HapMap project. For example, 24.6% of the genome currently lacks SNPs at the minimal density and spacing that would be required to construct even a conservative tag SNP map containing 300 000 SNPs. In an effort to improve the human SNP map, we identified 140 696 additional SNP candidates using a new bioinformatics pipeline. Over 51 000 of these SNPs mapped to the largest gaps in the human SNP map, leading to significant improvements in these regions. Our SNPs will be immediately useful for the HapMap project, and will allow for the inclusion of many additional genomic intervals in the final HapMap. Nevertheless, our results also indicate that additional SNP discovery projects will be required both to define the haplotype architecture of the human genome and to construct comprehensive tag SNP maps that will be useful for genetic linkage studies in humans.  相似文献   

3.
Martin SG  Dobi KC  St Johnston D 《Genome biology》2001,2(9):research0036.1-research003612

Background

Genetic screens in Drosophila have provided a wealth of information about a variety of cellular and developmental processes. It is now possible to screen for mutant phenotypes in virtually any cell at any stage of development by performing clonal screens using the flp/FRT system. The rate-limiting step in the analysis of these mutants is often the identification of the mutated gene, however, because traditional mapping strategies rely mainly on genetic and cytological markers that are not easily linked to the molecular map.

Results

Here we describe the development of a single-nucleotide polymorphism (SNP) map for chromosome arm 3R. The map contains 73 polymorphisms between the standard FRT chromosome, and a mapping chromosome that carries several visible markers (rucuca), at an average density of one SNP per 370 kilobases (kb). Using this collection, we show that mutants can be mapped to a 400 kb interval in a single meiotic mapping cross, with only a few hundred SNP detection reactions. Discovery of further SNPs in the region of interest allows the mutation to be mapped with the same recombinants to a region of about 50 kb.

Conclusion

The combined use of standard visible markers and molecular polymorphisms in a single mapping strategy greatly reduces both the time and cost of mapping mutations, because it requires at least four times fewer SNP detection reactions than a standard approach. The use of this map, or others developed along the same lines, will greatly facilitate the identification of the molecular lesions in mutants from clonal screens.  相似文献   

4.
5.
A haplotype is a single nucleotide polymorphism (SNP) sequence and a representative genetic marker describing the diversity of biological organs. Haplotypes have a wide range of applications such as pharmacology and medical applications. In particular, as a highly social species, haplotypes of the Apis mellifera (honeybee) benefit human health and medicine in diverse areas, including venom toxicology, infectious disease, and allergic disease. For this reason, assembling a pair of haplotypes from individual SNP fragments drives research and generates various computational models for this problem. The minimum error correction (MEC) model is an important computational model for an individual haplotype assembly problem. However, the MEC model has been proved to be NP-hard; therefore, no efficient algorithm is available to address this problem. In this study, we propose an improved version of a branch and bound algorithm that can assemble a pair of haplotypes with an optimal solution from SNP fragments of a honeybee specimen in practical time bound. First, we designed a local search algorithm to calculate the good initial upper bound of feasible solutions for enhancing the efficiency of the branch and bound algorithm. Furthermore, to accelerate the speed of the algorithm, we made use of the recursive property of the bounding function together with a lookup table. After conducting extensive experiments over honeybee SNP data released by the Human Genome Sequencing Center, we showed that our method is highly accurate and efficient for assembling haplotypes.  相似文献   

6.
Inference of intraspecific population divergence patterns typically requires genetic data for molecular markers with relatively high mutation rates. Microsatellites, or short tandem repeat (STR) polymorphisms, have proven informative in many such investigations. These markers are characterized, however, by high levels of homoplasy and varying mutational properties, often leading to inaccurate inference of population divergence. A SNPSTR is a genetic system that consists of an STR polymorphism closely linked (typically < 500 bp) to one or more single-nucleotide polymorphisms (SNPs). SNPSTR systems are characterized by lower levels of homoplasy than are STR loci. Divergence time estimates based on STR variation (on the derived SNP allele background) should, therefore, be more accurate and precise. We use coalescent-based simulations in the context of several models of demographic history to compare divergence time estimates based on SNPSTR haplotype frequencies and STR allele frequencies. We demonstrate that estimates of divergence time based on STR variation on the background of a derived SNP allele are more accurate (3% to 7% bias for SNPSTR versus 11% to 20% bias for STR) and more precise than STR-based estimates, conditional on a recent SNP mutation. These results hold even for models involving complex demographic scenarios with gene flow, population expansion, and population bottlenecks. Varying the timing of the mutation event generating the SNP revealed that estimates of divergence time are sensitive to SNP age, with more recent SNPs giving more accurate and precise estimates of divergence time. However, varying both mutational properties of STR loci and SNP age demonstrated that multiple independent SNPSTR systems provide less biased estimates of divergence time. Furthermore, the combination of estimates based separately on STR and SNPSTR variation provides insight into the age of the derived SNP alleles. In light of our simulations, we interpret estimates from data for human populations.  相似文献   

7.
Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.  相似文献   

8.
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved.  相似文献   

9.
The efficacy of linkage studies using microsatellites and single-nucleotide polymorphisms (SNPs) was evaluated. Analyzed data were supplied by the Collaborative Study on the Genetics of Alcoholism (COGA). Alcoholism was analyzed together with a simulated trait caused by a gene of known position, through a nonparametric linkage test (NPL). For the alcoholism trait, four densities of SNPs (1 SNP per 0.2 cM, 0.5 cM, 1 cM and 2 cM) showed higher peaks of NPL z scores and smaller significant p-values than the usual 10-cM density of microsatellites. However, the two highest densities of SNPs had unstable z score signals, and therefore were difficult to interpret. Analyzing a simulated trait with the same markers in the same pedigrees, we confirmed the higher power of all four densities of SNPs compared to the 10-cM microsatellites panel, although the existence of other confounding peaks was confirmed for maps that are denser than 1 SNP/cM. We further showed that estimating the gene position using SNPs is far less biased than using the usual panel of microsatellites (biases of 0-2 cM for SNPs vs. 8.9 cM for microsatellites). We conclude that using dense maps of SNPs in linkage analysis is more powerful and less biased than using the 10-cM maps of microsatellites. However, linkage signals can be unstable and difficult to interpret when several SNPs are genotyped per centimorgan. The power and accuracy of 1 SNP/cM or 1 SNP/2 cM may be sufficient in a genome-wide linkage scan while denser maps may be most useful in fine-gene mapping studies exploiting linkage disequilibrium.  相似文献   

10.
Because deleterious alleles arising from mutation are filtered by natural selection, mutations that create such alleles will be underrepresented in the set of common genetic variation existing in a population at any given time. Here, we describe an approach based on this idea called VERIFY (variant elimination reinforces functionality), which can be used to assess the extent of natural selection acting on an oligonucleotide motif or set of motifs predicted to have biological activity. As an application of this approach, we analyzed a set of 238 hexanucleotides previously predicted to have exonic splicing enhancer (ESE) activity in human exons using the relative enhancer and silencer classification by unanimous enrichment (RESCUE)-ESE method. Aligning the single nucleotide polymorphisms (SNPs) from the public human SNP database to the chimpanzee genome allowed inference of the direction of the mutations that created present-day SNPs. Analyzing the set of SNPs that overlap RESCUE-ESE hexamers, we conclude that nearly one-fifth of the mutations that disrupt predicted ESEs have been eliminated by natural selection (odds ratio = 0.82 +/- 0.05). This selection is strongest for the predicted ESEs that are located near splice sites. Our results demonstrate a novel approach for quantifying the extent of natural selection acting on candidate functional motifs and also suggest certain features of mutations/SNPs, such as proximity to the splice site and disruption or alteration of predicted ESEs, that should be useful in identifying variants that might cause a biological phenotype.  相似文献   

11.
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.  相似文献   

12.
Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.  相似文献   

13.
Jiang R  Marjoram P  Borevitz JO  Tavaré S 《Genetics》2006,173(4):2257-2267
This article is concerned with a statistical modeling procedure to call single-feature polymorphisms from microarray experiments. We use this new type of polymorphism data to estimate the mutation and recombination parameters in a population. The mutation parameter can be estimated via the number of single-feature polymorphisms called in the sample. For the recombination parameter, a two-feature sampling distribution is derived in a way analogous to that for the two-locus sampling distribution with SNP data. The approximate-likelihood approach using the two-feature sampling distribution is examined and found to work well. A coalescent simulation study is used to investigate the accuracy and robustness of our method. Our approach allows the utilization of single-feature polymorphism data for inference in population genetics.  相似文献   

14.
Errors while genotyping are inevitable and can reduce the power to detect linkage. However, does genotyping error have the same impact on linkage results for single-nucleotide polymorphism (SNP) and microsatellite (MS) marker maps? To evaluate this question we detected genotyping errors that are consistent with Mendelian inheritance using large changes in multipoint identity-by-descent sharing in neighboring markers. Only a small fraction of Mendelian consistent errors were detectable (e.g., 18% of MS and 2.4% of SNP genotyping errors). More SNP genotyping errors are Mendelian consistent compared to MS genotyping errors, so genotyping error may have a greater impact on linkage results using SNP marker maps. We also evaluated the effect of genotyping error on the power and type I error rate using simulated nuclear families with missing parents under 0, 0.14, and 2.8% genotyping error rates. In the presence of genotyping error, we found that the power to detect a true linkage signal was greater for SNP (75%) than MS (67%) marker maps, although there were also slightly more false-positive signals using SNP marker maps (5 compared with 3 for MS). Finally, we evaluated the usefulness of accounting for genotyping error in the SNP data using a likelihood-based approach, which restores some of the power that is lost when genotyping error is introduced.  相似文献   

15.
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.  相似文献   

16.
J Ma  CI Amos 《PloS one》2012,7(7):e40224
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct "populations" of inversion homozygotes of different orientations and their 1:1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.  相似文献   

17.
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.  相似文献   

18.
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum-recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our previous results show that the problem of finding a minimum-recombinant haplotype configuration (MRHC) is in general NP-hard. This paper presents an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branch-and-bound strategy that utilizes a partial order relationship and some other special relationships among variables to decide the branching order. Nontrivial lower and upper bounds on the optimal number of recombinants are introduced at each branching node to effectively prune the search tree. When multiple solutions exist, a best haplotype configuration is selected based on a maximum likelihood approach. The paper also shows for the first time how to incorporate marker interval distance into a rule-based haplotyping algorithm. Our results on simulated data show that the algorithm could recover haplotypes with 50 loci from a pedigree of size 29 in seconds on a Pentium IV computer. Its accuracy is more than 99.8% for data with no missing alleles and 98.3% for data with 20% missing alleles in terms of correctly recovered phase information at each marker locus. A comparison with a statistical approach SimWalk2 on simulated data shows that the ILP algorithm runs much faster than SimWalk2 and reports better or comparable haplotypes on average than the first and second runs of SimWalk2. As an application of the algorithm to real data, we present some test results on reconstructing haplotypes from a genome-scale SNP dataset consisting of 12 pedigrees that have 0.8% to 14.5% missing alleles.  相似文献   

19.
Nicastrin regulates gamma-secretase cleavage of the amyloid precursor protein by forming complexes with presenilins, in which most mutations causing familial early-onset Alzheimer disease (EOAD) have been found. The gene encoding nicastrin (NCSTN) maps to 1q23, a region that has been linked and associated with late-onset Alzheimer disease (LOAD) in various genome screens. In 78 familial EOAD cases, we found 14 NCSTN single-nucleotide polymorphisms (SNPs): 10 intronic SNPs, 3 silent mutations, and 1 missense mutation (N417Y). N417Y is unlikely to be pathogenic, since it did not alter amyloid beta secretion in an in vitro assay and its frequency was similar in case and control subjects. However, SNP haplotype estimation in two population-based series of Dutch patients with EOAD (n=116) and LOAD (n=240) indicated that the frequency of one SNP haplotype (HapB) was higher in the group with familial EOAD (7%), compared with the LOAD group (3%) and control group (3%). In patients with familial EOAD without the APOE epsilon4 allele, the HapB frequency further increased, to 14%, resulting in a fourfold increased risk (odds ratio = 4.1; 95% confidence interval 1.2-13.3; P=.01). These results are compatible with an important role of gamma-secretase dysfunction in the etiology of familial EOAD.  相似文献   

20.
We propose a simple model of evolution at a pair of SNP loci, under mutation, genetic drift and recombination. The developed model allows to consider evolution of SNPs under different demographic scenarios. We applied it to SNP data containing polymorphisms spanning 19 gene regions. We initially matched the linkage disequilibrium (LD) data only, and then we reconciled both LD and heterozygosity data. The imbalance between LD and heterozygosity data, observed for some of the analyzed genomic regions, may be a signature of selection acting in these regions. However, assuming neutrality, we obtain estimates of the age of population expansion of modern humans, which are consistent with the consensus estimates. In addition, we are able to estimate the ages of the polymorphisms observed in different genomic regions and we find that they vary widely with respect to their age. Polymorphisms at loci implicated in human disease, seem to be younger than average. Our results supplement the conclusions originally obtained by Reich and co-workers for the same set of data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号