首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Malaria parasites (Plasmodium falciparum) provide an excellent system in which to study the genomic effects of strong selection in a recombining eukaryote because the rapid spread of resistance to multiple drugs during the last the past 50 years has been well documented, the full genome sequence and a microsatellite map are now available, and haplotype data can be easily generated. We examined microsatellite variation around the dihydrofolate reductase (dhfr) gene on chromosome 4 of P. falciparum. Point mutations in dhfr are known to be responsible for resistance to the antimalarial drug pyrimethamine, and resistance to this drug has spread rapidly in Southeast (SE) Asia after its introduction in 1970s. We genotyped 33 microsatellite markers distributed across chromosome 4 in 61 parasites from a location on the Thailand/Myanmar border. We observed minimal microsatellite length variation in a 12-kb (0.7-cM) region flanking the dhfr gene and diminished variation for approximately 100 kb (6 cM), indicative of a single origin of resistant alleles. Furthermore, we found the same or similar microsatellite haplotypes flanked resistant dhfr alleles sampled from 11 parasite populations in five SE Asian countries indicating recent invasion of a single lineage of resistant dhfr alleles in locations 2000 km apart. Three features of these data are of especially interest. (1). Pyrimethamine resistance is generally assumed to have evolved multiple times because the genetic basis is simple and resistance can be selected easily in the laboratory. Yet our data clearly indicate a single origin of resistant dhfr alleles sampled over a large region of SE Asia. (2). The wide valley ( approximately 6 cM) of reduced variation around dhfr provides "proof-of-principle" that genome-wide association may be an effective way to locate genes under strong recent selection. (3). The width of the selective valley is consistent with predictions based on independent measures of recombination, mutation, and selection intensity, suggesting that we have reasonable estimates of these parameters. We conclude that scanning the malaria parasite genome for evidence of recent selection may prove an extremely effective way to locate genes underlying recently evolved traits such as drug resistance, as well as providing an opportunity to study the dynamics of selective events that have occurred recently or are currently in progress.  相似文献   

2.
Amyotrophic lateral sclerosis (ALS) is the most common form of motor neuron disease (MND). It is currently incurable and treatment is largely limited to supportive care. Family history is associated with an increased risk of ALS, and many Mendelian causes have been discovered. However, most forms of the disease are not obviously familial. Recent advances in human genetics have enabled genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. Genome-wide SNP analyses require a large sample size and thus depend upon collaborative efforts to collect and manage the biological samples and corresponding data. Public availability of biological samples (such as DNA), phenotypic and genotypic data further enhances research endeavors. Here we discuss a large collaboration among academic investigators, government, and non-government organizations which has created a public repository of human DNA, immortalized cell lines, and clinical data to further gene discovery in ALS. This resource currently maintains samples and associated phenotypic data from 2332 MND subjects and 4692 controls. This resource should facilitate genetic discoveries which we anticipate will ultimately provide a better understanding of the biological mechanisms of neurodegeneration in ALS.  相似文献   

3.
South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.  相似文献   

4.
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).  相似文献   

5.
Recent advances in sequencing and genotyping technologies are contributing to a data revolution in genome-wide association studies that is characterized by the challenging large p small n problem in statistics. That is, given these advances, many such studies now consider evaluating an extremely large number of genetic markers (p) genotyped on a small number of subjects (n). Given the dimension of the data, a joint analysis of the markers is often fraught with many challenges, while a marginal analysis is not sufficient. To overcome these obstacles, herein, we propose a Bayesian two-phase methodology that can be used to jointly relate genetic markers to binary traits while controlling for confounding. The first phase of our approach makes use of a marginal scan to identify a reduced set of candidate markers that are then evaluated jointly via a hierarchical model in the second phase. Final marker selection is accomplished through identifying a sparse estimator via a novel and computationally efficient maximum a posteriori estimation technique. We evaluate the performance of the proposed approach through extensive numerical studies, and consider a genome-wide application involving colorectal cancer.  相似文献   

6.
Knowledge about genetic diversity and population structure is useful for designing effective strategies to improve the production, management and conservation of farm animal genetic resources. Here, we present a comprehensive genome-wide analysis of genetic diversity, population structure and admixture based on 244 animals sampled from 10 cattle populations in Asia and Africa and genotyped for 69 903 autosomal single-nucleotide polymorphisms (SNPs) mainly derived from the indicine breed. Principal component analysis, STRUCTURE and distance analysis from high-density SNP data clearly revealed that the largest genetic difference occurred between the two domestic lineages (taurine and indicine), whereas Ethiopian cattle populations represent a mosaic of the humped zebu and taurine. Estimation of the genetic influence of zebu and taurine revealed that Ethiopian cattle were characterized by considerable levels of introgression from South Asian zebu, whereas Bangladeshi populations shared very low taurine ancestry. The relationships among Ethiopian cattle populations reflect their history of origin and admixture rather than phenotype-based distinctions. The high within-individual genetic variability observed in Ethiopian cattle represents an untapped opportunity for adaptation to changing environments and for implementation of within-breed genetic improvement schemes. Our results provide a basis for future applications of genome-wide SNP data to exploit the unique genetic makeup of indigenous cattle breeds and to facilitate their improvement and conservation.  相似文献   

7.
Evidence from human genetics supporting the therapeutic hypothesis increases the likelihood that a drug will succeed in clinical trials. Rare and common disease genetics yield a wide array of alleles with a range of effect sizes that can proxy for the effect of a drug in disease. Recent advances in large scale population collections and whole genome sequencing approaches have provided a rich resource of human genetic evidence to support drug target selection. As the range of phenotypes profiled increases and ever more alleles are discovered across world-wide populations, these approaches will increasingly influence multiple stages across the lifespan of a drug discovery programme.  相似文献   

8.
Making the most of 'omics' for crop breeding   总被引:1,自引:0,他引:1  
Adoption of new breeding technologies is likely to underpin future gains in crop productivity. The rapid advances in 'omics' technologies provide an opportunity to generate new datasets for crop species. Integration of genome and functional omics data with genetic and phenotypic information is leading to the identification of genes and pathways responsible for important agronomic phenotypes. In addition, high-throughput genotyping technologies enable the screening of large germplasm collections to identify novel alleles from diverse sources, thus offering a major expansion in the variation available for breeding. In this review, we discuss these advances, which have opened the door to new techniques for construction and screening of breeding populations, to increase ultimately the efficiency of selection and accelerate the rates of genetic gain.  相似文献   

9.
Summary As the variation of species is known to be influenced both by ecological and geographical factors, data on the origin of a sample from a given species could be used to infer some of its genetic characteristics. This concept was examined in the context of gene banks, where the assembled diversity usually represents a large range of environments and geographic locations. Results suggest that, although ecological variables in the site of origin can be useful in predicting genetic characteristics in the samples, the use of such data is neither simple nor precise. On the other hand simple geographic data, irrespective of their ecological content, were found to offer an effective method of stratifying and sampling variation in germ plasm collections.  相似文献   

10.
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.  相似文献   

11.
Markers with large differences in allele frequencies between ethnicities provide ancestry information that can be applied to genetic studies. We identified over 100 biallelic ancestry informative markers (AIMs) with large allele frequency differences between European Americans (EA) and Pima Amerindians from laboratory and database screens. For 35 of these markers, Mayan, Yavapai and Quechuan Amerindians were genotyped and compared with EA and Pima allele frequencies. Markers with large allele frequency differences between EA and one Amerindian tribe showed only small differences between the Amerindian tribes. Examination of structure in individuals demonstrated a clear separation of subjects of European from those of Amerindian ancestry, and similarity between individuals from disparate Amerindian populations. The AIMs demonstrated the variation in ancestral composition of individual Mexican Americans, providing evidence of applicability in admixture mapping and in controlling for structure in association tests. In addition, a high percentage of single-nucleotide polymorphisms (SNPs) selected on the basis of large frequency differences between EA and Asian populations had large allele frequency differences between EA and Amerindians, suggesting an efficient method for greatly expanding AIMs for use in admixture mapping/structure analysis in Mexican Americans. Together, these data provide additional support for the practical application of admixture mapping in the Mexican American population.Electronic Supplementary Material Supplementary material is available in the online version of this article at  相似文献   

12.
The successful exploitation of natural genetic diversity requires a basic knowledge of the extent of the variation present in a species. To study natural variation in Arabidopsis thaliana, we defined nested core collections maximizing the diversity present among a worldwide set of 265 accessions. The core collections were generated based on DNA sequence data from a limited number of fragments evenly distributed in the genome and were shown to successfully capture the molecular diversity in other loci as well as the morphological diversity. The core collections are available to the scientific community and thus provide an important resource for the study of genetic variation and its functional consequences in Arabidopsis. Moreover, this strategy can be used in other species to provide a rational framework for undertaking diversity surveys, including single nucleotide polymorphism (SNP) discovery and phenotyping, allowing the utilization of genetic variation for the study of complex traits.  相似文献   

13.
Although inherited mitochondrial genetic variation can cause human disease, no validated methods exist for control of confounding due to mitochondrial population stratification (PS). We sought to identify a reliable method for PS assessment in mitochondrial medical genetics. We analyzed mitochondrial SNP data from 1513 European American individuals concomitantly genotyped with the use of a previously validated panel of 144 mitochondrial markers as well as the Affymetrix 6.0 (n = 432), Illumina 610-Quad (n = 458), or Illumina 660 (n = 623) platforms. Additional analyses were performed in 938 participants in the Human Genome Diversity Panel (HGDP) (Illumina 650). We compared the following methods for controlling for PS: haplogroup-stratified analyses, mitochondrial principal-component analysis (PCA), and combined autosomal-mitochondrial PCA. We computed mitochondrial genomic inflation factors (mtGIFs) and test statistics for simulated case-control and continuous phenotypes (10,000 simulations each) with varying degrees of correlation with mitochondrial ancestry. Results were then compared across adjustment methods. We also calculated power for discovery of true associations under each method, using a simulation approach. Mitochondrial PCA recapitulated haplogroup information, but haplogroup-stratified analyses were inferior to mitochondrial PCA in controlling for PS. Correlation between nuclear and mitochondrial principal components (PCs) was very limited. Adjustment for nuclear PCs had no effect on mitochondrial analysis of simulated phenotypes. Mitochondrial PCA performed with the use of data from commercially available genome-wide arrays correlated strongly with PCA performed with the use of an exhaustive mitochondrial marker panel. Finally, we demonstrate, through simulation, no loss in power for detection of true associations with the use of mitochondrial PCA.  相似文献   

14.
Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.  相似文献   

15.
We conducted a two-stage genome-wide association study to identify common genetic variation altering risk of the metabolic syndrome and related phenotypes in Indian Asian men, who have a high prevalence of these conditions. In Stage 1, approximately 317,000 single nucleotide polymorphisms were genotyped in 2700 individuals, from which 1500 SNPs were selected to be genotyped in a further 2300 individuals. Selection for inclusion in Stage 1 was based on four metabolic syndrome component traits: HDL-cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity measured by waist to hip ratio, and diastolic blood pressure. Association was tested with these four traits and a composite metabolic syndrome phenotype. Four SNPs reaching significance level p<5×10−7 and with posterior probability of association >0.8 were found in genes CETP and LPL, associated with HDL-cholesterol. These associations have already been reported in Indian Asians and in Europeans. Five additional loci harboured SNPs significant at p<10−6 and posterior probability >0.5 for HDL-cholesterol, type 2 diabetes or diastolic blood pressure. Our results suggest that the primary genetic determinants of metabolic syndrome are the same in Indian Asians as in other populations, despite the higher prevalence. Further, we found little evidence of a common genetic basis for metabolic syndrome traits in our sample of Indian Asian men.  相似文献   

16.
Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.  相似文献   

17.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.  相似文献   

18.
Roeder K  Luca D 《Genomics》2009,93(1):1-4
Data for genome-wide association studies are being collected for a myriad of phenotypes. Many of these studies do not include control samples selected to reflect ancestry similar to the case samples. At the same time "control databases" are becoming available to be utilized as a common resource. These data are often genotyped using a large-scale SNP array. Human populations exhibit complex structure that can lead to spurious associations if not properly handled. How to couple case and control databases effectively is a pressing question. We review available methods for modeling genetic ancestry based on the information gleaned from the SNP array. Methods for selecting control samples with genetic ancestry similar to the case samples are described.  相似文献   

19.
Our understanding of the genetic architecture of iris color is still limited. This is partly related to difficulties associated with obtaining quantitative measurements of eye color. Here we introduce a new automated method for measuring iris color using high resolution photographs. This method extracts color measurements in the CIE 1976 L*a*b* (CIELAB) color space from a 256 by 256 pixel square sampled from the 9:00 meridian of the iris. Color is defined across three dimensions: L* (the lightness coordinate), a* (the red-green coordinate), and b* (the blue-yellow coordinate). We applied this method to a sample of individuals of diverse ancestry (East Asian, European and South Asian) that was genotyped for the HERC2 rs12913832 polymorphism, which is strongly associated with blue eye color. We identified substantial variation in the CIELAB color space, not only in the European sample, but also in the East Asian and South Asian samples. As expected, rs12913832 was significantly associated with quantitative iris color measurements in subjects of European ancestry. However, this SNP was also strongly associated with iris color in the South Asian sample, although there were no participants with blue irides in this sample. The usefulness of this method is not restricted only to the study of iris pigmentation. High-resolution pictures of the iris will also make it possible to study the genetic variation involved in iris textural patterns, which show substantial heritability in human populations.  相似文献   

20.
R Abo  GD Jenkins  L Wang  BL Fridley 《PloS one》2012,7(8):e43301
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号