首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Whole-genome resequencing technology has improved rapidly during recent years and is expected to improve further such that the sequencing of an entire human genome sequence for $1000 is within reach. Our main aim here is to use whole-genome sequence data for the prediction of genetic values of individuals for complex traits and to explore the accuracy of such predictions. This is relevant for the fields of plant and animal breeding and, in human genetics, for the prediction of an individual''s risk for complex diseases. Here, population history and genomic architectures were simulated under the Wright–Fisher population and infinite-sites mutation model, and prediction of genetic value was by the genomic selection approach, where a Bayesian nonlinear model was used to predict the effects of individual SNPs. The Bayesian model assumed a priori that only few SNPs are causative, i.e., have an effect different from zero. When using whole-genome sequence data, accuracies of prediction of genetic value were >40% increased relative to the use of dense ∼30K SNP chips. At equal high density, the inclusion of the causative mutations yielded an extra increase of accuracy of 2.5–3.7%. Predictions of genetic value remained accurate even when the training and evaluation data were 10 generations apart. Best linear unbiased prediction (BLUP) of SNP effects does not take full advantage of the genome sequence data, and nonlinear predictions, such as the Bayesian method used here, are needed to achieve maximum accuracy. On the basis of theoretical work, the results could be extended to more realistic genome and population sizes.GENOME resequencing technologies are currently developing at a very rapid rate, which we for simplicity call genome sequencing even though it is used on a species with a reference sequence. The current generation sequencing technology is two orders of magnitude faster and more cost effective than the technologies used for the sequencing of the human genome (Shendure and Ji 2008; TenBosch and Grody 2008). Future technologies are expected to reduce cost by another 100-fold so that sequencing an entire human genome for $1000 is considered achievable in the near future (Mardis 2008). The question arises: How can we make best use of entire genome sequence data on many individuals? One use will be the ability to predict the genetic value of an individual for complex traits. In the fields of animal and plant breeding, this would be of great practical benefit because most important traits are complex, quantitative traits, i.e., traits that are affected by many genes and by the environment. In humans the promise of personalized medicine relies on the ability to predict an individual''s genetic risk for complex, multifactorial diseases, such as Crohn''s disease (Barrett et al. 2008), and the ability to predict response to alternative treatments. The first aim of this article is to explore the accuracy of this prediction using the full genome sequence of the individual.The use of high-density SNP genotype data to predict genetic value, called genomic selection, was first proposed by Meuwissen et al. (2001). In its most sophisticated form, a Bayesian model was used to predict the effects of thousands of SNPs on the total genetic value simultaneously, where a priori it was assumed that only few SNPs were useful for predicting the trait [because they were in linkage disequilibrium (LD) with mutations causing variation in the trait], while many SNPs were not useful. Even among the SNPs that were useful for prediction, it was assumed that the distribution of effects was not normal because there were occasionally SNPs in LD with quantitative trait loci (QTL) that may occasionally have very large effect. To model this, the distribution of SNP effects was assumed to follow a distribution with thicker tails than the normal distribution (e.g., the t-distribution is often used). In the case of whole-genome sequence data, the polymorphisms that are causing the genetic differences between the individuals are among those being analyzed. For the sake of simplicity we call all polymorphisms in the sequence data SNPs while recognizing that other types of polymorphisms such as indels will be included. Assuming that the causal SNPs are included in the analysis simplifies the prior distribution of the SNP effects, because the effects of all the other SNPs, even if they are in LD with the causal SNPs, are expected to disappear. Thus, the prior distribution simplifies to the fact that some SNPs are expected to be causative and have an effect drawn from the distribution of the gene effects. The distribution of gene effects is investigated extensively in the evolutionary and other literature and is reported to be gamma (Hayes and Goddard 2001) or exponentially distributed (Erickson et al. 2004; Rocha et al. 2004), where the latter is a special form of the gamma distribution. On the downside, whole-genome sequence data will contain millions of SNPs and it may be difficult for genomic selection to separate the relatively few causative SNPs from all the others.Meuwissen et al. (2001) also investigated a model in which all SNPs were assumed to have an effect drawn from the same normal distribution [the so-called genome-wide best linear unbiased prediction (GWBLUP) model]. Although this model seems biologically implausible, it has been found to perform well in data from dairy cattle (VanRaden et al. 2009). However, we hypothesize that with sequence level data the BLUP model will not perform as well as models that assume that only some causal SNPs need to be included in the model.The aims here are to investigate the following: how accurately genetic values for complex traits can be predicted by genomic selection when whole-genome sequence data are available on a large number of individuals; whether it makes a difference to have the whole-genome sequence available, including the causative mutations, vs. very dense SNP marker genotypes; whether the estimates of the SNP effects can be used on individuals that are many generations separated from the data set in which they were estimated; the effect of the statistical model used on accuracy of prediction; and how accurately causative mutations can be detected and mapped. Because whole-genome sequence data on many individuals are not yet available, and because we needed to know the true genetic values of the individuals, the aforementioned questions were investigated by computer simulations of whole-genome sequence data.  相似文献   

2.

Key message

This study reports transmission genetics of chromosomal segments into Gossypium hirsutum from its most distant euploid relative, Gossypium mustelinum . Mutilocus interactions and structural rearrangements affect introgression and segregation of donor chromatin.

Abstract

Wild allotetraploid relatives of cotton are a rich source of genetic diversity that can be used in genetic improvement, but linkage drag and non-Mendelian transmission genetics are prevalent in interspecific crosses. These problems necessitate knowledge of transmission patterns of chromatin from wild donor species in cultivated recipient species. From an interspecific cross, Gossypium hirsutum × Gossypium mustelinum, we studied G. mustelinum (the most distant tetraploid relative of Upland cotton) allele retention in 35 BC3F1 plants and segregation patterns in BC3F2 populations totaling 3202 individuals, using 216 DNA marker loci. The average retention of donor alleles across BC3F1 plants was higher than expected and the average frequency of G. mustelinum alleles in BC3F2 segregating families was less than expected. Despite surprisingly high retention of G. mustelinum alleles in BC3F1, 46 genomic regions showed no introgression. Regions on chromosomes 3 and 15 lacking introgression were closely associated with possible small inversions previously reported. Nonlinear two-locus interactions are abundant among loci with single-locus segregation distortion, and among loci originating from one of the two subgenomes. Comparison of the present results with those of prior studies indicates different permeability of Upland cotton for donor chromatin from different allotetraploid relatives. Different contributions of subgenomes to two-locus interactions suggest different fates of subgenomes in the evolution of allotetraploid cottons. Transmission genetics of G. hirsutum × G. mustelinum crosses reveals allelic interactions, constraints on fixation and selection of donor alleles, and challenges with retention of introgressed chromatin for crop improvement.
  相似文献   

3.
The degree of dominance and the heritability coefficient have been investigated as the markers of allelic interactions of the loci Rht8 and Rht-B1. The interaction is characterized by partial or total dominance or even by overdominance of greater plant height and by low or medium heritability. The alleles that exert a weaker direct negative effect or no such effect predominate over the alleles that predispose to a more pronounced reduction of plant height. The impact of weather conditions on the switch of dominance is discussed. The Kooperatorka line has an additional allele (or alleles) that predisposes to greater plant height; the mode of inheritance of these alleles is partially recessive. An unidentified semidominant gene (or genes) that determines small plant height is present in the genotype of the Odes’ka 3 line. The presence of heterosis related to heterozygosity for certain genes not critically relevant for the present study makes the assessment of the allelic interaction characteristics under investigation more complicated.  相似文献   

4.
Even if substantial heritability has been reported and candidate genes have been identified extensively, all known marker associations explain only a small proportion of the phenotypic variance of developmental dyslexia (DD) and related quantitative phenotypes. Gene-by-gene interaction (also known as “epistasis”—G × G) triggers a non-additive effect of genes at different loci and should be taken into account in explaining part of the missing heritability of this complex trait. We assessed potential G × G interactions among five DD candidate genes, i.e., DYX1C1, DCDC2, KIAA0319, ROBO1, and GRIN2B, upon DD-related neuropsychological phenotypes in 493 nuclear families with DD, by implementing two complementary regression-based approaches: (1) a general linear model equation whereby the trait is predicted by the main effect of the number of rare alleles of the two genes and by the effect of the interaction between them, and (2) a family-based association test to detect G × G interactions between two unlinked markers by splitting up the association effect into a between- and a within-family genetic orthogonal components. After applying 500,000 permutations and correcting for multiple testing, both methods show that G × G effects between markers within the DYX1C1, KIAA0319/TTRAP, and GRIN2B genes lower the memory letters composite z-score of on average 0.55 standard deviation. We provided initial evidence that the effects of familial transmission of synergistic interactions between genetic risk variants can be exploited in the study of the etiology of DD, explain part of its missing heritability, and assist in designing customized charts of individualized neurocognitive impairments in complex disorders, such as DD.  相似文献   

5.
Genome wide association studies (GWAS) identify susceptibility loci for complex traits, but do not identify particular genes of interest. Integration of functional and network information may help in overcoming this limitation and identifying new susceptibility loci. Using GWAS and comorbidity data, we present a network-based approach to predict candidate genes for lipid and lipoprotein traits. We apply a prediction pipeline incorporating interactome, co-expression, and comorbidity data to Global Lipids Genetics Consortium (GLGC) GWAS for four traits of interest, identifying phenotypically coherent modules. These modules provide insights regarding gene involvement in complex phenotypes with multiple susceptibility alleles and low effect sizes. To experimentally test our predictions, we selected four candidate genes and genotyped representative SNPs in the Malmö Diet and Cancer Cardiovascular Cohort. We found significant associations with LDL-C and total-cholesterol levels for a synonymous SNP (rs234706) in the cystathionine beta-synthase (CBS) gene (p = 1 × 10−5 and adjusted-p = 0.013, respectively). Further, liver samples taken from 206 patients revealed that patients with the minor allele of rs234706 had significant dysregulation of CBS (p = 0.04). Despite the known biological role of CBS in lipid metabolism, SNPs within the locus have not yet been identified in GWAS of lipoprotein traits. Thus, the GWAS-based Comorbidity Module (GCM) approach identifies candidate genes missed by GWAS studies, serving as a broadly applicable tool for the investigation of other complex disease phenotypes.Genome wide association studies (GWAS)1 meta-analyses have pinpointed a number of new gene regions contributing to multifactorial diseases. GWAS typically find limited numbers of loci that contribute modestly to complex phenotypes (1), and GLGC meta-analysis of GWAS data has reached the limit of what can be expected (2) without the use of alternative strategies. Given that susceptibility loci for complex traits are unlikely to be randomly distributed in the genome (3), we might expect that the genes associated with a disease will be more likely to be present within the same pathways or functional groupings. In published cases, pathway based GWAS analysis provides an alternative approach to the dissection of complex disease traits (4, 5). In addition, nominal GWAS p values superimposed upon the human molecular network have been used to identify genes associated with multiple sclerosis (6), and the disease association protein–protein link evaluator (DAPPLE) has been used to find significant interactions among proteins encoded by genes in loci associated with other particular diseases (7). Other approaches incorporate heterogeneous molecular data such as linkage studies, cross species conservation measures, gene expression data and protein–protein interactions to better understand GWAS results (8, 9). Integrating molecular network information, pathway analyses, and GWAS data thus holds promise for identifying new susceptibility loci and improving the identification of relevant candidate genes.If a gene is involved in a specific functional process or disease, its molecular network neighbors might also be suspected to have some role (3). In line with this “local” hypothesis, proteins involved in the same disease show a high propensity to interact (10) or cluster together (11) with each other. Interactions between variations in multiple genes, each with strong or modest effects, perturbing the same pathways or modules, may govern complex traits (3, 6). The molecular triangulation (MT) algorithm can be applied to rank seed genes according to their common disease associated neighbors, assigning closer and more connected neighbors higher values (12). Interactions between modestly associated MT genes may be indicative of coherent disease pathways or of genes conferring susceptibility to disease in a coordinated manner. The jActiveModule method (13) combines seed gene scores with biologically relevant interactions to identify network modules where perturbations causative of disease are more likely to reside. Lastly, although not yet implemented at the module level, phenotypic coherence between interacting pairs of genes has been quantified using the combination of molecular level gene to disease relationships and Medicare comorbidity data (14, 15).We believe that GWAS significant SNPs and variants representing potential candidate genes can use the above strategies to reveal more about the missing heritability of complex phenotypes. The most important risk factors for coronary artery disease (CAD) include serum concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG). We present a GWAS-based meta-analysis Comorbid Module (GCM) approach that uses significant (p < 5 × 10−8) GWAS signals for these four traits in the context of molecular networks to prioritize modules of disease-associated candidate genes. We evaluate our approach experimentally through allelic association and genotyping within the Malmö Diet and Cancer Cardiovascular Cohort (MDC-CC) for SNPs representing top candidate genes.  相似文献   

6.
African ancestry individuals have a more favorable lipoprotein profile than Caucasians, although the mechanisms for these differences remain unclear. We measured fasting serum lipoproteins and genotyped 768 tagging or potentially functional single nucleotide polymorphisms (SNPs) across 33 candidate gene regions in 401 Afro-Caribbeans older than 18 years belonging to 7 multi-generational pedigrees (mean family size 51, range 21–113, 3,426 relative pairs). All lipoproteins were significantly heritable (P < 0.05). Gender-specific analysis showed that heritability for triglycerides was much higher (P < 0.01) in women than in men (women, 0.62 ± 0.18, P < 0.01; men, 0.13 ± 0.17, P > 0.10), but the heritability for LDL cholesterol (LDL-C) was higher (P < 0.05) in men than in women (men, 0.79 ± 0.21, P < 0.01; women, 0.39 ± 0.12, P < 0.01). The top 14 SNPs that passed the false discovery rate threshold in the families were then tested for replication in an independent population-based sample of 1,750 Afro-Caribbean men aged 40+ years. Our results revealed significant associations for three SNPs in two genes (rs5929 and rs6511720 in LDLR and rs7517090 in PCSK9) and LDL-C in both the family study and in the replication study. Our findings suggest that LDLR and PCSK9 variants may contribute to a variation in LDL-C among African ancestry individuals. Future sequencing and functional studies of these loci may advance our understanding of genetic factors contributing to LDL-C in African ancestry populations.Lipoprotein abnormalities, characterized by elevated levels of LDL cholesterol (LDL-C) and triglycerides (TRIG) and low levels of HDL cholesterol (HDL-C), have a central role in the development of atherosclerotic coronary heart disease (CHD). A recent meta-analysis, including 3,000 individuals with CHD-related deaths, showed that HDL-C and LDL-C are independently associated with CHD risk (1). There is also considerable evidence that high levels of TRIG are an additional, independent risk factor for CHD (2, 3), although this is still controversial (4).Individuals of African ancestry have a more favorable lipoprotein profile than Caucasians, characterized by lower levels of TRIG and higher levels of HDL-C (58). The mechanisms responsible for these ethnic differences remain to be defined. In particular, the differences in TRIG levels are independent of the greater degree of obesity among individuals of African ancestry and several other risk factors and appear to be consistent across African populations in different environments (9), indicating a possible role of genetic factors. Although genetic factors are important in determining lipoprotein levels, little data exists regarding the importance of heredity and specific genetic factors in determining lipoprotein levels in populations of African ancestry, especially outside the US, and the findings from previous studies in African-Americans may not necessarily apply to other African ancestry populations. Recently, several genome-wide association studies identified a number of loci contributing to inter-individual variation in lipoprotein levels (10, 11). However, the majority of these studies were restricted to Caucasian populations. Given the ethnic differences in lifestyle and environmental factors, as well as in genetic background, it is important to examine genes related to lipoprotein metabolism in different ethnic groups. Therefore, we examined the heritability of fasting, serum levels of HDL-C, LDL-C, and TRIG and systematically screened for association with 33 positional and biological candidate genes in large, multigenerational families of African ancestry.  相似文献   

7.
Although our previous GWAS failed to identify SNPs associated with pulmonary function at the level of genomewide significance, it did show that the heritability for FEV1/FVC was 41.6% in a Japanese population, suggesting that the heritability of pulmonary function traits can be explained by the additive effects of multiple common SNPs. In addition, our previous study indicated that pulmonary function genes identified in previous GWASs in non-Japanese populations accounted for 4.3% to 12.0% of the entire estimated heritability of FEV1/FVC in a Japanese population. Therefore, given that many loci with individual weak effects may contribute to asthma risk, in this study, we created a quantitative score of genetic load based on 16 SNPs implicated in lower lung function in both Japanese and non-Japanese populations. This genetic risk score (GRS) for lower FEV1/FVC was consistently associated with the onset of asthma (P = 9.6 × 10−4) in 2 independent Japanese populations as well as with the onset of COPD (P = 0.042). Clustering of asthma patients based on GRS levels indicated that an increased GRS may be responsible for the development of a particular phenotype of asthma characterized by early onset, atopy, and severer airflow obstruction.  相似文献   

8.
Heading date is one of the most important traits in wheat breeding as it affects adaptation and yield potential. A genome-wide association study (GWAS) using the 90 K iSelect SNP genotyping assay indicated that a total of 306 loci were significantly associated with heading and flowering dates in 13 environments in Chinese common wheat from the Yellow and Huai wheat region. Of these, 105 loci were significantly correlated with both heading and flowering dates and were found in clusters on chromosomes 2, 5, 6, and 7. Based on differences in distribution of the vernalization and photoperiod genes among chromosomes, arms, or block regions, 13 novel, environmentally stable genetic loci were associated with heading and flowering dates, including RAC875_c41145_189 on 1DS, RAC875_c50422_299 on 2BL, and RAC875_c48703_148 on 2DS, that accounted for more than 20% phenotypic variance explained (PVE) of the heading/flowering date in at least four environments. GWAS and t test of a combination of SNPs and vernalization and photoperiod alleles indicated that the Vrn-B1, Vrn-D1, and Ppd-D1 genes significantly affect heading and flowering dates in Chinese common wheat. Based on the association of heading and flowering dates with the vernalization and photoperiod alleles at seven loci and three significant SNPs, optimal linear regression equations were established, which show that of the seven loci, the Ppd-D1 gene plays the most important role in modulating heading and flowering dates in Chinese wheat, followed by Vrn-B1 and Vrn-D1. Additionally, three novel genetic loci (RAC875_c41145_189, Excalibur_c60164_137, and RAC875_c50422_299) also show important effect on heading and flowering dates. Therefore, Ppd-D1, Vrn-B1, Vrn-D1, and the novel genetic loci should be further investigated in terms of improving heading and flowering dates in Chinese wheat. Further quantitative analysis of an F10 recombinant inbred lines population identified a major QTL that controls heading and flowering dates within the Ppd-D1 locus with PVEs of 28.4% and 34.0%, respectively; this QTL was also significantly associated with spike length, peduncle length, fertile spikelets number, cold resistance, and tiller number.  相似文献   

9.
Alzheimer’s disease (AD) is the most common form of dementia and exhibits a considerable level of heritability. Previous association studies gave evidence for the associations of HLA-DRB1/DQB1 alleles with AD. However, how and when the gene variants in HLA-DRB1/DQB1 function in AD pathogenesis has yet to be determined. Here, we firstly investigated the association of gene variants in HLA-DRB1/DQB1 alleles and AD related brain structure on magnetic resonance imaging (MRI) in a large sample from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We selected hippocampus, subregion, parahippocampus, posterior cingulate, precuneus, middle temporal, entorhinal cortex, and amygdala as regions of interest (ROIs). Twelve SNPs in HLA-DRB1/DQB1 were identified in the dataset following quality control measures. In the total group hybrid population analysis, our study (rs35445101, rs1130399, and rs28746809) were associated with the smaller baseline volume of the left posterior cingulate and rs2854275 was associated with the larger baseline volume of the left posterior cingulate. Furthermore, we detected the above four associations in mild cognitive impairment (MCI) sub-group analysis, and two risk loci (rs35445101 and rs1130399) were also the smaller baseline volume of the left posterior cingulate in (NC) sub-group analysis. Our study suggested that HLA-DRB1/DQB1 gene variants appeared to modulate the alteration of the left posterior cingulate volume, hence modulating the susceptibility of AD.  相似文献   

10.
Understanding patterns of genetic diversity of plants is important in guiding conservation programs. The aim of our study was to characterize genetic diversity in Afzelia quanzensis, an economically important African tree species. We genotyped 192 individuals at 10 nuclear microsatellite loci. Samples were collected from nine sites in Zimbabwe, five in the north and four in the south, separated by a mountain range, the Kalahari-Zimbabwe axis. Overall, genetic diversity was relatively low across all sites (expected heterozygosity (H E)?=?0.452, mean number of alleles (A)?=?4.367, allelic richness (A R)?=?2.917, effective number of alleles (A E)?=?2.208, and private allelic richness (PAR)?=?0.197). Genetic diversity estimates, H E, A, A R, and PAR, were not significantly different between northern and southern sites. Allelic richness was significantly higher in southern sites. Significant population differentiation was observed among all sites (F ST ?=?0.0936, G′ ST ?=?0.1982, G ST ?=?0.1001, D JOST?=?0.0598). STRUCTURE analysis and principal components analysis identified two gene pools, one predominantly made up of southern individuals, and the other of northern individuals. A Monmonier’s function detected a genetic barrier that coincided with the Kalahari-Zimbabwe axis. The relatively low level of genetic diversity in A. quanzensis may reduce adaptability and limit future evolutionary responses. All sites should be monitored for deleterious effects of low genetic diversity, and genetic resource management should take into consideration the existence of the distinct gene pools to capture the entire extant genetic variation.  相似文献   

11.

Introduction

Approximately 100 loci have been definitively associated with rheumatoid arthritis (RA) susceptibility. However, they explain only a fraction of RA heritability. Interactions between polymorphisms could explain part of the remaining heritability. Multiple interactions have been reported, but only the shared epitope (SE) × protein tyrosine phosphatase nonreceptor type 22 (PTPN22) interaction has been replicated convincingly. Two recent studies deserve attention because of their quality, including their replication in a second sample collection. In one of them, researchers identified interactions between PTPN22 and seven single-nucleotide polymorphisms (SNPs). The other showed interactions between the SE and the null genotype of glutathione S-transferase Mu 1 (GSTM1) in the anti–cyclic citrullinated peptide–positive (anti-CCP+) patients. In the present study, we aimed to replicate association with RA susceptibility of interactions described in these two high-quality studies.

Methods

A total of 1,744 patients with RA and 1,650 healthy controls of Spanish ancestry were studied. Polymorphisms were genotyped by single-base extension. SE genotypes of 736 patients were available from previous studies. Interaction analysis was done using multiple methods, including those originally reported and the most powerful methods described.

Results

Genotypes of one of the SNPs (rs4695888) failed quality control tests. The call rate for the other eight polymorphisms was 99.9%. The frequencies of the polymorphisms were similar in RA patients and controls, except for PTPN22 SNP. None of the interactions between PTPN22 SNPs and the six SNPs that met quality control tests was replicated as a significant interaction term—the originally reported finding—or with any of the other methods. Nor was the interaction between GSTM1 and the SE replicated as a departure from additivity in anti-CCP+ patients or with any of the other methods.

Conclusions

None of the interactions tested were replicated in spite of sufficient power and assessment with different assays. These negative results indicate that whether interactions are significant contributors to RA susceptibility remains unknown and that strict standards need to be applied to claim that an interaction exists.  相似文献   

12.
A total of 18 polymorphic microsatellite loci were isolated and characterized from RAPD products in the Xinjiang Arctic Grayling (Thymallus arcticus grubei). The number of alleles (Na) per locus varied from 2 to 10. Observed (Ho) and expected (He) heterozygosities ranged from 0.64 to 0.92, and from 0.63 to 0.88, respectively. Considerable differences were found among HBH, FH and FY populations in the number of alleles, effective number of alleles, number of genotypes at all of these loci. These new RAPD-SSR markers have provided a helpful tool for genetic analyses and resources conservation of T. arcticus grubei. Five additional fish species, Amur grayling (Thymallus grubii), Taimen (Hucho taimen), Sea perch (Lateolabrax japonicus), Lenok (Brachymystax lenok) and Red seam bream (Pagrosomus major) were assessed for cross-species amplification. Three of the five species showed at least one polymorphic locus. In addition, seven loci were found to be polymorphic in at least one species.  相似文献   

13.
Michael N. Gould 《Genetics》2009,183(2):409-412
My research seeks to aid in developing approaches to prevent breast cancer. This research evolved from our early empirical studies for discovering natural compounds with anticancer activities, coupled with clinical evaluation to a genetics-driven approach to prevention. This centers on the use of comparative genomics to discover risk-modifying alleles that could help define population and individual risk and also serve as potential prevention drugable targets to mitigate risk. Here, we initially fine map mammary cancer loci in a rat carcinogenesis model and then evaluate their human homologs in breast cancer case-control association studies. This approach has yielded promising results, including the finding that the compound rat QTL Mcs5a''s human homologous region was associated with breast cancer risk. These and related findings have the potential to yield advancements both in translation-prevention research and in basic molecular genetics.WRITING this Perspectives for Genetics allows me to examine how a cancer biologist focused on cancer prevention morphed into a practicing geneticist. In addition, it allows me to review a decade of our investigations into the complexity of the genetic risk to breast cancer development using comparative genomics.Our comparative genomic strategy consists of genetically identifying mammary cancer risk loci using fine mapping studies in a rat mammary carcinogenesis model. Human homologs of these loci are then evaluated in human breast cancer association studies for their potential to modify risk. This genetics approach provides an integrated discovery platform to identify and mechanistically characterize novel breast cancer risk alleles. We predict that this platform will serve as a foundation for a cancer prevention drug development pipeline.My early work focused on the etiology and prevention of breast cancer. It is work on these interrelated areas that led me to investigate breast cancer genetics. While studying the etiology of breast cancer after joining the faculty of the University of Wisconsin, my interest targeted early events in the etiology of cancer. These range from altering the metabolic activation of environmental xenobiotics to metabolites capable of adducting DNA to destroying clones premalignant cells. At the time we began work in this area, cancer chemoprevention was an emerging field that was assumed to be less complex than cancer therapy. This was, in part, based on the fact that normal and premalignant tissues were genetically more stable than cancer cells and thus less likely to develop resistance to anticancer drugs.Our chemoprevention studies focused on a novel class of nontoxic monoterpenes widely found in the essential oils of fruits. These compounds were found to have both preventive and therapeutic anticancer activities in being able to inhibit both premalignant and malignant cells. Our lead compound was limonene, found in orange peel oil, and the first monoterpene we entered into FDA-approved clinical trials was perillyl alcohol (POH), found originally in lavender oil. For expediency, our first trial was a therapeutic one. This therapeutic phase I trial showed limited promising results (Ripple et al. 2000). We later discovered that POH inhibited the antiapoptotic ability of cancers via a calcium channel interaction that led to the downregulation of NFκB (Berchtold et al. 2005). This mechanism of action could underlie the cytostatic and cytotoxic actions of POH toward both premalignant and malignant cells.The monoterpenes and POH were found through empirical screening. Like the monoterpenes, many chemopreventive and therapeutic agents are found to be of low overall efficacy. Many also have undesirable toxicity, in part due to the lack of target specificity. As such, we felt the need to develop nonempirical methods to develop prevention strategies and drugs.To develop chemopreventative agents for common diseases, we sought an approach that would identify both appropriate drug targets and high-risk populations. For example, we aimed to develop prevention strategies for the large number of individuals at risk for breast cancer but not those who specifically carried the rare but highly penetrant susceptibility alleles of the breast cancer genes such as Brca1 and -2; these and other highly penetrant breast cancer risk alleles collectively account for <25% of inherited breast cancer risk in humans (Pharoah et al. 2008).We thus sought to identify moderately penetrant breast cancer susceptibility alleles that were common (high population frequency). Ten years ago it was difficult to identify such loci directly in human populations. In fact, most association studies at that time were based on a “candidate gene” approach; these studies were rarely successful (Pharoah et al. 2007). We thus adapted a comparative genomics strategy in which such loci are identified in a model organism using a nonbiased linkage approach and then evaluated in humans. We chose what we believe is the in vivo breast cancer model most closely related to the human—the rat.The rat, in contrast to the mouse and like the human, develops a spectrum of hormonally responsive and nonresponsive breast cancers. Importantly, almost all rat and human cancers have a ductal cell origin (Gould 1995). At the time we began this research, however, the rat had far fewer genetic resources and tools than the mouse (Gould 1995). This can be illustrated by our need to use a M13 minisatellite marker to identify our first rat mammary susceptibility QTL (Hsu et al. 1994). Over the course of this research and subsequent studies, rat geneticists have substantially narrowed this technology gap (see Aitman et al. 2008). For example, in pursuing this project we developed a technology that produced the first gene inactivation (“knockout”) rat models (Zan et al. 2003).

Comparative genetics studies:

The first major results of these genomewide comparative studies were published by Shepel et al. (1998) in Genetics. In this study we crossed two rat strains with large differences in their susceptibility to the induction of mammary carcinomas by the chemical carcinogen dimethylbenzanthracene (DMBA). The susceptible strain was the Wistar-Furth (WF) rat, while the resistance strain was the Copenhagen (COP) rat. F1 hybrid rats were backcrossed (WF × COP) F1 × WF or intercrossed (F1 × F1). Large groups of these rats were orally gavaged with DMBA, and the average number of mammary carcinomas per rat was quantified at necropsy. Rats were also genotyped using microsatellite markers, which had become available for the rat in the 1990s.The QTL genetically identified in this study accounted for most of the genetic variance controlling susceptibility to mammary cancer by identifying the Mammary carcinoma susceptibility (Mcs) loci—Mcs1, -2, -3, and -4. The COP allele of Mcs1, -2, and -3 conferred resistance while Mcs4 conferred an increased susceptibility to mammary cancer development. This study demonstrated the ability to use the rat model to identify the major COP vs. WF polymorphic loci controlling susceptibility. These loci interacted in an additive manner. Interestingly, the almost completely mammary cancer-resistant COP rat strain was shown to carry a polymorphic allele at the Mcs4 locus predicted to increase mammary cancer risk.In extending this study, we asked whether other mammary cancer-resistant strains varied at polymorphic mammary cancer susceptibility loci shared with those genetically identified in the WF × COP cross. A similar analysis was performed by conducting a QTL analysis of a cross between WF and a second resistant strain Wistar-Kyoto (WKy). In this backcross analysis we genetically identified four loci that accounted for most of the genetic variance associated with the susceptibility phenotype. As with the COP cross, the WKy cross identified three loci in which the WKy allele contributed to resistance and one locus at which the WKy allele contributed to increased susceptibility (Lan et al. 2001). Of these four WKy loci, only one broadly overlapped with those identified in the COP × WF cross, i.e., Mcs2 (COP) with Mcs6 (WKy). This study also used a novel statistical approach developed by our statistical collaborator, Christina Kendziorski, to identify alleles with no main effect that modify QTL with main effects. Mcs-modifier 1 (Mcsm1) was the most strongly supported locus of this class. The WKy allele of this locus fully negated the effects of the resistance conferred by the WKy allele of the Mcs8 QTL. Thus it appeared that there could be a large number of polymorphic loci in rats that could contribute to mammary cancer risk.It is important to keep in mind that genetically identified QTL are the product of statistical modeling and analysis of segregating populations from crosses. It is thus critical that their existence be confirmed in more homogeneous genetic material. An established method for QTL validation is to breed and phenotype congenic animals carrying only the region surrounding the QTL allele of interest on an alternative genetic background. So far we have generated and characterized six of the eight candidate WKy and COP QTL by genetically introgressing them onto the WF background. All six have the phenotype predicted by our quantitative models.Most congenic substitutions include tens of megabases encompassing the introgressed allele. The next step is to fine map this congenic interval to first determine whether this interval harbors more than one independent susceptibility locus. In addition, the fine mapping process allows for an increased genomic resolution of the locus and thus a more limited set of candidates. We have fine mapped two Mcs loci–Mcs1 (COP) and Mcs5 (WKy). Each was found to be complex, containing at least three separable subloci termed Mcs1a, -b, -c and Mcs5a, -b, -c. In the case of Mcs1, all three identified loci within it contributed to the cancer resistance phenotype of Mcs1. This led us to speculate that this apparent clustering might be biologically “random”; their strong-combined phenotype allowed us to readily identify Mcs1 over the experimental background. In contrast, the Mcs5 also had at least three subloci, Mcs5a, -b, and -c, but two of these, a and c, contribute to resistance while b confers an increased sensitivity. Each of the three had similar absolute relative risk (RR) contributions. If they interact in a purely additive manner, it might have been difficult to identify Mcs5. However, Mcs5 had the strongest of LOD scores of any identified locus in the WKy cross (Lan et al. 2001). When we explored the interaction of the alleles at the Mcs5 loci, we found complex epistatic interactions. The strongest was the complete neutralization of the effect of the sensitive WKy allele of Mcs5b by the resistant WKy allele of Mcs5a (Samuelson et al. 2005).It is interesting to explore an alternative hypothesis that suggests that the clustering of mammary cancer susceptibility alleles arise from evolutionary selection. Data supporting such a possibility in rodents has been published by Petkov et al. (2005). Their findings suggest that alleles controlling certain phenotypes cluster to assure joint inheritance, in that in concert with one another, they provide for an enhanced survival advantage. This could account for the clustering of risk-related genes at the Mcs1 and Mcs5 QTL.Many of the most comprehensive published mammalian fine-mapping studies achieve mapping resolutions in the order of several megabases. Such intervals, while carrying a limited number of genes, often require choosing one or more candidate genes for intensive study. These are usually chosen on the basis of how they might functionally relate to the specific disease risk under investigation. This negates the potential of positional cloning to identify an unbiased candidate. As mentioned above, experience suggests that functional candidate selection rarely identifies disease-specific modifier genes. For example, in breast cancer, when 120 such published candidates (710 SNPs) were rigorously evaluated, none met minimal statistical significance in a study of a large population of women in a breast cancer case-control study (Pharoah et al. 2007).We explored the ability of ultrafine mapping to annotate the Mcs5a locus. We mapped this locus to >100-kb resolution by phenotyping congenic rats recombinant within this locus. We found it to contain two elements. The WKy allele of each element by itself failed to elicit a mammary cancer phenotype; however, when combined, the resistance phenotype was obvious. These elements, termed Mcs5a1 and Mcs5a2, synthetically interact, making Mcs5a one of the first-identified compound QTL in mammals. Because Mcs5a acts in a semidominant manner, we could use heterozygous congenic recombinants to ask whether both elements of Mcs5a needed to lie in cis on the same chromosome, or could they interact in trans from separate homologs. They interact only in cis (Samuelson et al. 2007). Another interesting observation arising from the fine mapping of Mcs5a is that it localizes to noncoding DNA. All four Mcs loci that we have fine mapped to high resolution are localized to noncoding DNA (in progress).The observations that the rat compound locus Mcs5a consists of two synthetically interacting elements separated by ∼50-60 kb (based on the human sequence), interact only when on the same chromosome, and are noncoding suggest the hypothesis that they may be localized in closer proximity than suggested by the linear genomic distance that separates them. Recent observations in our laboratory using chromosome confirmation capture suggest that most of the sequences between these elements form a CTCF-mediated loop bringing both elements in close physical proximity to each other. The ability of this compound locus to control local and interchromosomal gene expression is being studied (in progress).To determine whether our findings in our rat model could be extended to women, we next asked whether the human ortholog of Mcs5a (-a1 and -a2) could influence breast cancer risk. In contrast to the method of searching for modifier genes using genomewide association studies (GWAS), we restricted our search to an ∼100-kb region of the human genome. Focusing on this orthologous locus defined by comparative genomics vastly reduced the number of SNP-tagged alleles needed for testing for association, greatly reducing the statistical penalty for multiple testing. We tested several SNPs in the orthologous MCS5A1 and -5A2 regions of the human genome in a total of ∼12,000 women in a breast cancer case-control study. We found that a tagged SNP in both MCS5A1 and -5A2 was significantly associated with risk to breast cancer in this population of women. The minor allele of SNP rs56476643 (MCS5A1) acts in a recessive manner to increase risk. Its allele frequency is 25% and it increases risk in homozygous women by 19%. In contrast, the minor allele of MCS5A2 (rs2182317) has an allele frequency of 13% and acts in a dominant manner to reduce by 14% the risk of breast cancer in the 24% of women carrying one or two copies of this allele (Samuelson et al. 2007).Not only does this human study support the use of comparative genomics to identify human cancer risk modifier alleles, it also extends the resolution obtained in the rat in localizing the two genetic elements of the Mcs5a allele. The rat localizes Mcs5a1 and -a2 to 32 and 84 kb, respectively, while the human studies resolved these determinants to 5.7 kb and 16.8 kb (Samuelson et al. 2007). Thus, we have demonstrated a clear advantage in using comparative genomics to localize target regions within QTL.Both MCS5A1 and -5A2 have similar allele frequencies and genetic penetrance (relative risk) as do most breast cancer alleles identified by GWAS studies. However, unlike alleles identified by GWAS studies, those identified by comparative genomics also provide in vivo models to functionally characterize risk alleles. For example, it is often assumed that breast cancer modifiers are likely to act within breast tissue to modulate risk. Using the rat as a model we have been able to show that Mcs5a, a noncoding allele, acts to differentially regulate its neighboring FBXO10 gene in immune but not mammary tissues (Samuelson et al. 2007).It is also intriguing to consider the observation that breast cancer risk-associated alleles such as Mcs5a1 and -5a2 are either conserved over millions of evolutionary years or are highly mutable and functionally neutral, suggesting that these alleles do not significantly reduce fitness. If so, one then speculates that they would make good targets for chemoprevention drugs by possessing low toxicity and as such a good therapeutic index. In particular, converting sensitive to resistant allelic function with drug therapy would mimic the conserved resistance allele that persists in the human population and should therefore show a low side-effects profile.Our current research on these genetically identified Mcs loci focuses on molecular, cellular, and organismal mechanisms by which they modify risk. Not only will these investigations provide insight into the function of each noncoding Mcs locus, but collectively they will provide a mechanistic framework to facilitate integrative genetic studies of the plethora of polymorphic risk loci identified by GWAS in multiple diseases.  相似文献   

14.
The sheep (Ovis aries L.) has been an important farm animal species since its domestication. A wide array of indigenous sheep breeds with abundant phenotypic diversity exists for domestication and selection. Therefore, assessing the genetic diversity of a local sheep resource using a multi-molecular system is helpful for maintaining and conserving those breeds. This study aimed to investigate the genetic diversity of three native Chinese sheep breeds (Tibetan sheep, Sishui Fur sheep, and Small-tailed Han sheep) using 15 microsatellite markers and the second exon of the DRA gene. In regards to the microsatellites, on average, 19 alleles per loci were observed among all individuals. Across loci, the HO within the population was 0.652 ± 0.022 in Tibetan sheep, 0.603 ± 0.023 in Small-tailed Han sheep and 0.635 ± 0.022 in SFS, and for most populations, the H E and H O were inconsistent. In addition, affluent private alleles within the breed indicated that the breeds have different domestication histories or sites. In regards to the 2 exon of the DRA gene, three haplotypes were constructed by seven single-nucleotide polymorphisms (SNPs), which were identified in the second DRA exon and inferred the potential for phenotypic variety in these Chinese native sheep. In summary, the current study reveals the importance of implementing effective conservation strategies for these three native Chinese sheep.  相似文献   

15.
Most common diseases are attributed to multiple genetic variants, and the feasibility of identifying inherited risk factors is often restricted to the identification of alleles with high or intermediate effect sizes. In our previous studies, we identified single loci associated with hepatic fibrosis (Hfib1Hfib4). Recent advances in analysis tools allowed us to model loci interactions for liver fibrosis. We analysed 322 F2 progeny from an intercross of the fibrosis-susceptible strain BALB/cJ and the resistant strain FVB/NJ. The mice were challenged with carbon tetrachloride (CCl4) for 6 weeks to induce chronic hepatic injury and fibrosis. Fibrosis progression was quantified by determining histological fibrosis stages and hepatic collagen contents. Phenotypic data were correlated to genome-wide markers to identify quantitative trait loci (QTL). Thirteen susceptibility loci were identified by single and composite interval mapping, and were included in the subsequent multiple QTL model (MQM) testing. Models provided evidence for susceptibility loci with strongest association to collagen contents (chromosomes 1, 2, 8 and 13) or fibrosis stages (chromosomes 1, 2, 12 and 14). These loci contained the known fibrosis risk genes Hc, Fasl and Foxa2 and were incorporated in a fibrosis network. Interestingly the hepatic fibrosis locus on chromosome 1 (Hfib5) connects both phenotype networks, strengthening its role as a potential modifier locus. Including multiple QTL mapping to association studies adds valuable information on gene–gene interactions in experimental crosses and human cohorts. This study presents an initial step towards a refined understanding of profibrogenic gene networks.  相似文献   

16.

Background

Recent advance in genetic studies added the confirmed susceptible loci for type 2 diabetes to eighteen. In this study, we attempt to analyze the independent and joint effect of variants from these loci on type 2 diabetes and clinical phenotypes related to glucose metabolism.

Methods/Principal Findings

Twenty-one single nucleotide polymorphisms (SNPs) from fourteen loci were successfully genotyped in 1,849 subjects with type 2 diabetes and 1,785 subjects with normal glucose regulation. We analyzed the allele and genotype distribution between the cases and controls of these SNPs as well as the joint effects of the susceptible loci on type 2 diabetes risk. The associations between SNPs and type 2 diabetes were examined by logistic regression. The associations between SNPs and quantitative traits were examined by linear regression. The discriminative accuracy of the prediction models was assessed by area under the receiver operating characteristic curves. We confirmed the effects of SNPs from PPARG, KCNJ11, CDKAL1, CDKN2A-CDKN2B, IDE-KIF11-HHEX, IGF2BP2 and SLC30A8 on risk for type 2 diabetes, with odds ratios ranging from 1.114 to 1.406 (P value range from 0.0335 to 1.37E-12). But no significant association was detected between SNPs from WFS1, FTO, JAZF1, TSPAN8-LGR5, THADA, ADAMTS9, NOTCH2-ADAM30 and type 2 diabetes. Analyses on the quantitative traits in the control subjects showed that THADA SNP rs7578597 was association with 2-h insulin during oral glucose tolerance tests (P = 0.0005, empirical P = 0.0090). The joint effect analysis of SNPs from eleven loci showed the individual carrying more risk alleles had a significantly higher risk for type 2 diabetes. And the type 2 diabetes patients with more risk allele tended to have earlier diagnostic ages (P = 0.0006).

Conclusions/Significance

The current study confirmed the association between PPARG, KCNJ11, CDKAL1, CDKN2A-CDKN2B, IDE-KIF11-HHEX, IGF2BP2 and SLC30A8 and type 2 diabetes. These type 2 diabetes risk loci contributed to the disease additively.  相似文献   

17.

Background

Recent studies have identified several single nucleotide polymorphisms (SNPs) in the population that are associated with variations in the risks of many different diseases including cancers such as breast, prostate and colorectal. For ovarian cancer, the known highly penetrant susceptibility genes (BRCA1 and BRCA2) are probably responsible for only 40% of the excess familial ovarian cancer risks, suggesting that other susceptibility genes of lower penetrance exist.

Methods

We have taken a candidate approach to identifying moderate risk susceptibility alleles for ovarian cancer. To date, we have genotyped 340 SNPs from 94 candidate genes or regions, in up to 1,491 invasive epithelial ovarian cancer cases and 3,145 unaffected controls from three different population based studies from the UK, Denmark and USA.

Results

After adjusting for population stratification by genomic control, 18 SNPs (5.3%) were significant at the 5% level, and 5 SNPs (1.5%) were significant at the 1% level. The most significant association was for the SNP rs2107425, located on chromosome 11p15.5, which has previously been identified as a susceptibility allele for breast cancer from a genome wide association study (P-trend = 0.0012). When SNPs/genes were stratified into 7 different pathways or groups of validation SNPs, the breast cancer associated SNPs were the only group of SNPs that were significantly associated with ovarian cancer risk (P-heterogeneity = 0.0003; P-trend = 0.0028; adjusted (for population stratification) P-trend = 0.006). We did not find statistically significant associations when the combined data for all SNPs were analysed using an admixture maximum likelihood (AML) experiment-wise test for association (P-heterogeneity = 0.051; P-trend = 0.068).

Conclusion

These data suggest that a proportion of the SNPs we evaluated were associated with ovarian cancer risk, but that the effect sizes were too small to detect associations with individual SNPs.  相似文献   

18.
The prevalence of type 2 diabetes (T2D) is greater in populations of African descent compared to European-descent populations. Genetic risk factors may underlie the disparity in disease prevalence. Genome-wide association studies (GWAS) have identified >60 common genetic variants that contribute to T2D risk in populations of European, Asian, African and Hispanic descent. These studies have not comprehensively examined population differences in cumulative risk allele load. To investigate the relationship between risk allele load and T2D risk, 46 T2D single nucleotide polymorphisms (SNPs) in 43 loci from GWAS in European, Asian, and African-derived populations were genotyped in 1,990 African Americans (n = 963 T2D cases, n = 1,027 controls) and 1,644 European Americans (n = 719 T2D cases, n = 925 controls) ascertained and recruited using a common protocol in the southeast United States. A genetic risk score (GRS) was constructed from the cumulative risk alleles for each individual. In African American subjects, risk allele frequencies ranged from 0.024 to 0.964. Risk alleles from 26 SNPs demonstrated directional consistency with previous studies, and 3 SNPs from ADAMTS9, TCF7L2, and ZFAND6 showed nominal evidence of association (p < 0.05). African American individuals carried 38–67 (53.7 ± 4.0, mean ± SD) risk alleles. In European American subjects, risk allele frequencies ranged from 0.084 to 0.996. Risk alleles from 36 SNPs demonstrated directional consistency, and 10 SNPs from BCL11A, PSMD6, ADAMTS9, ZFAND3, ANK1, CDKN2A/B, TCF7L2, PRC1, FTO, and BCAR1 showed evidence of association (p < 0.05). European American individuals carried 38–65 (50.9 ± 4.4) risk alleles. African Americans have a significantly greater burden of 2.8 risk alleles (p = 3.97 × 10?89) compared to European Americans. However, GRS modeling showed that cumulative risk allele load was associated with risk of T2D in European Americans, but only marginally in African Americans. This result suggests that there are ethnic-specific differences in genetic architecture underlying T2D, and that these differences complicate our understanding of how risk allele load impacts disease susceptibility.  相似文献   

19.
Single-nucleotide polymorphisms (SNPs), microsatellites and copy number variation (CNV) were studied on the Y chromosome to understand the paternal origin and phylogenetic relationships for resource protection, rational development and utilization of the domestic Bactrian camel in China. Our sample set consisted of 94 Chinese domestic Bactrian camels from four regions (Inner Mongolia, Gansu, Qinghai and Xinjiang), we screened 29 Y-chromosome-specific loci for SNPs, analysed 40 bovine-derived microsatellite loci and measured CNVs of HSFY and SRY through Sanger sequencing, automated fluorescence-based microsatellite analysis and quantitative real-time PCR, respectively. A multicopy gene, SRY, was first found, and sequence variation was only detected in SRY in a screen of 29 loci in 13 DNA pools of individual camels. In addition, a TG repeat in the USP9Y gene was identified as the first polymorphic microsatellite in the camel Y chromosome, whereas microsatellite based on bovine sequences were not detected. The frequency of each allele varied among different populations. For the Nanjiang, Hexi and Alashan populations, a 243-bp allele was found. For the Sunite population, 241-bp, 243-bp and 247-bp alleles were detected, and the frequencies of these alleles were \(22.2\%\), \(44.5\%\) and \(33.3\%\), respectively; 241-bp and 243-bp alleles were found in other populations. Finally, CNVs in two Y-chromosomal genes were detected; CNV for HSFY and SRY ranged from 1 to 3 and from 1 to 9, respectively.  相似文献   

20.
One of the applications of genomics is to identify genetic markers linked to loci responsible for variation in phenotypic traits, which could be used in breeding programs to select individuals with favorable alleles, particularly at the seedling stage. With this aim, in the framework of the European project FruitBreedomics, we selected five main peach fruit characters and a resistance trait, controlled by major genes with Mendelian inheritance: fruit flesh color Y, fruit skin pubescence G, fruit shape S, sub-acid fruit D, stone adhesion-flesh texture F-M, and resistance to green peach aphid Rm2. They were all previously mapped in Prunus. We then selected three F1 and three F2 progenies segregating for these characters and developed genetic maps of the linkage groups including the major genes, using the single nucleotide polymorphism (SNP) genome-wide scans obtained with the International Peach SNP Consortium (IPSC) 9K SNP array v1. We identified SNPs co-segregating with the characters in all cases. Their positions were in agreement with the known positions of the major genes. The number of SNPs linked to each of these, as well as the size of the physical regions encompassing them, varied depending on the maps. As a result, the number of useful SNPs for marker-assisted selection varied accordingly. As a whole, this study establishes a sound basis for further development of MAS on these characters. Additionally, we also discussed some limitations that were observed regarding the SNP array efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号