首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 719 毫秒
1.
Family-based association tests for genomewide association scans   总被引:7,自引:1,他引:6       下载免费PDF全文
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ~860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ~6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free.  相似文献   

2.
Adjuvant therapy of stage IIB/III melanoma with interferon reduces relapse and mortality by up to 33% but is accompanied by toxicity-related complications. Polymorphisms of the CTLA-4 gene associated with autoimmune diseases could help in identifying interferon treatment benefits. We previously genotyped 286 melanoma patients and 288 healthy (unrelated) individuals for six CTLA-4 polymorphisms (SNP). Previous analyses found no significant differences between the distributions of CTLA-4 polymorphisms in the melanoma population vs. controls, no significant difference in relapse free and overall survivals among patients and no correlation between autoimmunity and specific alleles. We report new analysis of these CTLA-4 genetic profiles, using Network Phenotyping Strategy (NPS). It is graph-theory based method, analyzing the SNP patterns. Application of NPS on CTLA-4 polymorphism captures allele relationship pattern for every patient into 6-partite mathematical graph P. Graphs P are combined into weighted 6-partite graph S, which subsequently decomposed into reference relationship profiles (RRP). Finally, every individual CTLA-4 genotype pattern is characterized by the graph distances of P from eight identified RRP''s. RRP''s are subgraphs of S, collecting equally frequent binary allele co-occurrences in all studied loci. If S topology represents the genetic “dominant model”, the RRP''s and their characteristic frequencies are identical to expectation-maximization derived haplotypes and maximal likelihood estimates of their frequencies. The graph-representation allows showing that patient CTLA-4 haplotypes are uniquely different from the controls by absence of specific SNP combinations. New function-related insight is derived when the 6-partite graph reflects allelic state of CTLA-4. We found that we can use differences between individual P and specific RRPs to identify patient subpopulations with clearly different polymorphic patterns relatively to controls as well as to identify patients with significantly different survival.  相似文献   

3.
Selection on phenotypes may cause genetic change. To understand the relationship between phenotype and gene expression from an evolutionary viewpoint, it is important to study the concordance between gene expression and profiles of phenotypes. In this study, we use a novel method of clustering to identify genes whose expression profiles are related to a quantitative phenotype. Cluster analysis of gene expression data aims at classifying genes into several different groups based on the similarity of their expression profiles across multiple conditions. The hope is that genes that are classified into the same clusters may share underlying regulatory elements or may be a part of the same metabolic pathways. Current methods for examining the association between phenotype and gene expression are limited to linear association measured by the correlation between individual gene expression values and phenotype. Genes may be associated with the phenotype in a nonlinear fashion. In addition, groups of genes that share a particular pattern in their relationship to phenotype may be of evolutionary interest. In this study, we develop a method to group genes based on orthogonal polynomials under a multivariate Gaussian mixture model. The effect of each expressed gene on the phenotype is partitioned into a cluster mean and a random deviation from the mean. Genes can also be clustered based on a time series. Parameters are estimated using the expectation-maximization algorithm and implemented in SAS. The method is verified with simulated data and demonstrated with experimental data from 2 studies, one clusters with respect to severity of disease in Alzheimer's patients and another clusters data for a rat fracture healing study over time. We find significant evidence of nonlinear associations in both studies and successfully describe these patterns with our method. We give detailed instructions and provide a working program that allows others to directly implement this method in their own analyses.  相似文献   

4.
Predicting phenotypes using genome-wide genetic variation and gene expression data is useful in several fields, such as human biology and medicine, as well as in crop and livestock breeding. However, for phenotype prediction using gene expression data for mammals, studies remain scarce, as the available data on gene expression profiling are currently limited. By integrating a few sources of relevant data that are available in mice, this study investigated the accuracy of phenotype prediction for several physiological traits. Gene expression data from two tissues as well as single nucleotide polymorphisms (SNPs) were used. For the studied traits, the variance of the effects of the expression levels was more likely to differ among the genes than were the effects of SNPs. For the glucose concentration, the total cholesterol amount, and the total tidal volume, the accuracy by cross validation tended to be higher when the gene expression data rather than the SNP genotype data were used, and a statistically significant increase in the accuracy was obtained when the gene expression data from the liver were used alone or jointly with the SNP genotype data. For these traits, there were no additional gains in accuracy from using the gene expression data of both the liver and lung compared to that of individual use. The accuracy of prediction using genes that were selected differently was examined; the use of genes with a higher tissue specificity tended to result in an accuracy that was similar to or greater than that associated with the use of all of the available genes for traits such as the glucose concentration and total cholesterol amount. Although relatively few animals were evaluated, the current results suggest that gene expression levels could be used as explanatory variables. However, further studies are essential to confirm our findings using additional animal samples.  相似文献   

5.
Insights into the relative contributions of locus specific and genome-wide effects on population genetic diversity can be gained through separation of their resulting genetic signals. Here we explore patterns of adaptive and neutral genetic diversity in the disjunct natural populations of Pinus radiata (D. Don) from mainland California. A first-generation common garden of 447 individuals revealed significant differentiation of wood phenotypes among populations (P ST), possibly reflecting local adaptation in response to environment. We subsequently screened all trees for genetic diversity at 149 candidate gene single nucleotide polymorphism (SNP) loci for signatures of adaptation. Ten loci were identified as being possible targets of diversifying selection following F ST outlier tests. Multivariate canonical correlation performed on a data set of 444 individuals identified significant covariance between environment, adaptive phenotypes and outlier SNP diversity, lending support to the case for local adaptation suggested from F ST and P ST tests. Covariation among discrete sets of outlier SNPs and adaptive phenotypes (inferred from multivariate loadings) with environment are supported by existing studies of candidate gene function and genotype–phenotype association. Canonical analyses failed to detect significant correlations between environment and 139 non-outlier SNP loci, which were applied to estimate neutral patterns of genetic differentiation among populations (F ST 4.3 %). Using this data set, significant hierarchical structure was detected, indicating three populations on the mainland. The hierarchical relationships based on neutral SNP markers (and SSR) were in contrast with those inferred from putatively adaptive loci, potentially highlighting the independent action of selection and demography in shaping genetic structure in this species.  相似文献   

6.
Kostem E  Lozano JA  Eskin E 《Genetics》2011,188(2):449-460
Genome-wide association studies (GWASs) have been effectively identifying the genomic regions associated with a disease trait. In a typical GWAS, an informative subset of the single-nucleotide polymorphisms (SNPs), called tag SNPs, is genotyped in case/control individuals. Once the tag SNP statistics are computed, the genomic regions that are in linkage disequilibrium (LD) with the most significantly associated tag SNPs are believed to contain the causal polymorphisms. However, such LD regions are often large and contain many additional polymorphisms. Following up all the SNPs included in these regions is costly and infeasible for biological validation. In this article we address how to characterize these regions cost effectively with the goal of providing investigators a clear direction for biological validation. We introduce a follow-up study approach for identifying all untyped associated SNPs by selecting additional SNPs, called follow-up SNPs, from the associated regions and genotyping them in the original case/control individuals. We introduce a novel SNP selection method with the goal of maximizing the number of associated SNPs among the chosen follow-up SNPs. We show how the observed statistics of the original tag SNPs and human genetic variation reference data such as the HapMap Project can be utilized to identify the follow-up SNPs. We use simulated and real association studies based on the HapMap data and the Wellcome Trust Case Control Consortium to demonstrate that our method shows superior performance to the correlation- and distance-based traditional follow-up SNP selection approaches. Our method is publicly available at http://genetics.cs.ucla.edu/followupSNPs.  相似文献   

7.
Marko NF  Toms SA  Barnett GH  Weil R 《Genomics》2008,91(5):395-406
We used microarray analysis to investigate associations between genotypic expression profiles and survival phenotypes in patients with primary glioblastoma (GBM). Tumor samples from 7 long-term glioblastoma survivors (>24 months) and 13 short-term survivors (<9 months) were analyzed to detect differential patterns of gene expression between these groups and to identify genotypic subclasses of glioblastomas that correlate with survival phenotypes. Five unsupervised and three supervised clustering algorithms consistently and accurately grouped the tumors into genotypic subgroups corresponding to the two clinical survival phenotypes. Three unique prospective mathematical classification algorithms were subsequently trained to use expression data to stratify unknown glioblastomas between survival groups and performed this task with 100% accuracy in validation studies. A set of 1478 genes with significant differential expression (p<0.01) between long-term and short-term survivors was identified, and additional mathematical filtering was used to isolate a 43-gene "fingerprint" that distinguished survival phenotypes. Differential regulation of a subset of these genes was confirmed using RT-PCR. Gene ontology analysis of the fingerprint demonstrated pathophysiologic functions for the gene products that are consistent with current models of tumor biology, suggesting that differential expression of these genes may contribute etiologically to the observed differences in survival. These results demonstrate that unique expression profiles characterize genotypic subsets of primary GBMs associated with differential survival phenotypes, and these profiles can be used in a prospective fashion to assign unknown tumors to survival groups. Future efforts will focus on building more robust classifiers and identifying additional subclasses of gliomas with phenotypic significance.  相似文献   

8.
Translocations of threatened species can reduce the risk of extinction from a catastrophic event. For plants, translocation consists of moving individuals, seeds, or cuttings from a native (source) population to a new site. Ideally a translocation population would be genetically diverse and consist of fit founding individuals. In practice, there are challenges to designing such a population, including constraints on the availability of material, and tradeoffs between different goals. Here, we present an approach for designing a translocation population that identifies sets of founders that are optimized according to multiple criteria (e.g., genetic diversity), while also conforming to constraints on the representation of different founders (e.g., propagation success). It uses flexible inputs, including SNP genotypes, matrices of similarity between individuals, and vectors of phenotype data. We apply the approach to a critically endangered plant, Hibbertia puberula subsp. glabrescens (Dilleniaceae), which was genotyped at thousands of SNP loci. The goals of minimizing genetic similarity among the founding individuals and maximizing genetic diversity were largely complementary: populations optimized for one of these criteria were near‐optimal for the other. We also performed analyses in which we minimized genetic similarity among founding individuals while imposing selection (against hypothetical deleterious alleles, and against undesirable phenotypes, respectively), and here characterized sharp tradeoffs. This was useful in allowing the benefits of selection to be weighed against costs in terms of genetic similarity. In summary, we present an approach for designing a translocation population that allows flexible inputs, the imposition of realistic constraints, and examination of conflicting goals.  相似文献   

9.
Pathway analysis of microarray data evaluates gene expression profiles of a priori defined biological pathways in association with a phenotype of interest. We propose a unified pathway-analysis method that can be used for diverse phenotypes including binary, multiclass, continuous, count, rate, and censored survival phenotypes. The proposed method also allows covariate adjustments and correlation in the phenotype variable that is encountered in longitudinal, cluster-sampled, and paired designs. These are accomplished by combining the regression-based test statistic for each individual gene in a pathway of interest into a pathway-level test statistic. Applications of the proposed method are illustrated with two real pathway-analysis examples: one evaluating relapse-associated gene expression involving a matched-pair binary phenotype in children with acute lymphoblastic leukemia; and the other investigating gene expression in breast cancer tissues in relation to patients' survival (a censored survival phenotype). Implementations for various phenotypes are available in R. Additionally, an Excel Add-in for a user-friendly interface is currently being developed.  相似文献   

10.
11.
There is a critical need for data-mining methods that can identify SNPs that predict among individual variation in a phenotype of interest and reverse-engineer the biological network of relationships between SNPs, phenotypes, and other factors. This problem is both challenging and important in light of the large number of SNPs in many genes of interest and across the human genome. A potentially fruitful form of exploratory data analysis is the Bayesian or Belief network. A Bayesian or Belief network provides an analytic approach for identifying robust predictors of among-individual variation in a disease endpoints or risk factor levels. We have applied Belief networks to SNP variation in the human APOE gene and plasma apolipoprotein E levels from two samples: 702 African-Americans from Jackson, MS, and 854 non-Hispanic whites from Rochester, MN. Twenty variable sites in the APOE gene were genotyped in both samples. In Jackson, MS, SNPs 4036 and 4075 were identified to influence plasma apoE levels. In Rochester, MN, SNPs 3937 and 4075 were identified to influence plasma apoE levels. All three SNPs had been previously implicated in affecting measures of lipid and lipoprotein metabolism. Like all data-mining methods, Belief networks are meant to complement traditional hypothesis-driven methods of data analysis. These results document the utility of a Belief network approach for mining large scale genotype-phenotype association data.  相似文献   

12.
MOTIVATION: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS: When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY: The programs are available by request from the authors.  相似文献   

13.
Roeder K  Luca D 《Genomics》2009,93(1):1-4
Data for genome-wide association studies are being collected for a myriad of phenotypes. Many of these studies do not include control samples selected to reflect ancestry similar to the case samples. At the same time "control databases" are becoming available to be utilized as a common resource. These data are often genotyped using a large-scale SNP array. Human populations exhibit complex structure that can lead to spurious associations if not properly handled. How to couple case and control databases effectively is a pressing question. We review available methods for modeling genetic ancestry based on the information gleaned from the SNP array. Methods for selecting control samples with genetic ancestry similar to the case samples are described.  相似文献   

14.
With the recent completion of the International HapMap Project, many tools are in hand for genetic association studies seeking to test the common variant/common disease hypothesis. In contrast, very few tools and resources are in place for genotype–phenotype studies hypothesizing that rare variation has a large impact on the phenotype of interest. To create these tools for rare variant/common disease studies, much interest is being generated towards investing in re-sequencing either large sample sizes of random chromosomes or smaller sample sizes of patients with extreme phenotypes. As a case study for rare variant discovery in random chromosomes, we have re-sequenced ~1,000 chromosomes representing diverse populations for the gene C-reactive protein (CRP). CRP is an important gene in the fields of cardiovascular and inflammation genetics, and its size (~2 kb) makes it particularly amenable medical or deep re-sequencing. With these data, we explore several issues related to the present-day candidate gene association study including the benefits of complete SNP discovery, the effects of tagSNP selection across diverse populations, and completeness of dbSNP for CRP. Also, we show that while deep re-sequencing uncovers potentially medically relevant coding SNPs, these SNPs are fleetingly rare when genotyped in a population-based survey of 7,000 Americans (NHANES III). Collectively, these data suggest that several different types re-sequencing and genotyping approaches may be required to fully understand the complete spectrum of alleles that impact human phenotypes.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

15.
16.
Epstein-Barr virus (EBV) transformed lymphoblastoid cell lines (LCLs) are a widely used renewable resource for functional genomic studies in humans. The ability to accumulate multidimensional data pertaining to the same individual cell lines, from complete genomic sequences to detailed gene regulatory profiles, further enhances the utility of LCLs as a model system. However, the extent to which LCLs are a faithful model system is relatively unknown. We have previously shown that gene expression profiles of newly established LCLs maintain a strong individual component. Here, we extend our study to investigate the effect of freeze-thaw cycles on gene expression patterns in mature LCLs, especially in the context of inter-individual variation in gene expression. We report a profound difference in the gene expression profiles of newly established and mature LCLs. Once newly established LCLs undergo a freeze-thaw cycle, the individual specific gene expression signatures become much less pronounced as the gene expression levels in LCLs from different individuals converge to a more uniform profile, which reflects a mature transformed B cell phenotype. We found that previously identified eQTLs are enriched among the relatively few genes whose regulations in mature LCLs maintain marked individual signatures. We thus conclude that while insight drawn from gene regulatory studies in mature LCLs may generally not be affected by the artificial nature of the LCL model system, many aspects of primary B cell biology cannot be observed and studied in mature LCL cultures.  相似文献   

17.
The objective of this study was to examine the relation between the 5, 10-methylenetetrahydrofolate reductase (MTHFR) gene and behaviors related to attention- deficit/hyperactivity disorder (ADHD) in individuals with myelomeningocele. The rationale for the study was twofold: folate metabolizing genes, (e.g. MTHFR), are important not only in the etiology of neural tube defects but are also critical to cognitive function; and individuals with myelomeningocele have an elevated incidence of ADHD. Here, we tested 478 individuals with myelomeningocele for attention-deficit hyperactivity disorder behavior using the Swanson Nolan Achenbach Pelham-IV ADHD rating scale. Myelomeningocele participants in this group for whom DNAs were available were genotyped for seven single nucleotide polymorphisms (SNPs) in the MTHFR gene. The SNPs were evaluated for an association with manifestation of the ADHD phenotype in children with myelomeningocele. The data show that 28.7% of myelomeningocele participants exhibit rating scale elevations consistent with ADHD; of these 70.1% had scores consistent with the predominantly inattentive subtype. In addition, we also show a positive association between the SNP rs4846049 in the 3′-untranslated region of the MTHFR gene and the attention-deficit hyperactivity disorder phenotype in myelomeningocele participants. These results lend further support to the finding that behavior related to ADHD is more prevalent in patients with myelomeningocele than in the general population. These data also indicate the potential importance of the MTHFR gene in the etiology of the ADHD phenotype.  相似文献   

18.
We report results from the analysis of complete mitochondrial DNA (mtDNA) sequences from 112 Japanese semi-supercentenarians (aged above 105 years) combined with previously published data from 96 patients in each of three non-disease phenotypes: centenarians (99-105 years of age), healthy non-obese males, obese young males and four disease phenotypes, diabetics with and without angiopathy, and Alzheimer's and Parkinson's disease patients. We analyze the correlation between mitochondrial polymorphisms and the longevity phenotype using two different methods. We first use an exhaustive algorithm to identify all maximal patterns of polymorphisms shared by at least five individuals and define a significance score for enrichment of the patterns in each phenotype relative to healthy normals. Our study confirms the correlations observed in a previous study showing enrichment of a hierarchy of haplogroups in the D clade for longevity. For the extreme longevity phenotype we see a single statistically significant signal: a progressive enrichment of certain "beneficial" patterns in centenarians and semi-supercentenarians in the D4a haplogroup. We then use Principal Component Spectral Analysis of the SNP-SNP Covariance Matrix to compare the measured eigenvalues to a Null distribution of eigenvalues on Gaussian datasets to determine whether the correlations in the data (due to longevity) arises from some property of the mutations themselves or whether they are due to population structure. The conclusion is that the correlations are entirely due to population structure (phylogenetic tree). We find no signal for a functional mtDNA SNP correlated with longevity. The fact that the correlations are from the population structure suggests that hitch-hiking on autosomal events is a possible explanation for the observed correlations.  相似文献   

19.

Background

The increasing prevalence of bovine tuberculosis (bTB) in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs) for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates.

Methodology/Principal Findings

We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC) curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC-curve (AUC). The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls). All individuals (592 cases and 559 controls) were genotyped for 727,252 loci (Illumina Bead Chip). The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale) and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40). ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data.

Conclusions/Significance

These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking bTB phenotypes. However, a larger training population will be required to improve prediction accuracies.  相似文献   

20.
Li X  Rao S  Wang Y  Gong B 《Nucleic acids research》2004,32(9):2685-2694
Current applications of microarrays focus on precise classification or discovery of biological types, for example tumor versus normal phenotypes in cancer research. Several challenging scientific tasks in the post-genomic epoch, like hunting for the genes underlying complex diseases from genome-wide gene expression profiles and thereby building the corresponding gene networks, are largely overlooked because of the lack of an efficient analysis approach. We have thus developed an innovative ensemble decision approach, which can efficiently perform multiple gene mining tasks. An application of this approach to analyze two publicly available data sets (colon data and leukemia data) identified 20 highly significant colon cancer genes and 23 highly significant molecular signatures for refining the acute leukemia phenotype, most of which have been verified either by biological experiments or by alternative analysis approaches. Furthermore, the globally optimal gene subsets identified by the novel approach have so far achieved the highest accuracy for classification of colon cancer tissue types. Establishment of this analysis strategy has offered the promise of advancing microarray technology as a means of deciphering the involved genetic complexities of complex diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号