首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods – SKAT and a previously proposed method based on functional linear models (FLM), – especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.  相似文献   

3.
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.  相似文献   

4.
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.  相似文献   

5.
The rapid decrease in sequencing cost has enabled genetic studies to discover rare variants associated with complex diseases and traits. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases. Similar to the hypothesis of common variants, rare variants may affect diseases by regulating gene expression, and recently, several studies have identified the effects of rare variants on gene expression using heritability and expression outlier analyses. However, identifying individual genes whose expression is regulated by rare variants has been challenging due to the relatively small sample size of expression quantitative trait loci studies and statistical approaches not optimized to detect the effects of rare variants. In this study, we analyze whole-genome sequencing and RNA-seq data of 681 European individuals collected for the Genotype-Tissue Expression (GTEx) project (v8) to identify individual genes in 49 human tissues whose expression is regulated by rare variants. To improve statistical power, we develop an approach based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. Using GTEx data, we identify many genes regulated by rare variants, and some of them are only regulated by rare variants and not by common variants. We also find that genes regulated by rare variants are enriched for expression outliers and disease-causing genes. These results suggest the regulatory effects of rare variants, which would be important in interpreting associations of rare variants with complex traits.  相似文献   

6.
Interaction (nonadditive effects) between genetic variants has been highlighted as an important mechanism underlying phenotypic variation, but the discovery of genetic interactions in humans has proved difficult. In this study, we show that the spectrum of variation in the human genome has been shaped by modifier effects of cis-regulatory variation on the functional impact of putatively deleterious protein-coding variants. We analyzed 1000 Genomes population-scale resequencing data from Europe (CEU [Utah residents with Northern and Western European ancestry from the CEPH collection]) and Africa (YRI [Yoruba in Ibadan, Nigeria]) together with gene expression data from arrays and RNA sequencing for the same samples. We observed an underrepresentation of derived putatively functional coding variation on the more highly expressed regulatory haplotype, which suggests stronger purifying selection against deleterious coding variants that have increased penetrance because of their regulatory background. Furthermore, the frequency spectrum and impact size distribution of common regulatory polymorphisms (eQTLs) appear to be shaped in order to minimize the selective disadvantage of having deleterious coding mutations on the more highly expressed haplotype. Interestingly, eQTLs explaining common disease GWAS signals showed an enrichment of putative epistatic effects, suggesting that some disease associations might arise from interactions increasing the penetrance of rare coding variants. In conclusion, our results indicate that regulatory and coding variants often modify the functional impact of each other. This specific type of genetic interaction is detectable from sequencing data in a genome-wide manner, and characterizing these joint effects might help us understand functional mechanisms behind genetic associations to human phenotypes-including both Mendelian and common disease.  相似文献   

7.
Next-generation sequencing technologies have revolutionized our ability to identify genetic variants, either germline or somatic point mutations, that occur in cancer. Parallelization and miniaturization of DNA sequencing enables massive data throughput and for the first time, large-scale, nucleotide resolution views of cancer genomes can be achieved. Systematic, large-scale sequencing surveys have revealed that the genetic spectrum of mutations in cancers appears to be highly complex with numerous low frequency bystander somatic variations, and a limited number of common, frequently mutated genes. Large sample sizes and deeper resequencing are much needed in resolving clinical and biological relevance of the mutations as well as in detecting somatic variants in heterogeneous samples and cancer cell sub-populations. However, even with the next-generation sequencing technologies, the overwhelming size of the human genome and need for very high fold coverage represents a major challenge for up-scaling cancer genome sequencing projects. Assays to target, capture, enrich or partition disease-specific regions of the genome offer immediate solutions for reducing the complexity of the sequencing libraries. Integration of targeted DNA capture assays and next-generation deep resequencing improves the ability to identify clinically and biologically relevant mutations.  相似文献   

8.
Infectious pathogens have long been recognized as potentially powerful agents impacting on the evolution of human genetic diversity. Analysis of large-scale case-control studies provides one of the most direct means of identifying human genetic variants that currently impact on susceptibility to particular infectious diseases. For over 50 years candidate gene studies have been used to identify loci for many major causes of human infectious mortality, including malaria, tuberculosis, human immunodeficiency virus/acquired immunodeficiency syndrome, bacterial pneumonia and hepatitis. But with the advent of genome-wide approaches, many new loci have been identified in diverse populations. Genome-wide linkage studies identified a few loci, but genome-wide association studies are proving more successful, and both exome and whole-genome sequencing now offer a revolutionary increase in power. Opinions differ on the extent to which the genetic component to common disease susceptibility is encoded by multiple high frequency or rare variants, and the heretical view that most infectious diseases might even be monogenic has been advocated recently. Review of findings to date suggests that the genetic architecture of infectious disease susceptibility may be importantly different from that of non-infectious diseases, and it is suggested that natural selection may be the driving force underlying this difference.  相似文献   

9.
10.
Accurate identification of sparse heterozygous single-nucleotide variants (SNVs) is a critical challenge for identifying the causative mutations in mouse genetic screens, human genetic diseases and cancer. When seeking to identify causal DNA variants that occur at such low rates, they are overwhelmed by false-positive calls that arise from a range of technical and biological sources. We describe a strategy using whole-exome capture, massively parallel DNA sequencing and computational analysis, which identifies with a low false-positive rate the majority of heterozygous and homozygous SNVs arising de novo with a frequency of one nucleotide substitution per megabase in progeny of N-ethyl-N-nitrosourea (ENU)-mutated C57BL/6j mice. We found that by applying a strategy of filtering raw SNV calls against known and platform-specific variants we could call true SNVs with a false-positive rate of 19.4 per cent and an estimated false-negative rate of 21.3 per cent. These error rates are small enough to enable calling a causative mutation from both homozygous and heterozygous candidate mutation lists with little or no further experimental validation. The efficacy of this approach is demonstrated by identifying the causative mutation in the Ptprc gene in a lymphocyte-deficient strain and in 11 other strains with immune disorders or obesity, without the need for meiotic mapping. Exome sequencing of first-generation mutant mice revealed hundreds of unphenotyped protein-changing mutations, 52 per cent of which are predicted to be deleterious, which now become available for breeding and experimental analysis. We show that exome sequencing data alone are sufficient to identify induced mutations. This approach transforms genetic screens in mice, establishes a general strategy for analysing rare DNA variants and opens up a large new source for experimental models of human disease.  相似文献   

11.
二代测序技术的涌现推动了基因组学研究,特别是在疾病相关的遗传变异研究中发挥了重要作用.虽然大多数遗传变异类型都可以借助于各种二代测序分析工具进行检测,但是仍然存在局限性,比如短串联重复序列的长度变异.许多遗传疾病是由短串联重复序列的长度扩张导致的,尤其是亨廷顿病等多种神经系统疾病.然而,现在几乎没有工具能够利用二代测序检测长度大于测序读长的短串联重复序列变异.为了突破这一限制,我们开发了一个全新的方法,该方法基于双末端二代测序辨识短串联重复序列长度变异,并可估计其扩张长度,将其应用于一项基于全外显子组测序的运动神经元疾病临床研究中,成功地鉴定出致病的短串联重复序列长度扩张.该方法首次原创性地利用测序读长覆盖深度特征来解决短串联重复序列变异检测问题,在人类遗传疾病研究中具有广泛的应用价值,并且对于其他二代测序分析方法的开发具有启发性意义.  相似文献   

12.
It has been anticipated that new, much more sensitive, next generation sequencing (NGS) techniques, using massively parallel sequencing, will likely provide radical insights into the genetics of multifactorial diseases. While NGS has been used initially to analyze individual human genomes, and has revealed considerable differences between healthy individuals, we have used NGS to examine genetic variation within individuals, by sequencing tissues “in depth”, i.e., oversequencing many thousands of times. Initial studies have revealed intra-tissue genetic heterogeneity, in the form of multiple variants of a single gene that exist as distinct “majority and “minority” variants. This highly specialized form of somatic mosaicism has been found within both cancer and normal tissues. If such genetic variation within individual tissues is widespread, it will need to be considered as a significant factor in the ontogeny of many multifactorial diseases, including cancer. The discovery of majority and minority gene variants and the resulting somatic cell heterogeneity in both normal and diseased tissues suggests that selection, as opposed to mutation, might be the critical event in disease ontogeny. We, therefore, are proposing a hypothesis to explain multifactorial disease ontogeny in which pre-existing multiple somatic gene variants, which may arise at a very early stage of tissue development, are eventually selected due to changes in tissue microenvironments.  相似文献   

13.
Acquisition of genetic material from viruses by their hosts can generate inter-host structural genome variation. We developed computational tools enabling us to study virus-derived structural variants (SVs) in population-scale whole genome sequencing (WGS) datasets and applied them to 3,332 humans. Although SVs had already been cataloged in these subjects, we found previously-overlooked virus-derived SVs. We detected non-germline SVs derived from squirrel monkey retrovirus (SMRV), human immunodeficiency virus 1 (HIV-1), and human T lymphotropic virus (HTLV-1); these variants are attributable to infection of the sequenced lymphoblastoid cell lines (LCLs) or their progenitor cells and may impact gene expression results and the biosafety of experiments using these cells. In addition, we detected new heritable SVs derived from human herpesvirus 6 (HHV-6) and human endogenous retrovirus-K (HERV-K). We report the first solo-direct repeat (DR) HHV-6 likely to reflect DR rearrangement of a known full-length endogenous HHV-6. We used linkage disequilibrium between single nucleotide variants (SNVs) and variants in reads that align to HERV-K, which often cannot be mapped uniquely using conventional short-read sequencing analysis methods, to locate previously-unknown polymorphic HERV-K loci. Some of these loci are tightly linked to trait-associated SNVs, some are in complex genome regions inaccessible by prior methods, and some contain novel HERV-K haplotypes likely derived from gene conversion from an unknown source or introgression. These tools and results broaden our perspective on the coevolution between viruses and humans, including ongoing virus-to-human gene transfer contributing to genetic variation between humans.  相似文献   

14.
15.
A dozen genes/regions have been confirmed as genetic risk factors for oral clefts in human association and linkage studies, and animal models argue even more genes may be involved. Genomic sequencing studies should identify specific causal variants and may reveal additional genes as influencing risk to oral clefts, which have a complex and heterogeneous etiology. We conducted a whole exome sequencing (WES) study to search for potentially causal variants using affected relatives drawn from multiplex cleft families. Two or three affected second, third, and higher degree relatives from 55 multiplex families were sequenced. We examined rare single nucleotide variants (SNVs) shared by affected relatives in 348 recognized candidate genes. Exact probabilities that affected relatives would share these rare variants were calculated, given pedigree structures, and corrected for the number of variants tested. Five novel and potentially damaging SNVs shared by affected distant relatives were found and confirmed by Sanger sequencing. One damaging SNV in CDH1, shared by three affected second cousins from a single family, attained statistical significance (P = 0.02 after correcting for multiple tests). Family-based designs such as the one used in this WES study offer important advantages for identifying genes likely to be causing complex and heterogeneous disorders.  相似文献   

16.
Li H 《Human genetics》2012,131(9):1395-1401
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.  相似文献   

17.
Genomics is now a core element in the effort to develop a vaccine against HIV-1. Thanks to unprecedented progress in high-throughput genotyping and sequencing, in knowledge about genetic variation in humans, and in evolutionary genomics, it is finally possible to systematically search the genome for common genetic variants that influence the human response to HIV-1. The identification of such variants would help to determine which aspects of the response to the virus are the most promising targets for intervention. However, a key obstacle to progress remains the scarcity of appropriate human cohorts available for genomic research.  相似文献   

18.
19.
20.
SNP analysis to dissect human traits   总被引:5,自引:0,他引:5  
The analysis of complex human diseases has been spurred by the number of published genomic sequence variants - many identified in the course of sequencing the human genome. But, to be useful for genetic analysis, variants have to be mapped accurately, their frequencies in various populations determined, and automated high-throughput assay techniques developed. Recently proposed methods address these issues: the use of 'reduced representation shotgun' methods for more efficient detection of single nucleotide polymorphisms (SNPs), the employment of high-throughput genotyping techniques, the development of SNP maps that incorporate information about linkage disequilibrium, and the use of SNPs in identifying susceptibility genes for common illnesses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号