首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Ott J 《Human heredity》2004,58(3-4):171-174
Several sources of errors are discussed. While genotyping errors have little effect on power in case-control association studies, they tend to strongly increase false positive results in TDT type tests unless occurrence of errors is allowed for in the analysis (e.g., TDTae test). Disregarding non-genetic risk factors is shown to lead to a form of hidden heterogeneity, which can strongly reduce power. Stratification of data into more homogeneous subgroups is advocated as a simple solution to allowing for non-genetic risk factors such as socio-economic status and food preferences.  相似文献   

2.
The pressure to publish novel genetic associations has meant that meta-analysis has been applied to genome-wide association studies without the time for a careful consideration of the methods that are used. This review distinguishes between the use of meta-analysis to validate previously reported genetic associations and its use for gene discovery, and advocates viewing gene discovery as an exploratory screen that requires independent replication instead of treating it as the application of hundreds of thousands of statistical tests. The review considers the use of fixed and random effects meta-analyses, the investigation of between-study heterogeneity, adjustment for confounding, assessing the combined evidence and genomic control, and comments on alternative approaches that have been used in the literature.  相似文献   

3.
Copy-number variation (CNV), and deletions in particular, can play a crucial, causative role in rare disorders. The extent to which CNV contributes to common, complex disease etiology, however, is largely unknown. Current techniques to detect CNV are relatively expensive and time consuming, making it difficult to conduct the necessary large-scale genetic studies. SNP genotyping technologies, on the other hand, are relatively cheap, thereby facilitating large study designs. We have developed a computational tool capable of harnessing the information in SNP genotype data to detect deletions. Our approach not only detects deletions with high power but also returns accurate estimates of both the population frequency and the transmission frequency. This tool, therefore, lends itself to the discovery of deletions in large familial SNP genotype data sets and to simultaneous testing of the discovered deletion for association, with the use of both frequency-based and transmission/disequilibrium test-based designs. We demonstrate the effectiveness of our computer program (microdel), available for download at no cost, with both simulated and real data. Here, we report 693 deletions in the HapMap 16c collection, with each deletion assigned a population frequency.  相似文献   

4.
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.  相似文献   

5.
Individual genome-wide association (GWA) studies and their meta-analyses represent two approaches for identifying genetic loci associated with complex diseases/traits. Inconsistent findings and non-replicability between individual GWA studies and meta-analyses are commonly observed, hence posing the critical question as to how to interpret their respective results properly. In this study, we performed a series of simulation studies to investigate and compare the statistical properties of the two approaches. Our results show that (1) as expected, meta-analysis of larger sample size is more powerful than individual GWA studies under the ideal setting of population homogeneity among individual studies; (2) under the realistic setting of heterogeneity among individual studies, detection of heterogeneity is usually difficult and meta-analysis (even with the random-effects model) may introduce elevated false positive and/or negative rates; (3) despite relatively small sample size, well-designed individual GWA study has the capacity to identify novel loci for complex traits; (4) replicability between meta-analysis and independent individual studies or between independent meta-analyses is limited, and thus inconsistent findings are not unexpected.  相似文献   

6.
Genome-wide case-control association studies aim at identifying significant differential markers between sick and healthy populations. With the development of large-scale technologies allowing the genotyping of thousands of single nucleotide polymorphisms (SNPs) comes the multiple testing problem and the practical issue of selecting the most probable set of associated markers. Several False Discovery Rate (FDR) estimation methods have been developed and tuned mainly for differential gene expression studies. However they are based on hypotheses and designs that are not necessarily relevant in genetic association studies. In this article we present a universal methodology to estimate the FDR of genome-wide association results. It uses a single global probability value per SNP and is applicable in practice for any study design, using any statistic. We have benchmarked this algorithm on simulated data and shown that it outperforms previous methods in cases requiring non-parametric estimation. We exemplified the usefulness of the method by applying it to the analysis of experimental genotyping data of three Multiple Sclerosis case-control association studies.  相似文献   

7.
Li J  Guo YF  Pei Y  Deng HW 《PloS one》2012,7(4):e34486
Genotype imputation is often used in the meta-analysis of genome-wide association studies (GWAS), for combining data from different studies and/or genotyping platforms, in order to improve the ability for detecting disease variants with small to moderate effects. However, how genotype imputation affects the performance of the meta-analysis of GWAS is largely unknown. In this study, we investigated the effects of genotype imputation on the performance of meta-analysis through simulations based on empirical data from the Framingham Heart Study. We found that when fix-effects models were used, considerable between-study heterogeneity was detected when causal variants were typed in only some but not all individual studies, resulting in up to ~25% reduction of detection power. For certain situations, the power of the meta-analysis can be even less than that of individual studies. Additional analyses showed that the detection power was slightly improved when between-study heterogeneity was partially controlled through the random-effects model, relative to that of the fixed-effects model. Our study may aid in the planning, data analysis, and interpretation of GWAS meta-analysis results when genotype imputation is necessary.  相似文献   

8.
Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.  相似文献   

9.
The development of genome-wide association scanning (GWAS) has revolutionized the search for genetic loci associated with complex diseases. Crohn's disease (CD), together with ulcerative colitis, has been a principal beneficiary of this technology with a recent meta-analysis from the International IBD Genetics Consortium increasing the number of confirmed CD susceptibility loci to 71. When one considers that prior to the development of GWAS only three susceptibility loci had been identified, the degree of progress becomes obvious. In this article we will summarize the principal discoveries that have been made in CD genetics and explain how these have contributed to our improved understanding of disease pathogenesis.  相似文献   

10.
11.
Kuo CL  Zaykin DV 《Genetics》2011,189(1):329-340
In recent years, genome-wide association studies (GWAS) have uncovered a large number of susceptibility variants. Nevertheless, GWAS findings provide only tentative evidence of association, and replication studies are required to establish their validity. Due to this uncertainty, researchers often focus on top-ranking SNPs, instead of considering strict significance thresholds to guide replication efforts. The number of SNPs for replication is often determined ad hoc. We show how the rank-based approach can be used for sample size allocation in GWAS as well as for deciding on a number of SNPs for replication. The basis of this approach is the "ranking probability": chances that at least j true associations will rank among top u SNPs, when SNPs are sorted by P-value. By employing simple but accurate approximations for ranking probabilities, we accommodate linkage disequilibrium (LD) and evaluate consequences of ignoring LD. Further, we relate ranking probabilities to the proportion of false discoveries among top u SNPs. A study-specific proportion can be estimated from P-values, and its expected value can be predicted for study design applications.  相似文献   

12.
Robust assessment of genetic effects on quantitative traits or complex-disease risk requires synthesis of evidence from multiple studies. Frequently, studies have genotyped partially overlapping sets of SNPs within a gene or region of interest, hampering attempts to combine all the available data. By using the example of C-reactive protein (CRP) as a quantitative trait, we show how linkage disequilibrium in and around its gene facilitates use of Bayesian hierarchical models to integrate informative data from all available genetic association studies of this trait, irrespective of the SNP typed. A variable selection scheme, followed by contextualization of SNPs exhibiting independent associations within the haplotype structure of the gene, enhanced our ability to infer likely causal variants in this region with population-scale data. This strategy, based on data from a literature based systematic review and substantial new genotyping, facilitated the most comprehensive evaluation to date of the role of variants governing CRP levels, providing important information on the minimal subset of SNPs necessary for comprehensive evaluation of the likely causal relevance of elevated CRP levels for coronary-heart-disease risk by Mendelian randomization. The same method could be applied to evidence synthesis of other quantitative traits, whenever the typed SNPs vary among studies, and to assist fine mapping of causal variants.  相似文献   

13.
Methods for multivariate meta-analysis of genetic association studies are reviewed, summarized and presented in a unified framework. Modifications of standard models are described in detail in order to be applied in genetic association studies. The model based on summary data is uniformly defined for both discrete and continuous outcomes and analytical expressions for the covariance of the two jointly modeled outcomes are derived for both cases. The models based on the binary nature of the data are fitted using both prospective and retrospective likelihood. Furthermore, formal tests for assessing the genetic model of inheritance are developed based on standard normal theory. The general model is compared to the recently proposed genetic model-free bivariate approach (either using summary or binary data), and it is clearly shown that the estimates provided by this approach are nearly identical to the estimates derived by the general bivariate model using the aforementioned tests for the genetic model. The methods developed here as well as the tests, are easily implemented in all major statistical packages, escaping the need of self written software. The methods are applied in several already published meta-analyses of genetic association studies (with both discrete and continuous outcomes) and the results are compared against the widely used univariate approach as well as against the genetic model free approaches. Illustrative examples of code in Stata are given in the appendix. It is anticipated that the methods developed in this work will be widely applied in the meta-analysis of genetic association studies.  相似文献   

14.
Comparative studies of quantitative genetic and neutral marker differentiation have provided means for assessing the relative roles of natural selection and random genetic drift in explaining among-population divergence. This information can be useful for our fundamental understanding of population differentiation, as well as for identifying management units in conservation biology. Here, we provide comprehensive review and meta-analysis of the empirical studies that have compared quantitative genetic (Q(ST)) and neutral marker (F(ST)) differentiation among natural populations. Our analyses confirm the conclusion from previous reviews - based on ca. 100% more data - that the Q(ST) values are on average higher than F(ST) values [mean difference 0.12 (SD 0.27)] suggesting a predominant role for natural selection as a cause of differentiation in quantitative traits. However, although the influence of trait (life history, morphological and behavioural) and marker type (e.g. microsatellites and allozymes) on the variance of the difference between Q(ST) and F(ST) is small, there is much heterogeneity in the data attributable to variation between specific studies and traits. The latter is understandable as there is no reason to expect that natural selection would be acting in similar fashion on all populations and traits (except for fitness itself). We also found evidence to suggest that Q(ST) and F(ST) values across studies are positively correlated, but the significance of this finding remains unclear. We discuss these results in the context of utility of the Q(ST)-F(ST) comparisons as a tool for inferring natural selection, as well as associated methodological and interpretational problems involved with individual and meta-analytic studies.  相似文献   

15.
Lin J  Chen ZZ  Tian B  Hua YJ 《Gene》2007,387(1-2):15-20
RecX is a regulator of RecA activity by interacting with RecA protein or RecA filaments. Genes encoding RecX were found in genomes of a wide diversity of bacteria and some plants (e.g., Arabidopsis thaliana and Oryza sativa). Our comparative genome analysis showed that although members of the RecX family are found in many bacterial species, they are not found in archaea and the only gene found in eukaryotes is likely derived from bacteria genomes. It is therefore proposed that RecX is of bacterial origin, and the gene had presented in the common ancestor of bacteria. Moreover, bacterial RecX and plant RecX domain are homologues, and RecX domain in plants may have derived from bacteria via unknown pathways. Plant RecX-like protein was formed by a gene fusion event between a unique N-terminal domain of unknown origin and RecX domain within plant cells. Finally, three possible evolutionary pathways from bacteria to plant were discussed.  相似文献   

16.
Ana Lukic  Simon Mead 《朊病毒》2011,5(3):154-160
Over the last decade remarkable advances in genotyping and sequencing technology have resulted in hundreds of novel gene associations with disease. These have typically involved high frequency alleles in common diseases and with the advent of next generation sequencing, disease causing recessive mutations in rare inherited syndromes. Here we discuss the impact of these advances and other gene discovery methods in the prion diseases. Several quantitative trait loci in mouse have been mapped and their human counterparts analyzed (HECTD2, CPNE8); other candidate genes regions have been chosen for functional reasons (SPRN, CTSD). Human genome wide association has been done in variant Creutzfeldt-Jakob disease (CJD) and are ongoing in larger collections of sporadic CJD with findings around, but not clearly beyond, the levels of statistical significance required in these studies (THRB-RARB, STMN2). Future work will include closer integration of animal and human genetic studies, larger and combined genome wide association, analysis of structural genetic variation and next generation sequencing studies involving the entire exome or genome.Key words: prion, genetic, CJD, GWAS  相似文献   

17.
《朊病毒》2013,7(3):154-160
Over the last decade remarkable advances in genotyping and sequencing technology have resulted in hundreds of novel gene associations with disease. These have typically involved high frequency alleles in common diseases and with the advent of next generation sequencing, disease causing recessive mutations in rare inherited syndromes. Here we discuss the impact of these advances and other gene discovery methods in the prion diseases. Several quantitative trait loci in mouse have been mapped and their human counterparts analysed (HECTD2, CPNE8); other candidate genes regions have been chosen for functional reasons (SPRN, CTSD). Human genome wide association has been done in variant Creutzfeldt-Jakob disease (CJD) and are ongoing in larger collections of sporadic CJD with findings around, but not clearly beyond, the levels of statistical significance required in these studies (THRB-RARB, STMN2). Future work will include closer integration of animal and human genetic studies, larger and combined genome wide association, analysis of structural genetic variantion and next generation sequencing studies involving the entire coding exome or genome.  相似文献   

18.
Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease‐associated non‐synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site‐specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non‐interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease‐associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome‐wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease. Proteins 2015; 83:428–435. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
Genome-wide association studies require accurate and fast statistical methods to identify relevant signals from the background noise generated by a huge number of simultaneously tested hypotheses. It is now commonly accepted that exact computations of association probability value (P-value) are preferred to chi(2) and permutation-based approximations. Following the same principle, the ExactFDR software package improves speed and accuracy of the permutation-based false discovery rate (FDR) estimation method by replacing the permutation-based estimation of the null distribution by the generalization of the algorithm used for computing individual exact P-values. It provides a quick and accurate non-conservative estimator of the proportion of false positives in a given selection of markers, and is therefore an efficient and pragmatic tool for the analysis of genome-wide association studies.  相似文献   

20.
Sen K  Ghosh TC 《Gene》2012,501(2):164-170
Pseudogenes, the 'genomic fossils' present portrayal of evolutionary history of human genome. The human genes configuring pseudogenes are also now coming forth as important resources in the study of human protein evolution. In this communication, we explored evolutionary conservation of the genes forming pseudogenes over the genes lacking any pseudogene and delving deeper, we probed an evolutionary rate difference between the disease genes in the two groups. We illustrated this differential evolutionary pattern by gene expressivity, number of regulatory miRNA targeting per gene, abundance of protein complex forming genes and lesser percentage of protein intrinsic disorderness. Furthermore, pseudogenes are observed to harbor sequence variations, over their entirety, those become degenerative disease-causing mutations though the disease involvement of their progenitors is still unexplored. Here, we unveiled an immense association of disease genes in the genes casting pseudogenes in human. We interpreted the issue by disease associated miRNA targeting, genes containing polymorphisms in miRNA target sites, abundance of genes having disease causing non-synonymous mutations, disease gene specific network properties, presence of genes having repeat regions, affluence of dosage sensitive genes and the presence of intrinsically unstructured protein regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号