首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/.  相似文献   

2.
3.
4.
Permutation of class labels is a common approach in microarray analysis. It is assumed to produce random score distributions, which are not affected by biological differences between samples. However, hidden confounding variables like the genetic background of patients or undetected experimental artifacts leave traces in the expression data contaminating the score distributions obtained from random permutations. While the effects of known confounders can be compensated using established methodology, little is known on how to deal with unknown confounders. We discuss a computational method called permutation filtering, which aims to borrow information across genes to detect and compensate the effects of unknown confounders.  相似文献   

5.
6.
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.  相似文献   

7.
Over a decade of genome-wide association studies (GWAS) have led to the finding of extreme polygenicity of complex traits. The phenomenon that “all genes affect every complex trait” complicates Mendelian Randomization (MR) studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing MR methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. We propose a comprehensive framework GRAPPLE to analyze the causal effect of target risk factors with heterogeneous genetic instruments and identify possible pleiotropic patterns from data. By using GWAS summary statistics, GRAPPLE can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, determine the causal direction and perform multivariable MR to adjust for confounding risk factors. With GRAPPLE, we analyze the effect of blood lipids, body mass index, and systolic blood pressure on 25 disease outcomes, gaining new information on their causal relationships and potential pleiotropic pathways involved.  相似文献   

8.
9.
The study of codon usage bias is an important research area that contributes to our understanding of molecular evolution, phylogenetic relationships, respiratory lifestyle, and other characteristics. Translational efficiency bias is perhaps the most well-studied codon usage bias, as it is frequently utilized to predict relative protein expression levels. We present a novel approach to isolating translational efficiency bias in microbial genomes. There are several existent methods for isolating translational efficiency bias. Previous approaches are susceptible to the confounding influences of other potentially dominant biases. Additionally, existing approaches to identifying translational efficiency bias generally require both genomic sequence information and prior knowledge of a set of highly expressed genes. This novel approach provides more accurate results from sequence information alone by resisting the confounding effects of other biases. We validate this increase in accuracy in isolating translational efficiency bias on 10 microbial genomes, five of which have proven particularly difficult for existing approaches due to the presence of strong confounding biases.  相似文献   

10.
Wessel J  Zapala MA  Schork NJ 《Genomics》2007,90(1):132-142
The availability of high-throughput genotyping technologies and microarray assays has allowed researchers to consider pursuing investigations whose ultimate goal is the identification of genetic variations that influence levels of gene expression, e.g., "expression quantitative trait locus" or "eQTL" mapping studies. However, the large number of genes whose expression levels can be tested for association with genetic variations in such studies can create both statistical and biological interpretive problems. We consider the integrated analysis of eQTL mapping data that incorporates pathway, function, and disease process information. The goal of this analysis is to determine if compelling patterns emerge from the data that are consistent with the notion that perturbations in the molecular physiologic environment induced by genetic variations implicate the expression patterns of multiple genes via genetic network relationships or feedback mechanisms. We apply available genetic network and pathway analysis software, as well as a novel regression analysis technique, to carry out the proposed studies. We also consider extensions of the proposed strategies and areas of future research.  相似文献   

11.
Conventional statistical methods for interpreting microarray data require large numbers of replicates in order to provide sufficient levels of sensitivity. We recently described a method for identifying differentially-expressed genes in one-channel microarray data 1. Based on the idea that the variance structure of microarray data can itself be a reliable measure of noise, this method allows statistically sound interpretation of as few as two replicates per treatment condition. Unlike the one-channel array, the two-channel platform simultaneously compares gene expression in two RNA samples. This leads to covariation of the measured signals. Hence, by accounting for covariation in the variance model, we can significantly increase the power of the statistical test. We believe that this approach has the potential to overcome limitations of existing methods. We present here a novel approach for the analysis of microarray data that involves modeling the variance structure of paired expression data in the context of a Bayesian framework. We also describe a novel statistical test that can be used to identify differentially-expressed genes. This method, bivariate microarray analysis (BMA), demonstrates dramatically improved sensitivity over existing approaches. We show that with only two array replicates, it is possible to detect gene expression changes that are at best detected with six array replicates by other methods. Further, we show that combining results from BMA with Gene Ontology annotation yields biologically significant results in a ligand-treated macrophage cell system.  相似文献   

12.
Cassava (Manihot esculenta Crantz, 2n = 36) is a global food security crop. It has a highly heterozygous genome, high genetic load, and genotype-dependent asynchronous flowering. It is typically propagated by stem cuttings and any genetic variation between haplotypes, including large structural variations, is preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this results in artifacts and an oversimplification of heterozygous regions. We used a combination of Pacific Biosciences (PacBio), Illumina, and Hi-C to resolve each haplotype of the genome of a farmer-preferred cassava line, TME7 (Oko-iyawo). PacBio reads were assembled using the FALCON suite. Phase switch errors were corrected using FALCON-Phase and Hi-C read data. The ultralong-range information from Hi-C sequencing was also used for scaffolding. Comparison of the two phases revealed >5000 large haplotype-specific structural variants affecting over 8 Mb, including insertions and deletions spanning thousands of base pairs. The potential of these variants to affect allele-specific expression was further explored. RNA-sequencing data from 11 different tissue types were mapped against the scaffolded haploid assembly and gene expression data are incorporated into our existing easy-to-use web-based interface to facilitate use by the broader plant science community. These two assemblies provide an excellent means to study the effects of heterozygosity, haplotype-specific structural variation, gene hemizygosity, and allele-specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.  相似文献   

13.
Iron disorders of genetic origin are mainly composed of iron overload diseases, the most frequent being HFE-related hemochromatosis. Hepcidin deficiency underlies iron overload in HFE-hemochromatosis as well as in several other genetic iron excess disorders, such as hemojuvelin or hepcidin-related hemochromatosis and transferrin receptor 2-related hemochromatosis. Deficiency of ferroportin, the only known cellular protein iron exporter, produces iron overload in the typical form of ferroportin disease. By contrast, genetically enhanced hepcidin production, as observed in matriptase-2 deficiency, generates iron-refractory iron deficiency anemia. Diagnosis of these iron storage disorders is usually established noninvasively through combined biochemical, imaging and genetic approaches. Moreover, improved knowledge of the molecular mechanisms accounting for the variations of iron stores opens the way of novel therapeutic approaches aiming to restore normal iron homeostasis. In this review, we will summarize recent findings about these various genetic entities that have been identified owing to an exemplary interplay between clinicians and basic scientists.  相似文献   

14.
15.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.  相似文献   

16.
17.
18.
The importance of single-cell level data is increasingly appreciated, and significant advances in this direction have been made in recent years. Common to these technologies is the need to physically segregate individual cells into containers, such as wells or chambers of a micro-fluidics chip. High-throughput Single-Cell Labeling (Hi-SCL) in drops is a novel method that uses drop-based libraries of oligonucleotide barcodes to index individual cells in a population. The use of drops as containers, and a microfluidics platform to manipulate them en-masse, yields a highly scalable methodological framework. Once tagged, labeled molecules from different cells may be mixed without losing the cell-of-origin information. Here we demonstrate an application of the method for generating RNA-sequencing data for multiple individual cells within a population. Barcoded oligonucleotides are used to prime cDNA synthesis within drops. Barcoded cDNAs are then combined and subjected to second generation sequencing. The data are deconvoluted based on the barcodes, yielding single-cell mRNA expression data. In a proof-of-concept set of experiments we show that this method yields data comparable to other existing methods, but with unique potential for assaying very large numbers of cells.  相似文献   

19.
20.
Hereditary coproporphyria (HCP) is the least common of the autosomal dominant acute hepatic porphyrias. It results from mutations in the CPO gene that encodes the mitochondrial enzyme, coproporphyrinogen oxidase. A few patients have also been reported who are homoallellic or heteroallelic for CPO mutations and are clinically distinct from those with HCP. In such patients the presence of a specific mutation (K404E) on one or both alleles produces a neonatal hemolytic anemia that is known as "harderoporphyria"; mutations on both alleles elsewhere in the gene give rise to the "homozygous" variant of HCP. The molecular relationship between these disorders and HCP has not been defined. We describe the molecular investigation and clinical features of 17 unrelated British patients with HCP. Ten novel and four previously reported CPO mutations, together with three previously unrecognized single-nucleotide polymorphisms, were identified in 15 of the 17 patients. HCP is more heterogeneous than other acute porphyrias, with all but one mutation being restricted to a single family, with a predominance of missense mutations (10 missense, 2 nonsense, 1 frameshift, and 1 splice site). Of the four known mutations, one (R331W) has previously been reported to cause disease only in homozygotes. Heterologous expression of another mutation (R401W) demonstrated functional properties similar to those of the K404E harderoporphyria mutation. In all patients, clinical presentation was uniform, in spite of the wide range (1%-64%) of residual coproporphyrinogen oxidase activity, as determined by heterologous expression. Our findings add substantially to knowledge of the molecular epidemiology of HCP, show that single copies of CPO mutations that are known or predicted to cause "homozygous" HCP or harderoporphyria can produce typical HCP in adults, and demonstrate that the severity of the phenotype does not correlate with the degree of inactivation by mutation of coproporphyrinogen oxidase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号