首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs—homologs separated by a speciation and a duplication event, respectively—provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model''s estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific—e.g. “penicillin binding,” “tRNA aminoacylation for protein translation,” or “pathogenesis”—and are freely available at http://gorbi.irb.hr/.  相似文献   

2.
In genome-wide association studies (GWAS), the association between each single nucleotide polymorphism (SNP) and a phenotype is assessed statistically. To further explore genetic associations in GWAS, we considered two specific forms of biologically plausible SNP-SNP interactions, ‘SNP intersection’ and ‘SNP union,’ and analyzed the Crohn''s Disease (CD) GWAS data of the Wellcome Trust Case Control Consortium for these interactions using a limited form of logic regression. We found strong evidence of CD-association for 195 genes, identifying novel susceptibility genes (e.g., ISX, SLCO6A1, TMEM183A) as well as confirming many previously identified susceptibility genes in CD GWAS (e.g., IL23R, NOD2, CYLD, NKX2-3, IL12RB2, ATG16L1). Notably, 37 of the 59 chromosomal locations indicated for CD-association by a meta-analysis of CD GWAS, involving over 22,000 cases and 29,000 controls, were represented in the 195 genes, as well as some chromosomal locations previously indicated only in linkage studies, but not in GWAS. We repeated the analysis with two smaller GWASs from the Database of Genotype and Phenotype (dbGaP): in spite of differences of populations and study power across the three datasets, we observed some consistencies across the three datasets. Notable examples included TMEM183A and SLCO6A1 which exhibited strong evidence consistently in our WTCCC and both of the dbGaP SNP-SNP interaction analyses. Examining these specific forms of SNP interactions could identify additional genetic associations from GWAS. R codes, data examples, and a ReadMe file are available for download from our website: http://www.ualberta.ca/~yyasui/homepage.html.  相似文献   

3.
Genome wide association studies (GWAS) identify susceptibility loci for complex traits, but do not identify particular genes of interest. Integration of functional and network information may help in overcoming this limitation and identifying new susceptibility loci. Using GWAS and comorbidity data, we present a network-based approach to predict candidate genes for lipid and lipoprotein traits. We apply a prediction pipeline incorporating interactome, co-expression, and comorbidity data to Global Lipids Genetics Consortium (GLGC) GWAS for four traits of interest, identifying phenotypically coherent modules. These modules provide insights regarding gene involvement in complex phenotypes with multiple susceptibility alleles and low effect sizes. To experimentally test our predictions, we selected four candidate genes and genotyped representative SNPs in the Malmö Diet and Cancer Cardiovascular Cohort. We found significant associations with LDL-C and total-cholesterol levels for a synonymous SNP (rs234706) in the cystathionine beta-synthase (CBS) gene (p = 1 × 10−5 and adjusted-p = 0.013, respectively). Further, liver samples taken from 206 patients revealed that patients with the minor allele of rs234706 had significant dysregulation of CBS (p = 0.04). Despite the known biological role of CBS in lipid metabolism, SNPs within the locus have not yet been identified in GWAS of lipoprotein traits. Thus, the GWAS-based Comorbidity Module (GCM) approach identifies candidate genes missed by GWAS studies, serving as a broadly applicable tool for the investigation of other complex disease phenotypes.Genome wide association studies (GWAS)1 meta-analyses have pinpointed a number of new gene regions contributing to multifactorial diseases. GWAS typically find limited numbers of loci that contribute modestly to complex phenotypes (1), and GLGC meta-analysis of GWAS data has reached the limit of what can be expected (2) without the use of alternative strategies. Given that susceptibility loci for complex traits are unlikely to be randomly distributed in the genome (3), we might expect that the genes associated with a disease will be more likely to be present within the same pathways or functional groupings. In published cases, pathway based GWAS analysis provides an alternative approach to the dissection of complex disease traits (4, 5). In addition, nominal GWAS p values superimposed upon the human molecular network have been used to identify genes associated with multiple sclerosis (6), and the disease association protein–protein link evaluator (DAPPLE) has been used to find significant interactions among proteins encoded by genes in loci associated with other particular diseases (7). Other approaches incorporate heterogeneous molecular data such as linkage studies, cross species conservation measures, gene expression data and protein–protein interactions to better understand GWAS results (8, 9). Integrating molecular network information, pathway analyses, and GWAS data thus holds promise for identifying new susceptibility loci and improving the identification of relevant candidate genes.If a gene is involved in a specific functional process or disease, its molecular network neighbors might also be suspected to have some role (3). In line with this “local” hypothesis, proteins involved in the same disease show a high propensity to interact (10) or cluster together (11) with each other. Interactions between variations in multiple genes, each with strong or modest effects, perturbing the same pathways or modules, may govern complex traits (3, 6). The molecular triangulation (MT) algorithm can be applied to rank seed genes according to their common disease associated neighbors, assigning closer and more connected neighbors higher values (12). Interactions between modestly associated MT genes may be indicative of coherent disease pathways or of genes conferring susceptibility to disease in a coordinated manner. The jActiveModule method (13) combines seed gene scores with biologically relevant interactions to identify network modules where perturbations causative of disease are more likely to reside. Lastly, although not yet implemented at the module level, phenotypic coherence between interacting pairs of genes has been quantified using the combination of molecular level gene to disease relationships and Medicare comorbidity data (14, 15).We believe that GWAS significant SNPs and variants representing potential candidate genes can use the above strategies to reveal more about the missing heritability of complex phenotypes. The most important risk factors for coronary artery disease (CAD) include serum concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG). We present a GWAS-based meta-analysis Comorbid Module (GCM) approach that uses significant (p < 5 × 10−8) GWAS signals for these four traits in the context of molecular networks to prioritize modules of disease-associated candidate genes. We evaluate our approach experimentally through allelic association and genotyping within the Malmö Diet and Cancer Cardiovascular Cohort (MDC-CC) for SNPs representing top candidate genes.  相似文献   

4.
RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html.  相似文献   

5.
Welcome Bender  Maura Lucas 《Genetics》2013,193(4):1135-1147
The bithorax complex in Drosophila melanogaster includes three homeobox-containing genes—Ultrabithorax (Ubx), abdominal-A (abd-A), and Abdominal-B (Abd-B)—which are required for the proper differentiation of the posterior 10 segments of the body. Each of these genes has multiple distinct regulatory regions; there is one for each segmental unit of the body plan where the genes are expressed. One additional protein- coding gene in the bithorax complex, Glut3, a sugar-transporter homolog, can be deleted without phenotype. We focus here on the upstream regulatory region for Ubx, the bithoraxoid (bxd) domain, and its border with the adjacent infraabdominal-2 (iab-2) domain, which controls abdA. These two domains can be defined by the phenotypes of rearrangement breakpoints, and by the expression patterns of enhancer traps. In D. virilis, the homeotic cluster is split between Ubx and abd-A, and so the border can also be located by a sequence comparison between species. When the border region is deleted in melanogaster, the flies show a dominant phenotype called Front-ultraabdominal (Fub); the first abdominal segment is transformed into a copy of the second abdominal segment. Thus, the border blocks the spread of activation from the bxd domain into the iab-2 domain.  相似文献   

6.
Genome-wide association studies (GWAS) explore the genetic causes of complex diseases. However, classical approaches ignore the biological context of the genetic variants and genes under study. To address this shortcoming, one can use biological networks, which model functional relationships, to search for functionally related susceptibility loci. Many such network methods exist, each arising from different mathematical frameworks, pre-processing steps, and assumptions about the network properties of the susceptibility mechanism. Unsurprisingly, this results in disparate solutions. To explore how to exploit these heterogeneous approaches, we selected six network methods and applied them to GENESIS, a nationwide French study on familial breast cancer. First, we verified that network methods recovered more interpretable results than a standard GWAS. We addressed the heterogeneity of their solutions by studying their overlap, computing what we called the consensus. The key gene in this consensus solution was COPS5, a gene related to multiple cancer hallmarks. Another issue we observed was that network methods were unstable, selecting very different genes on different subsamples of GENESIS. Therefore, we proposed a stable consensus solution formed by the 68 genes most consistently selected across multiple subsamples. This solution was also enriched in genes known to be associated with breast cancer susceptibility (BLM, CASP8, CASP10, DNAJC1, FGFR2, MRPS30, and SLC4A7, P-value = 3 × 10−4). The most connected gene was CUL3, a regulator of several genes linked to cancer progression. Lastly, we evaluated the biases of each method and the impact of their parameters on the outcome. In general, network methods preferred highly connected genes, even after random rewirings that stripped the connections of any biological meaning. In conclusion, we present the advantages of network-guided GWAS, characterize their shortcomings, and provide strategies to address them. To compute the consensus networks, implementations of all six methods are available at https://github.com/hclimente/gwas-tools.  相似文献   

7.
8.
Autism spectrum disorder (ASD) is one of the most prevalent and highly heritable neurodevelopmental disorders in humans. There is significant evidence that the onset and severity of ASD is governed in part by complex genetic mechanisms affecting the normal development of the brain. To date, a number of genes have been associated with ASD. However, the temporal and spatial co-expression of these genes in the brain remain unclear. To address this issue, we examined the co-expression network of 26 autism genes from AutDB (http://mindspec.org/autdb.html), in the framework of 3,041 genes whose expression energies have the highest correlation between the coronal and sagittal images from the Allen Mouse Brain Atlas database (http://mouse.brain-map.org). These data were derived from in situ hybridization experiments conducted on male, 56-day old C57BL/6J mice co-registered to the Allen Reference Atlas, and were used to generate a normalized co-expression matrix indicating the cosine similarity between expression vectors of genes in this database. The network formed by the autism-associated genes showed a higher degree of co-expression connectivity than seen for the other genes in this dataset (Kolmogorov–Smirnov P = 5×10−28). Using Monte Carlo simulations, we identified two cliques of co-expressed genes that were significantly enriched with autism genes (A Bonferroni corrected P<0.05). Genes in both these cliques were significantly over-expressed in the cerebellar cortex (P = 1×10−5) suggesting possible implication of this brain region in autism. In conclusion, our study provides a detailed profiling of co-expression patterns of autism genes in the mouse brain, and suggests specific brain regions and new candidate genes that could be involved in autism etiology.  相似文献   

9.
Genome-wide association studies (GWAS) and candidate gene studies have identified the REL and PRKCQ genes as risk loci for various autoimmune diseases. The purpose of the present study was to investigate the association of the REL and PRKCQ genes with Behcet’s disease (BD) in a Chinese Han population. A case-control study was conducted on three single nucleotide polymorphisms (SNPs), rs13031237, rs702873, and rs842647 of the REL gene and three SNPs (rs4750316, rs11258747, and rs947474) of the PRKCQ gene using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) in a total of 623 BD patients and 1,074 healthy controls. Multiple variables were assessed, including age, sex distribution, and extra-ocular findings. In the present study, the frequencies of rs842647 GG genotypes and rs842647 G alleles were significantly higher in patients than in controls and those of the rs842647 AG genotypes were lower in patients than in controls [GG genotype: Bonferroni corrected P-value for gender adjustment (Pca) = 0.0074, odds ratio (OR) = 1.63; G allele: Pca = 0.0072, OR = 1.57; AG genotype: Pca = 0.024, OR = 0.63, respectively]. No statistically significant differences in the frequencies of rs702873, rs13031237, rs4750316, rs11258747, and rs947474 between BD patients and controls were observed. Stratification analysis indicated that the REL rs842647 polymorphism was associated with BD patients with skin lesions. No significant association of the other five SNPs between BD patients with other extra-ocular findings, including genital ulcer, arthritis, and positive pathergy test results was found. The REL rs842647 polymorphism may be a susceptibility factor for BD pathogenesis and skin lesions, which indicate that c-Rel may be involved in the pathogenesis and skin lesions of BD through the NF-κB pathway.  相似文献   

10.
11.
12.
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.  相似文献   

13.
Nearly four decades ago, Roose & Gottlieb (Roose & Gottlieb 1976 Evolution 30, 818–830. (doi:10.2307/2407821)) showed that the recently derived allotetraploids Tragopogon mirus and T. miscellus combined the allozyme profiles of their diploid parents (T. dubius and T. porrifolius, and T. dubius and T. pratensis, respectively). This classic paper addressed the link between genotype and biochemical phenotype and documented enzyme additivity in allopolyploids. Perhaps more important than their model of additivity, however, was their demonstration of novelty at the biochemical level. Enzyme multiplicity—the production of novel enzyme forms in the allopolyploids—can provide an extensive array of polymorphism for a polyploid individual and may explain, for example, the expanded ranges of polyploids relative to their diploid progenitors. In this paper, we extend the concept of evolutionary novelty in allopolyploids to a range of genetic and ecological features. We observe that the dynamic nature of polyploid genomes—with alterations in gene content, gene number, gene arrangement, gene expression and transposon activity—may generate sufficient novelty that every individual in a polyploid population or species may be unique. Whereas certain combinations of these features will undoubtedly be maladaptive, some unique combinations of newly generated variation may provide tremendous evolutionary potential and adaptive capabilities.  相似文献   

14.
Rheumatoid arthritis (RA), osteoarthritis (OA), and periodontal disease (PD) are chronic inflammatory diseases that are globally prevalent, and pose a public health concern. The search for a potential mechanism linking PD to RA and OA continues, as it could play a significant role in disease prevention and treatment. Recent studies have linked RA, OA, and PD to Porphyromonas gingivalis (PG), a periodontal bacterium, through a similar dysregulation in an inflammatory mechanism. This study aimed to identify potential gene signatures that could assist in early diagnosis as well as gain insight into the molecular mechanisms of these diseases. The expression data sets with the series IDs GSE97779, GSE123492, and GSE24897 for macrophages of RA, OA synovium, and PG stimulated macrophages (PG-SM), respectively, were retrieved and screened for differentially expressed genes (DEGs). The 72 common DEGs among RA, OA, and PG-SM were further subjected to gene–gene correlation analysis. A GeneMANIA interaction network of the 47 highly correlated DEGs comprises 53 nodes and 271 edges. Network centrality analysis identified 15 hub genes, 6 of which are DEGs (API5, ATE1, CCNG1, EHD1, RIN2, and STK39). Additionally, two significantly up-regulated non-hub genes (IER3 and RGS16) showed interactions with hub genes. Functional enrichment analysis of the genes showed that “apoptotic regulation” and “inflammasomes” were among the major pathways. These eight genes can serve as important signatures/targets, and provide new insights into the molecular mechanism of PG-induced RA, OA, and PD.  相似文献   

15.
Optimal health is maintained by interaction of multiple intrinsic and environmental factors at different levels of complexity—from molecular, to physiological, to social. Understanding and quantification of these interactions will aid design of successful health interventions. We introduce the reference network concept as a platform for multi-level exploration of biological relations relevant for metabolic health, by integration and mining of biological interactions derived from public resources and context-specific experimental data. A White Adipose Tissue Health Reference Network (WATRefNet) was constructed as a resource for discovery and prioritization of mechanism-based biomarkers for white adipose tissue (WAT) health status and the effect of food and drug compounds on WAT health status. The WATRefNet (6,797 nodes and 32,171 edges) is based on (1) experimental data obtained from 10 studies addressing different adiposity states, (2) seven public knowledge bases of molecular interactions, (3) expert’s definitions of five physiologically relevant processes key to WAT health, namely WAT expandability, Oxidative capacity, Metabolic state, Oxidative stress and Tissue inflammation, and (4) a collection of relevant biomarkers of these processes identified by BIOCLAIMS (http://bioclaims.uib.es). The WATRefNet comprehends multiple layers of biological complexity as it contains various types of nodes and edges that represent different biological levels and interactions. We have validated the reference network by showing overrepresentation with anti-obesity drug targets, pathology-associated genes and differentially expressed genes from an external disease model dataset. The resulting network has been used to extract subnetworks specific to the above-mentioned expert-defined physiological processes. Each of these process-specific signatures represents a mechanistically supported composite biomarker for assessing and quantifying the effect of interventions on a physiological aspect that determines WAT health status. Following this principle, five anti-diabetic drug interventions and one diet intervention were scored for the match of their expression signature to the five biomarker signatures derived from the WATRefNet. This confirmed previous observations of successful intervention by dietary lifestyle and revealed WAT-specific effects of drug interventions. The WATRefNet represents a sustainable knowledge resource for extraction of relevant relationships such as mechanisms of action, nutrient intervention targets and biomarkers and for assessment of health effects for support of health claims made on food products.

Electronic supplementary material

The online version of this article (doi:10.1007/s12263-014-0439-x) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.

Background

Gene expression genetic studies in human tissues and cells identify cis- and trans-acting expression quantitative trait loci (eQTLs). These eQTLs provide insights into regulatory mechanisms underlying disease risk. However, few studies systematically characterized eQTL results across cell and tissues types. We synthesized eQTL results from >50 datasets, including new primary data from human brain, peripheral plaque and kidney samples, in order to discover features of human eQTLs.

Results

We find a substantial number of robust cis-eQTLs and far fewer trans-eQTLs consistent across tissues. Analysis of 45 full human GWAS scans indicates eQTLs are enriched overall, and above nSNPs, among positive statistical signals in genetic mapping studies, and account for a significant fraction of the strongest human trait effects. Expression QTLs are enriched for gene centricity, higher population allele frequencies, in housekeeping genes, and for coincidence with regulatory features, though there is little evidence of 5′ or 3′ positional bias. Several regulatory categories are not enriched including microRNAs and their predicted binding sites and long, intergenic non-coding RNAs. Among the most tissue-ubiquitous cis-eQTLs, there is enrichment for genes involved in xenobiotic metabolism and mitochondrial function, suggesting these eQTLs may have adaptive origins. Several strong eQTLs (CDK5RAP2, NBPFs) coincide with regions of reported human lineage selection. The intersection of new kidney and plaque eQTLs with related GWAS suggest possible gene prioritization. For example, butyrophilins are now linked to arterial pathogenesis via multiple genetic and expression studies. Expression QTL and GWAS results are made available as a community resource through the NHLBI GRASP database [http://apps.nhlbi.nih.gov/grasp/].

Conclusions

Expression QTLs inform the interpretation of human trait variability, and may account for a greater fraction of phenotypic variability than protein-coding variants. The synthesis of available tissue eQTL data highlights many strong cis-eQTLs that may have important biologic roles and could serve as positive controls in future studies. Our results indicate some strong tissue-ubiquitous eQTLs may have adaptive origins in humans. Efforts to expand the genetic, splicing and tissue coverage of known eQTLs will provide further insights into human gene regulation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-532) contains supplementary material, which is available to authorized users.  相似文献   

18.
Since its identification in 1983, HIV-1 has been the focus of a research effort unprecedented in scope and difficulty, whose ultimate goals — a cure and a vaccine – remain elusive. One of the fundamental challenges in accomplishing these goals is the tremendous genetic variability of the virus, with some genes differing at as many as 40% of nucleotide positions among circulating strains. Because of this, the genetic bases of many viral phenotypes, most notably the susceptibility to neutralization by a particular antibody, are difficult to identify computationally. Drawing upon open-source general-purpose machine learning algorithms and libraries, we have developed a software package IDEPI (IDentify EPItopes) for learning genotype-to-phenotype predictive models from sequences with known phenotypes. IDEPI can apply learned models to classify sequences of unknown phenotypes, and also identify specific sequence features which contribute to a particular phenotype. We demonstrate that IDEPI achieves performance similar to or better than that of previously published approaches on four well-studied problems: finding the epitopes of broadly neutralizing antibodies (bNab), determining coreceptor tropism of the virus, identifying compartment-specific genetic signatures of the virus, and deducing drug-resistance associated mutations. The cross-platform Python source code (released under the GPL 3.0 license), documentation, issue tracking, and a pre-configured virtual machine for IDEPI can be found at https://github.com/veg/idepi.
This is a PLOS Computational Biology Software Article
  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号