首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
W. Xu  S. Li  Z. Zhang  J. Hu  Y. Zhao 《Animal genetics》2019,50(6):726-732
Differentially expressed gene (DEG) analysis is a major approach for interpreting phenotype differences and produces a large number of candidate genes. Given that it is burdensome to validate too many genes through benchwork, an urgent need exists for DEG prioritization. Here, a novel method is proposed for prioritizing bona fide DEGs by constructing the normal range of gene expression through integrating public expression data. Prioritization was performed by ranking the differences in cumulative probability for genes in case and control groups. DEGs from a study on pig muscle tissue were used to evaluate the prioritization accuracy. The results showed that the method reached an area under the receiver operating characteristic curve of 96.42% and can effectively shorten the list of candidate genes from a differential expression experiment to find novel causal genes. Our method can be easily extended to other tissues or species to promote functional research in broad applications.  相似文献   

2.
We apply a novel gene expression network analysis to a cohort of 182 recently reported candidate Epileptic Encephalopathy genes to identify those most likely to be true Epileptic Encephalopathy genes. These candidate genes were identified as having single variants of likely pathogenic significance discovered in a large-scale massively parallel sequencing study. Candidate Epileptic Encephalopathy genes were prioritized according to their co-expression with 29 known Epileptic Encephalopathy genes. We utilized developing brain and adult brain gene expression data from the Allen Human Brain Atlas (AHBA) and compared this to data from Celsius: a large, heterogeneous gene expression data warehouse. We show replicable prioritization results using these three independent gene expression resources, two of which are brain-specific, with small sample size, and the third derived from a heterogeneous collection of tissues with large sample size. Of the nineteen genes that we predicted with the highest likelihood to be true Epileptic Encephalopathy genes, two (GNAO1 and GRIN2B) have recently been independently reported and confirmed. We compare our results to those produced by an established in silico prioritization approach called Endeavour, and finally present gene expression networks for the known and candidate Epileptic Encephalopathy genes. This highlights sub-networks of gene expression, particularly in the network derived from the adult AHBA gene expression dataset. These networks give clues to the likely biological interactions between Epileptic Encephalopathy genes, potentially highlighting underlying mechanisms and avenues for therapeutic targets.  相似文献   

3.
Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
This article is part of the “Translational Bioinformatics" collection for PLOS Computational Biology.

What to Learn in This Chapter

  • Identification of specific disease genes is complicated by gene pleiotropy, polygenic nature of many diseases, varied influence of environmental factors, and overlying genome variation.
  • Gene prioritization is the process of assigning likelihood of gene involvement in generating a disease phenotype. This approach narrows down, and arranges in the order of likelihood in disease involvement, the set of genes to be tested experimentally.
  • The gene “priority" in disease is assigned by considering a set of relevant features such as gene expression and function, pathway involvement, and mutation effects.
  • In general, disease genes tend to 1) interact with other disease genes, 2) harbor functionally deleterious mutations, 3) code for proteins localizing to the affected biological compartment (pathway, cellular space, or tissue), 4) have distinct sequence properties such as longer length and a higher number of exons, 5) have more orthologues and fewer paralogues.
  • Data sources (directly experimental, extracted from knowledge-bases, or text-mining based) and mathematical/computational models used for gene prioritization vary widely.
  相似文献   

4.
5.
6.
Complex diseases, such as obesity, type II diabetes and chronic obstructive pulmonary disease (COPD) as metabolic disorder-related diseases are major concern for worldwide public health in the 21st century. The identification of these disease risk genes has attracted increasing interest in computational systems biology. In this paper, a novel method was proposed to prioritize disease risk genes (PDRG) by integrating functional annotations, protein interactions and gene expression information to assess similarity between genes in a disease-related metabolic network. The gene prioritization method was successfully carried out for obesity and COPD, the effectiveness of which was superior to those of ToppGene and ToppNet in both literature validation and recall rate by LOOCV. Our method could be applied broadly to other metabolism-related diseases, helping to prioritize novel disease risk genes, and could shed light on diagnosis and effective therapies.  相似文献   

7.
MOTIVATION: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS: When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY: The programs are available by request from the authors.  相似文献   

8.

Background

In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET).

Results

Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2) and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders.

Discussion

UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis.  相似文献   

9.
Many cell activities are organized as a network, and genes are clustered into co-expressed groups if they have the same or closely related biological function or they are co-regulated. In this study, based on an assumption that a strong candidate disease gene is more likely close to gene groups in which all members coordinately differentially express than individual genes with differential expression, we developed a novel disease gene prioritization method GroupRank by integrating gene co-expression and differential expression information generated from microarray data as well as PPI network. A candidate gene is ranked high using GroupRank if it is differentially expressed in disease and control or is close to differentially co-expressed groups in PPI network. We tested our method on data sets of lung, kidney, leukemia and breast cancer. The results revealed GroupRank could efficiently prioritize disease genes with significantly improved AUC value in comparison to the previous method with no consideration of co-exprssed gene groups in PPI network. Moreover, the functional analyses of the major contributing gene group in gene prioritization of kidney cancer verified that our algorithm GroupRank not only ranks disease genes efficiently but also could help us identify and understand possible mechanisms in important physiological and pathological processes of disease.  相似文献   

10.
What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.  相似文献   

11.
Two of the major challenges in functional genomics are to identify genes that play a key role in biological processes, and to elucidate the biological role of the large numbers of genes whose function is poorly characterized or still completely unknown. In this study, a combination of large-scale expressed sequence tag sequencing, high-throughput gene silencing and visual phenotyping was used to identify genes in which partial inhibition of expression leads to marked phenotypic changes, mostly on leaves. Three normalized tobacco (Nicotiana tabacum) cDNA libraries were prepared directly in a binary vector using different tissues of tobacco as an RNA source, randomly sequenced and clustered. The Agrobacterium-tobacco leaf disc transformation system was used to generate sets of antisense or co-suppression transgenic tobacco plants for over 20 000 randomly chosen clones, each representing an independent cluster. After transfer to the glasshouse, transgenic plants were scored visually after 10-14 days for changes in growth, leaf form and chlorosis or necrosis. Putative hits were validated by repeating the transformation. This procedure is more stringent than the analysis of knockout mutants, because it requires that even a partial decrease in expression generates a phenotype. This procedure identified 88 validated gene/phenotype relations. These included several previously characterized gene/phenotype relationships, demonstrating the validity of the approach. For about one-third, a function could be inferred, but a loss-of-function phenotype had not been described previously. Strikingly, almost one-half of the validated genes were poorly annotated, or had no known function. For 77 of these tobacco sequences, a single or small number of potential orthologues were identified in Arabidopsis. The genes for which orthologues were identified in Arabidopsis included about one-half of the genes whose function was completely unknown. Comparison with published gene/phenotype relations for Arabidopsis knockout mutants revealed surprisingly little overlap with the present study. Our results indicate that partial gene silencing identifies novel gene/phenotype relationships, which are distinct from those uncovered by knockout screens. They also show that it is possible to perform these analyses in a crop species in which full genome sequence information is lacking, and subsequently to transfer the information to a reference species in which functional studies can be performed more effectively.  相似文献   

12.
13.
Understanding the organization and evolution of social complexity is a major task because it requires building an understanding of mechanisms operating at different levels of biological organization from genes to social interactions. I discuss here, a unique forward genetic approach spanning more than 30 years beginning with human-assisted colony-level selection for a single social trait, the amount of pollen honey bees (Apis mellifera L.) store. The goal was to understand a complex social trait from the social phenotype to genes responsible for observed trait variation. The approach combined the results of colony-level selection with detailed studies of individual behavior and physiology resulting in a mapped, integrated phenotypic architecture composed of correlative relationships between traits spanning anatomy, physiology, sensory response systems, and individual behavior that affect individual foraging decisions. Colony-level selection reverse engineered the architecture of an integrated phenotype of individuals resulting in changes in the social trait. Quantitative trait locus (QTL) studies combined with an exceptionally high recombination rate (60 kb/cM), and a phenotypic map, provided a genotype–phenotype map of high complexity demonstrating broad QTL pleiotropy, epistasis, and epistatic pleiotropy suggesting that gene pleiotropy or tight linkage of genes within QTL integrated the phenotype. Gene expression and knockdown of identified positional candidates revealed genes affecting foraging behavior and confirmed one pleiotropic gene, a tyramine receptor, as a target for colony-level selection that was under selection in two different tissues in two different life stages. The approach presented here has resulted in a comprehensive understanding of the structure and evolution of honey bee social organization.  相似文献   

14.
15.
Similarities between speech and birdsong make songbirds advantageous for investigating the neurogenetics of learned vocal communication--a complex phenotype probably supported by ensembles of interacting genes in cortico-basal ganglia pathways of both species. To date, only FoxP2 has been identified as critical to both speech and birdsong. We performed weighted gene coexpression network analysis on microarray data from singing zebra finches to discover gene ensembles regulated during vocal behavior. We found ~2,000 singing-regulated genes comprising three coexpression groups unique to area X, the basal ganglia subregion dedicated to learned vocalizations. These contained known targets of human FOXP2 and potential avian targets. We validated biological pathways not previously implicated in vocalization. Higher-order gene coexpression patterns, rather than expression levels, molecularly distinguish area X from the ventral striato-pallidum during singing. The previously unknown structure of singing-driven networks enables prioritization of molecular interactors that probably bear on human motor disorders, especially those affecting speech.  相似文献   

16.
Increasing evidence indicates that Parkinson''s disease (PD) and type 2 diabetes (T2DM) share dysregulated molecular networks. We identified 84 genes shared between PD and T2DM from curated disease-gene databases. Nitric oxide biosynthesis, lipid and carbohydrate metabolism, insulin secretion and inflammation were identified as common dysregulated pathways. A network prioritization approach was implemented to rank genes according to their distance to seed genes and their involvement in common biological pathways. Quantitative polymerase chain reaction assays revealed that a highly ranked gene, superoxide dismutase 2 (SOD2), is upregulated in PD patients compared to healthy controls in 192 whole blood samples from two independent clinical trials, the Harvard Biomarker Study (HBS) and the Diagnostic and Prognostic Biomarkers in Parkinson''s disease (PROBE). The results from this study reinforce the idea that shared molecular networks between PD and T2DM provides an additional source of biologically meaningful biomarkers. Evaluation of this biomarker in de novo PD patients and in a larger prospective longitudinal study is warranted.  相似文献   

17.
Complex biological systems usually pose a trade-off between robustness and fragility where a small number of perturbations can substantially disrupt the system. Although biological systems are robust against changes in many external and internal conditions, even a single mutation can perturb the system substantially, giving rise to a pathophenotype. Recent advances in identifying and analyzing the sequential variations beneath human disorders help to comprehend a systemic view of the mechanisms underlying various disease phenotypes. Network-based disease-gene prioritization methods rank the relevance of genes in a disease under the hypothesis that genes whose proteins interact with each other tend to exhibit similar phenotypes. In this study, we have tested the robustness of several network-based disease-gene prioritization methods with respect to the perturbations of the system using various disease phenotypes from the Online Mendelian Inheritance in Man database. These perturbations have been introduced either in the protein-protein interaction network or in the set of known disease-gene associations. As the network-based disease-gene prioritization methods are based on the connectivity between known disease-gene associations, we have further used these methods to categorize the pathophenotypes with respect to the recoverability of hidden disease-genes. Our results have suggested that, in general, disease-genes are connected through multiple paths in the human interactome. Moreover, even when these paths are disturbed, network-based prioritization can reveal hidden disease-gene associations in some pathophenotypes such as breast cancer, cardiomyopathy, diabetes, leukemia, parkinson disease and obesity to a greater extend compared to the rest of the pathophenotypes tested in this study. Gene Ontology (GO) analysis highlighted the role of functional diversity for such diseases.  相似文献   

18.
19.
Gene co-expression, in many cases, implies the presence of a functional linkage between genes. Co-expression analysis has uncovered gene regulatory mechanisms in model organisms such as Escherichia coli and yeast. Recently, accumulation of Arabidopsis microarray data has facilitated a genome-wide inspection of gene co-expression profiles in this model plant. An approach using network analysis has provided an intuitive way to represent complex co-expression patterns between many genes. Co-expression network analysis has enabled us to extract modules, or groups of tightly co-expressed genes, associated with biological processes. Furthermore, integrated analysis of gene expression and metabolite accumulation has allowed us to hypothesize the functions of genes associated with specific metabolic processes. Co-expression network analysis is a powerful approach for data-driven hypothesis construction and gene prioritization, and provides novel insights into the system-level understanding of plant cellular processes.  相似文献   

20.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号