首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

Background

In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET).

Results

Celsius, the largest co-normalized microarray dataset of Affymetrix based gene expression, was used to calculate the correlation between all possible gene pairs on all platforms, and generate stored indexes in a web searchable format. The size of Celsius makes UGET a powerful gene characterization tool. Using a small seed list of known cartilage-selective genes, UGET extended the list of known genes by identifying 32 new highly cartilage-selective genes. Of these, 7 of 10 tested were validated by qPCR including the novel cartilage-specific genes SDK2 and FLJ41170. In addition, we retrospectively tested UGET and other gene expression based prioritization tools to identify disease-causing genes within known linkage intervals. We first demonstrated this utility with UGET using genetically heterogeneous disorders such as Joubert syndrome, microcephaly, neuropsychiatric disorders and type 2 limb girdle muscular dystrophy (LGMD2) and then compared UGET to other gene expression based prioritization programs which use small but discrete and well annotated datasets. Finally, we observed a significantly higher gene correlation shared between genes in disease networks associated with similar complex or Mendelian disorders.

Discussion

UGET is an invaluable resource for a geneticist that permits the rapid inclusion of expression criteria from one to hundreds of genes in genomic intervals linked to disease. By using thousands of arrays UGET annotates and prioritizes genes better than other tools especially with rare tissue disorders or complex multi-tissue biological processes. This information can be critical in prioritization of candidate genes for sequence analysis.  相似文献   

2.
Genome-wide experimental methods to identify disease genes, such as linkage analysis and association studies, generate increasingly large candidate gene sets for which comprehensive empirical analysis is impractical. Computational methods employ data from a variety of sources to identify the most likely candidate disease genes from these gene sets. Here, we review seven independent computational disease gene prioritization methods, and then apply them in concert to the analysis of 9556 positional candidate genes for type 2 diabetes (T2D) and the related trait obesity. We generate and analyse a list of nine primary candidate genes for T2D genes and five for obesity. Two genes, LPL and BCKDHA, are common to these two sets. We also present a set of secondary candidates for T2D (94 genes) and for obesity (116 genes) with 58 genes in common to both diseases.  相似文献   

3.
Kao CF  Fang YS  Zhao Z  Kuo PH 《PloS one》2011,6(4):e18696

Background

Large scale and individual genetic studies have suggested numerous susceptible genes for depression in the past decade without conclusive results. There is a strong need to review and integrate multi-dimensional data for follow up validation. The present study aimed to apply prioritization procedures to build-up an evidence-based candidate genes dataset for depression.

Methods

Depression candidate genes were collected in human and animal studies across various data resources. Each gene was scored according to its magnitude of evidence related to depression and was multiplied by a source-specific weight to form a combined score measure. All genes were evaluated through a prioritization system to obtain an optimal weight matrix to rank their relative importance with depression using the combined scores. The resulting candidate gene list for depression (DEPgenes) was further evaluated by a genome-wide association (GWA) dataset and microarray gene expression in human tissues.

Results

A total of 5,055 candidate genes (4,850 genes from human and 387 genes from animal studies with 182 being overlapped) were included from seven data sources. Through the prioritization procedures, we identified 169 DEPgenes, which exhibited high chance to be associated with depression in GWA dataset (Wilcoxon rank-sum test, p = 0.00005). Additionally, the DEPgenes had a higher percentage to express in human brain or nerve related tissues than non-DEPgenes, supporting the neurotransmitter and neuroplasticity theories in depression.

Conclusions

With comprehensive data collection and curation and an application of integrative approach, we successfully generated DEPgenes through an effective gene prioritization system. The prioritized DEPgenes are promising for future biological experiments or replication efforts to discoverthe underlying molecular mechanisms for depression.  相似文献   

4.
Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.  相似文献   

5.
Epileptic Encephalopathy (EE) is a heterogeneous condition in which cognitive, sensory and/or motor functions deteriorate as a consequence of epileptic activity, which consists of frequent seizures and/or major interictal paroxysmal activity. There are various causes of EE and they may occur at any age in early childhood. Genetic mutations have been identified to contribute to an increasing number of children with early onset EE which had been previously considered as cryptogenic. We identified 26 patients with Infantile Epileptic Encephalopathy (IEE) of unknown etiology despite extensive workup and without any specific epilepsy syndromic phenotypes. We performed genetic analysis on a panel of 7 genes (ARX, CDKL5, KCNQ2, PCDH19, SCN1A, SCN2A, STXBP1) and identified 10 point mutations [ARX (1), CDKL5 (3), KCNQ2 (2), PCDH19 (1), SCN1A (1), STXBP1 (2)] as well as one microdeletion involving both SCN1A and SCN2A. The high rate (42%) of mutations suggested that genetic testing of this IEE panel of genes is recommended for cryptogenic IEE with no etiology identified. These 7 genes are associated with channelopathies or synaptic transmission and we recommend early genetic testing if possible to guide the treatment strategy.  相似文献   

6.

Background

Alzheimer’s disease (AD) is one of the leading genetically complex and heterogeneous disorder that is influenced by both genetic and environmental factors. The underlying risk factors remain largely unclear for this heterogeneous disorder. In recent years, high throughput methodologies, such as genome-wide linkage analysis (GWL), genome-wide association (GWA) studies, and genome-wide expression profiling (GWE), have led to the identification of several candidate genes associated with AD. However, due to lack of consistency within their findings, an integrative approach is warranted. Here, we have designed a rank based gene prioritization approach involving convergent analysis of multi-dimensional data and protein-protein interaction (PPI) network modelling.

Results

Our approach employs integration of three different AD datasets- GWL,GWA and GWE to identify overlapping candidate genes ranked using a novel cumulative rank score (SR) based method followed by prioritization using clusters derived from PPI network. SR for each gene is calculated by addition of rank assigned to individual gene based on either p value or score in three datasets. This analysis yielded 108 plausible AD genes. Network modelling by creating PPI using proteins encoded by these genes and their direct interactors resulted in a layered network of 640 proteins. Clustering of these proteins further helped us in identifying 6 significant clusters with 7 proteins (EGFR, ACTB, CDC2, IRAK1, APOE, ABCA1 and AMPH) forming the central hub nodes. Functional annotation of 108 genes revealed their role in several biological activities such as neurogenesis, regulation of MAP kinase activity, response to calcium ion, endocytosis paralleling the AD specific attributes. Finally, 3 potential biochemical biomarkers were found from the overlap of 108 AD proteins with proteins from CSF and plasma proteome. EGFR and ACTB were found to be the two most significant AD risk genes.

Conclusions

With the assumption that common genetic signals obtained from different methodological platforms might serve as robust AD risk markers than candidates identified using single dimension approach, here we demonstrated an integrated genomic convergence approach for disease candidate gene prioritization from heterogeneous data sources linked to AD.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-199) contains supplementary material, which is available to authorized users.  相似文献   

7.
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.  相似文献   

8.
9.
Scott K  Brady R  Cravchik A  Morozov P  Rzhetsky A  Zuker C  Axel R 《Cell》2001,104(5):661-673
A novel family of candidate gustatory receptors (GRs) was recently identified in searches of the Drosophila genome. We have performed in situ hybridization and transgene experiments that reveal expression of these genes in both gustatory and olfactory neurons in adult flies and larvae. This gene family is likely to encode both odorant and taste receptors. We have visualized the projections of chemosensory neurons in the larval brain and observe that neurons expressing different GRs project to discrete loci in the antennal lobe and subesophageal ganglion. These data provide insight into the diversity of chemosensory recognition and an initial view of the representation of gustatory information in the fly brain.  相似文献   

10.
Familial adult myoclonus epilepsy (FAME) is a rare autosomal dominant disorder characterized by adult onset, involuntary muscle jerks, cortical myoclonus and occasional seizures. FAME is genetically heterogeneous with more than 70 families reported worldwide and five potential disease loci. The efforts to identify potential causal variants have been unsuccessful in all but three families. To date, linkage analysis has been the main approach to find and narrow FAME critical regions. We propose an alternative method, pedigree free identity-by-descent (IBD) mapping, that infers regions of the genome between individuals that have been inherited from a common ancestor. IBD mapping provides an alternative to linkage analysis in the presence of allelic and locus heterogeneity by detecting clusters of individuals who share a common allele. Succeeding IBD mapping, gene prioritization based on gene co-expression analysis can be used to identify the most promising candidate genes. We performed an IBD analysis using high-density single nucleotide polymorphism (SNP) array data followed by gene prioritization on a FAME cohort of ten European families and one Australian/New Zealander family; eight of which had known disease loci. By identifying IBD regions common to multiple families, we were able to narrow the FAME2 locus to a 9.78 megabase interval within 2p11.2–q11.2. We provide additional evidence of a founder effect in four Italian families and allelic heterogeneity with at least four distinct founders responsible for FAME at the FAME2 locus. In addition, we suggest candidate disease genes using gene prioritization based on gene co-expression analysis.  相似文献   

11.
Determining the genetic factors in a disease is crucial to elucidating its molecular basis. This task is challenging due to a lack of information on gene function. The integration of large-scale functional genomics data has proven to be an effective strategy to prioritize candidate disease genes. Mitochondrial disorders are a prevalent and heterogeneous class of diseases that are particularly amenable to this approach. Here we explain the application of integrative approaches to the identification of mitochondrial disease genes. We first examine various datasets that can be used to evaluate the involvement of each gene in mitochondrial function. The data integration methodology is then described, accompanied by examples of common implementations. Finally, we discuss how gene networks are constructed using integrative techniques and applied to candidate gene prioritization. Relevant public data resources are indicated. This report highlights the success and potential of data integration as well as its applicability to the search for mitochondrial disease genes.  相似文献   

12.
Many cell activities are organized as a network, and genes are clustered into co-expressed groups if they have the same or closely related biological function or they are co-regulated. In this study, based on an assumption that a strong candidate disease gene is more likely close to gene groups in which all members coordinately differentially express than individual genes with differential expression, we developed a novel disease gene prioritization method GroupRank by integrating gene co-expression and differential expression information generated from microarray data as well as PPI network. A candidate gene is ranked high using GroupRank if it is differentially expressed in disease and control or is close to differentially co-expressed groups in PPI network. We tested our method on data sets of lung, kidney, leukemia and breast cancer. The results revealed GroupRank could efficiently prioritize disease genes with significantly improved AUC value in comparison to the previous method with no consideration of co-exprssed gene groups in PPI network. Moreover, the functional analyses of the major contributing gene group in gene prioritization of kidney cancer verified that our algorithm GroupRank not only ranks disease genes efficiently but also could help us identify and understand possible mechanisms in important physiological and pathological processes of disease.  相似文献   

13.
W. Xu  S. Li  Z. Zhang  J. Hu  Y. Zhao 《Animal genetics》2019,50(6):726-732
Differentially expressed gene (DEG) analysis is a major approach for interpreting phenotype differences and produces a large number of candidate genes. Given that it is burdensome to validate too many genes through benchwork, an urgent need exists for DEG prioritization. Here, a novel method is proposed for prioritizing bona fide DEGs by constructing the normal range of gene expression through integrating public expression data. Prioritization was performed by ranking the differences in cumulative probability for genes in case and control groups. DEGs from a study on pig muscle tissue were used to evaluate the prioritization accuracy. The results showed that the method reached an area under the receiver operating characteristic curve of 96.42% and can effectively shorten the list of candidate genes from a differential expression experiment to find novel causal genes. Our method can be easily extended to other tissues or species to promote functional research in broad applications.  相似文献   

14.
Congenital heart defects (CHDs) are the most common major developmental anomalies and the most frequent cause for perinatal mortality, but their etiology remains often obscure. We identified a locus for CHDs on 6q24-q25. Genotype-phenotype correlations in 12 patients carrying a chromosomal deletion on 6q delineated a critical 850 kb region on 6q25.1 harboring five genes. Bioinformatics prioritization of candidate genes in this locus for a role in CHDs identified the TGF-β-activated kinase 1/MAP3K7 binding protein 2 gene (TAB2) as the top-ranking candidate gene. A role for this candidate gene in cardiac development was further supported by its conserved expression in the developing human and zebrafish heart. Moreover, a critical, dosage-sensitive role during development was demonstrated by the cardiac defects observed upon titrated knockdown of tab2 expression in zebrafish embryos. To definitively confirm the role of this candidate gene in CHDs, we performed mutation analysis of TAB2 in 402 patients with a CHD, which revealed two evolutionarily conserved missense mutations. Finally, a balanced translocation was identified, cosegregating with familial CHD. Mapping of the breakpoints demonstrated that this translocation disrupts TAB2. Taken together, these data clearly demonstrate a role for TAB2 in human cardiac development.  相似文献   

15.
16.
The main aims of this study were to determine the effects of GH gene abuse/misuse in normal animals and to discover genes that could be used as candidate biomarkers for the detection of GH gene therapy abuse/misuse in humans. We determined the global gene expression profile of peripheral whole blood from normal adult male rats after long-term GH gene therapy using CapitalBio 27 K Rat Genome Oligo Arrays. Sixty one genes were found to be differentially expressed in GH gene-treated rats 24 weeks after receiving GH gene therapy, at a two-fold higher or lower level compared to the empty vector group (p < 0.05). These genes were mainly associated with angiogenesis, oncogenesis, apoptosis, immune networks, signaling pathways, general metabolism, type I diabetes mellitus, carbon fixation, cell adhesion molecules, and cytokine-cytokine receptor interaction. The results imply that exogenous GH gene expression in normal subjects is likely to induce cellular changes in the metabolism, signal pathways and immunity. A real-time qRT-PCR analysis of a selection of the genes confirmed the microarray data. Eight differently expressed genes were selected as candidate biomarkers from among these 61 genes. These 8 showed five-fold higher or lower expression levels after the GH gene transduction (p < 0.05). They were then validated in real-time PCR experiments using 15 single-treated blood samples and 10 control blood samples. In summary, we detected the gene expression profiles of rat peripheral whole blood after long-term GH gene therapy and screened eight genes as candidate biomarkers based on the microarray data. This will contribute to an increased mechanistic understanding of the effects of chronic GH gene therapy abuse/misuse in normal subjects.  相似文献   

17.
18.
In solving the gene prioritization problem, ranking candidate genes from most to least promising is attempted before further experimental validation. Integrating the results of various data sources and methods tends to result in a better performance when solving the gene prioritization problem. Therefore, a wide range of datasets and algorithms was investigated; these included topological features of protein networks, physicochemical characteristics and blast similarity scores of protein sequences, gene ontology, biological pathways, and tissue-based data sources. The novelty of this study lies in how the best-performing methods and reliable multi-genomic data sources were applied in an efficient two-step approach. In the first step, various multi-genomic data sources and algorithms were evaluated and seven best-performing rankers were then applied to prioritize candidate genes in different ways. In the second step, global prioritization was obtained by aggregating several scoring schemes.The results showed that protein networks, functional linkage networks, gene ontology, and biological pathway data sources have a significant impact on the quality of the gene prioritization approach. The findings also demonstrated a direct relationship between the degree of genes and the ranking quality of the evaluated tools. This approach outperformed previously published algorithms (e.g., DIR, GPEC, GeneDistiller, and Endeavour) in all evaluation metrices and led to the development of GPS software. Its user-friendly interface and accuracy makes GPS a powerful tool for the identification of human disease genes. GPS is available at http://gpsranker.com and http://LBB.ut.ac.ir.  相似文献   

19.
20.
《Gene》1997,194(1):57-62
A novel family of genes expressed in human brain has recently been identified. Gene 239FB, transcribed extensively in fetal brain, was isolated from the chromosome 11p13 region associated with mental retardation component of the WAGR (Wilms tumor, aniridia, genitourinary anomalies, mental retardation) syndrome. This report presents a cDNA sequence and expression profile of a related gene, 239AB, isolated from adult brain library, that was mapped to chromosome 22. While similar in structure, the two genes differ in their expression pattern and may have different roles in central nervous system development and function. In contrast to the 239FB, which is expressed predominantly in fetal brain, the 239AB gene is transcribed in adult tissues. Both human genes encode novel proteins of unknown function that are highly conserved from Caenorhabditis elegans to birds and mammals. Phylogenetic analysis suggested that the two lineages of the ancient gene family represented by 239FB and 239AB have been in existence prior to the emergence of modern animals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号