首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.  相似文献   

2.
MOTIVATION: A method for prediction of disease relevant human genes from the phenotypic appearance of a query disease is presented. Diseases of known genetic origin are clustered according to their phenotypic similarity. Each cluster entry consists of a disease and its underlying disease gene. Potential disease genes from the human genome are scored by their functional similarity to known disease genes in these clusters, which are phenotypically similar to the query disease. RESULTS: For assessment of the approach, a leave-one-out cross-validation of 878 diseases from the OMIM database, using 10672 candidate genes from the human genome, is performed. Depending on the applied parameters, in roughly one-third of cases the true solution is contained within the top scoring 3% of predictions and in two-third of cases the true solution is contained within the top scoring 15% of predictions. The prediction results can either be used to identify target genes, when searching for a mutation in monogenic diseases or for selection of loci in genotyping experiments in genetically complex diseases.  相似文献   

3.
Chen L  Tai J  Zhang L  Shang Y  Li X  Qu X  Li W  Miao Z  Jia X  Wang H  Li W  He W 《Molecular bioSystems》2011,7(9):2547-2553
Understanding the pathogenesis of complex diseases is aided by precise identification of the genes responsible. Many computational methods have been developed to prioritize candidate disease genes, but coverage of functional annotations may be a limiting factor for most of these methods. Here, we introduce a global candidate gene prioritization approach that considers information about network properties in the human protein interaction network and risk transformative contents from known disease genes. Global risk transformative scores were then used to prioritize candidate genes. This method was introduced to prioritize candidate genes for prostate cancer. The effectiveness of our global risk transformative algorithm for prioritizing candidate genes was evaluated according to validation studies. Compared with ToppGene and random walk-based methods, our method outperformed the two other candidate gene prioritization methods. The generality of our method was assessed by testing it on prostate cancer and other types of cancer. The performance was evaluated using standard leave-one-out cross-validation.  相似文献   

4.
5.
The gymnosperms are a group of plants characterized by a haploid female gametophyte (megagametophyte). With the function of bearing the female gametes and nourishing the developing embryo, the megagametophyte has provided a simple way to understand the genetics of gymnosperm species using biochemical or genetic markers. In this paper, a quantitative genetic approach is proposed to study the genetic architecture of a quantitative trait in gymnosperms by taking advantage of the megagametophyte and the concept of average effect of a gene. Average effect describes the value associated with an allele carried by an individual and transmitted to its offspring. Through the genetic dissection of the average effect and genetic variance associated with a gamete carrying candidate genes, this approach can provide estimates of basic population genetic parameters, such as additive, dominant and epistatic effects, allelic frequencies and linkage disequilibrium. The candidate genes, known through their major mutant phenotype, have been reported in gymnosperms. An example for a candidate gene affecting lignin biosynthesis was applied to demonstrate the statistical procedures of the approach and its advantage. The conditions upon which the approach can be effectively used are discussed. Received: 15 January 1999 / Accepted: 12 March 1999  相似文献   

6.
Zhao J  Yang TH  Huang Y  Holme P 《PloS one》2011,6(9):e24306
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.  相似文献   

7.
Recent studies have reported hundreds of genes linked to Alzheimer’s Disease (AD). However, many of these candidate genes may be not identified in different studies when analyses were replicated. Moreover, results could be controversial. Here, we proposed a computational workflow to curate and evaluate AD related genes. The method integrates large scale literature knowledge data and gene expression data that were acquired from postmortem human brain regions (AD case/control: 31/32 and 22/8). Pathway Enrichment, Sub-Network Enrichment, and Gene-Gene Interaction analysis were conducted to study the pathogenic profile of the candidate genes, with 4 metrics proposed and validated for each gene. By using our approach, a scalable AD genetic database was developed, including AD related genes, pathways, diseases and info of supporting references. The AD case/control classification supported the effectiveness of the 4 proposed metrics, which successfully identified 21 well-studied AD genes (i.g. TGFB1, CTNNB1, APP, IL1B, PSEN1, PTGS2, IL6, VEGFA, SOD1, AKT1, CDK5, TNF, GSK3B, TP53, CCL2, BDNF, NGF, IGF1, SIRT1, AGER and TLR) and highlighted one recently reported AD gene (i.g. ITGB1). The computational biology approach and the AD database developed in this study provide a valuable resource which may facilitate the understanding of the AD genetic profile.  相似文献   

8.
The gravitropic response in trees is a widely studied phenomenon, however understanding of the molecular mechanism involved remains unclear. The purpose of this work was to identify differentially expressed genes in response to inclination using a comparative approach for two conifer species. Young seedlings were subjected to inclination and samples were collected at four different times points. First, suppression subtractive hybridisation (SSH) was used to identify differentially regulated genes in radiata pine (Pinus radiata D. Don). cDNA libraries were constructed from the upper and lower part of inclined stems in a time course experiment, ranging from 2.5 h to 1 month. From a total of 3092 sequences obtained, 2203 elements were assembled, displaying homology to a public database. A total of 942 unigene elements were identified using bioinformatic tools after redundancy analysis. Of these, 614 corresponded to known function genes and 328 to unknown function genes, including hypothetical proteins. Comparative analysis between radiata pine and maritime pine (Pinus pinaster Ait.) was performed to validate the differential expression of relevant candidate genes using qPCR. Selected genes were involved in several functional categories: hormone regulation, phenylpropanoid pathway and signal transduction. This comparative approach for the two conifer species helped determine the molecular gene pattern generated by inclination, providing a set of Pinus gene signatures that may be involved in the gravitropic stress response. These genes may also represent relevant candidate genes involved in the gravitropic response and potentially in wood formation.  相似文献   

9.
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein–protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.  相似文献   

10.
Type I autosomal dominant cerebellar ataxia (ADCA) is a type of spinocerebellar ataxia (SCA) characterized by ataxia with other neurological signs, including oculomotor disturbances, cognitive deficits, pyramidal and extrapyramidal dysfunction, bulbar, spinal and peripheral nervous system involvement. The global prevalence of this disease is not known. The most common type I ADCA is SCA3 followed by SCA2, SCA1, and SCA8, in descending order. Founder effects no doubt contribute to the variable prevalence between populations. Onset is usually in adulthood but cases of presentation in childhood have been reported. Clinical features vary depending on the SCA subtype but by definition include ataxia associated with other neurological manifestations. The clinical spectrum ranges from pure cerebellar signs to constellations including spinal cord and peripheral nerve disease, cognitive impairment, cerebellar or supranuclear ophthalmologic signs, psychiatric problems, and seizures. Cerebellar ataxia can affect virtually any body part causing movement abnormalities. Gait, truncal, and limb ataxia are often the most obvious cerebellar findings though nystagmus, saccadic abnormalities, and dysarthria are usually associated. To date, 21 subtypes have been identified: SCA1-SCA4, SCA8, SCA10, SCA12-SCA14, SCA15/16, SCA17-SCA23, SCA25, SCA27, SCA28 and dentatorubral pallidoluysian atrophy (DRPLA). Type I ADCA can be further divided based on the proposed pathogenetic mechanism into 3 subclasses: subclass 1 includes type I ADCA caused by CAG repeat expansions such as SCA1-SCA3, SCA17, and DRPLA, subclass 2 includes trinucleotide repeat expansions that fall outside of the protein-coding regions of the disease gene including SCA8, SCA10 and SCA12. Subclass 3 contains disorders caused by specific gene deletions, missense mutation, and nonsense mutation and includes SCA13, SCA14, SCA15/16, SCA27 and SCA28. Diagnosis is based on clinical history, physical examination, genetic molecular testing, and exclusion of other diseases. Differential diagnosis is broad and includes secondary ataxias caused by drug or toxic effects, nutritional deficiencies, endocrinopathies, infections and post-infection states, structural abnormalities, paraneoplastic conditions and certain neurodegenerative disorders. Given the autosomal dominant pattern of inheritance, genetic counseling is essential and best performed in specialized genetic clinics. There are currently no known effective treatments to modify disease progression. Care is therefore supportive. Occupational and physical therapy for gait dysfunction and speech therapy for dysarthria is essential. Prognosis is variable depending on the type of ADCA and even among kindreds.  相似文献   

11.
12.
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.  相似文献   

13.
Background: Common variable immunodeficiency (CVID), the most prevalent form of primary immunodeficiency (PID), is characterized by hypogammaglobulinemia and recurrent infections. Understanding protein-protein interaction (PPI) networks of CVID genes and identifying candidate CVID genes are critical steps in facilitating the early diagnosis of CVID. Here, the aim was to investigate PPI networks of CVID genes and identify candidate CVID genes using computation techniques. Methods: Network density and biological distance were used to study PPI data for CVID and PID genes obtained from the STRING database. Gene expression data of patients with CVID were obtained from the Gene Expression Omnibus, and then Pearson’s correlation coefficient, a PPI database, and Kyoto Encyclopedia of Genes and Genomes were used to identify candidate CVID genes. We then evaluated our predictions and identified differentially expressed CVID genes. Results: The majority of CVID genes are characterized by a high network density and small biological distance, whereas most PID genes are characterized by a low network density and large biological distance, indicating that CVID genes are more functionally similar to each other and closely interact with one other compared with PID genes. Subsequently, we identified 172 CVID candidate genes that have similar biological functions to known CVID genes, and eight genes were recently reported as CVID-related genes. MYC, a candidate gene, was down-regulated in CVID duodenal biopsies, but up-regulated in blood samples compared with levels in healthy controls. Conclusion: Our findings will aid in a better understanding of the complex of CVID genes, possibly further facilitating the early diagnosis of CVID.  相似文献   

14.
15.

Background

DNA methylation is associated with aberrant gene expression in cancer, and has been shown to correlate with therapeutic response and disease prognosis in some types of cancer. We sought to investigate the biological significance of DNA methylation in lung cancer.

Results

We integrated the gene expression profiles and data of gene promoter methylation for a large panel of non-small cell lung cancer cell lines, and identified 578 candidate genes with expression levels that were inversely correlated to the degree of DNA methylation. We found these candidate genes to be differentially methylated in normal lung tissue versus non-small cell lung cancer tumors, and segregated by histologic and tumor subtypes. We used gene set enrichment analysis of the genes ranked by the degree of correlation between gene expression and DNA methylation to identify gene sets involved in cellular migration and metastasis. Our unsupervised hierarchical clustering of the candidate genes segregated cell lines according to the epithelial-to-mesenchymal transition phenotype. Genes related to the epithelial-to-mesenchymal transition, such as AXL, ESRP1, HoxB4, and SPINT1/2, were among the nearly 20% of the candidate genes that were differentially methylated between epithelial and mesenchymal cells. Greater numbers of genes were methylated in the mesenchymal cells and their expressions were upregulated by 5-azacytidine treatment. Methylation of the candidate genes was associated with erlotinib resistance in wild-type EGFR cell lines. The expression profiles of the candidate genes were associated with 8-week disease control in patients with wild-type EGFR who had unresectable non-small cell lung cancer treated with erlotinib, but not in patients treated with sorafenib.

Conclusions

Our results demonstrate that the underlying biology of genes regulated by DNA methylation may have predictive value in lung cancer that can be exploited therapeutically.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1079) contains supplementary material, which is available to authorized users.  相似文献   

16.
CAG and CTG repeat expansions are the cause of at least a dozen inherited neurological disorders. In these so-called "dynamic mutation" diseases, the expanded repeats display dramatic genetic instability, changing in size when transmitted through the germline and within somatic tissues. As the molecular basis of the repeat instability process remains poorly understood, modeling of repeat instability in model organisms has provided some insights into potentially involved factors, implicating especially replication and repair pathways. Studies in mice have also shown that the genomic context of the repeat sequence is required for CAG/CTG repeat instability in the case of spinocerebellar ataxia type 7 (SCA7), one of the most unstable of all CAG/CTG repeat disease loci. While most studies of repeat instability have taken a candidate gene approach, unbiased screens for factors involved in trinucleotide repeat instability have been lacking. We therefore attempted to use Drosophila melanogaster to model expanded CAG repeat instability by creating transgenic flies carrying trinucleotide repeat expansions, deriving flies with SCA7 CAG90 repeats in cDNA and genomic context. We found that SCA7 CAG90 repeats are stable in Drosophila, regardless of context. To screen for genes whose reduced function might destabilize expanded CAG repeat tracts in Drosophila, we crossed the SCA7 CAG90 repeat flies with various deficiency stocks, including lines lacking genes encoding the orthologues of flap endonuclease-1, PCNA, and MutS. In all cases, perfect repeat stability was preserved, suggesting that Drosophila may not be a suitable system for determining the molecular basis of SCA7 CAG repeat instability.  相似文献   

17.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

18.
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.  相似文献   

19.
The conventional approach of candidate gene studies in complex diseases is to look at the effect of one gene at a time. However, as the outcome of chronic diseases is influenced by a large number of alleles, simultaneous analysis is needed. We demonstrate the application of multivariate regression and cluster analysis to a multiple sclerosis (MS) dataset with genotypes for 489 patients at 11 candidate genes selected on their involvement in the immune response. Using multivariate regression, we observed that different sets of genes were associated with different disease characteristics that reflect different aspects of disease. Out of 15 polymorphisms, we identified one that contributed to the severity of disease. In addition, the set of 15 polymorphisms was predictive for yearly increase in lesion volume as seen on T1-weighted MRI (p=0.044). From this set, no individual polymorphisms could be identified after adjustment for multiple hypotheses testing. By means of a cluster analysis, we aimed to identify subgroups of patients with different pathogenic subtypes of MS on the basis of their genetic profile. We constructed genetic profiles from the genotypes at the 11 candidate genes. The approach proved to be feasible. We observed three clusters in the sample of patients. In this study, we observed no significant differences in the usual clinical and MRI outcome measures between the different clusters. However, a number of consistent trends indicated that this clustering might be related to the course of disease. With a larger number of genes regulating the course of disease, we may be able to identify clinically relevant clusters. The analyses are easily implemented and will be applicable to candidate gene studies of complex traits in general.  相似文献   

20.
Spinocerebellar Ataxia 8 (SCA8) appears unique among triplet repeat expansion-induced neurodegenerative diseases because the predicted gene product is a noncoding RNA. Little is currently known about the normal function of SCA8 in neuronal survival or how repeat expansion contributes to neurodegeneration. To investigate the molecular context in which SCA8 operates, we have expressed the human SCA8 noncoding RNA in Drosophila. SCA8 induces late-onset, progressive neurodegeneration in the Drosophila retina. Using this neurodegenerative phenotype as a sensitized background for a genetic modifier screen, we have identified mutations in four genes: staufen, muscle-blind, split ends, and CG3249. All four encode neuronally expressed RNA binding proteins conserved in Drosophila and humans. Although expression of both wild-type and repeat-expanded SCA8 induce neurodegeneration, the strength of interaction with certain modifiers differs between the two SCA8 backgrounds, suggesting that CUG expansions alter associations with specific RNA binding proteins. Our demonstration that SCA8 can recruit Staufen and that the interaction domain maps to the portion of the SCA8 RNA that undergoes repeat expansion in the human disease suggests a specific mechanism for SCA8 function and disease. Genetic modifiers identified in our SCA8-based screens may provide candidates for designing therapeutic interventions to treat this disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号