首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein–protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.  相似文献   

3.
MOTIVATION: Identifying candidate genes associated with a given phenotype or trait is an important problem in biological and biomedical studies. Prioritizing genes based on the accumulated information from several data sources is of fundamental importance. Several integrative methods have been developed when a set of candidate genes for the phenotype is available. However, how to prioritize genes for phenotypes when no candidates are available is still a challenging problem. RESULTS: We develop a new method for prioritizing genes associated with a phenotype by Combining Gene expression and protein Interaction data (CGI). The method is applied to yeast gene expression data sets in combination with protein interaction data sets of varying reliability. We found that our method outperforms the intuitive prioritizing method of using either gene expression data or protein interaction data only and a recent gene ranking algorithm GeneRank. We then apply our method to prioritize genes for Alzheimer's disease. AVAILABILITY: The code in this paper is available upon request.  相似文献   

4.

Background  

Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.  相似文献   

5.
S Blackshaw  R E Fraioli  T Furukawa  C L Cepko 《Cell》2001,107(5):579-589
To identify the full set of genes expressed by mammalian rods, we conducted serial analysis of gene expression (SAGE) by using libraries generated from mature and developing mouse retina. We identified 264 uncharacterized genes that were specific to or highly enriched in rods. Nearly half of all cloned human retinal disease genes are selectively expressed in rod photoreceptors. In silico mapping of the human orthologs of genes identified in our screen revealed that 86 map within intervals containing uncloned retinal disease genes, representing 37 different loci. We expect these data will allow identification of many disease genes, and that this approach may be useful for cloning genes involved in classes of disease where cell type-specific expression of disease genes is observed.  相似文献   

6.
At different stages of any research project, molecular biologists need to choose - often somewhat arbitrarily, even after careful statistical data analysis - which genes or proteins to investigate further experimentally and which to leave out because of limited resources. Computational methods that integrate complex, heterogeneous data sets - such as expression data, sequence information, functional annotation and the biomedical literature - allow prioritizing genes for future study in a more informed way. Such methods can substantially increase the yield of downstream studies and are becoming invaluable to researchers.  相似文献   

7.
In this paper, we describe an approach for identifying 'pathways' from gene expression and protein interaction data. Our approach is based on the assumption that many pathways exhibit two properties: their genes exhibit a similar gene expression profile, and the protein products of the genes often interact. Our approach is based on a unified probabilistic model, which is learned from the data using the EM algorithm. We present results on two Saccharomyces cerevisiae gene expression data sets, combined with a binary protein interaction data set. Our results show that our approach is much more successful than other approaches at discovering both coherent functional groups and entire protein complexes.  相似文献   

8.

Background  

To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.  相似文献   

9.
Genome-wide linkage and association studies have demonstrated promise in identifying genetic factors that influence health and disease. An important challenge is to narrow down the set of candidate genes that are implicated by these analyses. Protein-protein interaction (PPI) networks are useful in extracting the functional relationships between known disease and candidate genes, based on the principle that products of genes implicated in similar diseases are likely to exhibit significant connectivity/proximity. Information flow?based methods are shown to be very effective in prioritizing candidate disease genes. In this article, we utilize the topology of PPI networks to infer functional information in the context of disease association. Our approach is based on the assumption that PPI networks are organized into recurrent schemes that underlie the mechanisms of cooperation among different proteins. We hypothesize that proteins associated with similar diseases would exhibit similar topological characteristics in PPI networks. Utilizing the location of a protein in the network with respect to other proteins (i.e., the "topological profile" of the proteins), we develop a novel measure to assess the topological similarity of proteins in a PPI network. We then use this measure to prioritize candidate disease genes based on the topological similarity of their products and the products of known disease genes. We test the resulting algorithm, Vavien, via systematic experimental studies using an integrated human PPI network and the Online Mendelian Inheritance in Man (OMIM) database. Vavien outperforms other network-based prioritization algorithms as shown in the results and is available at www.diseasegenes.org.  相似文献   

10.
11.
癌基因组的体细胞突变扫查数据为研究人员发现新的癌基因提供了大量的信息。已有的通过基因突变频率寻找候选癌基因的方法倾向于发现突变频率较高的癌基因,但是部分低频率突变的基因也可能在癌症发生过程中发挥重要作用。具有相似系统发生谱并且具有蛋白互作关系的基因可能具有相似的功能,它们的损伤可能会导致相同或相似的疾病表型。基于这一假设,文章提出了一种发现候选癌基因的新方法。首先,寻找具有相似系统发生谱的蛋白质互作子网,定义为共进化基因模块;然后,在癌基因组中发生至少一次非同义体细胞突变的基因中,筛选出与已知癌基因在同一共进化模块并具有直接相互作用的基因,预测为候选癌基因。据此,文章共预测了15个候选癌基因,其中只有2个基因在以往的工作中通过基于高突变频率的方法被识别为癌基因。因此,该方法可以有效地发现突变频率低的候选癌基因。  相似文献   

12.
MOTIVATION:The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network. The method exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors. A binomial model of local neighbor function labeling probability is combined with a Markov random field propagation algorithm to assign function probabilities for proteins in the network. RESULTS: We applied the method to a protein-protein interaction dataset for the yeast Saccharomyces cerevisiae using the Gene Ontology (GO) terms as function labels. The method reconstructed known GO term assignments with high precision, and produced putative GO assignments to 320 proteins that currently lack GO annotation, which represents about 10% of the unlabeled proteins in S. cerevisiae.  相似文献   

13.

Background  

Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes.  相似文献   

14.

Background  

Nowadays modern biology aims at unravelling the strands of complex biological structures such as the protein-protein interaction (PPI) networks. A key concept in the organization of PPI networks is the existence of dense subnetworks (functional modules) in them. In recent approaches clustering algorithms were applied at these networks and the resulting subnetworks were evaluated by estimating the coverage of well-established protein complexes they contained. However, most of these algorithms elaborate on an unweighted graph structure which in turn fails to elevate those interactions that would contribute to the construction of biologically more valid and coherent functional modules.  相似文献   

15.

Background

Host-pathogen interactions that may lead to a competitive co-evolution of virulence and resistance mechanisms present an attractive system to study molecular evolution because strong, recent (or even current) selective pressure is expected at many genomic loci. However, it is unclear whether these selective forces would act to preserve existing diversity, promote novel diversity, or reduce linked neutral diversity during rapid fixation of advantageous alleles. In plants, the lack of adaptive immunity places a larger burden on genetic diversity to ensure survival of plant populations. This burden is even greater if the generation time of the plant is much longer than the generation time of the pathogen.

Methodology/Principal Findings

Here, we present nucleotide polymorphism and substitution data for 41 candidate genes from the long-lived forest tree loblolly pine, selected primarily for their prospective influences on host-pathogen interactions. This dataset is analyzed together with 15 drought-tolerance and 13 wood-quality genes from previous studies. A wide range of neutrality tests were performed and tested against expectations from realistic demographic models.

Conclusions/Significance

Collectively, our analyses found that axr (auxin response factor), caf1 (chromatin assembly factor) and gatabp1 (gata binding protein 1) candidate genes carry patterns consistent with directional selection and erd3 (early response to drought 3) displays patterns suggestive of a selective sweep, both of which are consistent with the arm-race model of disease response evolution. Furthermore, we have identified patterns consistent with diversifying selection at erf1-like (ethylene responsive factor 1), ccoaoemt (caffeoyl-CoA-O-methyltransferase), cyp450-like (cytochrome p450-like) and pr4.3 (pathogen response 4.3), expected under the trench-warfare evolution model. Finally, a drought-tolerance candidate related to the plant cell wall, lp5, displayed patterns consistent with balancing selection. In conclusion, both arms-race and trench-warfare models seem compatible with patterns of polymorphism found in different disease-response candidate genes, indicating a mixed strategy of disease tolerance evolution for loblolly pine, a major tree crop in southeastern United States.  相似文献   

16.
17.
18.
Genome-wide experimental methods to identify disease genes, such as linkage analysis and association studies, generate increasingly large candidate gene sets for which comprehensive empirical analysis is impractical. Computational methods employ data from a variety of sources to identify the most likely candidate disease genes from these gene sets. Here, we review seven independent computational disease gene prioritization methods, and then apply them in concert to the analysis of 9556 positional candidate genes for type 2 diabetes (T2D) and the related trait obesity. We generate and analyse a list of nine primary candidate genes for T2D genes and five for obesity. Two genes, LPL and BCKDHA, are common to these two sets. We also present a set of secondary candidates for T2D (94 genes) and for obesity (116 genes) with 58 genes in common to both diseases.  相似文献   

19.

Background

The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult.

Methods

We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks.

Results

We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks.

Conclusions

The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号