首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Genes that are co-expressed tend to be involved in the same biological process. However, co-expression is not a very reliable predictor of functional links between genes. The evolutionary conservation of co-expression between species can be used to predict protein function more reliably than co-expression in a single species. Here we examine whether co-expression across multiple species is also a better prioritizer of disease genes than is co-expression between human genes alone.  相似文献   

2.

Background  

Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes.  相似文献   

3.

Background  

The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.  相似文献   

4.
At different stages of any research project, molecular biologists need to choose - often somewhat arbitrarily, even after careful statistical data analysis - which genes or proteins to investigate further experimentally and which to leave out because of limited resources. Computational methods that integrate complex, heterogeneous data sets - such as expression data, sequence information, functional annotation and the biomedical literature - allow prioritizing genes for future study in a more informed way. Such methods can substantially increase the yield of downstream studies and are becoming invaluable to researchers.  相似文献   

5.
Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL's classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.  相似文献   

6.
The identification of genes associated with hereditary disorders has contributed to improving medical care and to a better understanding of gene functions, interactions, and pathways. However, there are well over 1500 Mendelian disorders whose molecular basis remains unknown. At present, methods such as linkage analysis can identify the chromosomal region in which unknown disease genes are located, but the regions could contain up to hundreds of candidate genes. In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for definition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming previous methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend beyond the disease proteins themselves.  相似文献   

7.

Background  

Although most of the current disease candidate gene identification and prioritization methods depend on functional annotations, the coverage of the gene functional annotations is a limiting factor. In the current study, we describe a candidate gene prioritization method that is entirely based on protein-protein interaction network (PPIN) analyses.  相似文献   

8.
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein–protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.  相似文献   

9.
10.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

11.
Exon discovery by genomic sequence alignment   总被引:5,自引:0,他引:5  
MOTIVATION: During evolution, functional regions in genomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous genomic sequences, for example from human and mouse. The main advantage of these methods is that they are based on simple and generally applicable measures of (local) sequence similarity; unlike standard gene-finding approaches they do not depend on species-specific training data or on the presence of cognate genes in data bases. As all comparative sequence-analysis methods, the new comparative gene-finding approaches critically rely on the quality of the underlying sequence alignments. RESULTS: Herein, we describe a new implementation of the sequence-alignment program DIALIGN that has been developed for alignment of large genomic sequences. We compare our method to the alignment programs PipMaker, WABA and BLAST and we show that local similarities identified by these programs are highly correlated to protein-coding regions. In our test runs, PipMaker was the most sensitive method while DIALIGN was most specific. AVAILABILITY: The program is downloadable from the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/.  相似文献   

12.
致病基因的定位候选克隆   总被引:2,自引:0,他引:2  
基因组研究的迅猛发展,使我们有必要重新审视致病基因克隆的各种策略与技术,以及人类基因组研究在致病基因克隆中的作用。定位候选克隆基因策略强调充分利用已知的细胞遗传学、医学遗传学、分子遗传学、分子生物学和生物化学知识,特别是人类基因组研究的最新成果,综合功能克隆、定位克隆与传统候选基因研究的策略,分离鉴定致病基因。今天的定位克隆已几乎不再需要染色体步移,甚至有可能避开cDNA筛选。  相似文献   

13.
14.
Chen Y  Wang W  Zhou Y  Shields R  Chanda SK  Elston RC  Li J 《PloS one》2011,6(6):e21137
Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.  相似文献   

15.
Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/.  相似文献   

16.
17.
Zhao J  Yang TH  Huang Y  Holme P 《PloS one》2011,6(9):e24306
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.  相似文献   

18.
Asthma is a complex disorder in which major genetic and environmental factors interact to initiate the disease and propagate it as a chronic relapsing disorder. Until recently, genetic factors implicated in the disease pathogenesis have been restricted to variants in known molecules involved in the inflammatory or remodelling pathways. This review discusses evidence for a new susceptibility gene for asthma, ADAM33, which was identified by positional cloning and shown to be selectively expressed in mesenchymal but not immune or inflammatory cells. ADAM33 belongs to a family of membrane-anchored metalloproteinases that also have fusagenic, adhesion and intracellular signalling properties. ADAM33 might play a key role in predisposing to the reduced lung function characteristic of asthma, possibly by influencing airway wall remodelling.  相似文献   

19.
20.
Several studies have reported markers linked to a putative resistance gene from Poncirus trifoliata (Ctv-R) located at linkage group 4 that confers resistance against one of the most important citrus pathogens, citrus tristeza virus (CTV). To be successful in both marker-assisted selection and transformation experiments, its accurate mapping is needed. Several factors may affect its localization, among them two are considered here: the definition of resistance and the genetic background of progeny.Two progenies derived from P. trifoliata, by self-pollination and by crossing with sour orange (Citrus aurantium), a citrus rootstock well-adapted to arid and semi-arid areas, were used for linkage group-4 marker enrichment. Two new methodologies were used to enrich this region with expressed sequences. The enrichment of group 4 resulted in the fusion of several C. aurantium linkage groups. The new one A(7+3+4) is now saturated with 48 markers including expressed sequences. Surprisingly, sour orange was as resistant to the CTV isolate tested as was P. trifoliata, and three hybrids that carry Ctv-R, as deduced from its flanking markers, are susceptible to CTV. The new linkage maps were used to map Ctv-R under the hypothesis of monogenic inheritance. Its position on linkage group 4 of P. trifoliata differs from the location previously reported in other progenies. The genetic analysis of virus-plant interaction in the family derived from C. aurantium after a CTV chronic infection showed the segregation of five types of interaction, which is not compatible with the hypothesis of a single gene controlling resistance. Two major issues are discussed: another type of genetic analysis of CTV resistance is needed to avoid the assumption of monogenic inheritance, and transferring Ctv-R from P. trifoliata to sour orange might not avoid the CTV decline of sweet orange trees.Communicated by C. Möllers  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号