首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

While the gargantuan multi-nation effort of sequencing T. aestivum gets close to completion, the annotation process for the vast number of wheat genes and proteins is in its infancy. Previous experimental studies carried out on model plant organisms such as A. thaliana and O. sativa provide a plethora of gene annotations that can be used as potential starting points for wheat gene annotations, proven that solid cross-species gene-to-gene and protein-to-protein correspondences are provided.

Results

DNA and protein sequences and corresponding annotations for T. aestivum and 9 other plant species were collected from Ensembl Plants release 22 and curated. Cliques of predicted 1-to-1 orthologs were identified and an annotation enrichment model was defined based on existing gene-GO term associations and phylogenetic relationships among wheat and 9 other plant species. A total of 13 cliques of size 10 were identified, which represent putative functionally equivalent genes and proteins in the 10 plant species. Eighty-five new and more specific GO terms were associated with wheat genes in the 13 cliques of size 10, which represent a 65% increase compared with the previously 130 known GO terms. Similar expression patterns for 4 genes from Arabidopsis, barley, maize and rice in cliques of size 10 provide experimental evidence to support our model. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation.

Conclusions

Our novel comparative genomics approach enriches existing T. aestivum gene annotations based on cliques of predicted 1-to-1 orthologs, phylogenetic relationships and existing gene ontologies from 9 other plant species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1496-2) contains supplementary material, which is available to authorized users.  相似文献   

2.
PCA (principal components analysis) and ANN (artificial neural network) are two broadly used pattern recognition methods in metabolomics data-mining. Yet their limitations sometimes are great obstacles for researchers. In this paper the wavelet transform (WT) method was used to integrate with PCA and ANN to improve their performance in manipulating metabolomics data. A dataset was decomposed by wavelets and then reconstructed. The "hard thresholding" algorithm was used, through which the detail information was discarded, and the entire "metabolomics image" reconstructed on the significant information. It was supposed that the most relevant information was captured after this process. It was found that, thanks to its ability in denoising data, the WT method could significantly improve the performance of the non-linear essence-extracting method ANN in classifying samples; further integration of WT with PCA showed that WT could greatly enhance the ability of PCA in distinguishing one group of samples from another and also its ability in identifying potential biomarkers. The results highlighted WT as a promising resolution in bridging the gap between huge bytes of data and the instructive biological information.  相似文献   

3.

Background

Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of ‘omics’ data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis.

Results

We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions.

Conclusions

Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0386-y) contains supplementary material, which is available to authorized users.  相似文献   

4.
The tomato Pto gene encodes a serine/threonine kinase (STK) whose molecular characterization has provided valuable insights into the disease resistance mechanism of tomato and it is considered as a promising candidate for engineering broad-spectrum pathogen resistance in this crop. In this study, a pair of degenerate primers based on conserved subdomains of plant STKs similar to the tomato Pto protein was used to amplify similar sequences in banana. A fragment of approximately 550 bp was amplified, cloned and sequenced. The sequence analysis of several clones revealed 13 distinct sequences highly similar to STKs. Based on their significant similarity with the tomato Pto protein (BLASTX E value <3e-53), seven of them were classified as Pto resistance gene candidates (Pto-RGCs). Multiple sequence alignment of the banana Pto-RGC products revealed that these sequences contain several conserved subdomains present in most STKs and also several conserved residues that are crucial for Pto function. Moreover, the phylogenetic analysis showed that the banana Pto-RGCs were clustered with Pto suggesting a common evolutionary origin with this R gene. The Pto-RGCs isolated in this study represent a valuable sequence resource that could assist in the development of disease resistance in banana.  相似文献   

5.
6.
The rice gene Xa21 represents a unique class of plant disease resistance (R) genes with distinct protein structure and broad-spectrum specificity; few sequences or genes of this class have been cloned and characterized in other plant species. Degenerate primers were designed from the conserved motifs in the kinase domains of Xa21 and tomato Pto, and used in PCR amplification to identify this class of resistance gene candidate (RGC) sequences from citrus for future evaluation of possible association with citrus canker resistance. Twenty-nine RGC sequences highly similar to the kinase domain of Xa21 (55%–60% amino-acid identity) were cloned and characterized. To facilitate recovery of full-length gene structures and to overcome RGC mapping limitations, large-insert genomic clones (BACs) were identified, fingerprinted and assembled into contigs. Southern hybridization revealed the presence of 1–3 copies of receptor-like kinase sequences (i.e., clustering) in each BAC. Some of these sequences were sampled by PCR amplification and direct sequencing. Twenty-three sequences were thus obtained and classified into five groups and eight subgroups, which indicates the possibility of enhancing RGC sequence diversity from BACs. A primer-walking strategy was employed to derive full-length gene structures from two BAC clones; both sequences 17o6RLK and 26m19RLK contained all the features of the rice Xa21 protein, including a signal peptide, the same number of leucine-rich-repeats, and transmembrane and kinase domains. These results demonstrate that PCR amplification with appropriately designed degenerate primers is an efficient approach for cloning receptor-like kinase class RGCs. Utilization of BAC clones can facilitate this approach in multiple ways by improving sequence diversity, providing full-length genes, and assisting in understanding gene structures and distribution.Communicated by P. Langridge  相似文献   

7.
Model organisms represent an important resource for understanding the fundamental aspects of mammalian biology. Mapping of biological phenomena between model organisms is complex and if it is to be meaningful, a simplified representation can be a powerful means for comparison. The Developmental eVOC ontologies presented here are simplified orthogonal ontologies describing the temporal and spatial distribution of developmental human and mouse anatomy. We demonstrate the ontologies by identifying genes showing a bias for developmental brain expression in human and mouse.  相似文献   

8.
Integration of biological networks and gene expression data using Cytoscape   总被引:1,自引:0,他引:1  
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.  相似文献   

9.
Gramene: development and integration of trait and gene ontologies for rice   总被引:1,自引:0,他引:1  
Gramene (http://www.gramene.org/) is a comparative genome database for cereal crops and a community resource for rice. We are populating and curating Gramene with annotated rice (Oryza sativa) genomic sequence data and associated biological information including molecular markers, mutants, phenotypes, polymorphisms and Quantitative Trait Loci (QTL). In order to support queries across various data sets as well as across external databases, Gramene will employ three related controlled vocabularies. The specific goal of Gramene is, first to provide a Trait Ontology (TO) that can be used across the cereal crops to facilitate phenotypic comparisons both within and between the genera. Second, a vocabulary for plant anatomy terms, the Plant Ontology (PO) will facilitate the curation of morphological and anatomical feature information with respect to expression, localization of genes and gene products and the affected plant parts in a phenotype. The TO and PO are both in the early stages of development in collaboration with the International Rice Research Institute, TAIR and MaizeDB as part of the Plant Ontology Consortium. Finally, as part of another consortium comprising macromolecular databases from other model organisms, the Gene Ontology Consortium, we are annotating the confirmed and predicted protein entries from rice using both electronic and manual curation.  相似文献   

10.
Chan WC  Ho MR  Li SC  Tsai KW  Lai CH  Hsu CN  Lin WC 《Genomics》2012,100(3):141-148
Recent genome-wide surveys on ncRNA have revealed that a substantial fraction of miRNA genes is likely to form clusters. However, the evolutionary and biological function implications of clustered miRNAs are still elusive. After identifying clustered miRNA genes under different maximum inter-miRNA distances (MIDs), this study intended to reveal evolution conservation patterns among these clustered miRNA genes in metazoan species using a computation algorithm. As examples, a total of 15-35% of known and predicted miRNA genes in nine selected species constitute clusters under the MIDs ranging from 1kb to 50kb. Intriguingly, 33 out of 37 metazoan miRNA clusters in 56 metazoan genomes are co-conserved with their up/down-stream adjacent protein-coding genes. Meanwhile, a co-expression pattern of miR-1 and miR-133a in the mir-133-1 cluster has been experimentally demonstrated. Therefore, the MetaMirClust database provides a useful bioinformatic resource for biologists to facilitate the advanced interrogations on the composition of miRNA clusters and their evolution patterns.  相似文献   

11.
番茄Pto基因是一类可以编码丝氨酸/苏氨酸激酶(STK)序列的广谱抗性候选基因,其序列克隆与鉴定为深入了解番茄的抗病机制奠定了基础.在该研究中,一对依据Pto基因的保守序列设计的简并引物被用来扩增巴西橡胶中Pto基因抗病同源序列,扩增得到了一个约550 bp的基因片段,其随后被克隆并测序.序列分析发现,其中的7个抗病同源序列与Pto基因高度同源(BLASTX E value <3e-53),所以其被认为是Pto基因抗病同源序列(Pto-RGCs).通过巴西橡胶的Pto-RGCs多序列比对表明,这些序列包含了多个STKs保守的次级结构域.此外,系统发育分析也表明,巴西橡胶的Pto-RGCs属于Pto基因同源的R基因.该研究结果中Pto-RGCs可为巴西橡胶抗病的发展提供一个有效的基因资源.  相似文献   

12.

Background

Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization.

Results

We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance.

Conclusion

We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-315) contains supplementary material, which is available to authorized users.  相似文献   

13.
Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis.  相似文献   

14.
15.
16.
17.
18.

Background  

Current efforts within the biomedical ontology community focus on achieving interoperability between various biomedical ontologies that cover a range of diverse domains. Achieving this interoperability will contribute to the creation of a rich knowledge base that can be used for querying, as well as generating and testing novel hypotheses. The OBO Foundry principles, as applied to a number of biomedical ontologies, are designed to facilitate this interoperability. However, semantic extensions are required to meet the OBO Foundry interoperability goals. Inconsistencies may arise when ontologies of properties – mostly phenotype ontologies – are combined with ontologies taking a canonical view of a domain – such as many anatomical ontologies. Currently, there is no support for a correct and consistent integration of such ontologies.  相似文献   

19.
Chronic obstructive pulmonary disease (COPD) is a major public health problem with increasing prevalence worldwide. The primary aim of this study was to identify genes and gene ontologies associated with COPD severity. Gene expression profiling was performed on total RNA extracted from lung tissue of 18 former smokers with COPD. Class comparison analysis on mild (n = 9, FEV1 80–110% predicted) and moderate (n = 9, FEV1 50–60% predicted) COPD patients identified 46 differentially expressed genes (p<0.01), of which 14 genes were technically confirmed by quantitative real-time-PCR. Biological replication in an independent test set of 58 lung samples confirmed the altered expression of ten genes with increasing COPD severity, with eight of these genes (NNMT, THBS1, HLA-DPB1, IGHD, ETS2, ELF1, PTGDS and CYRBD1) being differentially expressed by greater than 1.8 fold between mild and moderate COPD, identifying these as candidate determinants of COPD severity. These genes belonged to ontologies potentially implicated in COPD including angiogenesis, cell migration, proliferation and apoptosis. Our secondary aim was to identify gene ontologies common to airway obstruction, indicated by impaired FEV1 and KCO. Using gene ontology enrichment analysis we have identified relevant biological and molecular processes including regulation of cell-matrix adhesion, leukocyte activation, cell and substrate adhesion, cell adhesion, angiogenesis, cell activation that are enriched among genes involved in airflow obstruction. Exploring the functional significance of these genes and their gene ontologies will provide clues to molecular changes involved in severity of COPD, which could be developed as targets for therapy or biomarkers for early diagnosis.  相似文献   

20.
Downy mildew (Plasmopara halstedii (Farl.) Berlese et de Toni) is a serious foliar pathogen of cultivated sunflower (Helianthus annuus L.). Genetic resistance is conditioned by several linked downy mildew resistance gene specificities in the HaRGC1 cluster of TIR-NBS-LRR resistance gene candidates (RGCs) on linkage group 8. The complexity and diversity of the HaRGC1 cluster was assessed by multilocus intron fragment length polymorphism (IFLP) genotyping using a single pair of primers flanking a hypervariable intron located between the TIR and NBS domains. Two to 23 bands were amplified per germplasm accession. The size of the included intron ranged from 89 to 858 nucleotides. Forty-eight unique markers were distinguished among 24 elite inbred lines, six partially isogenic inbred lines, nine open-pollinated populations, four Native American land races, and 20 wild H. annuus populations. Nine haplotypes (based on 24 RGCs) were identified among elite inbred lines and were correlated with known downy mildew resistance specificities. Sixteen out of 39 RGCs identified in wild H. annuus populations were not observed in elite germplasm. Five partially isogenic downy mildew resistant lines developed from wild H. annuus and H. praecox donors carried eight RGCs not found in other elite inbred lines. Twenty-four HaRGC1 loci were mapped to a 2-4 cM segment of linkage group 8. The multilocus IFLP marker and duplicated, hypervariable microsatellite markers tightly linked to the HaRGC1 cluster are powerful tools for distinguishing downy mildew resistance gene specificities and identifying and introgressing new downy mildew resistance gene specificities from wild sunflowers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号