首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations -- tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at (http://datam.i2r.a-star.edu.sg/mstrap).  相似文献   

2.
3.
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.  相似文献   

4.
Linkage studies of complex traits frequently yield multiple linkage regions covering hundreds of genes. Testing each candidate gene from every region is prohibitively expensive and computational methods that simplify this process would benefit genetic research. We present a new method based on commonality of functional annotation (CFA) that aids dissection of complex traits for which multiple causal genes act in a single pathway or process. CFA works by testing individual Gene Ontology (GO) terms for enrichment among candidate gene pools, performs multiple hypothesis testing adjustment using an estimate of independent tests based on correlation of GO terms, and then scores and ranks genes annotated with significantly-enriched terms based on the number of quantitative trait loci regions in which genes bearing those annotations appear. We evaluate CFA using simulated linkage data and show that CFA has good power despite being conservative. We apply CFA to published linkage studies investigating age-of-onset of Alzheimer's disease and body mass index and obtain previously known and new candidate genes. CFA provides a new tool for studies in which causal genes are expected to participate in a common pathway or process and can easily be extended to utilize annotation schemes in addition to the GO.  相似文献   

5.
A uniform system for microRNA annotation   总被引:57,自引:1,他引:57  
MicroRNAs (miRNAs) are small noncoding RNA gene products about 22 nt long that are processed by Dicer from precursors with a characteristic hairpin secondary structure. Guidelines are presented for the identification and annotation of new miRNAs from diverse organisms, particularly so that miRNAs can be reliably distinguished from other RNAs such as small interfering RNAs. We describe specific criteria for the experimental verification of miRNAs, and conventions for naming miRNAs and miRNA genes. Finally, an online clearinghouse for miRNA gene name assignments is provided by the Rfam database of RNA families.  相似文献   

6.
SUMMARY: We describe a database and information discovery system named DIG (Duke Integrated Genomics) designed to facilitate the process of gene annotation and the discovery of functional context. The DIG system collects and organizes gene annotation and functional information, and includes tools that support an understanding of genes in a functional context by providing a framework for integrating and visualizing gene expression, protein interaction and literature-based interaction networks.  相似文献   

7.
8.
朱宇  冯迟  谭华荣  田宇清 《微生物学报》2013,53(10):1031-1042
摘要:【目的】构建用于阻遏链霉菌隐性次级代谢基因簇表达的负调控因子筛选的报告系统。【方法】通过“REDIRECT (Rapid Efficient Directed Recombination Time Saving)”技术结合链霉菌温和噬菌体BT1整合酶的体内位点特异性重组技术,对链霉菌中多基因进行无痕敲除。以链霉菌隐性次级代谢基因簇中受阻遏的启动子驱动链霉菌中保守的inoA 构建报告质粒,针对阻遏次级代谢基因簇表达的负调控基因的突变进行检测,以验证报告系统的可行性。【结果】本研究首先通过对天蓝色链霉菌的肌醇从头合成途径关键酶基因inoA,及合成黄色聚酮类隐性抗生素(yellow cryptic polyketide,yCPK)的途径特异性负调控基因scbR2依次进行了无痕敲除,以构建进一步筛选所用的受体菌,再以scbR2阻遏的cpkO启动子控制inoA 的表达构建了报告质粒pIJ8660::PcpkO::inoA。结果显示沉默的cpkO 启动子在突变的受体菌中被激活并使inoA得到了表达,可以使inoA的光秃型突变表型在不添加肌醇的培养基上恢复到产孢的野生型表型。【结论】inoA可以作为新的链霉菌普遍适用的报告基因,可方便地通过表型变化的观察进行筛选,同时可针对性对负调控基因的突变进行检测,可应用于链霉菌隐性抗生素激活的研究。  相似文献   

9.

Background  

Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature.  相似文献   

10.
11.
Abstract

High-throughput methods are now routinely used to rapidly screen chemicals for potential hazard. However, hazard-based decision-making excludes important exposure considerations resulting in an incomplete estimation of chemical safety. Models to estimate exposure exist, but are generally unsuited to keep up with high-throughput demands. The High-Throughput Exposure Assessment Tool (HEAT) is designed to efficiently predict near-field exposure to consumers and workers via inhalation, oral and dermal routes. HEAT is based on well-known modeling algorithms and provides default model parameters to support reasonably conservative exposure estimates. Underlying chemical-specific data are uploaded or entered by the end user. HEAT’s main strength is the flexible tiered screening functionality, which enables exposure estimates for single or multiple chemicals simultaneously. Hypothetical case examples highlighting the application of HEAT to more complex exposure estimates for alternative and aggregate assessments are provided.  相似文献   

12.
13.
14.
15.
Albinism is a group of inherited conditions in which affected individuals have less than normal pigment in the eyes, skin, and hair compared to others of the same race and ethnic background. The prevalence of all types of albinism in the United States is estimated at 1 in 20,000, based on poor epidemiological data. X-linked Nettleship-Falls ocular albinism (XLOA, OA1) affects approximately 1/150,000 males in the population. XLOA effects reduce visual acuity and nystagmus, result in a mild skin and hair phenotype, and occur mostly in XY males. Female carriers of XLOA have normal visual acuity, but often show iris punctate transillumination and a classic pattern of mosaic retinal pigmentation, coarse and grainy in the macula and becoming increasingly reticular into the periphery of the retinal pigment epithelium. Studies of OA1 have shown linkage of a single gene to markers at Xp22.3-p22.2. About 48% of the reported mutations in the OA1 gene are intragenic deletions and about 43% are point mutations. We present a hierarchical strategy for mutation screening for diagnostic testing for OA1 that comprises two tiers: first, multiplex PCR to detect intragenic deletions in the OA1 gene with denaturing high-performance liquid chromatography (dHPLC), and, second, heteroduplex analysis with dHPLC to scan for mutations, with subsequent sequencing of variants to confirm putative mutations in the OA1 gene. Prenatal diagnosis can be provided for families when the mutation has been firmly identified. We have validated this procedure with positive controls that were identified in patients by Southern blot, single-stranded conformation polymorphism (SSCP), and sequencing. In this hierarchical strategy, these procedures have an analytical sensitivity of > 99%.  相似文献   

16.
17.
The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.  相似文献   

18.
19.
MOTIVATION: Phylogenomic approaches towards functional and evolutionary annotation of unknown sequences have been suggested to be superior to those based only on pairwise local alignments. User-friendly software tools making the advantages of phylogenetic annotation available for the ever widening range of bioinformatically uninitiated biologists involved in genome/EST annotation projects are, however, not available. We were particularly confronted with this issue in the annotation of sequences from different groups of complex algae originating from secondary endosymbioses, where the identification of the phylogenetic origin of genes is often more problematic than in taxa well represented in the databases (e.g. animals, plants or fungi). RESULTS: We present a flexible pipeline with a user-friendly, interactive graphical user interface running on desktop computers that automatically performs a basic local alignment search tool (BLAST) search of query sequences, selects a representative subset of them, then creates a multiple alignment from the selected sequences, and finally computes a phylogenetic tree. The pipeline, named PhyloGena, uses public domain software for all standard bioinformatics tasks (similarity search, multiple alignment, and phylogenetic reconstruction). As the major technological innovation, selection of a meaningful subset of BLAST hits was implemented using logic programming, mimicing the selection procedure (BLAST tables, multiple alignments and phylogenetic trees) are displayed graphically, allowing the user to interact with the pipeline and deduce the function and phylogenetic origin of the query. PhyloGena thus makes phylogenomic annotation available also for those biologists without access to large computing facilities and with little informatics background. Although phylogenetic annotation is particularly useful when working with composite genomes (e.g. from complex algae), PhyloGena can be helpful in expressed sequence tag and genome annotation also in other organisms. AVAILABILITY: PhyloGena (executables for LINUX and Windows 2000/XP as well as source code) is available by anonymous ftp from http://www.awi.de/en/phylogena.  相似文献   

20.
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. METHODOLOGY: The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号