期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ruleminer: a knowledge system for supporting high-throughput protein function annotations

Yu GX 《Journal of bioinformatics and computational biology》2004,2(4):615-637

In this paper, we present RuleMiner, a knowledge system to facilitate a seamless integration of multi-sequence analysis tools and define profile-based rules for supporting high-throughput protein function annotations. This system consists of three essential components, Protein Function Groups (PFGs), PFG profiles and rules. The PFGs, established from an integrated analysis of current knowledge of protein functions from Swiss-Prot database and protein family-based sequence classifications, cover all possible cellular functions available in the database. The PFG profiles illustrate detailed protein features in the PFGs as in sequence conservations, the occurrences of sequence-based motifs, domains and species distributions. The rules, extracted from the PFG profiles, describe the clear relationships between these PFGs and all possible features. As a result, the RuleMiner is able to provide an enhanced capability for protein function analysis, such as results from the integrated sequence analysis tools for given proteins can be comparatively analyzed due to the clear feature-PFG relationships. Also, much needed guidance is readily available for such analysis. If the rules describe one-to-one (unique) relationships between the protein features and the PFGs, then these features can be utilized as unique functional identifiers and cellular functions of unknown proteins can be reliably determined. Otherwise, additional information has to be provided. 相似文献

2.

Computational identification of residues that modulate voltage sensitivity of voltage-gated potassium channels

Bin?Li Warren?J?Gallin Email author 《BMC structural biology》2005,5(1):16

Background

Studies of the structure-function relationship in proteins for which no 3D structure is available are often based on inspection of multiple sequence alignments. Many functionally important residues of proteins can be identified because they are conserved during evolution. However, residues that vary can also be critically important if their variation is responsible for diversity of protein function and improved phenotypes. If too few sequences are studied, the support for hypotheses on the role of a given residue will be weak, but analysis of large multiple alignments is too complex for simple inspection. When a large body of sequence and functional data are available for a protein family, mature data mining tools, such as machine learning, can be applied to extract information more easily, sensitively and reliably. We have undertaken such an analysis of voltage-gated potassium channels, a transmembrane protein family whose members play indispensable roles in electrically excitable cells. 相似文献

3.

In Silico Characterization of Proteins: UniProt,InterPro and Integr8

Mulder NJ Kersey P Pruess M Apweiler R 《Molecular biotechnology》2008,38(2):165-177

Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis. 相似文献

4.

ProFASTA: a pipeline web server for fungal protein scanning with integration of cell surface prediction software

de Groot PW Brandt BW 《Fungal genetics and biology : FG & B》2012,49(2):173-179

Surface proteins, such as those located in the cell wall of fungi, play an important role in the interaction with the surrounding environment. For instance, they mediate primary host-pathogen interactions and are crucial to the establishment of biofilms and fungal infections. Surface localization of proteins is determined by specific sequence features and can be predicted by combining different freely available web servers. However, user-friendly tools that allow rapid analysis of large datasets (whole proteomes or larger) in subsequent analyses were not yet available. Here, we present the web tool ProFASTA, which integrates multiple tools for rapid scanning of protein sequence properties in large datasets and returns sequences in FASTA format. ProFASTA also allows for pipeline filtering of proteins with cell surface characteristics by analysis of the output created with SignalP, TMHMM and big-PI. In addition, it provides keyword, iso-electric point, composition and pattern scanning. Furthermore, ProFASTA contains all fungal protein sequences present in the NCBI Protein database. As the full fungal NCBI Taxonomy is included, sequence subsets can be selected by supplying a taxon name. The usefulness of ProFASTA is demonstrated here with a few examples; in the recent past, ProFASTA has already been applied successfully to the annotation of covalently-bound fungal wall proteins as part of community-wide genome annotation programs. ProFASTA is available at: http://www.bioinformatics.nl/tools/profasta/. 相似文献

5.

Molecular Biocomputing Suite: a word processor add-in for the analysis and manipulation of nucleic acid and protein sequence data.

P Y Muller E Studer A R Miserez 《BioTechniques》2001,31(6):1306, 1308, 1310-1306, 1308, 1313

In all fields of molecular biology, researchers are increasingly challenged by experiments planned and evaluated on the basis of nucleic acid and protein sequence data generally retrieved from public databases. Despite the wide spectrum of available Web-based software tools for sequence analysis, the routine use of these tools has disadvantages, particularly because of the elaborate and heterogeneous ways of data input, output, and storage. Here we present a Visual Basic-encoded Microsoft Word Add-In, the Molecular BioComputing Suite (MBCS), available at the BioTechniques Software Library (www.BioTechniques.com). The MBCS software aims to manage and expedite a wide range of sequence analyses and manipulations using an integrated text editor environment including menu-guided commands. Its independence of sequence formats enables MBCS to be used as a pivotal application between other software tools for sequence analysis, manipulation, annotation, and editing. 相似文献

6.

Tools and resources for identifying protein families, domains and motifs

下载免费PDF全文

Nicola J Mulder Rolf Apweiler 《Genome biology》2001,3(1):1-8

With the large influx of raw sequence data from genome sequencing projects, there is a need for reliable automatic methods for protein sequence analysis and classification. The most useful tools use various methods for identifying motifs or domains found in previously characterized protein families. This article reviews the tools and resources available on the web for identifying signatures within proteins and discusses how they may be used in the analysis of new or unknown protein sequences. 相似文献

7.

MMDB: Entrez's 3D structure database. 总被引：5，自引：1，他引：4

下载免费PDF全文

A Marchler-Bauer K J Addess C Chappey L Geer T Madej Y Matsuo Y Wang S H Bryant 《Nucleic acids research》1999,27(1):240-243

The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The goal of Entrez's 3D-structure database is to make this information accessible and usable by molecular biologists (http://www.ncbi.nlm.nih.gov/Entrez). To this end Entrez provides two major analysis tools, a search engine based on sequence and structure 'neighboring' and an integrated visualization system for sequence and structure alignments. From a protein's sequence 'neighbors' one may rapidly identify other members of a protein family, including those where 3D structure is known. By comparing aligned sequences and/or structures in detail, using the visualization system, one may identify conserved features and perhaps infer functional properties. Here we describe how these analysis tools may be used to investigate the structure and function of newly discovered proteins, using the PTEN gene product as an example. 相似文献

8.

INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis

下载免费PDF全文

Coessens B Thijs G Aerts S Marchal K De Smet F Engelen K Glenisson P Moreau Y Mathys J De Moor B 《Nucleic acids research》2003,31(13):3468-3470

INCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services. The web pages are connected and integrated to reflect a methodology and facilitate complex analysis using different tools. The web services can be invoked using standard SOAP messaging. Example clients are available for download to invoke the services from a remote computer or to be integrated with other applications. All services are catalogued and described in a web service registry. The INCLUSive web portal is available for academic purposes at http://www.esat.kuleuven.ac.be/inclusive. 相似文献

9.

Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models

Yu-Yen Ou Quang-Thai Ho Heng-Ta Chang 《Proteomics》2023,23(23-24):2200494

Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine. 相似文献

10.

微生物基因组注释系统MGAP 总被引：6，自引：0，他引：6

禹胄李涛蔡涛赵进东罗静初《微生物学报》2003,43(6):805-808

利用生物信息学方法和工具开发了微生物基因组注释系统（Microbial genome annotation package, MGAP）,并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。相似文献

11.

BMT: Bioinformatics mini toolbox for comprehensive DNA and protein analysis

《Genomics》2020,112(6):4561-4566

BackgroundBioinformatics tools are of great significance and are used in different spheres of life sciences. There are wide variety of tools available to perform primary analysis of DNA and protein but most of them are available on different platforms and many remain undetected. Accessing these tools separately to perform individual task is uneconomical and inefficient.ObjectiveOur aim is to bring different bioinformatics models on a single platform to ameliorate scientific research. Hence, our objective is to make a tool for comprehensive DNA and protein analysis.MethodsTo develop a reliable, straight-forward and standalone desktop application we used state of the art python packages and libraries. Bioinformatics Mini Toolbox (BMT) is combination of seven tools including FastqTrimmer, Gene Prediction, DNA Analysis, Translation, Protein analysis and Pairwise and Multiple alignment.ResultsFastqTrimmer assists in quality assurance of NGS data. Gene prediction predicts the genes by homology from novel genome on the basis of reference sequence. Protein analysis and DNA analysis calculates physiochemical properties of nucleotide and protein sequences, respectively. Translation translates the DNA sequence into six open reading frames. Pairwise alignment performs pairwise global and local alignment of DNA and protein sequences on the basis or multiple matrices. Multiple alignment aligns multiple sequences and generates a phylogenetic tree.ConclusionWe developed a tool for comprehensive DNA and protein analysis. The link to download BMT is https://github.com/nasiriqbal012/BMT_SETUP.git 相似文献

12.

Prediction of Certain Well-Characterized Domains of Known Functions within the PE and PPE Proteins of Mycobacteria

Rafiya Sultana Karunakar Tanneeru Ashwin B. R. Kumar Lalitha Guruprasad 《PloS one》2016,11(2)

The PE and PPE protein family are unique to mycobacteria. Though the complete genome sequences for over 500 M. tuberculosis strains and mycobacterial species are available, few PE and PPE proteins have been structurally and functionally characterized. We have therefore used bioinformatics tools to characterize the structure and function of these proteins. We selected representative members of the PE and PPE protein family by phylogeny analysis and using structure-based sequence annotation identified ten well-characterized protein domains of known function. Some of these domains were observed to be common to all mycobacterial species and some were species specific. 相似文献

13.

A face in the crowd: recognizing peptides through database search

Eng JK Searle BC Clauser KR Tabb DL 《Molecular & cellular proteomics : MCP》2011,10(11):R111.009522

Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search. 相似文献

14.

Functional annotation of putative hypothetical proteins from Candida dubliniensis

Kundan Kumar Amresh PrakashMunazzah Tasleem Asimul IslamFaizan Ahmad Md. Imtaiyaz Hassan 《Gene》2014

An extensive analysis of C. dubliniensis proteomics data showed that ~ 22% protein are conserved hypothetical proteins (HPs) whose function is still not determined precisely. Analysis of gene sequence of HPs provides a platform to establish sequence–function relationships to a more profound understanding of the molecular machinery of organisms at systems level. Here we have combined the latest versions of bioinformatics tools including, protein family, motifs, intrinsic features from the amino acid sequence, sequence–function relationship, pathway analysis, etc. to assign a precise function to HPs for which no any experimental information is available. Our results show that 27 HPs have well defined functions and we categorized them as enzyme, nucleic acid binding, transport protein, etc. Five HPs showed adhesin character that is likely to be essential for the survival of yeast and pathogenesis. We also addressed issues related to the sub-cellular localization and signal peptide identification which provides an idea about its colocalization and function. The outcome of the present study may facilitate better understanding of mechanism of virulence, drug resistance, pathogenesis, adaptability to host, tolerance for host immune response, and drug discovery for treatment of C. dubliniensis infections. 相似文献

15.

Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. 总被引：2，自引：0，他引：2

下载免费PDF全文

S A Chervitz E T Hester C A Ball K Dolinski S S Dwight M A Harris G Juvik A Malekian S Roberts T Roe C Scafe M Schroeder G Sherlock S Weng Y Zhu J M Cherry D Botstein 《Nucleic acids research》1999,27(1):74-78

The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. The latest protein structure and comparison tools available at SGD are presented here. With the completion of the yeast sequence and the Caenorhabditis elegans sequence soon to follow, comparison of proteins from complete eukaryotic proteomes will be an extremely powerful way to learn more about a particular protein's structure, its function, and its relationships with other proteins. SGD can be accessed through the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/ 相似文献

16.

The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource

Gifford LK Carter LG Gabanyi MJ Berman HM Adams PD 《Journal of structural and functional genomics》2012,13(2):57-62

The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal. 相似文献

17.

GeneReporter--sequence-based document retrieval and annotation

Bartsch A Bunk B Haddad I Klein J Münch R Johl T Kärst U Jänsch L Jahn D Retter I 《Bioinformatics (Oxford, England)》2011,27(7):1034-1035

GeneReporter is a web tool that reports functional information and relevant literature on a protein-coding sequence of interest. Its purpose is to support both manual genome annotation and document retrieval. PubMed references corresponding to a sequence are detected by the extraction of query words from UniProt entries of homologous sequences. Data on protein families, domains, potential cofactors, structure, function, cellular localization, metabolic contribution and corresponding DNA binding sites complement the information on a given gene product of interest. Availability and implementation: GeneReporter is available at http://www.genereporter.tu-bs.de. The web site integrates databases and analysis tools as SOAP-based web services from the EBI (European Bioinformatics Institute) and NCBI (National Center for Biotechnology Information). 相似文献

18.

A practical guide for the computational selection of residues to be experimentally characterized in protein families

Benítez-Páez A Cárdenas-Brito S Gutiérrez AJ 《Briefings in bioinformatics》2012,13(3):329-336

In recent years, numerous biocomputational tools have been designed to extract functional and evolutionary information from multiple sequence alignments (MSAs) of proteins and genes. Most biologists working actively on the characterization of proteins from a single or family perspective use the MSA analysis to retrieve valuable information about amino acid conservation and the functional role of residues in query protein(s). In MSAs, adjustment of alignment parameters is a key point to improve the quality of MSA output. However, this issue is frequently underestimated and/or misunderstood by scientists and there is no in-depth knowledge available in this field. This brief review focuses on biocomputational approaches complementary to MSA to help distinguish functional residues in protein families. These additional analyses involve issues ranging from phylogenetic to statistical, which address the detection of amino acids pivotal for protein function at any level. In recent years, a large number of tools has been designed for this very purpose. Using some of these relevant, useful tools, we have designed a practical pipeline to perform in silico studies with a view to improving the characterization of family proteins and their functional residues. This review-guide aims to present biologists a set of specially designed tools to study proteins. These tools are user-friendly as they use web servers or easy-to-handle applications. Such criteria are essential for this review as most of the biologists (experimentalists) working in this field are unfamiliar with these biocomputational analysis approaches. 相似文献

19.

Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words

Lin HN Notredame C Chang JM Sung TY Hsu WL 《PloS one》2011,6(12):e27872

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/. 相似文献

20.

Comparing thousands of circular genomes using the CGView Comparison Tool 总被引：1，自引：0，他引：1

Grant JR Arantes AS Stothard P 《BMC genomics》2012,13(1):202

相似文献