首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we present RuleMiner, a knowledge system to facilitate a seamless integration of multi-sequence analysis tools and define profile-based rules for supporting high-throughput protein function annotations. This system consists of three essential components, Protein Function Groups (PFGs), PFG profiles and rules. The PFGs, established from an integrated analysis of current knowledge of protein functions from Swiss-Prot database and protein family-based sequence classifications, cover all possible cellular functions available in the database. The PFG profiles illustrate detailed protein features in the PFGs as in sequence conservations, the occurrences of sequence-based motifs, domains and species distributions. The rules, extracted from the PFG profiles, describe the clear relationships between these PFGs and all possible features. As a result, the RuleMiner is able to provide an enhanced capability for protein function analysis, such as results from the integrated sequence analysis tools for given proteins can be comparatively analyzed due to the clear feature-PFG relationships. Also, much needed guidance is readily available for such analysis. If the rules describe one-to-one (unique) relationships between the protein features and the PFGs, then these features can be utilized as unique functional identifiers and cellular functions of unknown proteins can be reliably determined. Otherwise, additional information has to be provided.  相似文献   

2.

Background  

Studies of the structure-function relationship in proteins for which no 3D structure is available are often based on inspection of multiple sequence alignments. Many functionally important residues of proteins can be identified because they are conserved during evolution. However, residues that vary can also be critically important if their variation is responsible for diversity of protein function and improved phenotypes. If too few sequences are studied, the support for hypotheses on the role of a given residue will be weak, but analysis of large multiple alignments is too complex for simple inspection. When a large body of sequence and functional data are available for a protein family, mature data mining tools, such as machine learning, can be applied to extract information more easily, sensitively and reliably. We have undertaken such an analysis of voltage-gated potassium channels, a transmembrane protein family whose members play indispensable roles in electrically excitable cells.  相似文献   

3.
Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis.  相似文献   

4.
Surface proteins, such as those located in the cell wall of fungi, play an important role in the interaction with the surrounding environment. For instance, they mediate primary host-pathogen interactions and are crucial to the establishment of biofilms and fungal infections. Surface localization of proteins is determined by specific sequence features and can be predicted by combining different freely available web servers. However, user-friendly tools that allow rapid analysis of large datasets (whole proteomes or larger) in subsequent analyses were not yet available. Here, we present the web tool ProFASTA, which integrates multiple tools for rapid scanning of protein sequence properties in large datasets and returns sequences in FASTA format. ProFASTA also allows for pipeline filtering of proteins with cell surface characteristics by analysis of the output created with SignalP, TMHMM and big-PI. In addition, it provides keyword, iso-electric point, composition and pattern scanning. Furthermore, ProFASTA contains all fungal protein sequences present in the NCBI Protein database. As the full fungal NCBI Taxonomy is included, sequence subsets can be selected by supplying a taxon name. The usefulness of ProFASTA is demonstrated here with a few examples; in the recent past, ProFASTA has already been applied successfully to the annotation of covalently-bound fungal wall proteins as part of community-wide genome annotation programs. ProFASTA is available at: http://www.bioinformatics.nl/tools/profasta/.  相似文献   

5.
P Y Muller  E Studer  A R Miserez 《BioTechniques》2001,31(6):1306, 1308, 1310-1306, 1308, 1313
In all fields of molecular biology, researchers are increasingly challenged by experiments planned and evaluated on the basis of nucleic acid and protein sequence data generally retrieved from public databases. Despite the wide spectrum of available Web-based software tools for sequence analysis, the routine use of these tools has disadvantages, particularly because of the elaborate and heterogeneous ways of data input, output, and storage. Here we present a Visual Basic-encoded Microsoft Word Add-In, the Molecular BioComputing Suite (MBCS), available at the BioTechniques Software Library (www.BioTechniques.com). The MBCS software aims to manage and expedite a wide range of sequence analyses and manipulations using an integrated text editor environment including menu-guided commands. Its independence of sequence formats enables MBCS to be used as a pivotal application between other software tools for sequence analysis, manipulation, annotation, and editing.  相似文献   

6.
With the large influx of raw sequence data from genome sequencing projects, there is a need for reliable automatic methods for protein sequence analysis and classification. The most useful tools use various methods for identifying motifs or domains found in previously characterized protein families. This article reviews the tools and resources available on the web for identifying signatures within proteins and discusses how they may be used in the analysis of new or unknown protein sequences.  相似文献   

7.
MMDB: Entrez's 3D structure database.   总被引:5,自引:1,他引:4       下载免费PDF全文
The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The goal of Entrez's 3D-structure database is to make this information accessible and usable by molecular biologists (http://www.ncbi.nlm.nih.gov/Entrez). To this end Entrez provides two major analysis tools, a search engine based on sequence and structure 'neighboring' and an integrated visualization system for sequence and structure alignments. From a protein's sequence 'neighbors' one may rapidly identify other members of a protein family, including those where 3D structure is known. By comparing aligned sequences and/or structures in detail, using the visualization system, one may identify conserved features and perhaps infer functional properties. Here we describe how these analysis tools may be used to investigate the structure and function of newly discovered proteins, using the PTEN gene product as an example.  相似文献   

8.
INCLUSive is a suite of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval, and detection of known and unknown regulatory elements using probabilistic sequence models and Gibbs sampling. All tools are available via different web pages and as web services. The web pages are connected and integrated to reflect a methodology and facilitate complex analysis using different tools. The web services can be invoked using standard SOAP messaging. Example clients are available for download to invoke the services from a remote computer or to be integrated with other applications. All services are catalogued and described in a web service registry. The INCLUSive web portal is available for academic purposes at http://www.esat.kuleuven.ac.be/inclusive.  相似文献   

9.
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.  相似文献   

10.
微生物基因组注释系统MGAP   总被引:6,自引:0,他引:6  
利用生物信息学方法和工具开发了微生物基因组注释系统(Microbial genome annotation package, MGAP),并用于蓝细菌PCC7002的基因组注释。该系统由基因组注释系统和基于Web的用户接口程序两部分组成。基因组注释系统整合多个基因识别、功能预测和序列分析软件;以及蛋白质序列数据库、蛋白质资源信息系统和直系同源蛋白质家族数据库等。用户接口程序包括基因组环状图展示、基因和开放读码框在染色体上的分布图,以及注释信息检索工具。该系统基于PC微机和Linux操作系统,用MySQL作数据库管理系统、用Apache作Web服务器程序,用Perl脚本语言编写应用程序接口,上述软件均可免费获得。  相似文献   

11.
《Genomics》2020,112(6):4561-4566
BackgroundBioinformatics tools are of great significance and are used in different spheres of life sciences. There are wide variety of tools available to perform primary analysis of DNA and protein but most of them are available on different platforms and many remain undetected. Accessing these tools separately to perform individual task is uneconomical and inefficient.ObjectiveOur aim is to bring different bioinformatics models on a single platform to ameliorate scientific research. Hence, our objective is to make a tool for comprehensive DNA and protein analysis.MethodsTo develop a reliable, straight-forward and standalone desktop application we used state of the art python packages and libraries. Bioinformatics Mini Toolbox (BMT) is combination of seven tools including FastqTrimmer, Gene Prediction, DNA Analysis, Translation, Protein analysis and Pairwise and Multiple alignment.ResultsFastqTrimmer assists in quality assurance of NGS data. Gene prediction predicts the genes by homology from novel genome on the basis of reference sequence. Protein analysis and DNA analysis calculates physiochemical properties of nucleotide and protein sequences, respectively. Translation translates the DNA sequence into six open reading frames. Pairwise alignment performs pairwise global and local alignment of DNA and protein sequences on the basis or multiple matrices. Multiple alignment aligns multiple sequences and generates a phylogenetic tree.ConclusionWe developed a tool for comprehensive DNA and protein analysis. The link to download BMT is https://github.com/nasiriqbal012/BMT_SETUP.git  相似文献   

12.
The PE and PPE protein family are unique to mycobacteria. Though the complete genome sequences for over 500 M. tuberculosis strains and mycobacterial species are available, few PE and PPE proteins have been structurally and functionally characterized. We have therefore used bioinformatics tools to characterize the structure and function of these proteins. We selected representative members of the PE and PPE protein family by phylogeny analysis and using structure-based sequence annotation identified ten well-characterized protein domains of known function. Some of these domains were observed to be common to all mycobacterial species and some were species specific.  相似文献   

13.
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.  相似文献   

14.
An extensive analysis of C. dubliniensis proteomics data showed that ~ 22% protein are conserved hypothetical proteins (HPs) whose function is still not determined precisely. Analysis of gene sequence of HPs provides a platform to establish sequence–function relationships to a more profound understanding of the molecular machinery of organisms at systems level. Here we have combined the latest versions of bioinformatics tools including, protein family, motifs, intrinsic features from the amino acid sequence, sequence–function relationship, pathway analysis, etc. to assign a precise function to HPs for which no any experimental information is available. Our results show that 27 HPs have well defined functions and we categorized them as enzyme, nucleic acid binding, transport protein, etc. Five HPs showed adhesin character that is likely to be essential for the survival of yeast and pathogenesis. We also addressed issues related to the sub-cellular localization and signal peptide identification which provides an idea about its colocalization and function. The outcome of the present study may facilitate better understanding of mechanism of virulence, drug resistance, pathogenesis, adaptability to host, tolerance for host immune response, and drug discovery for treatment of C. dubliniensis infections.  相似文献   

15.
The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. The latest protein structure and comparison tools available at SGD are presented here. With the completion of the yeast sequence and the Caenorhabditis elegans sequence soon to follow, comparison of proteins from complete eukaryotic proteomes will be an extremely powerful way to learn more about a particular protein's structure, its function, and its relationships with other proteins. SGD can be accessed through the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/  相似文献   

16.
The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/ ) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB ( http://sbkb.org ), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a functional sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.  相似文献   

17.
GeneReporter is a web tool that reports functional information and relevant literature on a protein-coding sequence of interest. Its purpose is to support both manual genome annotation and document retrieval. PubMed references corresponding to a sequence are detected by the extraction of query words from UniProt entries of homologous sequences. Data on protein families, domains, potential cofactors, structure, function, cellular localization, metabolic contribution and corresponding DNA binding sites complement the information on a given gene product of interest. Availability and implementation: GeneReporter is available at http://www.genereporter.tu-bs.de. The web site integrates databases and analysis tools as SOAP-based web services from the EBI (European Bioinformatics Institute) and NCBI (National Center for Biotechnology Information).  相似文献   

18.
In recent years, numerous biocomputational tools have been designed to extract functional and evolutionary information from multiple sequence alignments (MSAs) of proteins and genes. Most biologists working actively on the characterization of proteins from a single or family perspective use the MSA analysis to retrieve valuable information about amino acid conservation and the functional role of residues in query protein(s). In MSAs, adjustment of alignment parameters is a key point to improve the quality of MSA output. However, this issue is frequently underestimated and/or misunderstood by scientists and there is no in-depth knowledge available in this field. This brief review focuses on biocomputational approaches complementary to MSA to help distinguish functional residues in protein families. These additional analyses involve issues ranging from phylogenetic to statistical, which address the detection of amino acids pivotal for protein function at any level. In recent years, a large number of tools has been designed for this very purpose. Using some of these relevant, useful tools, we have designed a practical pipeline to perform in silico studies with a view to improving the characterization of family proteins and their functional residues. This review-guide aims to present biologists a set of specially designed tools to study proteins. These tools are user-friendly as they use web servers or easy-to-handle applications. Such criteria are essential for this review as most of the biologists (experimentalists) working in this field are unfamiliar with these biocomputational analysis approaches.  相似文献   

19.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号