首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
ProteoMix is a suite of JAVA programs for identifying, annotating and predicting regions of interest in large sets of amino acid sequences, according to systematic and consistent criteria. It is based on two concepts (1) the integration of results from different sequence analysis tools increases the prediction reliability; and (2) the integration protocol is critical and needs to be easily adaptable in a case-by-case manner. ProteoMix was designed to analyze simultaneously multiple protein sequences using several bioinformatics tools, merge the results of the analyses using logical functions and display them on an integrated viewer. In addition, new sequences can be added seamlessly to an analysis performed on an initial set of sequences. ProteoMix has a modular design, and bioinformatics tools are run on remote servers accessed using the Internet Simple Object Access Protocol (SOAP), ensuring the swift implementation of additional tools. ProteoMix has a user-friendly interactive graphical user interface environment and runs on PCs with Microsoft OS. AVAILABILITY: ProteoMix is freely available for academic users at http://bio.gsc.riken.jp/ProteoMix/  相似文献   

2.

Background  

The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families.  相似文献   

3.
To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful.  相似文献   

4.
  相似文献   

5.
We propose a specification language ProML for protein sequences, structures, and families based on the open XML standard. The language allows for portable, system-independent, machine-parsable and human-readable representation of essential features of proteins. The language is of immediate use for several bioinformatics applications: we discuss clustering of proteins into families and the representation of the specific shared features of the respective clusters. Moreover, we use ProML for specification of data used in fold recognition bench-marks exploiting experimentally derived distance constraints.  相似文献   

6.
ANTHEPROT is a fully interactive program devoted to the analysis of protein structures using a graphics workstation. It presents four options: The first option can predict secondary structures using five methods, and hydrophobicity, solvent accessibility, flexibility and antigenicity profiles using eighteen scales. The user may introduce his own scales. The results displayed on the screen can be easily analyzed. The second option is for representing results concerning up to eight proteins by one method. To compare these proteins, it is possible to align the profiles or the predicted secondary structure according to various motifs. The secondary structure deduced from crystallographic data may also be introduced. The third option is designed to compare the primary structure of two proteins and to visualize on the screen regions that exhibit similarity. Six different comparison matrices may be used, but the user can also introduce his own matrices. The last option is for studying the proteolytic peptides resulting from a chemical or enzymatic digestion of a given protein. It is possible to analyze the protein cleavage using eleven chemical reagents or enzymes. The results are displayed on the screen as RP-HPLC chromatogram.  相似文献   

7.
Software tools for analyzing pairwise alignments of long sequences.   总被引:3,自引:1,他引:3       下载免费PDF全文
Pairwise comparison of long stretches of genomic DNA sequence can identify regions conserved across species, which often indicate functional significance. However, the novel insights frequently must be windowed from a flood of information; for instance, running an alignment program on two 50-kilobase sequences might yield over a hundred pages of alignments. Direct inspection of such a volume of printed output is infeasible, or at best highly undesirable, and computer tools are needed to summarize the information, to assist in its analysis, and to report the findings. This paper describes two such software tools. One tool prepares publication-quality pictorial representations of alignments, while another facilitates interactive browsing of pairwise alignment data. Their effectiveness is illustrated by comparing the beta-like globin gene clusters between humans and rabbits. A second example compares the chloroplast genomes of tobacco and liverwort.  相似文献   

8.
Tan YH  Huang H  Kihara D 《Proteins》2006,64(3):587-600
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.  相似文献   

9.
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences. The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together. The classification is done at different levels of confidence, and yields a hierarchical organization of all proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il  相似文献   

10.
A test of the statistical significance of dot constellationsas detected by window search in large dot matrices is described.The procedure takes the correlation between overlapping windowson the diagonals of dot matrices into account. It is based ona confidence limit of the exact distribution of dot scores. Received on September 2, 1986; accepted on December 12, 1986  相似文献   

11.
12.
13.
Determination of window size for analyzing DNA sequences   总被引:4,自引:0,他引:4  
Summary DNA sequences are generally not random sequences. To show such nonrandomness visually, DNA sequence data are often plotted as moving averages for a certain length of window slid along a sequence. Here a simple algorithm is presented for determining the window size and for finding a nonrandom region of sequence.  相似文献   

14.
Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

15.
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.  相似文献   

16.
For detection of the latent periodicity of the protein families responsible for various biological functions, methods of information decomposition, cyclic profile alignment, and the method of noise decomposition have been used. The latent periodicity, being specific to a particular family, is recognized in 94 of 110 analyzed protein families. Family specific periodicity was found for more than 70% of amino acid sequences in each of these families. Based on such sequences the characteristic profile of the latent periodicity has been deduced for each family. Possible relationship between the recognized latent periodicity, evolution of proteins, and their structural organization is discussed.  相似文献   

17.
18.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

19.
SUMMARY: Burst After Duplication with Ancestral Sequence Predictions (BADASP) is a software package for identifying sites that may confer subfamily-specific biological functions in protein families following functional divergence of duplicated proteins. A given protein phylogeny is grouped into subfamilies based on orthology/paralogy relationships and/or user definitions. Ancestral sequences are then predicted from the sequence alignment and the functional specificity is calculated using variants of the Burst After Duplication method, which tests for radical amino acid substitutions following gene duplications that are subsequently conserved. Statistics are output along with subfamily groupings and ancestral sequences for an easy analysis with other packages. AVAILABILITY: BADASP is freely available from http://www.bioinformatics.rcsi.ie/~redwards/badasp/  相似文献   

20.
MOTIVATION: Clustering of protein sequences is widely used for the functional characterization of proteins. However, it is still not easy to cluster distantly-related proteins, which have only regional similarity among their sequences. It is therefore necessary to develop an algorithm for clustering such distantly-related proteins. RESULTS: We have developed a time and space efficient clustering algorithm. It uses a graph representation where its vertices and edges denote proteins and their sequence similarities above a certain cutoff score, respectively. It repeatedly partitions the graph by removing edges that have small weights, which correspond to low sequence similarities. To find the appropriate partitions, we introduce a score combining the normalized cut and a locally minimal cut capacities. Our method is applied to the entire 40,703 human proteins in SWISS-PROT and TrEMBL. The resulting clusters shows a 76% recall (20,529 proteins) of the 26,917 classified by InterPro. It also finds relationships not found by other clustering methods. AVAILABILITY: The complete result of our algorithm for all the human proteins in SWISS-PROT and TrEMBL, and other supplementary information are available at http://motif.ics.es.osaka-u.ac.jp/Ncut-KL/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号