首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. RESULTS: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5'-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects. AVAILABILITY: MASIA WEB site: http://www.scsb.utmb.edu/masia/masia.html SUPPLEMENTARY INFORMATION: The dendrogram of 42 APE sequences used to derive motifs is available on http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html  相似文献   

2.

Background  

Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos"), that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE) DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP) form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates.  相似文献   

3.
The accuracy of comparative models of proteins is addressed here. A set of 12732 single-template models of sequences of known high-resolution structures was built by an automated procedure. Accuracy of several structure-derived properties, such as surface area, residue accessibility, presence of pockets, electrostatic potential and others, was determined as a function of template:target sequence identity by comparing models with their corresponding experimental structures. As expected, the average accuracy of structure-derived properties always increases with higher template:target sequence identity, but the exact shape of this relationship can differ from one property to another. A comparison of structure-derived properties measured from NMR and X-ray structures of the same protein shows that for most properties, the NMR/X-ray difference is of the same order as the error in models based on ~40% template:target sequence identity. The exact sequence identity at which properties reach that accuracy varies between 25 and 50%, depending on the property being analyzed. A general characteristic of simple comparative models is that their surface has increased area as a consequence of being more rugged than that of experimental structures. This suggests that including solvent effects during model building or refinement could significantly improve the accuracy of surface properties in comparative models.  相似文献   

4.
5.
VISTAS is a suite of programs for protein sequence and structure analysis. The system allows the simultaneous display, in separate windows, of multiple sequence alignments, of known or model 3D structures, and of 2D graphic representations of sequence and/or alignment properties. The displays are fully integrated, and therefore manipulations in one window can be reflected in each of the others. Beyond its display facilities, VISTAS brings together a number of existing tools under a single, user-friendly umbrella: these include a fully functional interactive color alignment procedure, conserved motif selection, a range of database-scanning routines, and interactive access to the OWL composite sequence database and to the PRINTS protein fingerprint database. Exploration of the sequence database is thus straightforward, and predefined structural motifs from the fingerprint database may be readily visualized. Of particular note is the ability to calculate conservation criteria from sequence alignments and to display the information in a 3D context: this renders VISTAS a powerful tool for aiding mutagenesis studies and for facilitating refinement of molecular models.  相似文献   

6.
We report the properties of the new BseMII restriction and modification enzymes from Bacillus stearothermophilus Isl 15-111, which recognize the 5'-CTCAG sequence, and the nucleotide sequence of the genes encoding them. The restriction endonuclease R.BseMII makes a staggered cut at the tenth base pair downstream of the recognition sequence on the upper strand, producing a two base 3'-protruding end. Magnesium ions and S:-adenosyl-L-methionine (AdoMet) are required for cleavage. S:-adenosylhomocysteine and sinefungin can replace AdoMet in the cleavage reaction. The BseMII methyltransferase modifies unique adenine residues in both strands of the target sequence 5'-CTCAG-3'/5'-CTGAG-3'. Monomeric R.BseMII in addition to endonucleolytic activity also possesses methyltransferase activity that modifies the A base only within the 5'-CTCAG strand of the target duplex. The deduced amino acid sequence of the restriction endonuclease contains conserved motifs of DNA N6-adenine methylases involved in S-adenosyl-L-methionine binding and catalysis. According to its structure and enzymatic properties, R.BseMII may be regarded as a representative of the type IV restriction endonucleases.  相似文献   

7.
SUMMARY: The main source of hypotheses on the structure and function of new proteins is their homology to proteins with known properties. Homologous relationships are typically established through sequence similarity searches, multiple alignments and phylogenetic reconstruction. In cases where the number of potential relationships is large, for example in P-loop NTPases with many thousands of members, alignments and phylogenies become computationally demanding, accumulate errors and lose resolution. In search of a better way to analyze relationships in large sequence datasets we have developed a Java application, CLANS (CLuster ANalysis of Sequences), which uses a version of the Fruchterman-Reingold graph layout algorithm to visualize pairwise sequence similarities in either two-dimensional or three-dimensional space. AVAILABILITY: CLANS can be downloaded at http://protevo.eb.tuebingen.mpg.de/download.  相似文献   

8.
The pseudo oligonucleotide composition, or pseudo K-tuple nucleotide composition (PseKNC), can be used to represent a DNA or RNA sequence with a discrete model or vector yet still keep considerable sequence order information, particularly the global or long-range sequence order information, via the physicochemical properties of its constituent oligonucleotides. Therefore, the PseKNC approach may hold very high potential for enhancing the power in dealing with many problems in computational genomics and genome sequence analysis. However, dealing with different DNA or RNA problems may need different kinds of PseKNC. Here, we present a flexible and user-friendly web server for PseKNC (at http://lin.uestc.edu.cn/pseknc/default.aspx) by which users can easily generate many different modes of PseKNC according to their need by selecting various parameters and physicochemical properties. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to generate their desired PseKNC without the need to follow the complicated mathematical equations, which are presented in this article just for the integrity of PseKNC formulation and its development. It is anticipated that the PseKNC web server will become a very useful tool in computational genomics and genome sequence analysis.  相似文献   

9.
Vlahovicek K  Munteanu MG  Pongor S 《Genetica》1999,106(1-2):63-73
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http://www.icgeb.trieste.it/dna).This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

10.

Background

For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.

Results

We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/.

Conclusions

USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.  相似文献   

11.
The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence‐derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non‐commercial users through the webserver at http://biomine.cs.vcu.edu/servers/DISOselect/ .  相似文献   

12.
GARD: a genetic algorithm for recombination detection   总被引:6,自引:0,他引:6  
MOTIVATION: Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments. RESULTS: We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection. AVAILABILITY: Freely available http://www.datamonkey.org/GARD/  相似文献   

13.
14.
The software AGE (Analysis of Gene Evolution) has been developedboth to study a genetic reality, i. e. the identification ofstatistical properties in genes (e.g. periodicities), and tosimulate this observed genetic reality, by models of molecularevolution. AGE has two types of models: (i) models of sequencecreation from oligonucleotides: concatenation model in seriesof an oligonucleotide, independent (or Markov) mixing modelof oligonucleotides according to given probabilities (or a Markovmatrix); (ii) models of sequence evolution from created sequences:insertion/deletion process of (mono, di, tri)nucleot-ides, basemutation process. The study of a reality and the developmentof simulation models are based on several new algorithms: approximatedsimulation and exact calculus to compute various autocorrelationfunctions, Fourier transformation of autocorrelation curves,recognition of a curve form, etc. AGE is implemented on IBMor compatible microcomputers and can be used by biologists withoutany computer knowledge to identify statistical properties intheir newly determined DNA sequence and to explain them by modelsof molecular evolution.  相似文献   

15.
MOTIVATION: For large-scale structural assignment to sequences, as in computational structural genomics, a fast yet sensitive sequence search procedure is essential. A new approach using intermediate sequences was tested as a shortcut to iterative multiple sequence search methods such as PSI-BLAST. RESULTS: A library containing potential intermediate sequences for proteins of known structure (PDB-ISL) was constructed. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. Searches of PDB-ISL were calibrated, and the number of correct matches found at a given error rate was the same as that found by PSI-BLAST. The advantage of this library is that it uses pairwise sequence comparison methods, such as FASTA or BLAST2, and can, therefore, be searched easily and, in many cases, much more quickly than an iterative multiple sequence comparison method. The procedure is roughly 20 times faster than PSI-BLAST for small genomes and several hundred times for large genomes. AVAILABILITY: Sequences can be submitted to the PDB-ISL servers at http://stash.mrc-lmb.cam.ac.uk/PDB_ISL/ or http://cyrah.ebi.ac.uk:1111/Serv/PDB_ISL/ and can be downloaded from ftp://ftp.ebi.ac.uk/pub/contrib/jong/PDB_+ ++ISL/ CONTACT: sat@mrc-lmb.cam.ac.uk and jong@ebi.ac.uk  相似文献   

16.
Most proteins contain compositionally biased segments (CBS) in which one or more amino acid types are significantly overrepresented. CBS that contain amino acids with similar chemical properties can have functional and structural importance. This article describes ProBias, a web-server that searches a protein sequence for CBS composed of user-specified amino acid types. ProBias utilizes the discrete scan statistics to estimate statistical significance of CBS and is able to detect even subtle local deviations from the random independence model. The web-server also analyzes the global compositional bias of the input sequence. In the case of novel proteins that lack functional annotation, statistically significant CBS reported by ProBias can be used to guide the search for potential functionally important sites or domains. AVAILABILITY: Freely available at http://lcg.rit.albany.edu/ProBias. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

17.
Towards a computational model for -1 eukaryotic frameshifting sites   总被引:3,自引:0,他引:3  
MOTIVATION: Unconventional decoding events are now well acknowledged, but not yet well formalized. In this study, we present a bioinformatics analysis of eukaryotic -1 frameshifting, in order to model this event. RESULTS: A consensus model has already been established for -1 frameshifting sites. Our purpose here is to provide new constraints which make the model more precise. We show how a machine learning approach can be used to refine the current model. We identify new properties that may be involved in frameshifting. Each of the properties found was experimentally validated. Initially, we identify features of the overall model that are to be simultaneously satisfied. We then focus on the following two components: the spacer and the slippery sequence. As a main result, we point out that the identity of the primary structure of the so-called spacer is of great importance. AVAILABILITY: Sequences of the oligonucleotides in the functional tests are available at http://www.igmors.u-psud.fr/rousset/bioinformatics/.  相似文献   

18.
19.
MOTIVATION: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. RESULTS: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally. AVAILABILITY: The predictor, sequence data and supplementary studies are available at http://pprowler.itee.uq.edu.au/sspred/ and are free for academic use.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号