首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Four hairpin polyamides bearing subtle N- and C-terminal substitutions were examined in a fluorescent intercalator displacement (FID) assay enlisting a library of 512 DNA hairpins that contain all possible five base pair sequences in a challenging probe of its capabilities for establishing DNA binding sequence selectivity. Not only did the assay define the global sequence selectivity expected based on known structural interactions and Dervan's pairing rules establishing the utility of the method for characterizing such polyamides, but previously unappreciated subtle substituent effects on global sequence selectivity were also revealed. Thus, we report the discovery of a novel five base pair high affinity binding site of the form 5'-WWCWW (vs 5'-WGWWW) for the polyamide ImPyPy-gamma-PyPyPy-beta-Dp and its structural basis.  相似文献   

2.
N-Linked glycosylation is a post-translational event whereby carbohydrates are added to secreted proteins at the consensus sequence Asn-Xaa-Ser/Thr, where Xaa is any amino acid except proline. Some consensus sequences in secreted proteins are not glycosylated, indicating that consensus sequences are necessary but not sufficient for glycosylation. In order to understand the structural rules for N-linked glycosylation, we introduced N-linked consensus sequences by site-directed mutagenesis into the polypeptide chain of the recombinant human erythropoietin molecule. Some regions of the polypeptide chain supported N-linked glycosylation more effectively than others. N-Linked glycosylation was inhibited by an adjacent proline suggesting that sequence context of a consensus sequence could affect glycosylation. One N-linked consensus sequence (Asn123-Thr125) introduced into a position close to the existing O-glycosylation site (Ser126) had an additional O-linked carbohydrate chain and not an additional N-linked carbohydrate chain suggesting that structural requirements in this region favored O-glycosylation over N-glycosylation. The presence of a consensus sequence on the protein surface of the folded molecule did not appear to be a prerequisite for oligosaccharide addition. However, it was noted that recombinant human erythropoietin analogs that were hyperglycosylated at sites that were normally buried had altered protein structures. This suggests that carbohydrate addition precedes polypeptide folding.  相似文献   

3.
Bioinformatic software has used various numerical encoding schemes to describe amino acid sequences. Orthogonal encoding, employing 20 numbers to describe the amino acid type of one protein residue, is often used with artificial neural network (ANN) models. However, this can increase the model complexity, thus leading to difficulty in implementation and poor performance. Here, we use ANNs to derive encoding schemes for the amino acid types from protein three-dimensional structure alignments. Each of the 20 amino acid types is characterized with a few real numbers. Our schemes are tested on the simulation of amino acid substitution matrices. These simplified schemes outperform the orthogonal encoding on small data sets. Using one of these encoding schemes, we generate a colouring scheme for the amino acids in which comparable amino acids are in similar colours. We expect it to be useful for visual inspection and manual editing of protein multiple sequence alignments.  相似文献   

4.
An algorithm to simulate DNA sequence evolution under a generalstochastic model, including as particular cases all the previouslyused schemes of nucleotide substitution, is described. The simulationis carried out on finite, variable length, DNA sequences througha strict stochastic process, according to the particular substitutionrates imposed by each scheme. Five FORTRAN programs, runningon an IBM PC and compatibles, carry out all the tasks neededfor the simulation. They are menu driven and interfaced to thesystem through a principal menu. All sequence data files usedand generated by the SDSE package conform to the standard GenBankdatabase format, thus allowing the use of any sequence retrievedfrom this databank, as well as the application of other packagesto analyse, manipulate or retrieve simulated sequences. Received on August 23, 1988; accepted on November 15, 1988  相似文献   

5.
Sakib MN  Tang J  Zheng WJ  Huang CT 《PloS one》2011,6(12):e28251
Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly.  相似文献   

6.
RNA sequence analysis using covariance models.   总被引:43,自引:8,他引:35       下载免费PDF全文
We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.  相似文献   

7.
Raval S  Gowda SB  Singh DD  Chandra NR 《Glycobiology》2004,14(12):1247-1263
Lectins are known to be important for many biological processes, due to their ability to recognize cell surface carbohydrates with high specificity. Plant lectins have been model systems to study protein-carbohydrate recognition, because individually they exhibit high sensitivity and as a group large diversity in recognizing carbohydrate structures. Although extensive studies have been carried out for legume lectins that have led to interesting insights into the sequence determinants of sugar recognition in them, frameworks with such specific correlations are not available for other plant lectin families. This study reports a large-scale data acquisition and extensive analysis of sequences and structures of beta-prism-I or jacalin-related lectins (JRLs) and shows that hypervariability in the binding site loops generates carbohydrate recognition diversity, a strategy analogous to that in legume lectins. Analyses of the size, conformation, and sequence variability in key regions reveal the existence of a common theme, encoded as a set of structural features over a common scaffold, in defining specificity. This study also points to the remarkable range of domain architectures, often arising out of gene duplication events in lectins of this family. The data analyzed here also indicate a spectacular variety of quaternary associations possible in this family of lectins that have implications for glycan recognition. These results thus provide sequence-structure-function correlations, an understanding of the molecular basis of carbohydrate recognition by beta-prism-I lectins, and also a rationale for engineering specific recognition capabilities in relevant molecules.  相似文献   

8.
Competition dialysis is a powerful new tool for the discovery of ligands that bind to nucleic acids with structural- or sequence-selectivity. The method is based on firm thermodynamic principles and is simple to implement. In the competition dialysis experiment, an array of nucleic acid structures and sequences is dialyzed against a common test ligand solution. After equilibration, the amount of ligand bound to each structure or sequence is determined by absorbance or fluorescence measurements. Since all structures and sequences are in equilibrium with the same free ligand concentration, the amount bound is directly proportional to the ligand binding affinity. Competition dialysis thus provides a direct and quantitative measure of selectivity, and unambiguously identifies which of the samples within the array are preferred by a particular ligand. We describe here the third generation implementation of the method, in which competition dialysis was adapted for use in a 96-well plate format. In this format, we have been able to greatly expand the array of nucleic acid structures studied, and now can routinely study the interactions of a ligand of interest with 46 different structures and sequences.  相似文献   

9.
SugarBindDB lists pathogen and biotoxin lectins and their carbohydrate ligands in a searchable format. Information is collected from articles published in peer‐reviewed scientific journals. Help files guide the user through the search process and provide a review of structures and names of sugars that appear in human oligosaccharides. Glycans are written in the condensed form of the carbohydrate nomenclature system developed by the International Union of Pure and Applied Chemistry (IUPAC). Since its online publication by The MITRE Corporation in 2005, the database has served as a resource for research on the glycobiology of infectious disease. SugarBindDB is currently hosted by the Swiss Institute of Bioinformatics on the ExPASy server and will be enhanced and linked to related resources as part of the wider UniCarbKB initiative. Enhancements will include the option to display glycans in a variety of formats, including modified 2D condensed IUPAC and symbolic nomenclature. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
MOTIVATION: The genome of Arabidopsis thaliana, which has the best understood plant genome, still has approximately one-third of its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile and expressions data. RESULTS: We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These predictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced.  相似文献   

11.
Functional classification of the microbial feruloyl esterases   总被引:9,自引:3,他引:6  
Feruloyl esterases have potential uses over a broad range of applications in the agri-food industries. In recent years, the number of microbial feruloyl esterase activities reported has increased and, in parallel, even more related protein sequences may be discerned in the growing genome databases. Based on substrate utilisation data and supported by primary sequence identity, four sub-classes have been characterised and termed type-A, B, C and D. The proposed sub-classification scheme is discussed in terms of the evolutionary relationships existing between carbohydrate esterases.  相似文献   

12.
TargetDB: a target registration database for structural genomics projects   总被引:2,自引:0,他引:2  
TargetDB is a centralized target registration database that includes protein target data from the NIH structural genomics centers and a number of international sites. TargetDB, which is hosted by the Protein Data Bank (RCSB PDB), provides status information on target sequences and tracks their progress through the various stages of protein production and structure determination. A simple search form permits queries based on contributing site, target ID, protein name, sequence, status and other data. The progress of individual targets or entire structural genomics projects may be tracked over time, and target data from all contributing centers may also be downloaded in the XML format. AVAILABILITY: TargetDB is available at http://targetdb.pdb.org/  相似文献   

13.
Protein sequences of the Dayhoff databank of 1984 have been analyzed to evaluate the occurrences of the 400 dipeptides and 8000 tripeptides. Expected values and standard deviations for the di- and tripeptides were determined by Monte Carlo and binomial approximation. A condensed format containing this information, labeled a uniqueness diagram, is presented and made available in the form of a microfiche.  相似文献   

14.
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.  相似文献   

15.
S Waner  Y H Wu 《Bio Systems》1988,21(2):115-124
We propose an automata-theoretical framework for structured hierarchical control, in terms of rules and meta-rules, for sequences of moves on a graph. This leads to a notion of a "universal" hierarchically structured automaton mu which can move on a given graph in such a way as to emulate any automaton which moves on that graph in response to inputs. This emulation is achieved via a mapping of the inputs in the given automaton to those of mu, and we think of such a mapping as an encoding of the given automaton. We see in several examples that efficient encodings of graph-search algorithms correspond to their natural hierarchical structure (in terms of rules and meta-rules), and this leads one to a precise notion of the "depth" of an automaton which moves on a given graph. By way of application, we discuss a proposed structure of a series of stochastic neural networks which can learn, by example, to encode a given sequence of moves on a graph, so that the encoding obtained is structurally the "natural" one for the given sequence of moves. Thus, such a learning system would perform both structural pattern recognition (in terms of "patterns" of moves), and encoding based on a desired outcome.  相似文献   

16.
The classification of amino acid conservation   总被引:30,自引:0,他引:30  
A classification of amino acid type is described which is based on a synthesis of physico-chemical and mutation data. This is organised in the form of a Venn diagram from which sub-sets are derived that include groups of amino acids likely to be conserved for similar structural reasons. These sets are used to describe conservation in aligned sequences by allocating to each position the smallest set that contains all the residue types brought together by alignment. This minimal set assignment provides a simple way of reducing the information contained in a sequence alignment to a form which can be analysed by computer yet remains readable.  相似文献   

17.
Directed graphs of DNA sequences and their numerical characterization   总被引:1,自引:0,他引:1  
In this paper we (1) introduce a directed graphical representation of DNA primary sequences; (2) describe a scheme that transforms the directed graph of a DNA sequence into an upper triangular matrix; (3) investigate whether or not the existing matrix-based invariants of DNA sequences are compatible for the upper triangular matrix representation. The utility of our method is illustrated by an examination of the similarity between human and other seven species.  相似文献   

18.
MOTIVATION: Target selection strategies for structural genomic projects must be able to prioritize gene regions on the basis of significant sequence similarity with proteins that have already been structurally determined. With the rapid development of protein comparison software a robust prioritization scheme should be independent of the choice of algorithm and be able to incorporate different sequence similarity thresholds. RESULTS: A robust target selection strategy has been developed that can assign a priority level to all genes in any genome. Structural assignments to genome sequences are calculated at two thresholds and six levels (1-6) describe the prioritization of all whole genes and partial gene regions. This simple two-threshold approach can be implemented with any fold recognition or homology detection algorithms. The results for 10 genomes are presented using the SSEARCH and PSI-BLAST programs. AVAILABILITY: Programs are available on request from the authors.  相似文献   

19.
R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).  相似文献   

20.
SPLICE, a software tool for the extraction of sequences fromfiles in GenBank tape format, has been developed. The programcan analyze the features table in this format and use any ofthe information provided to write the corresponding sequencesinto a standard sequence file format suitable for use with sequenceanalysis programs. Sequences that are present as several subsequentfragments in a single GenBank file, such as those encoding apeptide, can be spliced together by the program. Further, sequencesthat are present in more than one Genbank file, such as an exonwhich spans several different files, can also be spliced intoone sequence. SPLICE runs under the MS/DOS and Unix operatingsystems, can be called as a sub-process by other programs andcan process batches of files. Received on December 26, 1989; accepted on May 30, 1990  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号