共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which determine specificity of protein-DNA and protein-ligand recognition. Finding such residues is crucial for understanding mechanisms of molecular recognition and for rational protein and drug design. 相似文献2.
3.
Ye K Feenstra KA Heringa J Ijzerman AP Marchiori E 《Bioinformatics (Oxford, England)》2008,24(1):18-25
MOTIVATION: Identification of residues that account for protein function specificity is crucial, not only for understanding the nature of functional specificity, but also for protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. Available algorithms generally use multiple sequence alignments to identify residue positions conserved within subfamilies but divergent in between. However, many biological examples show a much subtler picture than simple intra-group conservation versus inter-group divergence. RESULTS: We present multi-RELIEF, a novel approach for identifying specificity residues that is based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected 'local' functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we show that (a) multi-RELIEF identifies specificity residues for the seven test sets used, (b) incorporating structural information improves prediction for specificity of interaction with small molecules and (c) comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. AVAILABILITY: A web-server implementation of multi-RELIEF is available at www.ibi.vu.nl/programs/multirelief. Matlab source code of the algorithm and data sets are available on request for academic use. 相似文献
4.
Background
Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. 相似文献5.
The efficient recycling of the chromophore of visual pigments, 11-cis-retinal, through the retinoid visual cycle is an essential process for maintaining normal vision. RPE65 is the isomerohydrolase in retinal pigment epithelium and generates predominantly 11-cis-retinol (11cROL) and a minor amount of 13-cis-retinol (13cROL), from all-trans-retinyl ester (atRE). We recently identified and characterized novel homologues of RPE65, RPE65c, and 13-cis-isomerohydrolase (13cIMH), which are expressed in the zebrafish inner retina and brain, respectively. Although these two homologues have 97% identical amino acid sequences, they exhibit distinct product specificities. Under the same assay conditions, RPE65c generated predominantly 11cROL, similar to RPE65, while 13cIMH generated exclusively 13cROL from atRE substrate. To study the impacts of the key residues determining the isomerization product specificity of RPE65, we replaced candidate residues by site-directed mutagenesis in RPE65c and 13cIMH. Point mutations at residues Tyr58, Phe103, and Leu133 in RPE65c resulted in significantly altered isomerization product specificities. In particular, our results showed that residue 58 is a primary determinant of isomerization specificity, because the Y58N mutation in RPE65c and its reciprocal N58Y mutation in 13cIMH completely reversed the respective enzyme isomerization product specificities. These findings will contribute to the elucidation of molecular mechanisms underlying the isomerization reaction catalyzed by RPE65. 相似文献
6.
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture 总被引:7,自引:0,他引:7
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi 相似文献
7.
In gram-negative organisms, high-affinity transport of iron substrates requires energy transduction to specific outer membrane receptors by the TonB-ExbB-ExbD complex. Vibrio cholerae encodes two TonB proteins, one of which, TonB1, recognizes only a subset of V. cholerae TonB-dependent receptors and does not facilitate transport through Escherichia coli receptors. To investigate the receptor specificity exhibited by V. cholerae TonB1, chimeras were created between V. cholerae TonB1 and E. coli TonB. The activities of the chimeric TonB proteins in iron utilization assays demonstrated that the C-terminal one-third of either TonB confers the receptor specificities associated with the full-length TonB. Single-amino-acid substitutions near the C terminus of V. cholerae TonB1 were identified that allowed TonB1 to recognize E. coli receptors and at least one V. cholerae TonB2-dependent receptor. This indicates that the very C-terminal end of V. cholerae TonB1 determines receptor specificity. The regions of the TonB-dependent receptors involved in specificity for a particular TonB protein were investigated in experiments involving domain switching between V. cholerae and E. coli receptors exhibiting different TonB specificities. Switching the conserved TonB box heptapeptides at the N termini of these receptors did not alter their TonB specificities. However, replacing the amino acid immediately preceding the TonB box in E. coli receptors with an aromatic residue allowed these receptors to use V. cholerae TonB1. Further, site-directed mutagenesis of the TonB box -1 residue in a V. cholerae TonB2-dependent receptor demonstrated that a large hydrophobic amino acid in this position promotes recognition of V. cholerae TonB1. These data suggest that the TonB box -1 position controls productive interactions with V. cholerae TonB1. 相似文献
8.
Le Gouis J Bordes J Ravel C Heumez E Faure S Praud S Galic N Remoué C Balfourier F Allard V Rousset M 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,124(3):597-611
The modification of flowering date is considered an important way to escape the current or future climatic constraints that affect wheat crops. A better understanding of its genetic bases would enable a more efficient and rapid modification through breeding. The objective of this study was to identify chromosomal regions associated with earliness in wheat. A 227-wheat core collection chosen to be highly contrasted for earliness was characterized for heading date. Experiments were conducted in controlled conditions and in the field for 3 years to break down earliness in the component traits: photoperiod sensitivity, vernalization requirement and narrow-sense earliness. Whole-genome association mapping was carried out using 760 molecular markers and taking into account the five ancestral group structure. We identified 62 markers individually associated to earliness components corresponding to 33 chromosomal regions. In addition, we identified 15 other significant markers and seven more regions by testing marker pair interactions. Co-localizations were observed with the Ppd-1, Vrn-1 and Rht-1 candidate genes. Using an independent set of lines to validate the model built for heading date, we were able to explain 34% of the variation using the structure and the significant markers. Results were compared with already published data using bi-parental populations giving an insight into the genetic architecture of flowering time in wheat. 相似文献
9.
10.
11.
Mehmood T Bohlin J Bråthen Kristoffersen A Sæbø S Warringer J Snipen L 《BMC bioinformatics》2012,13(1):97
ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies. 相似文献
12.
Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets 总被引:2,自引:0,他引:2
Ordination is a powerful method for analysing complex data setsbut has been largely ignored in sequence analysis. This papershows how to use principal coordinates analysis to find lowdimensionalrepresentations of distance matrices derived from aligned setsof sequences. The method takes a matrix of Euclidean distancesbetween all pairs of sequence and finds a coordinate space wherethe distances are exactly preserved The main problem is to finda measure of distance between aligned sequences that is Euclidean.The simplest distance function is the square root of the percentagedifference (as measured by identities) between two sequences,where one ignores any positions in the alignment where thereis a gap in any sequence. If one does not ignore positions witha gap, the distances cannot be guaranteed to be Euclidean butthe deleterious effects are trivial. Two examples of using themethod are shown. A set of 226 aligned globins were analysedand the resulting ordination very successfully represents theknown patterns of relationship between the sequences. In theother example, a set of 610 aligned 5S rRNA sequences were analysed.Sequence ordinations complement phylogenetic analyses. Theyshould not be viewed as a complete alternative. 相似文献
13.
M van Heel 《Journal of molecular biology》1991,220(4):877-887
A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. Such typical invariant function is a 20 x 20 histogram of occurrences of amino acid pairs in a given sequence or fragment thereof. To illustrate the potential of the approach an analysis of 10,000 protein sequences from the National Biomedical Research Foundation Protein Identification Resource is presented, whose analysis already reveals great biological detail. For example, zeta-hemoglobin is found to lie close to amphibian and fish chi-hemoglobin which, in turn, is an important clue to the physiological function of this mammalian early embryonic hemoglobin. The multivariate statistical framework presented unifies such apparently unrelated issues as phylogenetic comparisons between a set of sequences and distance matrices between the constituents of the biological sequences. The Multivariate Statistical Sequence Analysis (MSSA) principles can be used for a wide spectrum of sequence analysis problems such as: assignment of family memberships to new sequences, validation of new incoming sequences to be entered into the database, prediction of structure from sequence, discrimination of coding from non-coding DNA regions, and automatic generation of an atlas of protein or DNA sequences. The MSSA techniques represent a self-contained approach to learning continuously and automatically from the growing stream of new sequences. The MSSA approach is particularly likely to play a significant role in major sequencing efforts such as the human genome project. 相似文献
14.
15.
16.
S. L. Grokhovsky I. A. Il’icheva D. Yu. Nechipurenko L. A. Panchenko R. V. Polozov Yu. D. Nechipurenko 《Biophysics》2008,53(3):250-251
Looking for new means of assessing local conformational and dynamic heterogeneities in DNA structure, we have estimated the rates of phosphodiester bond cleavage in DNA fragments of known sequence caused by ultrasonic treatment. Among the 16 dinucleotide steps possible, those with 5′-ward cytosine [5′-d(CpN)-3′] are distinguished by significantly higher cleavage rates: CG > CA = CT > CC. The possible causes of this intriguing phenomenon are considered. 相似文献
17.
For the first time, each specificity determining residue (SDR) in the binding site of an antibody has been replaced with every other possible single amino acid substitution, and the resulting mutants analyzed for binding affinity and specificity. The studies were conducted on a variant of the 26-10 antidigoxin single chain Fv (scFv) using in vitro scanning saturation mutagenesis, a new process that allows the high throughput production and characterization of antibody mutants [Burks,E.A., Chen,G., Georgiou,G. and Iverson,B.L. (1997) Proc. Natl Acad. Sci. USA, 94, 412-417]. Single amino acid mutants of 26-10 scFv were identified that modulated specificity in dramatic fashion. The overall plasticity of the antibody binding site with respect to amino acid replacement was also evaluated, revealing that 86% of all mutants retained measurable binding activity. Finally, by analyzing the physical properties of amino acid substitutions with respect to their effect on hapten binding, conclusions were drawn regarding the functional role played by the wild-type residue at each SDR position. The reported results highlight the value of in vitro scanning saturation mutagenesis for engineering antibody binding specificity, for evaluating the plasticity of proteins, and for comprehensive structure-function studies and analysis. 相似文献
18.
Rosén ML Edman M Sjöström M Wieslander A 《The Journal of biological chemistry》2004,279(37):38683-38692
Glycosyltransferases (GTs) are among the largest groups of enzymes found and are usually classified on the basis of sequence comparisons into many families of varying similarity (CAZy systematics). Only two different Rossman-like folds have been detected (GT-A and GT-B) within the small number of established crystal structures. A third uncharacterized fold has been indicated with transmembrane organization (GT-C). We here use a method based on multivariate data analyses (MVDAs) of property patterns in amino acid sequences and can with high accuracy recognize the correct fold in a large data set of GTs. Likewise, a retaining or inverting enzymatic mechanism for attachment of the donor sugar could be properly revealed in the GT-A and GT-B fold group sequences by such analyses. Sequence alignments could be correlated to important variables in MVDA, and the separating amino acid positions could be mapped over the active sites. These seem to be localized to similar positions in space for the alpha/beta/alpha binding motifs in the GT-B fold group structures. Analogous, active-site sequence positions were found for the GT-A fold group. Multivariate property patterns could also easily group most GTs annotated in the genomes of Escherichia coli and Synechocystis to proper fold or organization group, according to benchmarking comparisons at the MetaServer. We conclude that the sequence property patterns revealed by the multivariate analyses seem more conserved than amino acid types for these GT groups, and these patterns are also conserved in the structures. Such patterns may also potentially define substrate preferences. 相似文献
19.
Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The hotspots might also reflect structural and functional features of the respective DNA sequences. When mutations in a gene are identified using a particular experimental system, resulting hotspots could reflect the properties of the gene product and the mutant selection scheme. Analysis of the nucleotide sequence context of hotspots can provide information on the molecular mechanisms of mutagenesis. However, the determinants of mutation frequency and specificity are complex, and there are many analytical methods for their study. Here we review computational approaches for analyzing mutation spectra (distribution of mutations along the target genes) that include many mutable (detectable) positions. The following methods are reviewed: derivation of a consensus sequence, application of regression approaches to correlate nucleotide sequence features with mutation frequency, mutation hotspot prediction, analysis of oligonucleotide composition of regions containing mutations, pairwise comparison of mutation spectra, analysis of multiple spectra, and analysis of "context-free" characteristics. The advantages and pitfalls of these methods are discussed and illustrated by examples from the literature. The most reliable analyses were obtained when several methods were combined and information from theoretical analysis and experimental observations was considered simultaneously. Simple, robust approaches should be used with small samples of mutations, whereas combinations of simple and complex approaches may be required for large samples. We discuss several well-documented studies where analysis of mutation spectra has substantially contributed to the current understanding of molecular mechanisms of mutagenesis. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and DNA repair, replication, and modification enzymes, and the analysis of hotspot context provides evidence of such interactions. 相似文献
20.
Aspartate-semialdehyde dehydrogenase (ASADH; EC 1.2.1.11) is a key enzyme in the biosynthesis of essential amino acids in prokaryotes and fungi, inhibition of ASADH leads to the development of novel antitubercular agents. In the present work, a combined structure and ligand-based pharmacophore modeling, molecular docking, and molecular dynamics (MD) approaches were employed to identify potent inhibitors of mycobacterium tuberculosis (Mtb)-ASADH. The structure-based pharmacophore hypothesis consists of three hydrogen bond acceptor (HBA), two negatively ionizable, and one positively ionizable center, while ligand-based pharmacophore consists of additional one HBA and one hydrogen bond donor features. The validated pharmacophore models were used to screen the chemical databases (ZINC and NCI). The screened hits were subjected to ADME and toxicity filters, and subsequently to the molecular docking analysis. Best-docked 25 compounds carry the characteristics of highly electronegative functional groups (–COOH and –NO2) on both sides and exhibited the H-bonding interactions with highly conserved residues Arg99, Arg249, and His256. For further validation of docking results, MD simulation studies were carried out on two representative compounds NSC51108 and ZINC04203124. Both the compounds remain bound to the key active residues of Mtb-ASADH during the MD simulations. These identified hits can be further used for lead optimization and in the design more potent inhibitors against Mtb-ASADH. 相似文献