首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The large number of protein consensus sequences that may be recognized without computer analysis are reviewed. These include the extensive range of known phosphorylation site motifs for protein kinases; metal binding sites for calcium, zinc, copper, and iron; enzyme active site motifs; nucleotide binding and covalent attachment sites for prosthetic groups, carbohydrate, and lipids. Of particular note is the increasing realization of the importance for cellular regulation of protein-protein interaction motifs and sequences that target proteins to particular subcellular locations. This article includes an introduction to accessing the many suites of programs for analysis of protein structure, signatures of protein families, and consensus sequences that may be carried out on the internet.  相似文献   

2.
Genomic DNA sequences contain a wealth of information about the bendability and curvature of the DNA molecule. For example, the well-known 10-11 bp periodicities within genomes can be attributed to supercoiled structures or wrapping around nucleosomes. Such periodic signals have previously been examined mainly based on mono- or dinucleotide correlations. In this study, we generalize this approach and analyze correlation functions of longer motifs such as tetramers or poly(A) sequences. Periodically placed motifs may indicate regular protein binding or curvature signals. We detected various periodic signals e.g. strong 10-11 bp oscillations of periodically placed poly(A), poly(T) or poly(W) stretches. These observations lead to a new view on the intensively studied 10-11 bp periodicities.  相似文献   

3.
Protein kinase recognition sequence motifs   总被引:140,自引:0,他引:140  
Protein kinases play a crucial role in the regulation of many cellular processes. They alter the functions of their target proteins by phosphorylating specific serine, threonine and tyrosine residues. Identification of phosphorylation site sequences and studies with corresponding model peptides have provided clues to how these important enzymes recognize their substrate proteins. This knowledge has made it possible to identify potential sites of phosphorylation in newly sequenced proteins as well as to construct specific model substrates and inhibitors.  相似文献   

4.
A new algorithm has been constructed for finding under- and overrepresented oligonucleotide motifs in the protein coding regions of genomes that have been normalized for G/C content, codon usage, and amino acid order. This Robins-Krasnitz algorithm has been employed to compare the oligonucleotide frequencies between many different prokaryotic genomes. Evidence is presented demonstrating that at least some of these sequence motifs are functionally important and selected for or against during the evolution of these prokaryotes. The applications of this method include the optimization of protein expression for synthetic genes in foreign organisms, identification of novel oligonucleotide signals used by the organism and the examination of evolutionary relationships not dependent upon different gene sequence trees.  相似文献   

5.
Motifs in a given network are small connected subnetworks that occur in significantly higher frequencies than would be expected in random networks. They have recently gathered much attention as a concept to uncover structural design principles of complex networks. Kashtan et al. [Bioinformatics, 2004] proposed a sampling algorithm for performing the computationally challenging task of detecting network motifs. However, among other drawbacks, this algorithm suffers from a sampling bias and scales poorly with increasing subgraph size. Based on a detailed analysis of the previous algorithm, we present a new algorithm for network motif detection which overcomes these drawbacks. Furthermore, we present an efficient new approach for estimating the frequency of subgraphs in random networks that, in contrast to previous approaches, does not require the explicit generation of random networks. Experiments on a testbed of biological networks show our new algorithms to be orders of magnitude faster than previous approaches, allowing for the detection of larger motifs in bigger networks than previously possible and thus facilitating deeper insight into the field  相似文献   

6.
Patel RY  Balaji PV 《Glycobiology》2006,16(2):108-116
Eukaryotic sialyltransferases (SiaTs) comprise a superfamily of enzymes catalyzing the transfer of sialic acid (Sia) from a common donor substrate to various acceptor substrates in different linkages. These enzymes have been classified as ST3Gal, ST6Gal, ST6GalNAc, and ST8Sia families based on linkage- and acceptor monosaccharide-specificities and sequence similarities. It was recognized early on that SiaTs contain certain well-conserved motifs, and these were denoted as L (large)-, S (small)-, and VS (very small)-motifs; recently, a fourth motif, denoted as motif III, was identified. These four motifs are common to all the SiaTs, irrespective of the linkage- and acceptor saccharide-specificities. In this study, the sequences of the various families have been analyzed, and sequence motifs that are unique to the various families have been identified. These unique motifs are expected to contribute to the characteristic linkage- and acceptor saccharide-specificities of the family members. One of the linkage specific motifs is contiguous to L-motif. Members of ST3Gal and ST8Sia families share significant sequence similarities; in contrast, the ST6Gal family is distinct from the ST6GalNAc family. The latter consists of two subfamilies, one comprising ST6GalNAc I and ST6GalNAc II, and the other comprising ST6GalNAc III, ST6GalNAc IV, ST6GalNAc V, and ST6GalNAc VI. Each of these subfamilies has characteristic sequence motifs not present in the other subfamily.  相似文献   

7.
SUMMARY: CREDO is a user-friendly, web-based tool that integrates the analysis and results of different algorithms widely used for the computational detection of conserved sequence motifs in noncoding sequences. It enables easy comparison of the individual results. CREDO offers intuitive interfaces for easy and rapid configuration of the applied algorithms and convenient views on the results in graphical and tabular formats. AVAILABILITY: http://mips.gsf.de/proj/regulomips/credo.htm.  相似文献   

8.
9.
We present an update of our method for systematic detection and evaluation of potential helix-turn-helix DNA-binding motifs in protein sequences [Dodd, I. and Egan, J. B. (1987) J. Mol. Biol. 194, 557-564]. The new method is considerably more powerful, detecting approximately 50% more likely helix-turn-helix sequences without an increase in false predictions. This improvement is due almost entirely to the use of a much larger reference set of 91 presumed helix-turn-helix sequences. The scoring matrix derived from this reference set has been calibrated against a large protein sequence database so that the score obtained by a sequence can be used to give a practical estimation of the probability that the sequence is a helix-turn-helix motif.  相似文献   

10.
Motivation: Genomes contain biologically significant informationthat extends beyond that encoded in genes. Some of this informationrelates to various short dispersed repeats distributed throughoutthe genome. The goal of this work was to combine tools for detectionof statistically significant dispersed repeats in DNA sequenceswith tools to aid development of hypotheses regarding theirpossible physiological functions in an easy-to-use web-basedenvironment. Results: Ab Initio Motif Identification Environment (AIMIE)was designed to facilitate investigations of dispersed sequencemotifs in prokaryotic genomes. We used AIMIE to analyze theEscherichia coli and Haemophilus influenzae genomes in orderto demonstrate the utility of the new environment. AIMIE detectedrepeated extragenic palindrome (REP) elements, CRISPR repeats,uptake signal sequences, intergenic dyad sequences and severalother over-represented sequence motifs. Distributional patternsof these motifs were analyzed using the tools included in AIMIE. Availability: AIMIE and the related software can be accessedat our web site http://www.cmbl.uga.edu/software.html. Contact: mrazek{at}uga.edu Associate Editor: Alex Bateman  相似文献   

11.
A fixed-point alignment analysis technique is presented whichis designed to locate common sequence motifs in collectionsof proteins or nucleic acids. Initially a program aligns a collectionof sequences by a common sequence pattern or known biologicalfeature. The common pattern or feature (fixed-point) may bea user-specified sequence string or a known sequence positionlike mRNA start site, which may be taken directly from the annotatedfeature table of GenBank. Once all alignment markers are located,the sequences are scanned for occurrences of given oligomerswithin a specified span both upstream and downstream of thefixed-point. The occurrences may then be plotted as a functionof the position relative to the fixed-point, displayed as anactual sequence alignment or selectively summarized via variousprogram options. Applications of the technique are discussed. Received on August 17, 1987; accepted on November 17, 1987  相似文献   

12.
13.
Plant S-adenosyl-L-methionine-dependent methyltransferases (SAM-Mtases) are the key enzymes in phenylpropanoid, flavonoid and many other metabolic pathways of biotechnological importance. Here we compiled the amino acid sequences of 56 SAM-Mtases from different plants and performed a computer analysis for the conserved sequence motifs that could possibly act as SAM-binding domains. To date, genes or cDNAs encoding at least ten distinct groups of SAM-Mtases that utilize SAM and a variety of substrates have been reported from higher plants. Three amino acid sequence motifs are conserved in most of these SAM-Mtases. In addition, many conserved domains have been discovered in each group of O-methyltransferases (OMTs) that methylate specific substrates and may act as sites for substrate specificity in each enzyme. Finally, a diagrammatic representation of the relationship between different OMTs is presented. These SAM-Mtase sequence signatures will be useful in the identification of SAM-Mtase motifs in the hitherto unidentified proteins as well as for designing primers in the isolation of new SAM-Mtases from plants.  相似文献   

14.
15.
Protein domains and sequence motifs have been very influentialin the field of molecular biology. These units are the commoncurrency of protein structure and function. The  相似文献   

16.
We have searched for the exclusivity of common sequence motifs of the mitochondrial uncoupling proteins (UCP1, UCP2, UCP3, UCP4, BMCP1, and plant UCP [PUMP]) within the gene family of mitochondrial anion carrier proteins. The UCP-specific sequences, "UCP signatures", were found in the first, second, and fourth alpha-helices. First: Ala/Ser-Cys/Thr/n-n/Phe-Ala/Gly-[negatively charged residue]-n/Phe-n/Cys-Thr-Phe/n; second: Gly/Ala-Ile/Leu-Gln/X-[positively charged residue]-NH-n/Cys-Ser/nphi/X-n/Ser-OH/Gly-n-[positively charged residue]-Ile/Met-Gly/Val-n/Thr; fourth: Pro-Asn/ Thr-n-X-[positively charged residue]-Asn/Ser/Ala-n-n-Ile/Leu-n-Asn/Val-Cys/n-n/Thr-[negatively charged residue]-n-n/Thr/Pro-OH/Val (n, nonpolar; phi, aromatic; (positively charged residue/negatively charged residue, charged residue). The second and part of the third signature are also present in the yeast dicarboxylate transporter. The UCP signature excluding BMCP1 was also found in the second matrix segment: [positively charged residue]-(Pro/ del-Leu/del)-[positively charged residue]-phi-X-Gly/Ser-Thr/n-X-NH/[negatively charged residue]-Ala-phi. These UCP signatures are thought to be involved in fatty acid anion binding and translocation.  相似文献   

17.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.  相似文献   

18.
19.
We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) it greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers, 2) it simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly, 3) it calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号