首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
We describe a set of IBM-compatible computer programs designed to selectively identify the potential sites for silent mutagenesis within a target DNA sequence. This program is based on a novel strategy of identifying amino acid motifs compatible with each restriction site (BioTechniques 12:382-384, 1991). The programs can be used to identify the suitability for the introduction of any 6-base nucleic acid sequences, such as restriction enzyme sites in cassette mutagenesis strategies. The Table program generates a table of multiple amino acid motifs for each restriction enzyme, obtained by translating each unique recognition sequence in all three reading frames. The Silmut program, which utilizes the features of Table, will further identify the presence of a match between any amino acid motif of each restriction enzyme and the input target sequence. Minor manipulations of the data base files will enable the individual researcher to identify the potential for introduction of any 6-base sequences by silent mutagenesis.  相似文献   

3.
Palindromic units (PU or REP) were defined as 40-nucleotide DNA sequences which are highly repeated in the genome of several members of the Enterobacteriaceae. They were shown to be a constituent of the bacterial interspersed mosaic element (BIME), in which they are associated with other repetitive sequences. We report here that Escherichia coli PU sequences contain three motifs (Y, Z1 and Z2), leading to the definition of two BIME families. The BIME-1 family, highly conserved over 145 nucleotides, contains two PUs (motifs Y and Z1). The BIME-2 family contains a variable number of PUs (motifs Y and Z2). We present evidence, using band shift experiments, that each PU motif binds DNA gyrase with a different affinity. This suggests that the two families are functionally distinct.  相似文献   

4.
5.
DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds.  相似文献   

6.
Base triples are recurrent clusters of three RNA nucleobases interacting edge-to-edge by hydrogen bonding. We find that the central base in almost all triples forms base pairs with the other two bases of the triple, providing a natural way to geometrically classify base triples. Given 12 geometric base pair families defined by the Leontis-Westhof nomenclature, combinatoric enumeration predicts 108 potential geometric base triple families. We searched representative atomic-resolution RNA 3D structures and found instances of 68 of the 108 predicted base triple families. Model building suggests that some of the remaining 40 families may be unlikely to form for steric reasons. We developed an on-line resource that provides exemplars of all base triples observed in the structure database and models for unobserved, predicted triples, grouped by triple family, as well as by three-base combination (http://rna.bgsu.edu/Triples). The classification helps to identify recurrent triple motifs that can substitute for each other while conserving RNA 3D structure, with applications in RNA 3D structure prediction and analysis of RNA sequence evolution.  相似文献   

7.
BackgroundAID/APOBEC3 (A3) enzymes instigate genomic mutations that are involved in immunity and cancer. Although they can deaminate any deoxycytidine (dC) to deoxyuridine (dU), each family member has a signature preference determined by nucleotides surrounding the target dC. This WRC (W = A/T, R = A/G) and YC (Y = T/C) hotspot preference is established for AID and A3A/A3B, respectively. Base alkylation and oxidation are two of the most common types of DNA damage induced environmentally or by chemotherapy. Here we examined the activity of AID, A3A and A3B on dCs neighboring such damaged bases.MethodsSubstrates were designed to contain target dCs either in normal WRC/YC hotspots, or in oxidized/alkylated DNA motifs. AID, A3A and A3B were purified and deamination kinetics of each were compared between substrates containing damaged vs. normal motifs.ResultsAll three enzymes efficiently deaminated dC when common damaged bases were present in the -2 or -1 positions. Strikingly, some damaged motifs supported comparable or higher catalytic efficiencies by AID, A3A and A3B than the WRC/YC motifs which are their most favored normal sequences. Based on the resolved interactions of AID, A3A and A3B with DNA, we modeled interactions with alkylated or oxidized bases. Corroborating the enzyme assay data, the surface regions that recognize normal bases are predicted to also interact robustly with oxidized and alkylated bases.ConclusionsAID, A3A and A3B can efficiently recognize and deaminate dC whose neighbouring nucleotides are damaged.General significanceBeyond AID/A3s initiating DNA damage, some forms of pre-existing damaged DNA can constitute favored targets of AID/A3s if encountered.  相似文献   

8.
We performed whole-genome analyses of DNA methylation in Shewanella oneidensis MR-1 to examine its possible role in regulating gene expression and other cellular processes. Single-molecule real-time (SMRT) sequencing revealed extensive methylation of adenine (N6mA) throughout the genome. These methylated bases were located in five sequence motifs, including three novel targets for type I restriction/modification enzymes. The sequence motifs targeted by putative methyltranferases were determined via SMRT sequencing of gene knockout mutants. In addition, we found that S. oneidensis MR-1 cultures grown under various culture conditions displayed different DNA methylation patterns. However, the small number of differentially methylated sites could not be directly linked to the much larger number of differentially expressed genes under these conditions, suggesting that DNA methylation is not a major regulator of gene expression in S. oneidensis MR-1. The enrichment of methylated GATC motifs in the origin of replication indicates that DNA methylation may regulate genome replication in a manner similar to that seen in Escherichia coli. Furthermore, comparative analyses suggest that many Gammaproteobacteria, including all members of the Shewanellaceae family, may also utilize DNA methylation to regulate genome replication.  相似文献   

9.
10.
11.
12.
Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.  相似文献   

13.
Short DNA sequence motifs have been demonstrated to interact with DNA binding proteins and regulate flanking genes. The short nature and the lack of continuity of many of these DNA binding sites make it difficult to develop an approach to characterize genes that have the same flanking sequences. We tested various oligonucleotide combinations using an immunoglobulin variable region gene family as a model amplification system. One successful amplification strategy used an oligonucleotide containing two known noncontiguous short sequences connected by random insertion of all four bases to maintain the appropriate spacing. A second approach used an oligonucleotide having a single short homologous sequence with the addition of all four bases randomly placed at the 5' end to increase the extent of homology. Both strategies will permit the priming of members of a specific gene family, with the two short sequences bridged by all four bases randomly added being more efficient in the amplification process.  相似文献   

14.
Two gene families clustered in a small region of the Drosophila genome   总被引:13,自引:0,他引:13  
Three Drosophila genes that are clustered within 8 X 10(3) bases of DNA at the chromosomal region 44D have been identified and mapped, and the gene cluster entirely sequenced. The three genes are 55 to 60% homologous in DNA sequence. One gene contains an intron in its 5'-proximal protein coding sequence while the other two have none at this position; similarly, another gene has an intron in its 3'-proximal protein coding sequence which is not found in the other genes. All three genes are abundantly expressed together in Drosophila first, second, and early third instar larval stages and in adults, but they are not abundantly expressed in either embryonic, late third instar larval, or pupal stages. This gene family lies 11 X 10(3) bases away from another cluster containing four Drosophila larval cuticle protein genes plus a pseudogene. The cuticle genes are all abundantly expressed throughout third instar larval development. Thus, at least seven protein-coding genes and one pseudogene lie within 27 X 10(3) bases of DNA. Moreover, two small gene families can lie adjacent on a chromosome and exhibit different patterns of developmental regulation, even though individual genes within each clustered family are co-ordinately expressed.  相似文献   

15.
16.
We have analyzed structure-sequence relationships in 32 families of flavin adenine dinucleotide (FAD)-binding proteins, to prepare for genomic-scale analyses of this family. Four different FAD-family folds were identified, each containing at least two or more protein families. Three of these families, exemplified by glutathione reductase (GR), ferredoxin reductase (FR), and p-cresol methylhydroxylase (PCMH) were previously defined, and a family represented by pyruvate oxidase (PO) is newly defined. For each of the families, several conserved sequence motifs have been characterized. Several newly recognized sequence motifs are reported here for the PO, GR, and PCMH families. Each FAD fold can be uniquely identified by the presence of distinctive conserved sequence motifs. We also analyzed cofactor properties, some of which are conserved within a family fold while others display variability. Among the conserved properties is cofactor directionality: in some FAD-structural families, the adenine ring of the FAD points toward the FAD-binding domain, whereas in others the isoalloxazine ring points toward this domain. In contrast, the FAD conformation and orientation are conserved in some families while in others it displays some variability. Nevertheless, there are clear correlations among the FAD-family fold, the shape of the pocket, and the FAD conformation. Our general findings are as follows: (a) no single protein 'pharmacophore' exists for binding FAD; (b) in every FAD-binding family, the pyrophosphate moiety binds to the most strongly conserved sequence motif, suggesting that pyrophosphate binding is a significant component of molecular recognition; and (c) sequence motifs can identify proteins that bind phosphate-containing ligands.  相似文献   

17.
The methylation of DNA bases plays an important role in numerous biological processes including development, gene expression, and DNA replication. Salmonella is an important foodborne pathogen, and methylation in Salmonella is implicated in virulence. Using single molecule real-time (SMRT) DNA-sequencing, we sequenced and assembled the complete genomes of eleven Salmonella enterica isolates from nine different serovars, and analysed the whole-genome methylation patterns of each genome. We describe 16 distinct N6-methyladenine (m6A) methylated motifs, one N4-methylcytosine (m4C) motif, and one combined m6A-m4C motif. Eight of these motifs are novel, i.e., they have not been previously described. We also identified the methyltransferases (MTases) associated with 13 of the motifs. Some motifs are conserved across all Salmonella serovars tested, while others were found only in a subset of serovars. Eight of the nine serovars contained a unique methylated motif that was not found in any other serovar (most of these motifs were part of Type I restriction modification systems), indicating the high diversity of methylation patterns present in Salmonella.  相似文献   

18.
The recent deluge of new RNA structures, including complete atomic-resolution views of both subunits of the ribosome, has on the one hand literally overwhelmed our individual abilities to comprehend the diversity of RNA structure, and on the other hand presented us with new opportunities for comprehensive use of RNA sequences for comparative genetic, evolutionary and phylogenetic studies. Two concepts are key to understanding RNA structure: hierarchical organization of global structure and isostericity of local interactions. Global structure changes extremely slowly, as it relies on conserved long-range tertiary interactions. Tertiary RNA-RNA and quaternary RNA-protein interactions are mediated by RNA motifs, defined as recurrent and ordered arrays of non-Watson-Crick base-pairs. A single RNA motif comprises a family of sequences, all of which can fold into the same three-dimensional structure and can mediate the same interaction(s). The chemistry and geometry of base pairing constrain the evolution of motifs in such a way that random mutations that occur within motifs are accepted or rejected insofar as they can mediate a similar ordered array of interactions. The steps involved in the analysis and annotation of RNA motifs in 3D structures are: (a) decomposition of each motif into non-Watson-Crick base-pairs; (b) geometric classification of each basepair; (c) identification of isosteric substitutions for each basepair by comparison to isostericity matrices; (d) alignment of homologous sequences using the isostericity matrices to identify corresponding positions in the crystal structure; (e) acceptance or rejection of the null hypothesis that the motif is conserved.  相似文献   

19.
Proteins that bind to DNA are found in all areas of genetic activity within the cell. To help understand how these proteins perform their various functions, it is useful to analyse which residues are involved in binding to the DNA and how they interact with the bases and sugar-phosphate backbone of nucleic acids. Here we describe a program called NUCPLOT which can automatically identify these interactions from the 3D atomic coordinates of the complex from a PDB file and generate a plot that shows all the interactions in a schematic manner. The program produces a PostScript output file representing hydrogen, van der Waals and covalent bonds between the protein and the DNA. The resulting diagram is both clear and simple and allows immediate identification of important interactions within the structure. It also facilitates comparison of binding found in different structures. NUCPLOT is a completely automatic program, which can be used for any protein-DNA complex and will also work for certain protein-RNA structures.  相似文献   

20.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号