首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The consensus sequence of E.coli promoter elements was determined by the method of random selection. A large collection of hybrid molecules was produced in which random-sequence oligonucleotides were cloned in place of a wild-type promoter element, and functional -10 and -35 E.coli promoter elements were obtained by a genetic selection involving the expression of a structural gene. The DNA sequences and relative levels of function for -10 and -35 elements were determined. The consensus sequences determined by this approach are very similar to those determined by comparing DNA sequences of naturally occurring E.coli promoters. However, no strong correlation is observed between similarity to the consensus and relative level of function. The results are considered in terms of E.coli promoter function and of the general applicability of the random selection method.  相似文献   

2.
3.
Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity.  相似文献   

4.
Base sequence influences the structure, mechanics, dynamics, and interactions of nucleic acids. However, studying all possible sequences for a given fragment leads to a number of base combinations that increases exponentially with length. We present here a novel methodology based on a multi-copy approach enabling us to determine which base sequence favors a given structural change or interaction via a single energy minimization. This methodology, termed ADAPT, has been implemented starting from the JUMNA molecular mechanics program by adding special nucleotides, "lexides," containing all four bases, whose contribution to the energy of the system is weighted by continuously variable coefficients. We illustrate the application of this approach in the case of double-stranded DNA by determining the optimal sequences satisfying structural (B-Z transition), mechanical (intrinsic curvature), and interaction (ligand-binding) properties.  相似文献   

5.
Abstract

This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA  相似文献   

6.
This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA.  相似文献   

7.
8.
Sequences from ribosomal RNA (rRNA) genes have made a huge contribution to our current understanding of metazoan phylogeny and indeed the phylogeny of all of life. That said, some parts of this rRNA-based phylogeny remain unresolved. One approach to increase the resolution of these trees would be to use more appropriate models of sequence evolution in phylogenetic analysis. RNAs transcribed from rRNA genes have a complex secondary structure mediated by base pairing between sometimes distant regions of the rRNA molecule. The pairing between the stem nucleotides has important consequences for their evolution which differs from that of unpaired loop nucleotides. These differences in evolution should ideally be accounted for when using rRNA sequences for phylogeny estimation. We use a novel permutation approach to demonstrate the significant superiority of models of sequence evolution that allow stem and loop regions to evolve according to separate models and, in common with previous studies, we show that 16-state models that take base pairing of stems into account are significantly better than simpler, 4-state, single-nucleotide models. One of these 16-state models has been applied to the phylogeny of the Bilateria using small subunit rRNA (SSU) sequences. Our optimal tree largely echoes previous results based on SSU in particular supporting the tripartite Bilaterian tree of deuterostomes, lophotrochozoans, and ecdysozoans. There are also a number of differences, however, perhaps most important of which is the observation of a clade consisting of the gastrotrichs plus platyheminthes that is basal to all other lophotrochozoan taxa. Use of 16-state models also appears to reduce the Bayesian support given to certain biologically improbable groups found using standard 4-state models.  相似文献   

9.
1. The nucleotide sequence of 5.8-S rRNA from Xenopus laevis is given; it differs by a C in equilibrium U transition at position 140 from the 5.8-S rRNA of Xenopus borealis. 2. The sequence contains two completely modified and two partially modified residues. 3. Three different 5' nucleotides are found: pU-C-G (0.4) pC-G (0.2) and pG (0.4). 4. The 3' terminus is C not U as in all other 5.8-S sequences so far determined. 5. The X. laevis sequence differs from the mammalian and turtle sequences by five and six residue changes respectively. 6. A ribonuclease-resistant hairpin loop is a principle feature of secondary structure models proposed for this molecule. 7. Sequence heterogeneity may occur at one position at a very low level (approximately 0.01) in X. laevis 5.8-S rRNA, while none was detected in X. borealis or HeLa cell 5.8-S rRNA.  相似文献   

10.
The formation of melted regions from A + T-rich sequences and left-handed Z-DNA by alternating purine-pyrimidine sequences will both be facilitated by negative supercoiling, and thus if the sequences are present within the same plasmid molecule they will compete for the free energy of supercoiling. We have studied a series of plasmids that contain either (CG)8 or (TG)12 sequences in either G + C or A + T-rich contexts, by means of two-dimensional gel electrophoresis and chemical modification. We observe both B-Z and helix-coil transitions in all plasmids at elevated temperatures and low ionic strength. The plasmids fall into a number of different classes, in terms of the conformational behavior. As the superhelix density is increased, pCG8/vec ((CG)8 in G + C-rich context) undergoes an initial B-Z transition, followed by melting transitions in sequences remote from the (CG)8 sequence. The two transitions are coupled through the topology of the molecule but are otherwise independent. When the (CG)8 sequence was placed in an A + T-rich context (pCG8/col), the helix-coil transition was perturbed by the presence of the Z-DNA segment. Replacement of the (CG)8 tracts by (TG)12 sequences resulted in a further level of interaction between the transitions. Statistical mechanical modeling of the transitions suggested that at intermediate levels of negative supercoiling the Z-DNA formed by the (TG)12 sequence has a lowered probability due to the helix-coil transition in the A + T-rich sequences. These studies illustrate the complexities of competing conformational equilibria in supercoiled DNA molecules.  相似文献   

11.
12.
Qin F 《Biophysical journal》2004,86(3):1488-1501
Patch-clamp recording provides an unprecedented means for study of detailed kinetics of ion channels at the single molecule level. Analysis of the recordings often begins with idealization of noisy recordings into continuous dwell-time sequences. Success of an analysis is contingent on accuracy of the idealization. I present here a statistical procedure based on hidden Markov modeling and k-means segmentation. The approach assumes a Markov scheme involving discrete conformational transitions for the kinetics of the channel and a white background noise for contamination of the observations. The idealization is sought to maximize a posteriori probability of the state sequence corresponding to the samples. The approach constitutes two fundamental steps. First, given a model, the Viterbi algorithm is applied to determine the most likely state sequence. With the resultant idealization, the model parameters are then empirically refined. The transition probabilities are calculated from the state sequences, and the current amplitudes and noise variances are determined from the ensemble means and variances of those samples belonging to the same conductance classes. The two steps are iterated until the likelihood is maximized. In practice, the algorithm converges rapidly, taking only a few iterations. Because the noise is taken into explicit account, it allows for a low signal/noise ratio, and consequently a relatively high bandwidth. The approach is applicable to data containing subconductance levels or multiple channels and permits state-dependent noises. Examples are given to elucidate its performance and practical applicability.  相似文献   

13.
14.
Abstract

An analysis of the B-to-Z transition as a function of supercoiling for a natural Z-DNA- forming sequence found in plasmid pBR322 is presented at nucleotide resolution. The analysis is based on reactivity to four chemical probes which exhibit hyperreactivity in the presence of Z-DNA: hydroxylamine, osmium tetroxide, diethyl pyrocarbonate and dimethyl sulfate. We find that the initial transition occurs largely within a 14 base pair region which is mostly alternating purines and pyrimidines. With increasing negative supercoiling, Z-DNA extends into flanking regions having less and less alternating character, first in one direction and then in the other. Evidence of B-Z junctions is seen at four sites bracketing these three adjacent regions. One of these Z-forming regions contains the non-alternating sequence CTCCT, suggesting that such sequences can form Z-DNA without great difficulty if they are adjacent to alternating sequences. A plasmid containing three copies of a 61 base pair fragment bearing the entire Z-forming region shows equal reactivity of all three copies at any given superhelical density, implying that they compete equally and independently for the torsional strain energy which promotes the B-Z transition, and are unaffected by adjacent sequences more than 20–30 base pairs away.  相似文献   

15.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

16.
Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one of the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences of lengths M and N, comparing L residues at a time, with an efficiency of L X M X N/(SK) where S is the alphabet size, and k is the length of the oligomer. For nucleic acids, in which S = 4, use of a tetranucleotide table results in an efficiency of L X M X N/256. The simplicity of the approach allows for a straightforward calculation of the level of similarities expected to be found for given search parameters. Furthermore, the storage required is minimal, allowing for even large sequences to be compared on small microcomputers. Theoretical considerations regarding the use of this search are discussed.  相似文献   

17.
Gordon M. Crippen 《Proteins》1996,26(2):167-171
To calculate the tertiary structure of a protein from its amino acid sequence, the thermodynamic approach requires a potential function of sequence and conformation that has its global minimum at the native conformation for many different proteins. Here we study the behavior of such functions for the simplest model system that still has some of the features of the protein folding problem, namely two-dimensional square lattice chain configurations involving two residue types. First we show that even the given contact potential, which by definition is used to identify the folding sequences and their unique native conformations, cannot always correctly select which sequences will fold to a given structure. Second, we demonstrate that the given contact potential is not always able to favor the native alignment of a native sequence on its own native conformation over other gapped alignments of different folding sequences onto that same conformation. Because of these shortcomings, even in this simple model system in which all conformations and all native sequences are known and determined directly by the given potential, we must reexamine our expectations for empirical potentials used for inverse folding and gapped alignment on more realistic representations of proteins. © 1996 Wiley-Liss, Inc.  相似文献   

18.
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.  相似文献   

19.
R Alex  O Szeri  S Meyer    R Dildrop 《Nucleic acids research》1992,20(9):2257-2263
The DNA-binding domain of the murine N-Myc protein, comprising the basic helix-loop-helix-zipper (bHLH-zip) region was expressed as a fusion protein in E. coli. The affinity purified glutathione-S-transferase-N-Myc fusion protein (GST-N-MYC) was used to select the N-Myc specific DNA-recognition motif from a pool of random-sequence oligonucleotides. After seven rounds of binding-site selection, specifically enriched oligonucleotides were cloned and sequenced. Of 31 individual oligonucleotides whose sequences were determined, 30 contained a common DNA-motif, defining the hexameric consensus sequence CACGTG. We confirm by mutational analysis that binding of the N-Myc derived bHLH-zip domain to this motif is sequence-specific.  相似文献   

20.
We describe codon cassette mutagenesis, a simple method of mutagenesis that uses universal mutagenic cassettes to deposit single codons at specific sites in double-stranded DNA. A target molecule is first constructed that contains a blunt, double-strand break at the site targeted for mutagenesis. A double-stranded mutagenic codon cassette is then inserted at the target site. Each mutagenic codon cassette contains a three base pair direct terminal repeat and two head-to-head recognition sequences for the restriction endonuclease Sapl, an enzyme that cleaves outside of its recognition sequence. The intermediate molecule containing the mutagenic cassette is then digested with Sapl, thereby removing most of the mutagenic cassette, leaving only a three base cohesive overhang that is ligated to generate the final insertion or substitution mutation. A general method for constructing blunt-end target molecules suitable for this approach is also described. Because the mutagenic cassette is excised during this procedure and alters the target only by introducing the desired mutation, the same cassette can be used to introduce a particular codon at all target sites. Each cassette can deposit two different codons, depending on the orientation in which it is inserted into the target molecule. Therefore, a series of eleven cassettes is sufficient to insert all possible amino acids at any constructed target site. Thus codon cassettes are 'off-the-shelf' reagents, and this methodology should be a particularly useful and inexpensive approach for subjecting multiple different positions in a protein sequence to saturation mutagenesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号