首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
RNA sequence analysis using covariance models.   总被引:43,自引:8,他引:35       下载免费PDF全文
We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.  相似文献   

2.
Schultz GE  Drake JW 《Genetics》2008,178(2):661-673
Some mutations arise in association with a potential sequence donor that consists of an imperfect direct or reverse repeat. Many such mutations are complex; that is, they consist of multiple close sequence changes. Current models posit that the primer terminus of a replicating DNA molecule dissociates, reanneals with an ectopic template, extends briefly, and then returns to the cognate template, bringing with it a locally different sequence; alternatively, a hairpin structure may form the mutational intermediate when processed by mismatch repair. This process resembles replication repair, in which primer extension is blocked by a lesion in the template; in this case, the ectopic template is the other daughter strand, and the result is error-free bypass of the lesion. We previously showed that mutations that impair replication repair can enhance templated mutagenesis. We show here that the intensity of templated mutation can be exquisitely sensitive to its local sequence, that the donor and recipient arms of an imperfect inverse repeat can exchange roles, and that double mutants carrying two alleles, each affecting both templated mutagenesis and replication repair, can have unexpected phenotypes. We also record an instance in which the mutation rates at two particular sites change concordantly with a distant sequence change, but in a manner that appears unrelated to templated mutagenesis.  相似文献   

3.
Each V, D, and J gene segment is flanked by a recombination signal sequence (RSS), composed of a conserved heptamer and nonamer separated by a 12- or 23-bp spacer. Variations from consensus in the heptamer or nonamer at specific positions can dramatically affect recombination frequency, but until recently, it had been generally held that only the length of the spacer, but not its sequence, affects the efficacy of V(D)J recombination. In this study, we show several examples in which the spacer sequence can significantly affect recombination frequencies. We show that the difference in spacer sequence alone of two V(H)S107 genes affects recombination frequency in recombination substrates to a similar extent as the bias observed in vivo. We show that individual positions in the spacer can affect recombination frequency, and those positions can often be predicted by their frequency in a database of RSS. Importantly, we further show that a spacer sequence that has an infrequently observed nucleotide at each position is essentially unable to support recombination in an extrachromosmal substrate assay, despite being flanked by a consensus heptamer and nonamer. This infrequent spacer sequence RSS shows only a 2-fold reduction of binding of RAG proteins, but the in vitro cleavage of this RSS is approximately 9-fold reduced compared with a good RSS. These data demonstrate that the spacer sequence should be considered to play an important role in the recombination efficacy of an RSS, and that the effect of the spacer occurs primarily subsequent to RAG binding.  相似文献   

4.
Strand displacement amplification (9SDA) is an isothermal in vitro method of amplifying a DNA sequence prior to its detection. We have combined SDA with fluorescence polarization detection. A 5'-fluorescein-labelled oligodeoxynucleotide detector probe hybridizes to the amplification product that rises in concentration during SDA and the single- to double strand conversion is monitored through an increase in fluorescence polarization. Detection sensitivity can be enhanced by using a detector probe containing an EcoRI recognition sequence at its 5'-end that is not homologous to the target sequence. During SDA the probe is converted to a fully double-stranded form that specifically binds a genetically modified form of the endonuclease EcoRI which lacks cleavage activity but retains binding specificity. We have applied this SDA detection system to a target sequence specific for Mycobacterium tuberculosis.  相似文献   

5.
Stabilization of translationally active mRNA by prokaryotic REP sequences   总被引:79,自引:0,他引:79  
  相似文献   

6.
Distance based algorithms are a common technique in the construction of phylogenetic trees from taxonomic sequence data. The first step in the implementation of these algorithms is the calculation of a pairwise distance matrix to give a measure of the evolutionary change between any pair of the extant taxa. A standard technique is to use the log det formula to construct pairwise distances from aligned sequence data. We review a distance measure valid for the most general models, and show how the log det formula can be used as an estimator thereof. We then show that the foundation upon which the log det formula is constructed can be generalized to produce a previously unknown estimator which improves the consistency of the distance matrices constructed from the log det formula. This distance estimator provides a consistent technique for constructing quartets from phylogenetic sequence data under the assumption of the most general Markov model of sequence evolution.  相似文献   

7.
Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST  相似文献   

8.
We have developed a novel class of antisense agents, RNA Lassos, which are capable of binding to and circularizing around complementary target RNAs. The RNA Lasso consists of a fixed sequence derived from the hairpin ribozyme and an antisense segment whose size and sequence can be varied to base pair with accessible sites in the target RNA. The ribozyme catalyzes self-processing of the 5'- and 3'-ends of a transcribed Lasso precursor and ligates the processed ends to produce a circular RNA. The circular and linear forms of the self-processed Lasso coexist in an equilibrium that is dependent on both the Lasso sequence and the solution conditions. Lassos form strong, noncovalent complexes with linear target RNAs and form true topological linkages with circular targets. Lasso complexes with linear RNA targets were detected by denaturing gel electrophoresis and were found to be more stable than ordinary RNA duplexes. We show that expression of a fusion mRNA consisting of a sequence from the murine tumor necrosis factor-alpha (TNF-alpha) gene linked to luciferase reporter can be specifically and efficiently blocked by an anti-TNF Lasso. We also show in cell culture experiments that Lassos directed against Fas pre-mRNA were able to induce a change in alternative splicing patterns.  相似文献   

9.
We describe a peptide sequencing procedure which can be used to verify an amino acid sequence which is derived from a nucleotide sequence. One first labels the protein with a 3H- and a 14C-labelled amino acid and then cleaves the protein into a set of peptides using a cleavage reaction specific for a particular amino acid residue. Finally one performs Edman degradations on the whole mixture of peptides. The released amino acids reflect the combined aminoterminal amino acid sequences of all the peptides that have been formed by the cleavage reaction. The data can therefore be used to check a deduced sequence simultaneously at several regions of the polypeptide chain. We have applied this sequencing procedure to verify the amino acid sequence deduced from the 26S RNA of Semliki Forest virus.  相似文献   

10.
Life is based on replication and evolution. But replication cannot be taken for granted. We must ask what there was prior to replication and evolution. How does evolution begin? We have proposed prelife as a generative system that produces information and diversity in the absence of replication. We model prelife as a binary soup of active monomers that form random polymers. ‘Prevolutionary’ dynamics can have mutation and selection prior to replication. Some sequences might have catalytic activity, thereby enhancing the rates of certain prelife reactions. We study the selection criteria for these prelife catalysts. Their catalytic efficiency must be above certain critical values. We find a maintenance threshold and an initiation threshold. The former is a linear function of sequence length, and the latter is an exponential function of sequence length. Therefore, it is extremely hard to select for prelife catalysts that have long sequences. We compare prelife catalysis with a simple model for replication. Assuming fast template-based elongation reactions, we can show that replicators have selection thresholds that are independent of their sequence length. Our calculation demonstrates the efficiency of replication and provides an explanation of why replication was selected over other forms of prelife catalysis.  相似文献   

11.
Profile-based sequence search procedures are commonly employed to detect remote relationships between proteins. We provide an assessment of a Cascade PSI-BLAST protocol that rigorously employs intermediate sequences in detecting remote relationships between proteins. In this approach we detect using PSI-BLAST, which involves multiple rounds of iteration, an initial set of homologues for a protein in a 'first generation' search by querying a database. We propagate a 'second generation' search in the database, involving multiple runs of PSI-BLAST using each of the homologues identified in the previous generation as queries to recognize homologues not detected earlier. This non-directed search process can be viewed as an iteration of iterations that is continued to detect further homologues until no new hits are detectable. We present an assessment of the coverage of this 'cascaded' intermediate sequence search on diverse folds and find that searches for up to three generations detect most known homologues of a query. Our assessments show that this approach appears to perform better than the traditional use of PSI-BLAST by detecting 15% more relationships within a family and 35% more relationships within a superfamily. We show that such searches can be performed on generalized sequence databases and non-trivial relationships between proteins can be detected effectively. Such a propagation of searches maximizes the chances of detecting distant homologies by effectively scanning protein "fold space".  相似文献   

12.
A cyclic shop is a production system that repeatedly produces identical sets of parts of multiple types, called minimal part sets (MPSs), in the same loading and processing sequence. A different part type may have a different machine visit sequence. We consider a version of cyclic job shop where some operations of an MPS instance are processed prior to some operations of the previous MPS instances. We call such a shop an overtaking cyclic job shop (OCJS). The overtaking degree can be specified by how many MPS instances the operations of an MPS instance can overtake. More overtaking results in more work-in-progress, but reduces the cycle time, in general. We prove that for a given processing sequence of the operations at each machine, under some conditions, an OCJS has a stable earliest starting schedule such that each operation starts as soon as its preceding operations are completed, the schedule repeats an identical timing pattern for each MPS instance, and the cycle time is kept to be minimal. To do these, we propose a specialized approach to analyzing steady states for an event graph model of an OCJS that has a cyclic structure, which can keep the MPS-based scheduling concept. Based on the steady-state results, we develop a mixed integer programming model for finding a processing sequence of the operations at each machine and the overtaking degrees, if necessary, that minimize the cycle time.  相似文献   

13.
Sequencing by hybridization (SBH) is a method for sequencing DNA. The Watson-Crick complementarity of DNA can be used to determine whether the DNA contains an oligonucleotide substring. A large number of oligonucleotides can be arranged on an array (SBH chip). A combinatorial method is used to construct the sequence from the collection of probes that occur in it. We develop an idea of Margaritis and Skiena and propose an algorithm that uses a series of small SBH chips to sequence long strings. The total number of probes used by our method matches the information theoretical lower bound up to a constant factor.  相似文献   

14.
We study a population genetics model of an organism with a genome of L(tot)loci that determine the values of T quantitative traits. Each trait is controlled by a subset of L loci assigned randomly from the genome. There is an optimum value for each trait, and stabilizing selection acts on the phenotype as a whole to maintain actual trait values close to their optima. The model contains pleiotropic effects (loci can affect more than one trait) and epistasis in fitness. We use adaptive walk simulations to find high-fitness genotypes and to study the way these genotypes are distributed in sequence space. We then simulate the evolution of haploid and diploid populations on these fitness landscapes and show that the genotypes of populations are able to drift through sequence space despite stabilizing selection on the phenotype. We study the way the rate of drift and the extent of the accessible region of sequence space is affected by mutation rate, selection strength, population size, recombination rate, and the parameters L and T that control the landscape shape. There are three regimes of the model. If LTL(tot), there are many small peaks that can be spread over a wide region of sequence space. Compensatory neutral mutations are important in the population dynamics in this case.  相似文献   

15.
16.
R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).  相似文献   

17.
The design of oligonucleotides for gene silencing requires a rational method for identifying hybridization-accessible sequences within the target RNA. To this end, we have developed stem-loop self-quenching reporter molecules (SQRMs) as probes for such sequence. SQRMs have a 5' fluorophore, a quenching moiety on the 3' end, an intervening sequence that forms an approximately 5-basepaired stem, and a loop sequence of approximately 20-30 bases. We have previously described a mapping strategy employing SQRMs to locate stem-loop structures in the target mRNA molecule. We now show that the original design constraint of a basepaired stem is not needed, either in vitro or in vivo. We propose that stemless probes possess sufficient signal-to-noise for use in vivo and that this ratio is an indication of hybridization of the probe to its target. Data showing that these SQRMs can specifically target and reduce c-Myb protein synthesis and can be used for real-time in vivo assays are presented.  相似文献   

18.
The nonconjugative streptococcal plasmid pMV158 can be mobilized by the conjugative streptococcal plasmid pIP501. We determined the sequence of the 1.1-kilobase EcoRI fragment of pMV158 to complete the DNA sequence of the plasmid. We showed that an open reading frame, mob (able to encode a polypeptide of 58,020 daltons), is required for mobilization of pMV158. An intergenic region present in the EcoRI fragment contains four lengthy palindromes that are found also in one or more of the staphylococcal plasmids pT181, pE194, and pUB110. One palindromic sequence, palD, which is common to all four plasmids, also appeared to be necessary for mobilization. Circumstantial evidence indicates that this sequence contains both an oriT site and the mob promoter. The Mob protein is homologous in its amino-terminal half to Pre proteins encoded by pT181 and pE194 that were shown by others to be essential for site-specific cointegrative plasmid recombination; their main biological function may be plasmid mobilization.  相似文献   

19.
We report a new strategy for the synthesis of genes encoding repetitive, protein-based polymers of specified sequence, chain length, and architecture. In this stepwise approach, which we term "recursive directional ligation" (RDL), short gene segments are seamlessly combined in tandem using recombinant DNA techniques. The resulting larger genes can then be recursively combined until a gene of a desired length is obtained. This approach is modular and can be used to combine genes encoding different polypeptide sequences. We used this method to synthesize three different libraries of elastin-like polypeptides (ELPs); each library encodes a unique ELP sequence with systematically varied molecular weights. We also combined two of these sequences to produce a block copolymer. Because the thermal properties of ELPs depend on their sequence and chain length, the synthesis of these polypeptides provides an example of the importance of precise control over these parameters that is afforded by RDL.  相似文献   

20.
We propose a new method for homology search of nucleic acids or proteins in databanks. All the possible subsequences of a specific length in a sequence are converted into a code and stored in an indexed file (hash-coding). This preliminary work of codifying an entire bank is rather long but it enables an immediate access to all the sequence fragments of a given type. With our method a strict homology pattern of twenty nucleotides can be found for example in the Los Alamos bank (GENBANK) in less than 2 seconds. We can also use this data storage to considerably speed up the non-strict homology search programs and to write a program to help in the selection of nucleic acid hybridization probes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号