首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Dynamic programming is the core algorithm of sequence comparison, alignment and linear hidden Markov model (HMM) training. For a pair of sequence lengths m and n, the problem can be solved readily in O(mn)time and O(mn)space. The checkpoint algorithm introduced by Grice et al. (CABIOS, 13, 45--53, 1997) runs in O(Lmn)time and O(Lm(L) square root of n)space, where L is a positive integer determined by m, n, and the amount of available workspace. The algorithm is appropriate for many string comparison problems, including all-paths and single-best-path hidden Markov model training, and is readily parallelizable. The checkpoint algorithm has a diagonal version that can solve the single-best-path alignment problem in O(mn)time and O(m + n)space. RESULTS: In this work, we improve performance by analyzing optimal checkpoint placement. The improved row checkpoint algorithm performs up to one half the computation of the original algorithm. The improved diagonal checkpoint algorithm performs up to 35% fewer computational steps than the original. We modified the SAM hidden Markov modeling package to use the improved row checkpoint algorithm. For a fixed sequence length, the new version is up to 33% faster for all-paths and 56% faster for single-best-path HMM training, depending on sequence length and allocated memory. Over a typical set of protein sequence lengths, the improvement is approximately 10%.  相似文献   

2.
3.
Summary The stochastic model of molecular evolution was used to makea priori predictions for the total number of one-step nucleotide changes required to account for a given observed number of nucleotide substitutions between two homologous nucleic acids. The experimental deviations from randomness found for eukaryotic transfer RNAs and summarized by Dayhoff and McLaughlin (1972) are shown to affect only slightly the quantitative predictions of the model. This is true for both short and long periods of evolutionary divergence. The model can thus be used with some confidence for quantitatively correcting the branch lengths of phylogenetic trees derived from either nucleic acid sequence or hybridization data.  相似文献   

4.
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 super-family proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).  相似文献   

5.
6.
Beta-lactamase expression in Streptomyces cacaoi.   总被引:2,自引:1,他引:1       下载免费PDF全文
  相似文献   

7.
PCR-ribotyping, a typing method based on size variation in 16S-23S rRNA intergenic spacer region (ISR), has been used widely for molecular epidemiological investigations of C. difficile infections. In the present study, we describe the sequence diversity of ISRs from 43 C. difficile strains, representing different PCR-ribotypes and suggest homologous recombination as a possible mechanism driving the evolution of 16S-23S rRNA ISRs. ISRs of 45 different lengths (ranging from 185 bp to 564 bp) were found among 458 ISRs. All ISRs could be described with one of the 22 different structural groups defined by the presence or absence of different sequence modules; tRNAAla genes and different combinations of spacers of different lengths (33 bp, 53 bp or 20 bp) and 9 bp direct repeats separating the spacers. The ISR structural group, in most cases, coincided with the sequence length. ISRs that were of the same lengths had also very similar nucleotide sequence, suggesting that ISRs were not suitable for discriminating between different strains based only on the ISR sequence. Despite large variations in the length, the alignment of ISR sequences, based on the primary sequence and secondary structure information, revealed many conserved regions which were mainly involved in maturation of pre-rRNA. Phylogenetic analysis of the ISR alignment yielded strong evidence for intra- and inter-homologous recombination which could be one of the mechanisms driving the evolution of C. difficile 16S-23S ISRs. The modular structure of the ISR, the high sequence similarities of ISRs of the same sizes and the presence of homologous recombination also suggest that different copies of C. difficile 16S-23S rRNA ISR are evolving in concert.  相似文献   

8.
The complete nucleotide sequence of lysU, the gene for the heat-inducible lysyl-tRNA synthetase of Escherichia coli, was determined and compared with the published sequence of lysS (herC), the gene for the constitutive lysyl-tRNA synthetase. These unlinked genes were found to be identical over 72% of their lengths. The deduced amino acid sequences of the respective gene products, LysU and LysS, were identical over 85% and similar over 92% of their lengths. Accumulation of high levels of LysU during growth of strains carrying the wild-type allele of lysU on multicopy plasmids had no observable effect on growth or on the synthesis of LysS. A lysU deletion strain was constructed and was shown to grow normally at low temperature (28 degrees C) but poorly at 44 degrees C; the slow growth (45% of normal) at elevated temperature was fully reversed by plasmids bearing wild-type lysU. The implications of these findings for the existence of two aminoacyl-tRNA synthetases for lysine are discussed.  相似文献   

9.
This paper presents a dynamic programming algorithm for aligning two sequeces when the alignment is constrained to lie between two arbitrary boundary lines in the dynamic programming matrix. For affine gap penalties, the algorithm requires onlyO(F) computation time andO(M+N) space, whereF is the area of the feasible region andM andN are the sequence lengths. The result extends to concave gap penalties, with somewhat increased time and space bounds. K.-M. C. and W. M. were supported in part by grant R01 LM05110 from the National Library of Medicine. R. C. H. was supported by PHS grant R01 DK27635.  相似文献   

10.
The ribonuclease Dicer excises mature miRNAs from a diverse group of precursors (pre-miRNAs), most of which contain various secondary structure motifs in their hairpin stem. In this study, we analyzed Dicer cleavage in hairpin substrates deprived of such motifs. We searched for the factors other than the secondary structure, which may influence the length diversity and heterogeneity of miRNAs. We found that the nucleotide sequence at the Dicer cleavage site influences both of these miRNA characteristics. With regard to cleavage mechanism, we demonstrate that the Dicer RNase IIIA domain that cleaves within the 3′ arm of the pre-miRNA is more sensitive to the nucleotide sequence of its substrate than is the RNase IIIB domain. The RNase IIIA domain avoids releasing miRNAs with G nucleotide and prefers to generate miRNAs with a U nucleotide at the 5′ end. We also propose that the sequence restrictions at the Dicer cleavage site might be the factor that contributes to the generation of miRNA duplexes with 3′ overhangs of atypical lengths. This finding implies that the two RNase III domains forming the single processing center of Dicer may exhibit some degree of flexibility, which allows for the formation of these non-standard 3′ overhangs.  相似文献   

11.
The process of evolution is considered as the change of nucleotide sequence in an (N+1)-dimensional or 3-dimensional space. Restricting conditions may be represented by the shape of a tunnel, through which points or a rope representing nucleic acids move along with time, and may be similar for ontogenetic and phylogenetic development.  相似文献   

12.
H Noda 《Origins of life》1984,14(1-4):681-684
The process of evolution is considered as the change of nucleotide sequence in an (N+1)-dimensional or 3-dimensional space. Restricting conditions may be represented by the shape of a tunnel, through which points or a rope representing nucleic acids move along with time, and may be similar for ontogenetic and phylogenetic development.  相似文献   

13.
Bisnaphthalimide intercalators are anti-tumour agents composed of two planar rings linked by a flexible diazanonylene chain. The intercalated rings of three bisnaphthalimide analogues complexed to DNA are found here to undergo 180° rotating motions that do not affect the diazanonylene linker atoms bound to the major groove. These ring rotations are detected by NMR spectroscopy in a broad range of sequence contexts and duplex lengths. A comparative analysis of the frequency and activation energies of such excited states in different complexes and conditions indicates that these motions (i) are unrelated to drug dissociation; (ii) are a consequence of concerted, sequence-dependent nucleotide movements taking place on the millisecond time scale; and (iii) may occur inside the DNA duplexes. The rotation frequencies range from 2 to 25 s−1 at 25°C, depending on DNA composition and the size of the rotating rings. The detected nucleotide dynamics are likely to play an important role in the binding kinetics of the numerous proteins and drugs that require base unstacking when interacting with DNA.  相似文献   

14.
RNA interference (RNAi) has been exploited as a reverse genetic tool for functional genomics in the nonmodel species strawberry (Fragaria × ananassa) since 2006. Here, we analysed for the first time different but overlapping nucleotide sections (>200 nt) of two endogenous genes, FaCHS (chalcone synthase) and FaOMT (O‐methyltransferase), as inducer sequences and a transitive vector system to compare their gene silencing efficiencies. In total, ten vectors were assembled each containing the nucleotide sequence of one fragment in sense and corresponding antisense orientation separated by an intron (inverted hairpin construct, ihp). All sequence fragments along the full lengths of both target genes resulted in a significant down‐regulation of the respective gene expression and related metabolite levels. Quantitative PCR data and successful application of a transitive vector system coinciding with a phenotypic change suggested propagation of the silencing signal. The spreading of the signal in strawberry fruit in the 3′ direction was shown for the first time by the detection of secondary small interfering RNAs (siRNAs) outside of the primary targets by deep sequencing. Down‐regulation of endogenes by the transitive method was less effective than silencing by ihp constructs probably because the numbers of primary siRNAs exceeded the quantity of secondary siRNAs by three orders of magnitude. Besides, we observed consistent hotspots of primary and secondary siRNA formation along the target sequence which fall within a distance of less than 200 nt. Thus, ihp vectors seem to be superior over the transitive vector system for functional genomics in strawberry fruit.  相似文献   

15.
The nucleotide sequence of a rat myosin light chain 2 gene   总被引:24,自引:4,他引:20       下载免费PDF全文
A rat myosin light chain 2 gene was characterized by nucleotide sequence and S1 mapping analyses. It contains seven exons separated by six introns. The corresponding mRNA is predicted to be 654 nucleotides long (excluding polyA sequences), with 5'-nontranslated, coding, and 3'-nontranslated lengths of 56, 510, and 88 nucleotides, respectively. The predicted amino acid sequence is identical to that from rabbit except that the rat sequence lacks one of two Gly residues located at positions 12 and 13 in the rabbit sequence. From the nucleotide sequence, nascent rat myosin light chain 2 is predicted to have Met Ala preceding Pro at the N-terminal end.  相似文献   

16.
A special-purpose processor for gene sequence analysis   总被引:1,自引:1,他引:0  
Advances in computational biology have occurred primarily inthe areas of software and algorithm development; new designsof hardware to support biological computing are extremely scarce.This is due, we believe, to the presence of a non-trivial knowledgegap between molecular biologists and computer designers. Theexistence of this gap is unfortunate, as it has long been knownthat for certain problems, special-purpose computers can achievesignificant cost/performance gains over general-purpose machines.We describe one such computer here: a custom accelerator forgene sequence analysis. The accelerator implements a versionof the Needleman – Wunsch algorithm for nucleotide sequencealignment. Sequence lengths are constrained only by availablememory; the product of sequence lengths in the current implementationcan be up to 222. The machine is implemented as two NuBus boardsconnected to a Mac IIf/x, using a mixture of TTL and FPGA technologyclocked at 10 MHz. The boards are completely functional, andyield a 15-fold performance improvement over an unassisted host.  相似文献   

17.
MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME.  相似文献   

18.
In this work we report a simple way to assign a single numeric value in a three-dimensional space to a given nucleotide sequence. The method reported allows for theoretical comparisons of naturally occurring nucleotide sequences.  相似文献   

19.
DNA sequence organization in the genomes of five marine invertebrates   总被引:10,自引:1,他引:9  
The arrangement of repetitive and non-repetitive sequence was studied in the genomic DNA of the oyster (Crassostrea virginica), the surf clam (Spisula solidissima), the horseshoe crab (Limulus polyphemus), a nemertean worm (Cerebratulus lacteus) and a jelly-fish (Aurelia aurita). Except for the jellyfish these animals belong to the protostomial branch of animal evolution, for which little information regarding DNA sequence organization has previously been available. The reassociation kinetics of short (250-300 nucleotide) and long (2,000-3,000 nucleotide) DNA fragments was studied by the hydroxyapatite method. It was shown that in each case a major fraction of the DNA consists of single copy sequences less than about 3,000 nucleotides in length, interspersed with short repetitive sequences. The lengths of the repetitive sequences were estimated by optical hyperchromicity and S1 nuclease measurements made on renaturation products. All the genomes studied include a prominent fraction of interspersed repetitive sequences about 300 nucleotides in length, as well as longer repetitive sequence regions.  相似文献   

20.
The mouse VHIII subgroup is composed of four families which share sequence homology. We isolated a VH germ-line genomic clone, which cross hybridizes with a cDNA probe from one of these families, derived from a myeloma secreting an antigalactan antibody. We report here the nucleotide sequence of the cross hybridizing gene and show that very likely it has an anti-sheep red blood cell specificity. Comparison of its nucleotide sequence with those of the three other VHIII families shows that these genes share segmental homologies of variable lengths. This suggests that interchanges of sequence blocks between VH genes could be an important evolutionary mechanism for diversifying the germ-line repertoire. The strong homology (82%) with human VHIII genes suggests that efficient antibody sequences are strongly conserved. This conservation of homology is particularly striking when compared to the more limited homology (63%) between mouse and human C kappa genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号