首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Nucleotide sequence analysis revealed that a DNA length polymorphism 5' to the human antithrombin III gene is due to the presence of 32bp or 108bp nonhomologous nucleotide sequences (variable segments) 345bp upstream from the translation initiation codon. Sequences at the 3' borders of both variable segments can form intrastrand inverted repeat structures with sequences further downstream. An inverted repeat is also found immediately 5' to the site where the variable segments are located. Thus, cruciform structures may form flanking the variable segments of both alleles of this DNA length polymorphism. DNA secondary structure may be detected with single strand specific nucleases. S1 nuclease sensitive sites were mapped in recombinant plasmids containing the cloned alleles of the ATIII length polymorphism. The site most sensitive to S1 is located upstream from the variable segments in an AT-rich segment flanked by 6bp direct repeats. A region of lesser nuclease sensitivity was also observed in the AT-rich loops formed between the inverted repeats 5' to the variable segments.  相似文献   

2.
Pattern discovery in unaligned DNA sequences is a challenging problem in both computer science and molecular biology. Several different methods and techniques have been proposed so far, but in most of the cases signals in DNA sequences are very complicated and avoid detection. Exact exhaustive methods can solve the problem only for short signals with a limited number of mutations. In this work, we extend exhaustive enumeration also to longer patterns. More in detail, the basic version of algorithm presented in this paper, given as input a set of sequences and an error ratio epsilon < 1, finds all patterns that occur in at least q sequences of the set with at most epsilonm mutations, where m is the length of the pattern. The only restriction is imposed on the location of mutations along the signal. That is, a valid occurrence of a pattern can present at most [epsiloni] mismatches in the first i nucleotides, and so on. However, we show how the algorithm can be used also when no assumption can be made on the position of mutations. In this case, it is also possible to have an estimate of the probability of finding a signal according to the signal length, the error ratio, and the input parameters. Finally, we discuss some significance measures that can be used to sort the patterns output by the algorithm.  相似文献   

3.
The ability to select short DNA oligonucleotide sequences capable of binding solely to their intended target is of great importance in developing nucleic acid based detection technologies. Applications such as multiplex PCR rely on primers binding to unique regions in a genome. Competing side reactions with other primer pairs or template DNA decrease PCR efficiency: Freely available primer design software such as Primer3 screens for potential hairpin and primer-dimer interactions while selecting a single primer pair. The development of multiplex PCR assays (in the range of 5 to 20 loci) requires the screening of all primer pairs for potential cross-reactivity. However, a logistical problem results due to the number of total number of comparisons required. Comparing the primer set for a 10-plex assay (20 total primer sequences) results in 210 primer-primer combinations that must be screened. The ability to screen sets of candidate oligomers rapidly for potential cross-reactivity reduces overall assay devlelopment time. Here we report the application of a familiar sliding algorithm for comparing two strands of DNA in an overlapping fashion. The algorithm has been employed in a software package wherein the user can compare multiple sequences in a single computational run. After the screening is completed, a score is assigned to potential duplex interactions exceeding a user-defined threshold. Additional criteria of predicted melting temperature (Tm) and free energy of melting (deltaG) are included for further ranking. Sodium counterion and total stand concentrations can be adjusted for the Tm and deltaG calculations. The predicted interactions are saved in a text file for further evaluation.  相似文献   

4.
Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.  相似文献   

5.
Kinetic and chemical analysis show that the haploid genome of Leishmania donovani has between 4.6 and 6.5 X 10(7) Kb pairs of DNA. Cot analysis shows that the genome contains 12% rapidly reassociating DNA, U3% middle repetitive DNA with an average reiteration frequency of 77 and 62% single copy DNA. Saturation hybridization experiments show that 0.82% of the nuclear DNA is occupied by rRNA coding sequences. The average repetition frequency of these sequences is determined to be 166. Sedimentation velocity studies indicate the two major rRNA species have sedimentation values of 26S and 16S, respectively. The arrangement of the rRNA genes and their spacer sequences on long strands of purified rDNA has been determined by the examination of the structure of rRNA:DNA hybrids prepared for electron microscopy by the gene 32-ethidium bromide technique. Long DNA strands are observed to contain several gene sets (16S + 26S). One repeat unit contains the following sequences in the order given: (a) A 16S gene of length 2.12 Kb, (b) An internal transcribed spacer (Spl) of length 1.23 Kb, which contains a short sequence that may code for a 5.8S rRNA, (C) 26S gene with a length of 4.31 Kb which contains an internal gap region of length 0.581 Ib, (d) An external spacer of average length 5.85 Kb.  相似文献   

6.
7.
Complementary strand-specific adenovirus DNA, either full length or from restriction enzyme cleavage fragments, was used to estimate the fractional representation and abundance of viral sequences in two adenovirus type 2 (Ad2)-transformed rat cell lines, A2F19 and A2T2C4. The reassociation method introduced is based on the linear relationship, after exhaustive hybridization, between the inverted fraction of hybrid DNA and the molar ratio of probe to cellular DNA in the reaction mixture. The amount of viral DNA in A2F19 cells represents 12 to 14% of the viral genome at a level of around seven copies per diploid cell equivalent. For the cell line A2T2C4, the pattern of integrated viral DNA sequences is more complex. With full-length Ad2 DNA strands as a probe, about 56% of the probe was represented in cellular DNA. When each of the four BamHI fragment strands of Ad2 DNA was used as a probe, the fraction of the viral DNA present also amounted to around 56% with one to five copies from different regions of the viral genome. The results demonstrate the advantage of using strand-specific viral DNA as a probe in reassociation analysis with denatured cell DNA. The method should be useful in any system in which complementary strand separation of viral DNA sequences can be achieved.  相似文献   

8.
An algorithm is presented for the generation of sets of non-interacting DNA sequences, employing existing thermodynamic models for the prediction of duplex stabilities and secondary structures. A DNA ‘word’ structure is employed in which individual DNA ‘words’ of a given length (e.g. 12mer and 16mer) may be concatenated into longer sequences (e.g. four tandem words and six tandem words). This approach, where multiple word variants are used at each tandem word position, allows very large sets of non-interacting DNA strands to be assembled from combinations of the individual words. Word sets were generated and their figures of merit are compared to sets as described previously in the literature (e.g. 4, 8, 12, 15 and 16mer). The predicted hybridization behavior was experimentally verified on selected members of the sets using standard UV hyperchromism measurements of duplex melting temperatures (Tms). Additional experimental validation was obtained by using the sequences in formulating and solving a small example of a DNA computing problem.  相似文献   

9.
Velocity sedimentation studies of RNA of Sarcophaga bullata show that the major rRNA species have sedimentation values of 26S and 18S. Analysis of the rRNA under denaturing conditions indicates that there is a hidden break centrally located in the 26S rRNA species. Saturation hybridization studies using total genomic DNA and rRNA show that 0.08% of the nuclear DNA is occupied by rRNA coding sequences and that the average repetition frequency of these coding sequences is approximately 144. The arrangement of the rRNA genes and their spacer sequences on long strands of purified rDNA was determined by the examination of the structure of rRNa:DNA hybrids in the electron microscope. Long DNA strands contain several gene sets (18S + 26S) with one repeat unit containing the following sequences in order given: (a) An 18S gene of length 2.12 kb, (b) an internal transcribed spacer of length 2.01 kb, which contains a short sequence that may code for a 5.8S rRNA, (c) A 26S gene of length 4.06 kb which, in 20% of the cases, contains an intron with an average length of 5.62 kb, and (d) an external spacer of average length of 9.23 kb.  相似文献   

10.
Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired) hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE) to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.  相似文献   

11.
DNA and RNA strands are employed in novel ways in the construction of nanostructures, as molecular tags in libraries of polymers and in therapeutics. New software tools for prediction and design of molecular structure will be needed in these applications. The RNAsoft suite of programs provides tools for predicting the secondary structure of a pair of DNA or RNA molecules, testing that combinatorial tag sets of DNA and RNA molecules have no unwanted secondary structure and designing RNA strands that fold to a given input secondary structure. The tools are based on standard thermodynamic models of RNA secondary structure formation. RNAsoft can be found online at http://www.RNAsoft.ca.  相似文献   

12.
MOTIVATION: In a wide range of experimental techniques in biology, there is a need for an efficient method to calculate the melting temperature of pairings of two single DNA strands. Avoiding cross-hybridization when choosing primers for the polymerase chain reaction or selecting probes for large-scale DNA assays are examples where the exact determination of melting temperatures is important. Beyond being exact, the method has to be efficient, as these techniques often require the simultaneous calculation of melting temperatures of up to millions of possible pairings. The problem is to simultaneously determine the most stable alignment of two sequences, including potential loops and bulges, and calculate the corresponding melting temperature. RESULTS: As the melting temperature can be expressed as a fraction in terms of enthalpy and entropy differences of the corresponding annealing reaction, we propose to use a fractional programming algorithm, the Dinkelbach algorithm, to solve the problem. To calculate the required differences of enthalpy and entropy, the Nearest Neighbor model is applied. Using this model, the substeps of the Dinkelbach algorithm in our problem setting turn out to be calculations of alignments which optimize an additive score function. Thus, the usual dynamic programming techniques can be applied. The result is an efficient algorithm to determine melting temperatures of two DNA strands, suitable for large-scale applications such as primer or probe design. AVAILABILITY: The software is available for academic purposes from the authors. A web interface is provided at http://www.zaik.uni-koeln.de/bioinformatik/fptm.html  相似文献   

13.
Garel T  Orland H 《Biopolymers》2004,75(6):453-467
The Poland-Scheraga (PS) model for the helix-coil transition of DNA considers the statistical mechanics of the binding (or hybridization) of two complementary strands of DNA of equal length, with the restriction that only bases with the same index along the strands are allowed to bind. In this article, we extend this model by relaxing these constraints: We propose a generalization of the PS model that allows for the binding of two strands of unequal lengths N1 and N2 with unrelated sequences. We study in particular (i) the effect of mismatches on the hybridization of complementary strands, (ii) the hybridization of noncomplementary strands (as resulting from point mutations) of unequal lengths N1 and N2. The use of a Fixman-Freire scheme scales down the computational complexity of our algorithm from O(N1(2)N2(2) to O(N1N2). The simulation of complementary strands of a few kilo base pairs yields results almost identical to the PS model. For short strands of equal or unequal lengths, the binding displays a strong sensitivity to mutations. This model may be relevant to the experimental protocol in DNA microarrays, and more generally to the molecular recognition of DNA fragments. It also provides a physical implementation of sequence alignments.  相似文献   

14.
Although synthesizing and utilizing individual peptides and DNA primers has become relatively inexpensive, massively parallel probing and next-generation sequencing approaches have dramatically increased the number of molecules that can be subjected to screening; this, in turn, requires vast numbers of peptides and therefore results in significant expenses. To alleviate this issue, pools of related molecules are often used to downselect prior to testing individual sequences. A computational selection process to create pools of related sequences at large scale has not been reported for peptides. In the case of PCR primers, there have been successful attempts to address this problem by designing degenerate primers that can be produced at the same cost as conventional, unique primers and then be used to amplify several different genomic regions. We present an algorithm, "FlexGrePPS" (Flexible Greedy Peptide Pool Search), that can create a near-optimal set of peptide pools. This approach is also applicable to nucleotide sequences and outperforms most DNA primer selection programs. For the proteomic compression with FlexGrePPS, the main body of our work presented here, we demonstrate the feasibility of the computation of an exhaustive cover of pathogenic proteomes with degenerate peptides that lend themselves to antigenic screening. Furthermore, we present preliminary data that demonstrate the experimental utility of highly degenerate peptides for antigenic screening. FlexGrePPS provides a near-optimal solution for proteomic compression and there are no programs available for comparison. We also demonstrate computational performance of our GreedyPrime implementation, which is a modified version of FlexGrePPS applicable to the design of degenerate primers and is comparable to existing programs for the design of degenerate primers. Specifically, we focus on the comparisons with PAMPS and DPS-DIP, software tools that have recently been shown to be superior to other methods. FlexGrePPS forms the foundation of a novel antigenic screening methodology that is based on the representation of an entire proteome by near-optimal degenerate peptide pools. Our preliminary wet lab data indicate that the approach will likely prove successful in comprehensive wet lab studies, and hence will dramatically reduce the expenses for antigenic screening and make whole proteome screening feasible. Although FlexGrePPS was designed for computational performance in order to handle vast data sets, there is the very surprising finding that even for small data sets the primer design version of FlexGrePPS, GreedyPrime, offers similar or even superior results for MP-DPD and most MDPD instances when compared to existing methods; despite the much longer run times, other approaches did not fare significantly better in reducing the original data sets to degenerate primers. The FlexGrePPS and GreedyPrime programs are available at no charge under the GNU LGPL license at http://sourceforge.net/projects/flexgrepps/.  相似文献   

15.
The canonical double-helix form of DNA is thought to predominate both in dilute solution and in living cells. Sequence-dependent fluctuations in local DNA shape occur within the double helix. Besides these relatively modest variations in shape, more extreme and remarkable structures have been detected in which some bases become unpaired. Examples include unusual three-stranded structures such as H-DNA. Certain RNA and DNA strands can also fold onto themselves to form intrastrand triplexes. Although they have been extensively studied in vitro, it remains unknown whether nucleic acid triplexes play natural roles in cells.If natural nucleic acid triplexes were identified in cells, much could be learned by examining the formation, stabilization, and function of such structures. With these goals in mind, we adapted a pattern-recognition program to search genetic databases for a type of potential triplex structure whose presence in genomes has not been previously investigated. We term these sequences Potential Intrastrand Triplex (PIT) elements. The formation of an intrastrand triplex requires three consecutive sequence domains with appropriate symmetry along a single nucleic acid strand. It is remarkable that we discovered multiple copies of sequence elements with the potential to form one particular class of intrastrand triplexes in the fully sequenced genomes of several bacteria. We then focused on the characterization of the 25 copies of a particular approximately 37 nt PIT sequence detected in Escherichia coli. Through biochemical studies, we demonstrate that an isolated DNA strand from this family of E. coli PIT elements forms a stable intrastrand triplex at physiological temperature and pH in the presence of physiological concentrations of Mg(2+).  相似文献   

16.
spads 1.0 (for ‘Spatial and Population Analysis of DNA Sequences’) is a population genetic toolbox for characterizing genetic variability within and among populations from DNA sequences. In view of the drastic increase in genetic information available through sequencing methods, spads was specifically designed to deal with multilocus data sets of DNA sequences. It computes several summary statistics from populations or groups of populations, performs input file conversions for other population genetic programs and implements locus‐by‐locus and multilocus versions of two clustering algorithms to study the genetic structure of populations. The toolbox also includes two Matlab and r functions, Gdispal and Gdivpal , to display differentiation and diversity patterns across landscapes. These functions aim to generate interpolating surfaces based on multilocus distance and diversity indices. In the case of multiple loci, such surfaces can represent a useful alternative to multiple pie charts maps traditionally used in phylogeography to represent the spatial distribution of genetic diversity. These coloured surfaces can also be used to compare different data sets or different diversity and/or distance measures estimated on the same data set.  相似文献   

17.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a ‘basic’one and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed.  相似文献   

18.
Phylogenetic test of the molecular clock and linearized trees   总被引:30,自引:7,他引:23  
To estimate approximate divergence times of species or species groups with molecular data, we have developed a method of constructing a linearized tree under the assumption of a molecular clock. We present two tests of the molecular clock for a given topology: two-cluster test and branch-length test. The two-cluster test examines the hypothesis of the molecular clock for the two lineages created by an interior node of the tree, whereas the branch-length test examines the deviation of the branch length between the tree root and a tip from the average length. Sequences evolving excessively fast or slow at a high significance level may be eliminated. A linearized tree will then be constructed for a given topology for the remaining sequences under the assumption of rate constancy. We have used these methods to analyze hominoid mitochondrial DNA and drosophilid Adh gene sequences.   相似文献   

19.
Nuclear DNA from the slime mould Physarum polycephalum is digested by the restriction endonuclease HpaII to generate a high molecular weight and a low molecular weight component. These are referred to as the M+ and the M- compartment, respectively. Sequences that are present in the M+ compartment are cleaved by MspI, the restriction enzyme isoschizomer of HpaII, thus showing that the recognition sequences for these enzymes in M+ DNA contain methylated CpG doublets. The distribution of repetitive sequences in the M+ and M- DNA compartments was investigated by comparison of the 'fingerprint' patterns of total Physarum DNA and isolated M+ DNA after digestion using different restriction endonucleases, and by probing for the presence of specific repetitive sequences in Southern blots of M+ and M- DNA by the use of cloned DNA segments. Both types of experiment indicate that many repetitive sequences are shared by both compartments, though some repetitive sequences appear to be considerably enriched, or are present exclusively, either in M+ DNA or in M- DNA.  相似文献   

20.
Intrastrand self-complementary sequences have been isolated from the DNA of Bacillus subtilis by hydroxyapatite (HA) chromatography following thermal renaturation of strands separated by chromatography on methylated albumin kieselguhr (MAK). The instrastrand structures derived from the MAK H strand (HA HII) were biologically active showing transforming activity for a wide variety of markers, as well as hybridization to both pulse-labelled and ribosomal RNA. Removal of regions of single-strand DNA with S1 nuclease did not significantly alter the biological activity of the self-annealed molecules. The overall efficiency of transformation and hybridization of the intrastrand self-annealing DNA was low suggesting that many sequences in the population are neither active in transformation to prototrophy nor transcribed into RNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号