首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
 本文报道了两个用于PCR引物设计的计算机程序PCRDESN和PCRDESNA。PCRDESN程序主要从以下4个方面评价用户自己设计的一对引物的质量:(1)引物内的碱基反向重复或发夹结构,(2)两个引物之间的碱基互补配对,(3)两个引物之间的同源性,(4)引物的碱基组成及特点和T_m值计算。通过用多例文献发表的及本院有关实验室提供的引物对序列的验证,确定了程序的运算参数,证明该程序能较好地检验引物对的质量和解释某些PCR实验失败的原因。PCRDESNA程序采用逐级优化的方法和比PCRDESN所选用的更严紧的引物选择参数对用户提供的核酸序列进行快速检索,以确定所有可能的和合适的引物对。  相似文献   

2.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

3.
LINKER: a program to generate linker sequences for fusion proteins   总被引:9,自引:0,他引:9  
The construction of functional fusion proteins often requires a linker sequence that adopts an extended conformation to allow for maximal flexibility. Linker sequences are generally selected based on intuition. Without a reliable selection criterion, the design of such linkers is often difficult, particularly in situations where longer linker sequences are required. Here we describe a program called LINKER which can automatically generate a set of linker sequences that are known to adopt extended conformations as determined by X-ray crystallography and NMR. The only required input to the program is the desired linker sequence length. The program is specifically designed to assist in fusion protein construction. A number of optional input parameters have been incorporated so that users are able to enhance sequence selection based on specific applications. The program output simply contains a set of sequences with a specified length. This program should be a useful tool in both the biotechnology industry and biomedical research. It can be accessed through the Web page http://www.fccc. edu/research/labs/feng/linker.html.  相似文献   

4.
We describe a new computer program that identifies conserved secondary structures in aligned nucleotide sequences of related single-stranded RNAs. The program employs a series of hash tables to identify and sort common base paired helices that are located in identical positions in more than one sequence. The program gives information on the total number of base paired helices that are conserved between related sequences and provides detailed information about common helices that have a minimum of one or more compensating base changes. The program is useful in the analysis of large biological sequences. We have used it to examine the number and type of complementary segments (potential base paired helices) that can be found in common among related random sequences similar in base composition to 16S rRNA from Escherichia coli. Two types of random sequences were analyzed. One set consisted of sequences that were independent but they had the same mononucleotide composition as the 16S rRNA. The second set contained sequences that were 80% similar to one another. Different results were obtained in the analysis of these two types of random sequences. When 5 sequences that were 80% similar to one another were analyzed, significant numbers of potential helices with two or more independent base changes were observed. When 5 independent sequences were analyzed, no potential helices were found in common. The results of the analyses with random sequences were compared with the number and type of helices found in the phylogenetic model of the secondary structure of 16S ribosomal RNA. Many more helices are conserved among the ribosomal sequences than are found in common among similar random sequences. In addition, conserved helices in the 16S rRNAs are, on the average, longer than the complementary segments that are found in comparable random sequences. The significance of these results and their application in the analysis of long non-ribosomal nucleotide sequences is discussed.  相似文献   

5.
6.
gm: a practical tool for automating DNA sequence analysis   总被引:1,自引:0,他引:1  
The gm (gene modeler) program automates the identification ofcandidate genes in anonymous, genomic DNA sequence data, gmaccepts sequence data, organism-specific consensus matricesand codon asymmetry tables, and a set of parameters as input;it returns a set of models describing the structures of candidategenes in the sequence and a corresponding set of predicted aminoacid sequences as output, gm is implemented in C, and has beentested on Sun, VAX, Sequent, MIPS and Cray computers. It iscapable of analyzing sequences of several kilobases containingmulti-exon genes in >1 min execution time on a Sun 4/60. Received on December 4, 1989; accepted on February 28, 1990  相似文献   

7.
8.
We have designed a computer program which rapidly scans nucleic acid sequences to select all possible pairs of oligonucleotides suitable for use as primers to direct efficient DNA amplification by the polymerase chain reaction. This program is based on a set of rules which define in generic terms both the sequence composition of the primers and the amplified region of DNA. These rules (1) enhance primer-to-target sequence hybridization avidity at critical 3'-end extension initiation sites, (2) facilitate attainment of full length extension during the 72 degrees C phase, by minimizing generation of incomplete or nonspecific product and (3) limit primer losses occurring from primer-self or primer-primer homologies. Three examples of primer sets chosen by the program that correctly amplified the target regions starting from RNA are shown. This program should facilitate the rapid selection of effective and specific primers from long gene sequences while providing a flexible choice of various primers to focus study on particular regions of interest.  相似文献   

9.
SUMMARY: We have developed U-PRIMER, a primer design program, to compute a minimal primer set (MPS) for any given set of DNA sequences. The U-PRIMER algorithm, which uses automatic variable fixing and automatic redundant constraint elimination to tackle the binary integer programming problem associated with the MPS selection problem. The program has been tested successfully with 32 adipocyte development-related genes and 9 TB-specific genes to obtain their respective MPSs. AVAILABILITY: A free copy of U-PRIMER implemented in C++ programming language is available from http://www.u-vision-biotech.com  相似文献   

10.
Diagnostic re-sequencing plays a central role in medical and evolutionary genetics. In this report we describe a process that applies fluorescence-based re-sequencing and an integrated set of analysis tools to automate and simplify the identification of DNA variations using the human mitochondrial genome as a model system. Two programs used in genome sequence analysis (Phred, a base-caller, and Phrap, a sequence assembler) are applied to assess the quality of each base call across the sequence. Potential DNA variants are automatically identified and 'tagged' by comparing the assembled sequence with a reference sequence. We also show that employing the Consed program to display a set of highly annotated reference sequences greatly simplifies data analysis by providing a visual database containing information on the location of the PCR primers, coding and regulatory sequences and previously known DNA variants. Among the 12 genomes sequenced 378 variants including 29 new variants were identified along with two heteroplasmic sites, automatically detected by the PolyPhred program. Overall we document the ease and speed of performing high quality and accurate fluorescence-based re-sequencing on long tracts of DNA as well as the application of new approaches to automatically find and view DNA variants among these sequences.  相似文献   

11.
The success of comparative analysis in resolving RNA secondary structure and numerous tertiary interactions relies on the presence of base covariations. Although the majority of base covariations in aligned sequences is associated to Watson-Crick base pairs, many involve non-canonical or restricted base pair exchanges (e.g. only G:C/A:U), reflecting more specific structural constraints. We have developed a computer program that determines potential base pairing conformations for a given set of paired nucleotides in a sequence alignment. This program (ISOPAIR) assumes that the base pair conformation is maintained through sequence variation without significantly affecting the path of the sugar-phosphate backbone. ISOPAIR identifies such 'isomorphic' structures for any set of input base pair or base triple sequences. The program was applied to base pairs and triples with known structures and sequence exchanges. In several instances, isomorphic structures were correctly identified with ISOPAIR. Thus, ISOPAIR is useful when assessing non-canonical base pair conformations in comparative analysis. ISOPAIR applications are limited to those cases where unusual base pair exchanges indeed reflect a non-canonical conformation.  相似文献   

12.
Motivation: A large number of new DNA sequences with virtuallyunknown functions are generated as the Human Genome Projectprogresses. Therefore, it is essential to develop computer algorithmsthat can predict the functionality of DNA segments accordingto their primary sequences, including algorithms that can predictpromoters. Although several promoter-predicting algorithms areavailable, they have high false-positive detections and therate of promoter detection needs to be improved further. Results: In this research, PromFD, a computer program to recognizevertebrate RNA polymerase II promoters, has been developed.Both vertebrate promoters and non-promoter sequences are usedin the analysis. The promoters are obtained from the EukaryoticPromoter Database. Promoters are divided into a training setand a test set. Non-promoter sequences are obtained from theGenBank sequence databank, and are also divided into a trainingset and a test set. The first step is to search out, among allpossible permutations, patterns of strings 5–10 bp long,that are significantly over-represented in the promoter set.The program also searches IMD (Information Matrix Database)matrices that have a significantly higher presence in the promoterset. The results of the searches are stored in the PromFD database,and the program PromFD scores input DNA sequences accordingto their content of the database entries. PromFD predicts promoters—theirlocations and the location of potential TATA boxes, if found.The program can detect 71% of promoters in the training setwith a false-positive rate of under 1 in every 13 000 bp, and47% of promoters in the test set with a false-positive rateof under 1 in every 9800 bp. PromFD uses a new approach andits false-positive identification rate is better compared withother available promoter recognition algorithms. The sourcecode for PromFD is in the ‘c++’ language. Availability: PromFD is available for Unix platforms by anonymousftp to: beagle. colorado. edu, cd pub, get promFD.tar. A Javaversion of the program is also available for netscape 2.0, byhttp: // beagle.colorado.edu/chenq. Contact: E-mail: chenq{at}beagle.colorado.edu  相似文献   

13.
TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.  相似文献   

14.
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.  相似文献   

15.
16.
MOTIVATION: RNA secondary structure analysis often requires searching for potential helices in large sequence data. RESULTS: We present a utility program GUUGle that efficiently locates potential helical regions under RNA base pairing rules, which include Watson-Crick as well as G-U pairs. It accepts a positive and a negative set of sequences, and determines all exact matches under RNA rules between positive and negative sequences that exceed a specified length. The GUUGle algorithm can also be adapted to use a precomputed suffix array of the positive sequence set. We show how this program can be effectively used as a filter preceding a more computationally expensive task such as miRNA target prediction. AVAILABILITY: GUUGle is available via the Bielefeld Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de/guugle  相似文献   

17.
Herold KE  Rasooly A 《BioTechniques》2003,35(6):1216-1221
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.  相似文献   

18.
A computer program has been developed which aids in the determination of restriction enzyme recognition sequences. This is achieved by cleaving DNAs of known sequence with a restriction endonuclease and comparing the fragmentation pattern with a computer-generated set of patterns. The feasibility of this approach has been tested using fragmentation patterns of 0X174 DNA produced by enzymes of both known and unknown specificity. Recognition sequences are predicted for two restriction endonucleases (BbvI and SfaNI) using this method. In addition, recognition sequences are predicted for two other new enzymes (PvuI and MstI) using another computer-assisted method.  相似文献   

19.
In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions.  相似文献   

20.
We describe TRiFLe, a freely accessible computer program that generates theoretical terminal restriction fragments (T-RFs) from any user-supplied sequence set tailored to a particular group of organisms, sequences from clone libraries, or sequences from specific genes. The program allows a rapid identification of the most polymorphic enzymes, creates a collection of T-RFs for the data set, and can potentially identify specific T-RFs in T-RF length polymorphism (T-RFLP) patterns by comparing theoretical and experimental results. TRiFLE was used for analyzing T-RFLP data generated for the amoA and pmoA genes. The peaks identified in the T-RFLP patterns show an overlap of ammonia- and methane-oxidizing bacteria in the metalimnion of a subtropical lake.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号