首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

Background  

Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program.  相似文献   

2.
3.
A computer program (RSITE) was developed which predicts the recognition sequence of a restriction endonuclease. The sizes of fragments experimentally determined on cleavage of a DNA of known sequence were input. Possible recognition sequences producing fragments of sizes matching those determined empirically were printed out. The program faithfully predicted the specificity of restriction enzymes of known recognition sequence and also determined the recognition sequence of a new restriction enzyme from Haemophilus influenzae GU (HinGU II).  相似文献   

4.
SEQCMP, a program that analyzes and searches for homology among multiple nucleic acid sequences, is described. The sequences are compared by the dot matrix method and the consensus sequence is derived by superimposing all the dot matrices on one another. The program is written in MBASIC and runs on IBM-PC microcomputer. It is interactive and can be used by investigators with no computer background or experience.  相似文献   

5.
Exon discovery by genomic sequence alignment   总被引:5,自引:0,他引:5  
MOTIVATION: During evolution, functional regions in genomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous genomic sequences, for example from human and mouse. The main advantage of these methods is that they are based on simple and generally applicable measures of (local) sequence similarity; unlike standard gene-finding approaches they do not depend on species-specific training data or on the presence of cognate genes in data bases. As all comparative sequence-analysis methods, the new comparative gene-finding approaches critically rely on the quality of the underlying sequence alignments. RESULTS: Herein, we describe a new implementation of the sequence-alignment program DIALIGN that has been developed for alignment of large genomic sequences. We compare our method to the alignment programs PipMaker, WABA and BLAST and we show that local similarities identified by these programs are highly correlated to protein-coding regions. In our test runs, PipMaker was the most sensitive method while DIALIGN was most specific. AVAILABILITY: The program is downloadable from the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/.  相似文献   

6.
A platform program that performs biological sequence comparisonprovides a case study to compare the relative advantages ofa machine–independent approach to parallel computationversus a machine-specific approach. The program consists oftwo routines: (i) PSCANLIB, which compares a single biologicalsequence against a database of sequences, and (ii) PCOMPLJB,which compares a database of sequences against another databaseof sequences, or against itself. The program was first parallelizedto run on the Intel Hypercube parallel computer using nativeHypercube commands to coordinate the parallel computation. Theparallelization logic of the program was then translated intoa machine–independent parallel programming language, Linda.Tliese two approaches to parallelization are contrasted in termsof: (i) the expressive power of the logic that coordinates theparallel computation, (ii) the portability of the machine–independentversion to other parallel machines and (Hi) the relative efficiencyof the two versions of the program. In the benchmark tests reported,the benefits of the machine–independent approach wereachieved with only a modest sacrifice in efficiency.  相似文献   

7.
An interface program has been developed for users of MS-DOScomputers and the GenBank(R) gene sequence files in their disketteformat. With the program a user is able to produce keyword,author and entry name listings of GenBank items or to selectGenBank sequences for viewing, printing or decoding. The decodeoption uncompresses sequence data and yields a character filewhich has the format used on GenBank magnetic tapes. Programoptions are chosen by selecting items from command menus. Whilethe program is designed primarily for hard disk operation, italso allows users of diskette-based computers to work with GenBankfiles. Received on July 15, 1987; accepted on July 15, 1987  相似文献   

8.
An implementation of Profilesearch (a technique to search forrelationships between a protein sequence and multiply alignedsequences) for a parallel computer is described. The numbercrunchingmachine, consisting of 21 T800 transputers, is connected toa Macintosh IIcx host computer. The program utilizes a standardMacintosh application as its user–interface, resultingin a transparent and user–friendly environment for addressingthe parallel computer. The program is independent of the nwnberof available processors and exceeds the speed of a VAXstation3200 with only one transputer in operation, thus allowing cheapand fast database searches with a PC frontend. For a largernwnber of processors, the speed increase is approximately linearwith no obvious symptoms of saturation with the available maximwnof 21 transputers. The program and environment are usefid tosearch quickly and easily for similarities between a singlesequence or sequence set and individual sequences containedin a large database. The alignment is determined by typicaldynamic programming techniques.  相似文献   

9.
The current status and portability of our sequence handling software.   总被引:94,自引:15,他引:79       下载免费PDF全文
I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX.  相似文献   

10.
SEQ: a nucleotide sequence analysis and recombination system   总被引:67,自引:26,他引:41       下载免费PDF全文
SEQ is an interactive, self-documenting computer program that contains procedures for the analysis of nucleotide sequences and the manipulation of such sequences to allow the simulation and prediction of the results of recombinant DNA experiments.  相似文献   

11.
Comparative analysis of related DNA sequences has been simplified by the transformation of data in the standard A, G, C, T format into a set of geometric symbols that promote pattern recognition. Previously, comparing more than 2 or 3 sequences simultaneously has been difficult because of the monotonous patterns established by letters. Here 33 sequences are simultaneously compared to demonstrate the ease with which nucleotide substitutions are accurately identified. This has been accomplished by writing a Word-Perfect macro program to facilitate this transformation. Since this word processing program is widely used, performing this kind of analysis is readily achievable in most laboratories involved in DNA sequence analysis.  相似文献   

12.
SUMMARY: Multiple sequence alignment is the NP-hard problem of aligning three or more DNA or amino acid sequences in an optimal way so as to match as many characters as possible from the set of sequences. The popular sequence alignment program ClustalW uses the classical method of approximating a sequence alignment, by first computing a distance matrix and then constructing a guide tree to show the evolutionary relationship of the sequences. We show that parallelizing the ClustalW algorithm can result in significant speedup. We used a cluster of workstations using C and message passing interface for our implementation. Experimental results show that speedup of over 5.5 on six processors is obtainable for most inputs. AVAILABILITY: The software is available upon request from the second author.  相似文献   

13.

Background  

Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites.  相似文献   

14.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

15.
Phylogenetic reconstruction based upon multiple alignments ofmolecular sequences is important to most branches of modernbiology and is central to molecular evolution. Understandingthe historical relationships among macromolecules depends uponcomputer programs that implement a variety of analytical methods.Because it is impossible to know those historical relationshipswith certainty, assessment of the accuracy of methods and theprograms that implement them requires the use of programs thatrealistically simulate the evolution of DNA sequences. EvolveAGene3 is a realistic coding sequence simulation program that separatesmutation from selection and allows the user to set selectionconditions, including variable regions of selection intensitywithin the sequence and variation in intensity of selectionover branches. Variation includes base substitutions, insertions,and deletions. To the best of my knowledge, it is the only programavailable that simulates the evolution of intact coding sequences.Output includes the true tree and true alignments of the resultingcoding sequence and corresponding protein sequences. A log filereports the frequencies of each kind of base substitution, theratio of transition to transversion substitutions, the ratioof indel to base substitution mutations, and the numbers ofsilent and amino acid replacement mutations. The realism ofthe data sets has been assessed by comparing the dN/dS ratio,the ratio of transition to transversion substitutions, and theratio of indel to base substitution mutations of the simulateddata sets with those parameters of real data sets from the "goldstandard" BaliBase collection of structural alignments. Resultsshow that the data sets produced by EvolveAGene 3 are very similarto real data sets, and EvolveAGene 3 is therefore a realisticsimulation program that can be used to evaluate a variety ofprograms and methods in molecular evolution.  相似文献   

16.
All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.  相似文献   

17.
We present a fast algorithm to produce a graphic matrix representationof sequence homology. The algorithm is based on lexicographicalordering of fragments. It preserves most of the options of asimple naive algorithm with a significant increase in speed.This algorithm was the basis for a program, called DNAMAT, thathas been extensively tested during the last three years at theWeizmann Institute of Science and has proven to be very useful.In addition we suggest a way to extend our approach to analysea series of related DNA or RNA sequences, in order to determinecertain common structural features. The analysis is done by‘summing’ a set of dot-matrices to produce an overallmatrix that displays structural elements common to most of thesequences. We give an example of this procedure by analysingtRNA sequences. Received on June 26, 1986; accepted on September 28, 1986  相似文献   

18.
Ordination is a powerful method for analysing complex data setsbut has been largely ignored in sequence analysis. This papershows how to use principal coordinates analysis to find low–dimensionalrepresentations of distance matrices derived from aligned setsof sequences. The method takes a matrix of Euclidean distancesbetween all pairs of sequence and finds a coordinate space wherethe distances are exactly preserved The main problem is to finda measure of distance between aligned sequences that is Euclidean.The simplest distance function is the square root of the percentagedifference (as measured by identities) between two sequences,where one ignores any positions in the alignment where thereis a gap in any sequence. If one does not ignore positions witha gap, the distances cannot be guaranteed to be Euclidean butthe deleterious effects are trivial. Two examples of using themethod are shown. A set of 226 aligned globins were analysedand the resulting ordination very successfully represents theknown patterns of relationship between the sequences. In theother example, a set of 610 aligned 5S rRNA sequences were analysed.Sequence ordinations complement phylogenetic analyses. Theyshould not be viewed as a complete alternative.  相似文献   

19.
gff2aplot: Plotting sequence comparisons   总被引:1,自引:0,他引:1  
SUMMARY: gff2aplot is a program to visualize the alignment of two sequences together with their annotations. Input for the program consists of single or multiple files in GFF-format which specify the alignment coordinates and annotation features of both sequences. Output is in PostScript format of any size. The features to be displayed are highly customizable to meet user specific needs. The program serves to generate print-quality images for comparative genome sequence analysis. AVAILABILITY: gff2aplot is freely available under the GNU software licence and can be downloaded from the address specified below. Supplementary information: http://genome.imim.es/software/gfftools/GFF2APLOT.html  相似文献   

20.
The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused either on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRNA abundance and non-random features in coding sequences (e.g., codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together. Using the AlignACE program, 442 over-represented motifs were identified from the upstream 100bp region of 293 genes located in the known regulons. Regression of mRNA expression data against the measures of coding and non-coding sequence features indicated that 54.1% of the variations in mRNA abundance can be explained by the presence of upstream motifs, while coding sequences alone contribute to 29.7% of the variations in mRNA abundance. Interestingly, most of contribution from coding sequences is overlapping with that from upstream motifs; thereby a total of 60.3% of the variations in mRNA abundance can be explained when coding and non-coding information was included. This result demonstrates that upstream regulatory motifs and coding sequence information contribute to the overall mRNA expression in a combinatorial rather than an additive manner.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号