首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. The program looks both for segments of perfect identity or for regions where, using a scoring matrix, a minimum value is exceeded. The results of comparisons are presented as a matrix which is displayed on a simple graphics terminal. Use of a graphics terminal allows the user to display the whole of the two sequences in one screenful or to home-in on regions of interest to examine them in more detail. The program is interactive and so the user can easily see the effect of changes to variables and can use inbuilt editing functions to make insertions to produce alignments of the two sequences. These aligned sequences can then be saved on disk files for further processing.  相似文献   

2.
Representation of sequence similarity by dot matrix plots is a method widely used for comparing biological sequences. The user is presented with an overall view of similarity between two sequences. Computation of this plot has been reconsidered here. An improvement is proposed through the preprocessing of the data into an automation recognizing the word structure of a sequence. The main advantage of this approach is to systematically eliminate the repetitions during word comparison. Simple heuristics are also considered to greatly speed up pattern matching. As a result, large sequences are handled very efficiently. This is illustrated by a comparison of large genomic DNA. The algorithm has been implemented in an interactive application on a microcomputer.  相似文献   

3.
We describe software for aligning protein or nucleic acid sequencesbased on the concept of match density. This method is especiallyuseful for locating regions of short similarity between twolonger sequences which may be largely dissimilar (e.g. locatingactive site regions in distantly related proteins). Our softwareis able to identify biologically interesting similarities betweentwo sub-regions because it allows the user to control the matchingparameters and the manner in which local alignments are selectedfor display. Furthermore, the collection and ranking of alignmentsfor display uses a novel, highly efficient algorithm. We illustratethese features with several examples. In addition, we show thatthis tool can be used to find a new conserved sequence in severalviral DNA polymerases, which, we suggest, occurs at a functionallyimportant enzymatic site. Received on August 17, 1987; accepted on November 17, 1987  相似文献   

4.
The RDP (Ribosomal Database Project) continues   总被引:56,自引:0,他引:56  
The Ribosomal Database Project (RDP-II), previously described by Maidak et al., continued during the past year to add new rRNA sequences to the aligned data and to improve the analysis commands. Release 7.1 (September 17, 1999) included more than 10 700 small subunit rRNA sequences. More than 850 type strain sequences were identified and added to the prokaryotic alignment, bringing the total number of type sequences to 3324 representing 2460 different species. Availability of an RDP-II mirror site in Japan is also near completion. RDP-II provides aligned and annotated rRNA sequences, derived phylogenetic trees and taxonomic hierarchies, and analysis services through its WWW server (http://rdp.cme.msu.edu/ ). Analysis services include rRNA probe checking, approx-i-mate phylogenetic placement of user sequences, screening user sequences for possible chimeric rRNA sequences, automated alignment, production of similarity matrices and services to plan and analyze terminal restriction fragment length polymorphism (T-RFLP) experiments.  相似文献   

5.
We have developed a web-based tool for design of specific PCR primers and probes. The program allows you to enter primer sequence information as well as an optional probe, and sequence similarity searches (MegaBLAST) will be performed to see if the sequences match the same sequence entry in the specified database. If primers (and probe) match, this will be reported. The program can handle overlapping amplicons, amplification from a single primer, ambiguous bases and other problematic cases.  相似文献   

6.
MOTIVATION: Dot-matrix plots are widely used for similarity analysis of biological sequences. Many algorithms and computer software tools have been developed for this purpose. Though some of these tools have been reported to handle sequences of a few 100 kb, analysis of genome sequences with a length of >10 Mb on a microcomputer is still impractical due to long execution time and computer memory requirement. RESULTS: Two dot-matrix comparison methods have been developed for analysis of large sequences. The methods initially locate similarity regions between two sequences using a fast word search algorithm, followed with an explicit comparison on these regions. Since the initial screening removes most of random matches, the computing time is substantially reduced. The methods produce high quality dot-matrix plots with low background noise. Space requirements are linear, so the algorithms can be used for comparison of genome size sequences. Computing speed may be affected by highly repetitive sequence structures of eukaryote genomes. A dot-matrix plot of Yeast genome (12 Mb) with both strands was generated in 80 s with a 1 GHz personal computer.  相似文献   

7.
MOTIVATION: Protein structure classification has been recognized as one of the most important research issues in protein structure analysis. A substantial number of methods for the classification have been proposed, and several databases have been constructed using these methods. Since some proteins with very similar sequences may exhibit structural diversities, we have proposed PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB), which strategy of selection is based not only on sequence similarity but also on structural similarity. Forty-eight representative sets whose similarity criteria were predetermined were made available over the World Wide Web (WWW). However, the sets were insufficient in number to satisfy users researching protein structures by various methods. RESULT: We have improved the system for PDB-REPRDB so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. This paper describes the method we use to classify chains and select the representatives in the system. We also describe the interface used to set the parameters.  相似文献   

8.
SUMMARY: MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. AVAILABILITY: MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. CONTACT: gorodkin@cbs.dtu.dk  相似文献   

9.
Calculation of dot-matrices is a widespread tool in the search for sequence similarities. When sequences are distant, even this approach may fail to point out common regions. If several plots calculated for all members of a sequence set consistently displayed a similarity between them, this would increase its credibility. We present an algorithm to delineate dot-plot agreement. A novel procedure based on matrix multiplication is developed to identify common patterns and reliably aligned regions in a set of distantly related sequences. The algorithm finds motifs independent of input sequence lengths and reduces the dependence on gap penalties. When sequences share greater similarity, the same approach converts to a multiple sequence alignment procedure.  相似文献   

10.
As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.  相似文献   

11.
Although bird song has been an important model for investigating questions of behavior development, cultural evolution and population differentiation, the quantitative methods of analysis have been problematic. Here we develop and apply quantitative randomization methods to test hypotheses about these processes in a natural population of birds. Songs of the African brood-parasitic straw-tailed whydahs ( Vidua fischeri ) and songs of their host species, the purple grenadier ( Granatina ianthinogaster ), were compared in audiospectrograms for similarity to test the following hypotheses: Whydahs mimic the songs of their host species, they have local song dialects, neighboring males match their song themes, local males match the songs of local hosts, remote populations have different songs according to their geographic distance, and songs undergo cultural evolution over time across generations. Randomization analyses were completed using (1) Mantel matrix statistics and (2) tree-based measures employing Sankoff optimization of Manhattan matrices and approximate randomizations. Our results provide evidence for song mimicry, local song dialects, matching song themes between neighboring males, song matching of local whydah mimics and grenadier song models, correspondence of song differences and geographic distance, and cultural continuity with change in song traditions within a local population. These randomization methods may be useful in other studies of animal communication, and they are sufficiently general for use both with distance matrices derived either from naturalistic impressions of song similarity as in our example or from acoustic measurements.  相似文献   

12.
The D(2) statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D(2) may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D(2)* and D(2c). We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D(2) and D(2)c, and to a somewhat lesser extent D(2)*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.  相似文献   

13.
We present a method based on hierarchical self-organizing maps (SOMs) for recognizing patterns in protein sequences. The method is fully automatic, does not require prealigned sequences, is insensitive to redundancy in the training set, and works surprisingly well even with small learning sets. Because it uses unsupervised neural networks, it is able to extract patterns that are not present in all of the unaligned sequences of the learning set. The identification of these patterns in sequence databases is sensitive and efficient. The procedure comprises three main training stages. In the first stage, one SOM is trained to extract common features from the set of unaligned learning sequences. A feature is a number of ungapped sequence segments (usually 4-16 residues long) that are similar to segments in most of the sequences of the learning set according to an initial similarity matrix. In the second training stage, the recognition of each individual feature is refined by selecting an optimal weighting matrix out of a variety of existing amino acid similarity matrices. In a third stage of the SOM procedure, the position of the features in the individual sequences is learned. This allows for variants with feature repeats and feature shuffling. The procedure has been successfully applied to a number of notoriously difficult cases with distinct recognition problems: helix-turn-helix motifs in DNA-binding proteins, the CUB domain of developmentally regulated proteins, and the superfamily of ribokinases. A comparison with the established database search procedure PROFILE (and with several others) led to the conclusion that the new automatic method performs satisfactorily.  相似文献   

14.
The RDP-II (Ribosomal Database Project)   总被引:23,自引:0,他引:23  
The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [Nucleic Acids Res. (2000), 28, 173-174], continued during the past year to add new rRNA sequences to the aligned data and to improve the analysis commands. Release 8.0 (June 1, 2000) consisted of 16 277 aligned prokaryotic small subunit (SSU) rRNA sequences while the number of eukaryotic and mitochondrial SSU rRNA sequences in aligned form remained at 2055 and 1503, respectively. The number of prokaryotic SSU rRNA sequences more than doubled from the previous release 14 months earlier, and approximately 75% are longer than 899 bp. An RDP-II mirror site in Japan is now available (http://wdcm.nig.ac.jp/RDP/html/index.h tml). RDP-II provides aligned and annotated rRNA sequences, derived phylogenetic trees and taxonomic hierarchies, and analysis services through its WWW server (http://rdp.cme.msu.edu/). Analysis services include rRNA probe checking, approximate phylogenetic placement of user sequences, screening user sequences for possible chimeric rRNA sequences, automated alignment, production of similarity matrices and services to plan and analyze terminal restriction fragment polymorphism experiments. The RDP-II email address for questions and comments has been changed from curator@cme.msu.edu to rdpstaff@msu.edu.  相似文献   

15.
16.
17.
The programs offer the possibility of comparing pairs of homologous sequences in order to find out percentage of homology, number of identical and deviating nucleotides, of transitions and transversions and, derived from these, KNUC-values according to Kimura (1) and the corresponding standard error sigmaK. The sequences can be printed in pairs underneath each other, homologies are indicated by asterisks between the identical nucleotides. Out of a set of homologous sequences stored on a disk any number of sequences can be compared in pairs in this way, and a matrix containing either the percentage of homology values, the number of deviating nucleotides or the KNUC-values together with the corresponding standard errors can be sent to screen, printer or disk. A program will be available soon which creates a dendrogram representing the similarity between the sequences by use of an average linkage clustering method deduced from this matrix. The programs are written for Apple II computers using UCSD-PASCAL and for Sirius I/Victor 9000 computers using TURBO-PASCAL.  相似文献   

18.
The recent application of genome-wide, single nucleotide polymorphism (SNP) microarrays to investigate DNA copy number aberrations in cancer has provided unparalleled sensitivity for identifying genomic changes. In some instances the complexity of these changes makes them difficult to interpret, particularly when tumour samples are contaminated with normal (stromal) tissue. Current automated scoring algorithms require considerable manual data checking and correction, especially when assessing uncultured tumour specimens. To address these limitations we have developed a visual tool to aid in the analysis of DNA copy number data. Simulated DNA Copy Number (SiDCoN) is a spreadsheet-based application designed to simulate the appearance of B-allele and logR plots for all known types of tumour DNA copy number changes, in the presence or absence of stromal contamination. The system allows the user to determine the level of stromal contamination, as well as specify up to 3 different DNA copy number aberrations for up to 5000 data points (representing individual SNPs). This allows users great flexibility to assess simple or complex DNA copy number combinations. We demonstrate how this utility can be used to estimate the level of stromal contamination within tumour samples and its application in deciphering the complex heterogeneous copy number changes we have observed in a series of tumours. We believe this tool will prove useful to others working in the area, both as a training tool, and to aid in the interpretation of complex copy number changes.  相似文献   

19.
Vole disturbances and plant diversity in a grassland metacommunity   总被引:1,自引:0,他引:1  
Questad EJ  Foster BL 《Oecologia》2007,153(2):341-351
We studied the disturbance associated with prairie vole burrows and its effects on grassland plant diversity at the patch (1 m2) and metacommunity (>5 ha) scales. We expected vole burrows to increase patch-scale plant species diversity by locally reducing competition for resources or creating niche opportunities that increase the presence of fugitive species. At the metacommunity scale, we expected burrows to increase resource heterogeneity and have a community composition distinct from the matrix. We measured resource variables and plant community composition in 30 paired plots representing disturbed burrows and undisturbed matrix patches in a cool-season grassland. Vole disturbance affected the mean values of nine resource variables measured and contributed more to resource heterogeneity in the metacommunity than matrix plots. Disturbance increased local plant species richness, metacommunity evenness, and the presence and abundance of fugitive species. To learn more about the contribution of burrow and matrix habitats to metacommunity diversity, we compared community similarity among burrow and matrix plots. Using Sorenson’s similarity index, which considers only presence–absence data, we found no difference in community similarity among burrows and matrix plots. Using a proportional similarity index, which considers both presence–absence and relative abundance data, we found low community similarity among burrows. Burrows appeared to shift the identity of dominant species away from the species dominant in the matrix. They also allowed subordinate species to persist in higher abundances. The patterns we observed are consistent with several diversity-maintaining mechanisms, including a successional mosaic and alternative successional trajectories. We also found evidence that prairie voles may be ecosystem engineers.  相似文献   

20.
Software has been developed to allow the use of a number ofparameters in the comparative representation of proteins incolor and monochrome dot matrices. They include the parametersof partial specific volume, residue bulkiness, the mean areaburied of side chains, seven additional hydropathy scales, mutability,polarity, secondary structure propensities, energy/residue,energy/atom, Rf values, the pKs at the N and C terminals, user-definedparameters and, if desired, randomly generated values. Manyof these parameters can be combined in n space using an algorithmbased on the Euclidian distance relationship in order to deriveconsensus values. The problem of scoring matched identitiesis addressed and the user may stipulate that they score 100on a 0–100 scale or be determined from the Dayhoff MDM78values with the rest of the matrix scaled appropriately. ThePAMs matrix has been incorporated in such a way to allow theuser to stipulate various PAM's values or estimated percentagedifference between two peptide sequences, and converting tolog odds values. In addition, the similarity ring developedby Swanson and the matrix proposed by Bacon and Anderson havebeen adapted for use in the program. Color indices have beenutilized to give a ‘third dimension’ to the projections,allowing the user to judge the degree of similarity of differentregions which are represented. The software also provides forthe plotting of nucleotides in which case color is used to codeindividual nucleotides, purines versus pyrimidines, or similarcolors are used to differentiate between A and T bases on theone hand, and G and C on the other. Received on December 31, 1987; accepted on May 18, 1988  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号