首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

2.
W Saurin  P Marlière 《Biochimie》1985,67(5):517-521
A set of sequences can be defined by their common subsequences, and the length of these is a measure of the overall resemblance of the set. Each subsequence corresponds to a succession of symbols embedded in every sequence, following the same order but not necessarily contiguous. Determining the longest common subsequence (LCS) requires the exhaustive testing of all possible common subsequences, which sum up to about 2L, if L is the length of the shortest sequence. We present a polynomial algorithm (O(n X L4), where n is the number of sequences) for generating strings related to the LCS and constructed with the sequence alphabet and an indetermination symbol. Such strings are iteratively improved by deleting indetermination symbols and concomitantly introducing the greatest number of alphabet symbols. Processed accordingly, nucleic acid and protein sequences lead to key-words encompassing the salient positions of homologous chains, which can be used for aligning or classifying them, as well as for finding related sequences in data banks.  相似文献   

3.
An algorithm for selection of functional siRNA sequences   总被引:33,自引:0,他引:33  
Randomly designed siRNA targeting different positions within the same mRNA display widely differing activities. We have performed a statistical analysis of 46 siRNA, identifying various features of the 19bp duplex that correlate significantly with functionality at the 70% knockdown level and verified these results against an independent data set of 34 siRNA recently reported by others. Features that consistently correlated positively with functionality across the two data sets included an asymmetry in the stability of the duplex ends (measured as the A/U differential of the three terminal basepairs at either end of the duplex) and the motifs S1, A6, and W19. The presence of the motifs U1 or G19 was associated with lack of functionality. A selection algorithm based on these findings strongly differentiated between the two functional groups of siRNA in both data sets and proved highly effective when used to design siRNA targeting new endogenous human genes.  相似文献   

4.
A flexible method to align large numbers of biological sequences   总被引:5,自引:0,他引:5  
Summary A method for the alignment of two or more biological sequences is described. The method is a direct extension of the method of Taylor (1987) incorporating a consensus sequence approach and allows considerable freedom in the control of the clustering of the sequences. At one extreme this is equivalent to the earlier method (Taylor 1987), whereas at the other, the clustering approaches the binary method of Feng and Doolittle (1987). Such freedom allows the program to be adapted to particular problems, which has the important advantage of resulting in considerable savings in computer time, allowing very large problems to be tackled. Besides a detailed analysis of the alignment of the cytochrome c superfamily, the clustering and alignment of the PIR sequence data bank (3500 sequences approx.) is described.  相似文献   

5.
An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X ("motifs") interspersed with one another. NTRs fall into a class of repetitive structures broadly known as subrepeats. NTRs have been found in real DNA sequences and are expected to be important in evolutionary biology, both in understanding evolution of the ribosomal DNA (where NTRs can occur), and as a potential marker in population genetic and phylogenetic studies. This article describes an alignment algorithm for the verification phase of the software tool NTRFinder developed for database searches for NTRs. When the search algorithm has located a subsequence containing a possible NTR, with motifs X and x, a verification step aligns this subsequence against an exact NTR built from the templates X and x, to determine whether the subsequence contains an approximate NTR and its extent. This article describes an algorithm to solve this alignment problem in O(|T|(|X| + |x|)) space and time. The algorithm is based on Fischetti et al.'s wrap-around dynamic programming.  相似文献   

6.
In this paper, we propose a simple method to analyze the similarity of biological sequences. By taking the average contents of biological sequences and their information entropies as the variables, the fuzzy method is used to cluster them. From the results of application, it finds that the method is relatively easy and rapid. Unlike other methods such as the graphical representation methods, which is usually very complex to compute some invariants of matric derived from graphical representation, our method pays more attention to the information of biological sequences themselves. Especially with the help of the software (SPSS), it seems to be very convenient. Therefore, it may be used to study the new biological sequences such as their evolution relationship and structures.  相似文献   

7.
We present an efficient algorithm for individual-based, stochastic simulation of biological populations in continuous time. A simple method for its implementation is given and it is compared to Gillespie's commonly used Direct Method. These two methods are proven to be exactly equivalent and, using a basic evolutionary model, it is demonstrated that the new algorithm can run thousands of times faster. Furthermore, while computational cost per event increases linearly with population size under the Direct Method, this cost is independent of population size under the new algorithm. We argue that this gain in efficiency opens up the possibility to explore a new class of models in population biology.  相似文献   

8.
We describe an algorithm (IRSA) for identification of common regulatory signals in samples of unaligned DNA sequences. The algorithm was tested on randomly generated sequences of fixed length with implanted signal of length 15 with 4 mutations, and on natural upstream regions of bacterial genes regulated by PurR, ArgR and CRP. Then it was applied to upstream regions of orthologous genes from Escherichia coli and related genomes. Some new palindromic binding and direct repeats signals were identified. Finally we present a parallel version suitable for computers supporting the MPI protocol. This implementation is not strictly bounded by the number of available processors. The computation speed linearly depends on the number of processors.  相似文献   

9.
Linking similar proteins structurally is a challenging task that may help in finding the novel members of a protein family. In this respect, identification of conserved sequence can facilitate understanding and classifying the exact role of proteins. However, the exact role of these conserved elements cannot be elucidated without structural and physiochemical information. In this work, we present a novel desktop application MotViz designed for searching and analyzing the conserved sequence segments within protein structure. With MotViz, the user can extract a complete list of sequence motifs from loaded 3D structures, annotate the motifs structurally and analyze their physiochemical properties. The conservation value calculated for an individual motif can be visualized graphically. To check the efficiency, predicted motifs from the data sets of 9 protein families were analyzed and MotViz algorithm was more efficient in comparison to other online motif prediction tools. Furthermore, a database was also integrated for storing, retrieving and performing the detailed functional annotation studies. In summary, MotViz effectively predicts motifs with high sensitivity and simultaneously visualizes them into 3D strucures. Moreover, MotViz is user-friendly with optimized graphical parameters and better processing speed due to the inclusion of a database at the back end. MotViz is available at http://www.fi-pk.com/motviz.html.  相似文献   

10.
Koike R  Kinoshita K  Kidera A 《Proteins》2007,66(3):655-663
Dynamic programming (DP) and its heuristic algorithms are the most fundamental methods for similarity searches of amino acid sequences. Their detection power has been improved by including supplemental information, such as homologous sequences in the profile method. Here, we describe a method, probabilistic alignment (PA), that gives improved detection power, but similarly to the original DP, uses only a pair of amino acid sequences. Receiver operating characteristic (ROC) analysis demonstrated that the PA method is far superior to BLAST, and that its sensitivity and selectivity approach to those of PSI-BLAST. Particularly for orphan proteins having few homologues in the database, PA exhibits much better performance than PSI-BLAST. On the basis of this observation, we applied the PA method to a homology search of two orphan proteins, Latexin and Resuscitation-promoting factor domain. Their molecular functions have been described based on structural similarities, but sequence homologues have not been identified by PSI-BLAST. PA successfully detected sequence homologues for the two proteins and confirmed that the observed structural similarities are the result of an evolutional relationship.  相似文献   

11.
12.
Native protein structures achieve stability in part by burying hydrophobic side-chains. About 75% of all amino acid residues buried in protein interiors are non-polar. Buried residues are not uniformly distributed in protein sequences, but sometimes cluster as contiguous polypeptide stretches that run through the interior of protein domain structures. Such regions have an intrinsically high local sequence density of non-polar residues, creating a potential problem: local non-polar sequences also promote protein misfolding and aggregation into non-native structures such as the amyloid fibrils in Alzheimer's disease. Here we show that long buried blocks of sequence in protein domains of known structure have, on average, a lower content of non-polar amino acids (about 70%) than do isolated buried residues (about 80%). This trend is observed both in small and in large protein domains and is independent of secondary structure. Long, completely non-polar buried stretches containing many large side-chains are particularly avoided. Aspartate residues that are incorporated in long buried stretches were found to make fewer polar interactions than those in short stretches, hinting that they may be destabilizing to the native state. We suggest that evolutionary pressure is acting on non-native properties, causing buried polar residues to be placed at positions where they would break up aggregation-prone non-polar sequences, perhaps even at some cost to native state stability.  相似文献   

13.
Clusters of charged residues are one of the key features of protein primary structure since they have been associated to important functions of proteins. Here, we present a proteome wide scan for the occurrence of Charge Clusters in Protein sequences using a new search tool (FCCP) based on a score‐based methodology. The FCCP was run to search charge clusters in seven eukaryotic proteomes: Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, and Saccharomyces cerevisiae. We found that negative charge clusters (NCCs) are three to four times more frequent than positive charge clusters (PCCs). The Drosophila proteome is on average the most charged, whereas the human proteome is the least charged. Only 3 to 8% of the studied protein sequences have negative charge clusters, while 1.6 to 3% having PCCs and only 0.07 to 0.6% have both types of clusters. NCCs are localized predominantly in the N‐terminal and C‐terminal domains, while PCCs tend to be localized within the functional domains of the protein sequences. Furthermore, the gene ontology classification revealed that the protein sequences with negative and PCCs are mainly binding proteins. Proteins 2015; 83:1252–1261. © 2015 Wiley Periodicals, Inc.  相似文献   

14.
Proteomic analyses of the nucleolus have revealed almost 700 functionally diverse proteins implicated in ribosome biogenesis, nucleolar assembly, and regulation of vital cellular processes. However, this nucleolar inventory has not unveiled a specific consensus motif necessary for nucleolar binding. The ribosomal protein family characterized by their basic nature should exhibit distinct binding sequences that enable interactions with the rRNA precursor molecules facilitating subunit assembly. We succeeded in delineating 2 minimal nucleolar binding sequences of human ribosomal protein S6 by fusing S6 cDNA fragments to the 5' end of the LacZ gene and subsequently detecting the intracellular localization of the beta-galactosidase fusion proteins. Nobis1 (nucleolar binding sequence 1), comprising of 4 highly conserved amino acid clusters separated by glycine or proline, functions independently of the 3 authentic nuclear localization signals (NLSs). Nobis2 consists of 2 conserved peptide clusters and requires the authentic NLS2 in its native context. Similarly, we deduced from previous publications that the single Nobis of ribosomal protein S25 is also highly conserved. The functional protein domain organization of the ribosomal protein S6e family consists of 3 modules: NLS, Nobis, and the C-terminal serine cluster of the phosphorylation sites. This modular structure is evolutionary conserved in vertebrates, invertebrates, and fungi. Remarkably, nucleolar binding sequences of small and large ribosomal proteins reside in peptide clusters conserved over millions of years.  相似文献   

15.
Abstract The DNA sequence of five contiguous open reading frames encoding enzymes for phenazine biosynthesis in the biological control bacterium Pseudomonas aureofaciens 30–84 was determined. These open reading frames were named phzF, phzA, phzB, phzC and phzD . Protein PhzF is similar to 3-deoxy-D-arabino-heptulosonate-7-phosphate synthases of solanaceous plants. PhzA is similar to 2,3-dihydro-2,3-dihydroxybenzoate synthase (EntB) of Escherichia coli . PhzB shares similarity with both subunits of anthranilate synthase and the phzB open reading frame complemented an E. coli trpE mutant deficient in anthranilate synthase activity. Although phzC shares little similarity to known genes, its product is responsible for the conversion of phenazine-1-carboxylic acid to 2-hydroxy-phenazine-1-carboxylic acid. PhzD is similar to pyridoxamine phosphate oxidases. These results indicate that phenazine biosynthesis in P. aureofaciens shares similarities with the shikimic acid, enterochelin, and tryptophan biosynthetic pathways.  相似文献   

16.
In this study, an in silico approach was developed to identify homologies existing between livestock microsatellite flanking sequences and GenBank nucleotide sequences. Initially, 1955 bovine, 1570 porcine and 1121 chicken microsatellites were downloaded and the flanking sequences were compared with the nr and dbEST databases of GenBank. A total of 74 bovine, 44 porcine and 37 chicken microsatellite flanking sequences passed our criteria and had at least one significant match to human genomic sequence, genes/expressed sequence tags (ESTs) or both. GenBank annotation and BLAT searches of the UCSC human genome assembly revealed that 38 bovine, 13 porcine and 17 chicken microsatellite flanking sequences were highly similar to known human genes. Map locations were available for 67 bovine, 44 porcine and 21 chicken microsatellite flanking sequences, providing useful links in the comparative maps of humans and livestock. In support of our approach, 112 alignments with both microsatellite and match mapping information were located in the expected chromosomal regions based on previously reported syntenic relationships. The development of this in silico mapping approach has significantly increased the number of genes and EST sequences anchored to the bovine, porcine and chicken genome maps and the number of links in various human-livestock comparative maps.  相似文献   

17.
18.
This paper describes a non-iterative, recursive method to compute the likelihood for a pedigree without loops, and hence an efficient way to compute genotype probabilities for every member of the pedigree. The method can be used with multiple mates and large sibships. Scaling is used in calculations to avoid numerical problems in working with large pedigrees.  相似文献   

19.
Summary The data from a genomic library can be sorted into the frequencies of every possible tetranucleotide in the sequence. This tabulation, a short sequence distribution, contains the frequency of occurrence of the 256 tetranucleotides and thus seems to serve as a vehicle for averaging sequence information. Two such distributions can be readily compared by correlation. Reported here are correlations (Spearmanr s) of the distributions from all of the genomic libraries in GenBank 44.0 with sizes equal to or larger than that ofSalmonella typhimurium, except for the data for mouse and humans. All of the organisms examined showed highly significant correlations between the two DNA strands (not the complementarity expected from base pairing). Of 155 comparisons between libraries, 132 showed significant correlations at the 99% confidence level. Application of the correlation coefficients as a similarity matrix clustered most organisms in a phenogram in a pattern consistent with other hypotheses. This suggests a highly conserved pattern underlying all other genetic information in cellular DNA and affecting both DNA strands, perhaps caused by interaction with conserved factors necessary for DNA packaging.  相似文献   

20.
In the analysis of organism life cycles in ecology, comparisons of life cycles between species or between different types of life cycles within species are frequently conducted. In matrix population models, partitioning of the elasticity matrix is used to quantify the separate contributions of different life cycles to the population growth rate. Such partition is equivalent to a decomposition of the life cycle graph of the population. A graph theoretic spanning tree method to carry out the decomposition was formalized by Wardle [Ecology 79(7), 2539–2549 (1998)]. However there are difficulties in realizing a suitable decomposition for complex life histories using the spanning-tree method. One of the problems is the occurrence of life cycles that contain contradictory directions that defy biological interpretation. We propose an algorithmic approach for decomposing a directed, weighted graph. The graph is to be decomposed into two parts. The first part is a set of simple cycles that contain no contradictory directions and that consist of edges of equal weight. The second part of the decomposition is a subgraph in which no such simple cycles are obtainable. When applied to life cycle analysis in ecology, the proposed method will guarantee a complete decomposition of the life cycle graph into individual life cycles containing no contradictory directions. Although the research described in this article has been funded in part by the United States Environmental Protection Agency through STAR cooperative agreement R-82940201-0 to the University of Chicago, it has not been subjected to the Agency’s required peer and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号