首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

2.
When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the “seedâ€� regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.  相似文献   

3.
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.  相似文献   

4.
5.
多序列比对在阐明一组相关序列的重要生物学模式方面起着十分重要的作用。自从计算机的出现,就有许多研究者致力于多序列比对算法。人类基因组计划和单体型计划使多序列比对研究再次成为研究热点。本文详细归纳了多序列比对的主要算法,总结了国内外近年来多序列比对的研究进展,同时也分析并预测了未来该问题的研究方向。  相似文献   

6.
《生命科学研究》2014,(5):458-464
高通量测序技术的飞速发展,给生物信息学带来了新的机遇和挑战,第二代测序序列数量多、长度短使得原来的序列分析手段不再适用。近几年来,针对高通量测序的序列分析算法和软件日益增多,目前已有上百种,导致选择合适的软件成为一个难题。对第二代测序的测序类型、序列类型以及分析算法进行了总结和归纳,对现今常用的分析软件的序列的类型、长度以及软件应用算法、输入/输出格式、特点和功能等方面做了详细分析和比较并给出建议。分析了现今测序技术和序列分析存在的问题,预测了今后的发展方向。  相似文献   

7.
Carrying out simultaneous tree-building and alignment of sequence data is a difficult computational task, and the methods currently available are either limited to a few sequences or restricted to highly simplified models of alignment and phylogeny. A method is given here for overcoming these limitations by Bayesian sampling of trees and alignments simultaneously. The method uses a standard substitution matrix model for residues together with a hidden Markov model structure that allows affine gap penalties. It escapes the heavy computational burdens of other models by using an approximation called the ``*' rule, which replaces missing data by a sum over all possible values of variables. The behavior of the model is demonstrated on test sets of globins. Received: 25 May 1998 / Accepted: 8 December 1998  相似文献   

8.
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.  相似文献   

9.
A systematic way of inferring evolutionary relatedness of microbial organisms from the oligopeptide content, i.e., frequency of amino acid K-strings in their complete proteomes, is proposed. The new method circumvents the ambiguity of choosing the genes for phylogenetic reconstruction and avoids the necessity of aligning sequences of essentially different length and gene content. The only parameter in the method is the length K of the oligopeptides, which serves to tune the resolution power of the method. The topology of the trees converges with K increasing. Applied to a total of 109 organisms, including 16 Archaea, 87 Bacteria, and 6 Eukarya, it yields an unrooted tree that agrees with the biologists tree of life based on SSU rRNA comparison in a majority of basic branchings, and especially, in all lower taxa.  相似文献   

10.
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.  相似文献   

11.
The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).  相似文献   

12.
Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment.  相似文献   

13.
The success of hierarchical production planning approaches for flexible manufacturing systems lies in the consistency of decision outcomes at various decision levels. For instance, the loading problem, which is solved at a lower level, may not yield a feasible loading solution to a set of part types selected at a higher level. This paper attemps to address the issue of recognizing the infeasibility of a loading solution. We present a modified loading model that includes a penalty for each operation not assigned to any machine. We develop a Lagrangian-based heuristic procedure and provide a sufficient condition on the quality of heuristic solutions that, if satisfied, will enable us to use the heuristic solutions to recognize the infeasibility of a loading problem. The proposed model and the dual-based heuristic can be effectively incorporated in an FMS hierarchical production planning approach that finds a good loading solution by iteratively comparing different part grouping scenarios.  相似文献   

14.
信号肽预测是蛋白质功能预测中最重要的问题之一。为了避免使用滑动窗口造成的样本不平衡等问题,序列比对方法被有效地运用到了信号肽预测中。考虑到信号肽是蛋白质序列局部片段所体现的生物特性,本文提出一种局部序列匹配相似度的方法来预测信号肽,在采用氨基酸相对疏水性编码方案的基础上,搜索蛋白质局部匹配子序列,根据替换矩阵BLOSUM62来度量两个蛋白质的相似性,最后采用k最近邻思想进行分类。在目前广泛使用的SwissProt数据集上进行实验,结果表明该方法具有一定的高预测率。  相似文献   

15.
Segmentation of the left ventricle is very important to quantitatively analyze global and regional cardiac function from magnetic resonance. The aim of this study is to develop a novel algorithm for segmenting left ventricle on short-axis cardiac magnetic resonance images (MRI) to improve the performance of computer-aided diagnosis (CAD) systems. In this research, an automatic segmentation method for left ventricle is proposed on the basis of local binary fitting (LBF) model and dynamic programming techniques. The validation experiments are performed on a pool of data sets of 45 cases. For both endo- and epi-cardial contours of our results, percentage of good contours is about 93.5%, the average perpendicular distance are about 2 mm. The overlapping dice metric is about 0.91. The regression and determination coefficient between the experts and our proposed method on the LV mass is 1.038 and 0.9033, respectively; they are 1.076 and 0.9386 for ejection fraction (EF). The proposed segmentation method shows the better performance and has great potential in improving the accuracy of computer-aided diagnosis systems in cardiovascular diseases.  相似文献   

16.
Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct “BILD” (“Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum.  相似文献   

17.
Abstract

Modelling by homology is an approach to the rational design of new drugs based on the construction of ligand protein interaction complexes. Because in most cases the 3D-structure of the target protein is not known from biophysical data, this approach yields a theoretical procedure which establishes at least parts of the protein by comparison with isofunctional proteins, assuming that much of the structural information is embedded in the amino acid sequence. This approach should be of considerable importance for proteins with divergent primary structures but with a high degree of isofunctionality, the latter demanding a similar active site folding pattern.

This study is a pattern recognition approach based on additive secondary structure prediction and surface probabilities from residue variabilities. The comparison of the additive properties yields a sequence alignment of the viral thymidine kinases with the adenylate kinases having a closely related functionality. X-ray structures of adenylate kinases can then be used as templates to derive a 3D-structure prediction of the thymidine kinase active site.  相似文献   

18.
植物乳杆菌C88胞外多糖生物合成基因的克隆及序列比对   总被引:1,自引:0,他引:1  
乳酸菌胞外多糖能显著改善发酵乳制品及食品的流变学和质构特性.为进一步了解乳酸菌胞外多糖的生物合成途径及调控机制,本研究对参与植物乳杆菌C88胞外多糖生物合成基因簇的部分序列进行了克隆和鉴定.根据GenBank中已报道植物乳杆菌基因序列的保守区域设计特异性引物,扩增出植物乳杆菌C88生物合成蛋白基因(cps4A)序列,并通过染色体步移方法克隆了植物乳杆菌C88 参与胞外多糖合成基因簇的部分序列(4.9 kb).利用生物信息学方法预测基因簇中6个阅读框的结构和功能,结果表明该序列与已报道的乳酸杆菌胞外多糖生物合成基因具有高度的同源性(>96%);对各阅读框功能预测分析发现,这6个基因主要编码参与胞外多糖合成中的多糖合成蛋白、糖链长度检测蛋白、UDP-葡萄糖-4-异构酶和糖基转移酶.本研究将为利用基因工程方法调控多糖的合成和产量提供理论依据.  相似文献   

19.
An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm''s utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.  相似文献   

20.
Dialysis is a well-known technique for laboratory separation. However, its efficiency is commonly restricted by the dialyzer volume and its passive diffusion manner. In addition, the sample is likely to be precipitated and inactive during a long dialysis process. To overcome these drawbacks, a dynamic dialysis method was described and evaluated. The dynamic dialysis was performed by two peristaltic pumps working in reverse directions, in order to drive countercurrent parallel flow of sample and buffer, respectively. The efficiency and capacity of this dynamic dialysis method was evaluated by recording and statistically comparing the variation of conductance from retentate under different conditions. The dynamic method was proven to be effective in dialyzing a large-volume sample, and its efficiency changes proportionally to the flow rate of sample. To sum up, circulating the sample and the buffer creates the highest possible concentration gradient to significantly improve dialysis capacity and shorten dialysis time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号