首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we address the multiple peak alignment problem in sequential data analysis with an approach based on the Gaussian scale-space theory. We assume that multiple sets of detected peaks are the observed samples of a set of common peaks. We also assume that the locations of the observed peaks follow unimodal distributions (e.g., normal distribution) with their means equal to the corresponding locations of the common peaks and variances reflecting the extension of their variations. Under these assumptions, we convert the problem of estimating locations of the unknown number of common peaks from multiple sets of detected peaks into a much simpler problem of searching for local maxima in the scale-space representation. The optimization of the scale parameter is achieved using an energy minimization approach. We compare our approach with a hierarchical clustering method using both simulated data and real mass spectrometry data. We also demonstrate the merit of extending the binary peak detection method (i.e., a candidate is considered either as a peak or as a nonpeak) with a quantitative scoring measure-based approach (i.e., we assign to each candidate a possibility of being a peak).  相似文献   

2.
The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, successfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.  相似文献   

3.
The purpose of this study was to investigate whether or not the neuromuscular locomotor system is optimized at a unique speed by examining the variability of the ground reaction force (GRF) pattern during walking in relation to different constant speeds. Ten healthy male subjects were required to walk on a treadmill at 3.0, 4.0, 5.0, 6.0, 7.0, and 8.0 km/h. Three components [vertical (F(z)), anteroposterior (F(y)), and mediolateral (F(x)) force] of the GRF were independently measured for approximately 35 steps consecutively for each leg. To quantify the GRF pattern, five indexes (first and second peaks of F(z), first and second peaks of F(y), and F(x) peak) were defined. Coefficients of variation were calculated for these five indexes to evaluate the GRF variability for each walking speed. It became clear for first and second peaks of F(z) and F(x) peak that index variabilities increased in relation to increments in walking speed, whereas there was a speed (5.5-5.8 km/h) at which variability was minimum for first and second peaks of F(y), which were related to forward propulsion of the body. These results suggest that there is "an optimum speed" for the neuromuscular locomotor system but only for the propulsion control mechanism.  相似文献   

4.
Human amniotic fluid has been shown to contain a protein that binds insulin-like growth factor I and II (IGF-I and IGF-II). Partially purified preparations of this protein have been reported to inhibit the biologic actions of the IGFs. In these studies our laboratory has used a modified purification procedure to obtain a homogeneous preparation of this protein as determined by polyacrylamide gel electrophoresis and amino acid sequence analysis. During purification the ion exchange chromatography step resulted in two peaks of material with IGF binding activity termed peaks B and C. Each peak was purified separately to homogeneity. Both peaks were estimated to be 31,000 daltons by polyacrylamide gel electrophoresis and their amino acid compositions were nearly identical. Amino acid sequence analysis showed that both peaks had identical N-terminal sequences through the first 28 residues. Neither protein had detectable carbohydrate side chains and each had a similar affinity for radiolabeled IGF-I (1.7-2.2 x 10(10) liters/mol). In contrast, these two forms had marked differences in bioactivity. Concentrations of peak C material between 2 and 20 ng/ml inhibited IGF-I stimulation of [3H]thymidine incorporation into smooth muscle cell DNA. In contrast, when peak B (100 ng/ml) was incubated with IGF-I there was a 4.4-fold enhancement of stimulation of DNA synthesis. Additionally, pure peak B was shown to adhere to cell surfaces, whereas peak C was not adherent. The non-adherent peak C inhibited IGF-I binding to its receptor and to adherent peak B. We conclude that human amniotic fluid contains two forms of IGF binding protein that have very similar physiochemical characteristics but markedly different biologic actions. Since both have similar if not identical amino acid compositions, N-terminal sequences, and do not contain carbohydrate, we conclude that they differ in some other as yet undefined post-translational modification.  相似文献   

5.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

6.
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257716 pairs of homologous sequences from 100 protein families. On 168475 of the 257716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.  相似文献   

7.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

8.
9.
When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.  相似文献   

10.
MOTIVATION: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS: In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.  相似文献   

11.
MOTIVATION: Comparative metabolic profiling by nuclear magnetic resonance (NMR) is showing increasing promise for identifying inter-individual differences to drug response. Two dimensional (2D) (1)H (13)C NMR can reduce spectral overlap, a common problem of 1D (1)H NMR. However, the peak alignment tools for 1D NMR spectra are not well suited for 2D NMR. An automated and statistically robust method for aligning 2D NMR peaks is required to enable comparative metabonomic analysis using 2D NMR. RESULTS: A novel statistical method was developed to align NMR peaks that represent the same chemical groups across multiple 2D NMR spectra. The degree of local pattern match among peaks in different spectra is assessed using a similarity measure, and a heuristic algorithm maximizes the similarity measure for peaks across the whole spectrum. This peak alignment method was used to align peaks in 2D NMR spectra of endogenous metabolites in liver extracts obtained from four inbred mouse strains in the study of acetaminophen-induced liver toxicity. This automated alignment method was validated by manual examination of the top 50 peaks as ranked by signal intensity. Manual inspection of 1872 peaks in 39 different spectra demonstrated that the automated algorithm correctly aligned 1810 (96.7%) peaks. AVAILABILITY: Algorithm is available upon request.  相似文献   

12.
MOTIVATION: Multiple alignment of highly divergent sequences is a challenging problem for which available programs tend to show poor performance. Generally, this is due to a scoring function that does not describe biological reality accurately enough or a heuristic that cannot explore solution space efficiently enough. In this respect, we present a new program, Align-m, that uses a non-progressive local approach to guide a global alignment. RESULTS: Two large test sets were used that represent the entire SCOP classification and cover sequence similarities between 0 and 50% identity. Performance was compared with the publicly available algorithms ClustalW, T-Coffee and DiAlign. In general, Align-m has comparable or slightly higher accuracy in terms of correctly aligned residues, especially for distantly related sequences. Importantly, it aligns much fewer residues incorrectly, with average differences of over 15% compared with some of the other algorithms. AVAILABILITY: Align-m and the test sets are available at http://bioinformatics.vub.ac.be  相似文献   

13.
MOTIVATION: Ion-type identification is a fundamental problem in computational proteomics. Methods for accurate identification of ion types provide the basis for many mass spectrometry data interpretation problems, including (a) de novo sequencing, (b) identification of post-translational modifications and mutations and (c) validation of database search results. RESULTS: Here, we present a novel graph-theoretic approach for solving the problem of separating b ions from y ions in a set of tandem mass spectra. We represent each spectral peak as a node and consider two types of edges: type-1 edge connecting two peaks probably of the same ion types and type-2 edge connecting two peaks probably of different ion types. The problem of ion-separation is formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, representing b, y and others ions, respectively, through maximizing the total weight of type-1 edges while minimizing the total weight of type-2 edges within each partitioned subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME (PaRtition of Ion types in tandem Mass spEctra). The tests on a large amount of simulated mass spectra and 19 sets of high-quality experimental Fourier transform ion cyclotron resonance tandem mass spectra indicate that an accuracy level of approximately 90% for the separation of b and y ions was achieved. AVAILABILITY: The executable code of PRIME is available upon request. CONTACT: xyn@bmb.uga.edu.  相似文献   

14.
Current methods for aligning biological sequences are based on dynamic programming algorithms. If large numbers of sequences or a number of long sequences are to be aligned, the required computations are expensive in memory and central processing unit (CPU) time. In an attempt to bring the tools of large-scale linear programming (LP) methods to bear on this problem, we formulate the alignment process as a controlled Markov chain and construct a suggested alignment based on policies that minimise the expected total cost of the alignment. We discuss the LP associated with the total expected discounted cost and show the results of a solution of the problem based on a primal-dual interior point method. Model parameters, estimated from aligned sequences, along with cost function parameters are used to construct the objective and constraint conditions of the LP problem. This article concludes with a discussion of some alignments obtained from the LP solutions of problems with various cost function parameter values.  相似文献   

15.
MOTIVATION: Computationally identifying non-coding RNA regions on the genome has much scope for investigation and is essentially harder than gene-finding problems for protein-coding regions. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignments of RNA sequences. On the other hand, Hidden Markov Models (HMMs) have played important roles for modeling and analysing biological sequences. Especially, the concept of Pair HMMs (PHMMs) have been examined extensively as mathematical models for alignments and gene finding. RESULTS: We propose the pair HMMs on tree structures (PHMMTSs), which is an extension of PHMMs defined on alignments of trees and provides a unifying framework and an automata-theoretic model for alignments of trees, structural alignments and pair stochastic context-free grammars. By structural alignment, we mean a pairwise alignment to align an unfolded RNA sequence into an RNA sequence of known secondary structure. First, we extend the notion of PHMMs defined on alignments of 'linear' sequences to pair stochastic tree automata, called PHMMTSs, defined on alignments of 'trees'. The PHMMTSs provide various types of alignments of trees such as affine-gap alignments of trees and an automata-theoretic model for alignment of trees. Second, based on the observation that a secondary structure of RNA can be represented by a tree, we apply PHMMTSs to the problem of structural alignments of RNAs. We modify PHMMTSs so that it takes as input a pair of a 'linear' sequence and a 'tree' representing a secondary structure of RNA to produce a structural alignment. Further, the PHMMTSs with input of a pair of two linear sequences is mathematically equal to the pair stochastic context-free grammars. We demonstrate some computational experiments to show the effectiveness of our method for structural alignments, and discuss a complexity issue of PHMMTSs.  相似文献   

16.
The problem of alignment of two symbol sequences is considered. The validity of the available algorithms for constructing optimal alignment depends on the weighting coefficients which are frequently difficult to choose. A new approach to the problem is proposed, which is based on the use of vector weighting functions (instead of tradionally used scalar ones) and Pareto-optimal alignment (an alignment that is optimal at any choice of weighting coefficient will always be Pareto-optimal). An efficient algorithm for constructing all Pareto-optimal alignments of two sequences is proposed. An approach to choosing a "biologically correct" alignment among all Pareto-optimal alignments is suggested.  相似文献   

17.
18.
A simple capillary zone electrophoresis (CZE) method was used to determine native, in vitro Cu(2+) and glucose modified low-density lipoprotein (LDL) particles for four healthy subjects. The LDL electropherograms are highly reproducible with good precisions of effective mobility and peak area. The native LDL capillary electrophoresis (CE) profile shows a major peak with lower mobility and two minor peaks with higher mobilities. For three-hour Cu(2+) oxidation, one major peak with mobility close to that of the native major peak, and one minor peak with mobility extending to -47 x 10(-5)cm(2)V(-1)s(-1) appear. For eighteen-hour Cu(2+) oxidation, one major peak with mobility much higher than that of the native major peak appears. As the reaction time for LDL and Cu(2+) increases from 0 to 24h, effective mobility of the LDL major peak increases, suggesting that LDL particles become more negatively charged and oxidized as the time increases. The in vitro glycated LDL particles are characterized by a major peak and two minor peaks. Mobility of the major peak is close to that of native major peak, but the second minor peak is much more negatively charged with mobility extending to -53 x 10(-5)cm(2)V(-1)s(-1). Native, oxidized and glycated LDL particles show distinctive differences in their CZE profiles. Agarose electrophoresis shows that the charge to mass ratios of native, three-hour Cu(2+) and glucose modified LDL particles are similar, but that of eighteen-hour Cu(2+) oxidized LDL particles is higher.  相似文献   

19.
The problem of alignment of cells (or other objects) that interact in an angle-dependent way was described in Mogilner and Edelstein-Keshet (1995). In this sequel we consider in detail a special limiting case of nearly complete alignment. This occurs when the rotational diffusion of individual objects becomes very slow. In this case, the motion of the objects is essentially deterministic, and the individuals or objects tend to gather in clusters at various orientations. (Numerical solutions show that the angular distribution develops sharp peaks at various discrete orientations.) To understand the behaviour of the deterministic models with analytic tools, we represent the distribution as a number of -like peaks. This approximation of a true solution by a set of (infinitely sharp) peaks will be referred to as thepeak ansatz. For weak but nonzero angular diffusion, the peaks are smoothed out. The analysis of this case leads to a singular perturbation problem which we investigate. We briefly discuss other applications of similar techniques.  相似文献   

20.
MOTIVATION: We consider the problem of multiple alignment of protein sequences with the goal of achieving a large SP (Sum-of-Pairs) score. RESULTS: We introduce a new graph-based method. We name our method QOMA (Quasi-Optimal Multiple Alignment). QOMA starts with an initial alignment. It represents this alignment using a K-partite graph. It then improves the SP score of the initial alignment through local optimizations within a window that moves greedily on the alignment. QOMA uses two parameters to permit flexibility in time/accuracy trade off: (1) The size of the window for local optimization. (2) The sparsity of the K-partite graph. Unlike traditional progressive methods, QOMA is independent of the order of sequences. The experimental results on BAliBASE benchmarks show that QOMA produces higher SP score than the existing tools including ClustalW, Probcons, Muscle, T-Coffee and DCA. The difference is more significant for distant proteins. AVAILABILITY: The software is available from the authors upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号