首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present algorithms for time-series gene expression analysis that permit the principled estimation of unobserved time points, clustering, and dataset alignment. Each expression profile is modeled as a cubic spline (piecewise polynomial) that is estimated from the observed data and every time point influences the overall smooth expression curve. We constrain the spline coefficients of genes in the same class to have similar expression patterns, while also allowing for gene specific parameters. We show that unobserved time points can be reconstructed using our method with 10-15% less error when compared to previous best methods. Our clustering algorithm operates directly on the continuous representations of gene expression profiles, and we demonstrate that this is particularly effective when applied to nonuniformly sampled data. Our continuous alignment algorithm also avoids difficulties encountered by discrete approaches. In particular, our method allows for control of the number of degrees of freedom of the warp through the specification of parameterized functions, which helps to avoid overfitting. We demonstrate that our algorithm produces stable low-error alignments on real expression data and further show a specific application to yeast knock-out data that produces biologically meaningful results.  相似文献   

2.
Aligning gene expression time series with time warping algorithms   总被引:1,自引:0,他引:1  
motivation: Increasingly, biological processes are being studied through time series of RNA expression data collected for large numbers of genes. Because common processes may unfold at varying rates in different experiments or individuals, methods are needed that will allow corresponding expression states in different time series to be mapped to one another. Results: We present implementations of time warping algorithms applicable to RNA and protein expression data and demonstrate their application to published yeast RNA expression time series. Programs executing two warping algorithms are described, a simple warping algorithm and an interpolative algorithm, along with programs that generate graphics that visually present alignment information. We show time warping to be superior to simple clustering at mapping corresponding time states. We document the impact of statistical measurement noise and sample size on the quality of time alignments, and present issues related to statistical assessment of alignment quality through alignment scores. We also discuss directions for algorithm improvement including development of multiple time series alignments and possible applications to causality searches and non-temporal processes ('concentration warping').  相似文献   

3.
Given two time series, possibly of different lengths, time warping is a method to construct an optimal alignment obtained by stretching or contracting time intervals. Unlike pairwise alignment of amino acid sequences, classical time warping, originally introduced for speech recognition, is not symmetric in the sense that the time warping distance between two time series is not necessarily equal to the time warping distance of the reversal of the time series. Here we design a new symmetric version of time warping, and present a formal proof of symmetry for our algorithm as well as for one of the variants of Aach and Church [1]. We additionally design quadratic time dynamic programming algorithms to compute both the forward and backward Boltzmann partition functions for symmetric time warping, and hence compute the Boltzmann probability that any two time series points are aligned. In the future, with the availability of increasingly long and accurate time series gene expression data, our algorithm can provide a sense of biological significance for aligned time points – e.g. our algorithm could be used to provide evidence that expression values of two genes have higher Boltzmann probability (say) in the G1 and S phase than in G2 and M phases. Algorithms, source code and web interface, developed by the first author, are made publicly available via the Boltzmann Time Warping web server at bioinformatics.bc.edu/clotelab/. Research partially supported by National Science Foundation grant DBI-0543506.  相似文献   

4.
SUMMARY: A novel integration approach targeting the combination of multi-experiment time series expression data is proposed. A recursive hybrid aggregation algorithm is initially employed to extract a set of genes, which are eventually of interest for the biological phenomenon under study. Next, a hierarchical merge procedure is specifically developed for the purpose of fusing together the multiple-experiment expression pro.les of the selected genes. This employs dynamic time warping alignment techniques in order to account adequately for the potential phase shift between the different experiments. We subsequently demonstrate that the resulting gene expression pro.les consistently re.ect the behavior of the original expression pro.les in the different experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available athttp://www.tu-plovdiv.bg/Container/bi/DataIntegration/  相似文献   

5.
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.  相似文献   

6.
MOTIVATION: Time series expression experiments have emerged as a popular method for studying a wide range of biological systems under a variety of conditions. One advantage of such data is the ability to infer regulatory relationships using time lag analysis. However, such analysis in a single experiment may result in many false positives due to the small number of time points and the large number of genes. Extending these methods to simultaneously analyze several time series datasets is challenging since under different experimental conditions biological systems may behave faster or slower making it hard to rely on the actual duration of the experiment. RESULTS: We present a new computational model and an associated algorithm to address the problem of inferring time-lagged regulatory relationships from multiple time series expression experiments with varying (unknown) time-scales. Our proposed algorithm uses a set of known interacting pairs to compute a temporal transformation between every two datasets. Using this temporal transformation we search for new interacting pairs. As we show, our method achieves a much lower false-positive rate compared to previous methods that use time series expression data for pairwise regulatory relationship discovery. Some of the new predictions made by our method can be verified using other high throughput data sources and functional annotation databases. AVAILABILITY: Matlab implementation is available from the supporting website: http://www.cs.cmu.edu/~yanxins/regulation_inference/index.html.  相似文献   

7.
MOTIVATION: Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments. RESULTS: We propose a novel algorithm to align LC-MS samples and to match corresponding ion species across samples. Our algorithm matches landmark signals between two data sets using a geometric technique based on pose clustering. Variations in mass and retention time are corrected by an affine dewarping function estimated from matched landmarks. We use the pairwise dewarping in an algorithm for aligning multiple samples. We show that our pose clustering approach is fast and reliable as compared to previous approaches. It is robust in the presence of noise and able to accurately align samples with only few common ion species. In addition, we can easily handle different kinds of LC-MS data and adopt our algorithm to new mass spectrometry technologies. AVAILABILITY: This algorithm is implemented as part of the OpenMS software library for shotgun proteomics and available under the Lesser GNU Public License (LGPL) at www.openms.de.  相似文献   

8.
An application tool for alignment, template matching and visualization of gene expression time series is presented. The core algorithm is based on dynamic time warping techniques used in the speech recognition field. These techniques allow for non-linear (elastic) alignment of temporal sequences of feature vectors and consequently enable detection of similar shapes with different phases. AVAILABILITY: The Java program, examples and a tutorial are available at http://www.psb.ugent.be/cbd/papers/gentxwarper/  相似文献   

9.
Basic Local Alignment Search Tool (BLAST) is a popular tool used for determining the patterns in genomic sequences. The algorithm of BLAST has gone for various changes from time to time. One third of the time is taken by BLAST to perform the gapped analysis on the sequences. An efficient algorithm has been presented that employs a new approach for curtailing the amount of sequences that proceed for gapped alignment. So this method will work after the ungapped alignment process is over. This works because of the fact that it is not necessary to perform gapped alignment for all the sequences that are coming from ungapped analysis. There is a significant increase in speed of the alignment process without compromising on the sensitivity of the result.  相似文献   

10.
A "Long Indel" model for evolutionary sequence alignment   总被引:7,自引:0,他引:7  
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.  相似文献   

11.
A popular commercially available oligonucleotide microarray technology employs sets of 25 base pair oligonucleotide probes for measurement of gene expression levels. A mathematical algorithm is required to compute an estimate of gene expression from the multiple probes. Previously proposed methods for summarizing gene expression data have either been substantially ad hoc or have relied on model assumptions that may be easily violated. Here we present a new algorithm for calculating gene expression from probe sets. Our approach is functionally related to leave-one-out cross-validation, a non-parametric statistical technique that is often applied in limited data situations. We illustrate this approach using data from our study seeking a molecular fingerprint of STAT3 regulated genes for early detection of human cancer.  相似文献   

12.
13.
MOTIVATION: Multiple sequence alignment is an important tool in computational biology. In order to solve the task of computing multiple alignments in affordable time, the most commonly used multiple alignment methods have to use heuristics. Nevertheless, the computation of optimal multiple alignments is important in its own right, and it provides a means of evaluating heuristic approaches or serves as a subprocedure of heuristic alignment methods. RESULTS: We present an algorithm that uses the divide-and-conquer alignment approach together with recent results on search space reduction to speed up the computation of multiple sequence alignments. The method is adaptive in that depending on the time one wants to spend on the alignment, a better, up to optimal alignment can be obtained. To speed up the computation in the optimal alignment step, we apply the alpha(*) algorithm which leads to a procedure provably more efficient than previous exact algorithms. We also describe our implementation of the algorithm and present results showing the effectiveness and limitations of the procedure.  相似文献   

14.
15.
Metabolic network alignment is a system scale comparative analysis that discovers important similarities and differences across different metabolisms and organisms. Although the problem of aligning metabolic networks has been considered in the past, the computational complexity of the existing solutions has so far limited their use to moderately sized networks. In this paper, we address the problem of aligning two metabolic networks, particularly when both of them are too large to be dealt with using existing methods. We develop a generic framework that can significantly improve the scale of the networks that can be aligned in practical time. Our framework has three major phases, namely the compression phase, the alignment phase and the refinement phase. For the first phase, we develop an algorithm which transforms the given networks to a compressed domain where they are summarized using fewer nodes, termed supernodes, and interactions. In the second phase, we carry out the alignment in the compressed domain using an existing network alignment method as our base algorithm. This alignment results in supernode mappings in the compressed domain, each of which are smaller instances of network alignment problem. In the third phase, we solve each of the instances using the base alignment algorithm to refine the alignment results. We provide a user defined parameter to control the number of compression levels which generally determines the tradeoff between the quality of the alignment versus how fast the algorithm runs. Our experiments on the networks from KEGG pathway database demonstrate that the compression method we propose reduces the sizes of metabolic networks by almost half at each compression level which provides an expected speedup of more than an order of magnitude. We also observe that the alignments obtained by only one level of compression capture the original alignment results with high accuracy. Together, these suggest that our framework results in alignments that are comparable to existing algorithms and can do this with practical resource utilization for large scale networks that existing algorithms could not handle. As an example of our method's performance in practice, the alignment of organism-wide metabolic networks of human (1615 reactions) and mouse (1600 reactions) was performed under three minutes by only using a single level of compression.  相似文献   

16.
MOTIVATION: Time series experiments of cDNA microarrays have been commonly used in various biological studies and conducted under a lot of experimental factors. A popular approach of time series microarray analysis is to compare one gene with another in their expression profiles, and clustering expression sequences is a typical example. On the other hand, a practically important issue in gene expression is to identify the general timing difference that is caused by experimental factors. This type of difference can be extracted by comparing a set of time series expression profiles under a factor with those under another factor, and so it would be difficult to tackle this issue by using only a current approach for time series microarray analysis. RESULTS: We have developed a systematic method to capture the timing difference in gene expression under different experimental factors, based on hidden Markov models. Our model outputs a real-valued vector at each state and has a unique state transition diagram. The parameters of our model are trained from a given set of pairwise (generally multiplewise) expression sequences. We evaluated our model using synthetic as well as real microarray datasets. The results of our experiment indicate that our method worked favourably to identify the timing ordering under different experimental factors, such as that gene expression under heat shock tended to start earlier than that under oxidative stress. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

17.
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of–the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.  相似文献   

18.
The need for the temporal alignment of gait cycle data is well known; however, there is little consensus concerning which alignment method to use. In this paper, we discuss the pros and cons of some methods commonly applied to temporally align gait cycle data (normalization to percent gait cycle, dynamic time warping, derivative dynamic time warping, and piecewise alignment methods). In addition, we empirically evaluate these different methods' abilities to produce successful temporal alignment when mapping a test gait cycle trajectory to a target trajectory. We demonstrate that piecewise temporal alignment techniques outperform other commonly used alignment methods (normalization to percent gait cycle, dynamic time warping, and derivative dynamic time warping) in typical biomechanical and clinical alignment tasks. Lastly, we present an example of how these piecewise alignment techniques make it possible to separately examine intensity and temporal differences between gait cycle data throughout the entire gait cycle, which can provide greater insight into the complexities of movement patterns.  相似文献   

19.
Alignment of the individual images of a tilt series is a critical step in obtaining high-quality electron microscope reconstructions. We report on general methods for producing good alignments, and utilizing the alignment data in subsequent reconstruction steps. Our alignment techniques utilize bundle adjustment. Bundle adjustment is the simultaneous calculation of the position of distinguished markers in the object space and the transforms of these markers to their positions in the observed images, along the bundle of particle trajectories along which the object is projected to each EM image. Bundle adjustment techniques are general enough to encompass the computation of linear, projective or nonlinear transforms for backprojection, and can compensate for curvilinear trajectories through the object, sample warping, and optical aberration. We will also report on new reconstruction codes and describe our results using these codes.  相似文献   

20.
Since traditional multiple alignment formulations are NP-hard, heuristics are commonly employed to find acceptable alignments with no guaranteed performance bound. This causes a substantial difficulty in understanding what the resulting alignment means and in assessing the quality of these alignments. We propose an alternative formulation of multiple alignment based on the idea of finding a multiple alignment of k sequences which preserves k - 1 pairwise alignments as specified by edges of a given tree. Although it is well known that such a preserving alignment always exists, it did not become a mainstream method for multiple alignment since it seems that a lot of information is lost from ignoring pairwise similarities outside the tree. In contrast, by using pairwise alignments that incorporate consistency information from other sequences, we show that it is possible to obtain very good accuracy with the preserving alignment formulation. We show that a reasonable objective function to use is to find the shortest preserving alignment, and, by a reduction to a graph-theoretic problem, that the problem of finding the shortest preserving multiple alignment can be solved in polynomial time. We demonstrate the success of this approach on three sets of benchmark multiple alignments by using consistency-based pairwise alignments from the first stage of two of the best performing progressive alignment algorithms TCoffee and ProbCons and replace the second heuristic progressive step of these algorithms by the exact preserving alignment step. We apply this strategy to TCoffee and show that our approach outperforms TCoffee on two of the three test sets. We apply the strategy to a variant of ProbCons with no iterative refinements and show that our approach achieves similar or better accuracy except on one test set. We also compare our performance to ProbCons with iterative refinements and show that our approach achieves similar or better accuracy on many subcategories even without further refinements. The most important advantage of the preserving alignment formulation is that we are certain that we can solve the problem in polynomial time without using a heuristic. A software program implementing this approach (PSAlign) is available at http://faculty.cs.tamu.edu/shsze/psalign.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号