首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper investigates the utility of the Lomb–Scargle periodogram for the analysis of biological rhythms. This method is particularly suited to detect periodic components in unequally sampled time-series and data sets with missing values, but restricts all calculations to actually measured values. The Lomb-Scargle method was tested on both real and simulated time-series with even and uneven sampling, and compared to a standard method in biomedical rhythm research, the Chi-square periodogram. Results indicate that the Lomb–Scargle algorithm shows a clearly better detection efficiency and accuracy in the presence of noise, and avoids possible bias or erroneous results that may arise from replacement of missing data by interpolation techniques. Hence, the Lomb–Scargle periodogram may serve as a useful method for the study of biological rhythms, especially when applied to telemetrical or observational time-series obtained from free-living animals, i.e., data sets that notoriously lack points.  相似文献   

2.
The classical power spectrum, computed in the frequency domain, outranks traditionally used periodograms derived in the time domain (such as the chi2 periodogram) regarding the search for biological rhythms. Unfortunately, classical power spectral analysis is not possible with unequally spaced data (e.g., time series with missing data). The Lomb-Scargle periodogram fixes this shortcoming. However, peak detection in the Lomb-Scargle periodogram of unequally spaced data requires some careful consideration. To guide researchers in the proper evaluation of detected peaks, therefore, a novel procedure and a computer program have recently become available. It is recommended that the Lomb-Scargle periodogram be the default method of periodogram analysis in future biomedical applications of rhythm investigation.  相似文献   

3.
MOTIVATION: Periodic patterns in time series resulting from biological experiments are of great interest. The commonly used Fast Fourier Transform (FFT) algorithm is applicable only when data are evenly spaced and when no values are missing, which is not always the case in high-throughput measurements. The choice of statistic to evaluate the significance of the periodic patterns for unevenly spaced gene expression time series has not been well substantiated. METHODS: The Lomb-Scargle periodogram approach is used to search time series of gene expression to quantify the periodic behavior of every gene represented on the DNA array. The Lomb-Scargle periodogram analysis provides a direct method to treat missing values and unevenly spaced time points. We propose the combination of a Lomb-Scargle test statistic for periodicity and a multiple hypothesis testing procedure with controlled false discovery rate to detect significant periodic gene expression patterns. RESULTS: We analyzed the Plasmodium falciparum gene expression dataset. In the Quality Control Dataset of 5080 expression patterns, we found 4112 periodic probes. In addition, we identified 243 probes with periodic expression in the Complete Dataset, which could not be examined in the original study by the FFT analysis due to an excessive number of missing values. While most periodic genes had a period of 48 h, some had a period close to 24 h. Our approach should be applicable for detection and quantification of periodic patterns in any unevenly spaced gene expression time-series data.  相似文献   

4.
Periodogram analysis of unequally spaced time-series, as part of many biological rhythm investigations, is complicated. The mathematical frameworkis scattered over the literature, and the interpretation of results is often debatable. In this paper, we show that the Lomb-Scargle method is the appropriate tool for periodogram analysis of unequally spaced data. A unique procedure of multiple period searching is derived, facilitating the assessment of the various rhythms that may be present in a time-series. All relevant mathematical and statistical aspects are considered in detail, and much attention is given to the correct interpretation of results. The use of the procedure is illustrated by examples, and problems that may be encountered are discussed. It is argued that, when following the procedure of multiple period searching, we can even benefit from the unequal spacing of a time-series in biological rhythm research.  相似文献   

5.
The Lomb-Scargle periodogram was introduced in astrophysics to detect sinusoidal signals in noisy unevenly sampled time series. It proved to be a powerful tool in time series analysis and has recently been adapted in biomedical sciences. Its use is motivated by handling non-uniform data which is a common characteristic due to the restricted and irregular observations of, for instance, free-living animals. However, the observational data often contain fractions of non-Gaussian noise or may consist of periodic signals with non-sinusoidal shapes. These properties can make more difficult the interpretation of Lomb-Scargle periodograms and can lead to misleading estimates. In this letter we illustrate these difficulties for noise-free bimodal rhythms and sinusoidal signals with outliers. The examples are aimed to emphasize limitations and to complement the recent discussion on Lomb-Scargle periodograms.  相似文献   

6.
The Lomb-Scargle periodogram was introduced in astrophysics to detect sinusoidal signals in noisy unevenly sampled time series. It proved to be a powerful tool in time series analysis and has recently been adapted in biomedical sciences. Its use is motivated by handling non-uniform data which is a common characteristic due to the restricted and irregular observations of, for instance, free-living animals. However, the observational data often contain fractions of non-Gaussian noise or may consist of periodic signals with non-sinusoidal shapes. These properties can make more difficult the interpretation of Lomb-Scargle periodograms and can lead to misleading estimates. In this letter we illustrate these difficulties for noise-free bimodal rhythms and sinusoidal signals with outliers. The examples are aimed to emphasize limitations and to complement the recent discussion on Lomb-Scargle periodograms.  相似文献   

7.
Periodogram analysis of unequally spaced time-series, as part of many biological rhythm investigations, is complicated. The mathematical frameworkis scattered over the literature, and the interpretation of results is often debatable. In this paper, we show that the Lomb–Scargle method is the appropriate tool for periodogram analysis of unequally spaced data. A unique procedure of multiple period searching is derived, facilitating the assessment of the various rhythms that may be present in a time-series. All relevant mathematical and statistical aspects are considered in detail, and much attention is given to the correct interpretation of results. The use of the procedure is illustrated by examples, and problems that may be encountered are discussed. It is argued that, when following the procedure of multiple period searching, we can even benefit from the unequal spacing of a time-series in biological rhythm research.  相似文献   

8.
The chi-square periodogram (CSP), developed over 40 years ago, continues to be one of the most popular methods to estimate the period of circadian (circa 24-h) rhythms. Previous work has indicated the CSP is sometimes less accurate than other methods, but understanding of why and under what conditions remains incomplete. Using simulated rhythmic time-courses, we found that the CSP is prone to underestimating the period in a manner that depends on the true period and the length of the time-course. This underestimation bias is most severe in short time-courses (e.g., 3 days), but is also visible in longer simulated time-courses (e.g., 12 days) and in experimental time-courses of mouse wheel-running and ex vivo bioluminescence. We traced the source of the bias to discontinuities in the periodogram that are related to the number of time-points the CSP uses to calculate the observed variance for a given test period. By revising the calculation to avoid discontinuities, we developed a new version, the greedy CSP, that shows reduced bias and improved accuracy. Nonetheless, even the greedy CSP tended to be less accurate on our simulated time-courses than an alternative method, namely the Lomb-Scargle periodogram. Thus, although our study describes a major improvement to a classic method, it also suggests that users should generally avoid the CSP when estimating the period of biological rhythms.  相似文献   

9.
10.
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.  相似文献   

11.
Gaussian mixture clustering and imputation of microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. RESULTS: Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.  相似文献   

12.
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.  相似文献   

13.
The moving window principle applied to the khi-square periodogram allows, through local successive examinations, a comprehensive study of the biological time series. This method puts forward several cases of transition linked to environmental or physiological changes. Furthermore, we applied the Grassberger and Procaccia method (1983) for the analysis of more complex transition problems. The method helps to detect chaotic properties in behavioral activity rhythms.  相似文献   

14.
The moving window principle applied to the khi-square periodogram allows, through local successive examinations, a comprehensive study of the biological time series. This method puts forward several cases of transition linked to environmental or physiological changes. Furthermore, we applied the Grassberger and Procaccia method (1983) for the analysis of more complex transition problems. The method helps to detect chaotic properties in behavioral activity rhythms.  相似文献   

15.
Missing value estimation methods for DNA microarrays   总被引:39,自引:0,他引:39  
MOTIVATION: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. RESULTS: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.  相似文献   

16.
Treatment‐related changes in neurobiological rhythms are of increasing interest to psychologists, psychiatrists, and biological rhythms researchers. New methods for analyzing change in rhythms are needed, as most common methods disregard the rich complexity of biological processes. Large time series data sets reflect the intricacies of underlying neurobiological processes, but can be difficult to analyze. We propose the use of Fourier methods with multivariate permutation test (MPT) methods for analyzing change in rhythms from time series data. To validate the use of MPT for Fourier‐transformed data, we performed Monte Carlo simulations and compared statistical power and family‐wise error for MPT to Bonferroni‐corrected and uncorrected methods. Results show that MPT provides greater statistical power than Bonferroni‐corrected tests, while appropriately controlling family‐wise error. We applied this method to human, pre‐ and post‐treatment, serially‐sampled neurotransmitter data to confirm the utility of this method using real data. Together, Fourier with MPT methods provides a statistically powerful approach for detecting change in biological rhythms from time series data.  相似文献   

17.
We present a novel approach for analyzing biological time-series data using a context-free language (CFL) representation that allows the extraction and quantification of important features from the time-series. This representation results in Hierarchically AdaPtive (HAP) analysis, a suite of multiple complementary techniques that enable rapid analysis of data and does not require the user to set parameters. HAP analysis generates hierarchically organized parameter distributions that allow multi-scale components of the time-series to be quantified and includes a data analysis pipeline that applies recursive analyses to generate hierarchically organized results that extend traditional outcome measures such as pharmacokinetics and inter-pulse interval. Pulsicons, a novel text-based time-series representation also derived from the CFL approach, are introduced as an objective qualitative comparison nomenclature. We apply HAP to the analysis of 24 hours of frequently sampled pulsatile cortisol hormone data, which has known analysis challenges, from 14 healthy women. HAP analysis generated results in seconds and produced dozens of figures for each participant. The results quantify the observed qualitative features of cortisol data as a series of pulse clusters, each consisting of one or more embedded pulses, and identify two ultradian phenotypes in this dataset. HAP analysis is designed to be robust to individual differences and to missing data and may be applied to other pulsatile hormones. Future work can extend HAP analysis to other time-series data types, including oscillatory and other periodic physiological signals.  相似文献   

18.
Two-dimensional SDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection and matching. In this study, we show that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume. Furthermore, we present an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach. Confidence intervals for abundances and p-values for differential expression between two groups are calculated using bootstrap sampling. The algorithm is compared to two standard approaches, one that discards missing values and one that sets all missing values to zero. We have evaluated this approach in two different gel data sets of different biological origin. An R-program, implementing the algorithm, is freely available at http://bioinfo.thep .lu.se/MissingValues2Dgels.html.  相似文献   

19.
Gan X  Liew AW  Yan H 《Nucleic acids research》2006,34(5):1608-1619
Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.  相似文献   

20.
We describe WinCD, a program for extracting quantitative information about periodicity in time-series data using the method of complex demodulation (CD). The method is particularly well suited for the analysis of the effects of variables that may produce changes in biological rhythms, such as sleep deprivation, adaptation to changes in work schedules, time zone displacements, and various sorts of pathology. WinCD enables exploratory analysis of time series data by providing graphical displays of raw and processed time series, as well as numerous options for viewing and saving quantitative data. We describe WinCD operations and examples of the use of the program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号