首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The analysis of signals consisting of discrete and irregular data causes methodological problems for the Fourier spectral Analysis: Since it is based on sinusoidal functions, rectangular signals with unequal periodicities cannot easily be replicated. The Walsh spectral Analysis is based on the so called "Walsh functions", a complete set of orthonormal, rectangular waves and thus seems to be the method of choice for analysing signals consisting of binary or ordinal data. The paper compares the Walsh spectral analysis and the Fourier spectral analysis on the basis of simulated and real binary data sets of various length. Simulated data were derived from signals with defined cyclic patterns that were noised by randomly generated signals of the same length. The Walsh and Fourier spectra of each set were determined and up to 25% of the periodogram coefficients were utilized as input for an inverse transform. Mean square approximation error (MSE) was calculated for each of the series in order to compare the goodness of fit between the original and the reconstructed signal. The same procedure was performed with real data derived from a behavioral observation in pigs. The comparison of the two methods revealed that, in the analysis of discrete and binary time series, Walsh spectral analysis is the more appropriate method, if the time series is rather short. If the length of the signal increases, the difference between the two methods is less substantial.  相似文献   

2.
The analysis of signals consisting of discrete and irregular data causes methodological problems for the Fourier spectral Analysis: Since it is based on sinusoidal functions, rectangular signals with unequal periodicities cannot easily be replicated. The Walsh spectral Analysis is based on the so called "Walsh functions", a complete set of orthonormal, rectangular waves and thus seems to be the method of choice for analysing signals consisting of binary or ordinal data. The paper compares the Walsh spectral analysis and the Fourier spectral analysis on the basis of simulated and real binary data sets of various length. Simulated data were derived from signals with defined cyclic patterns that were noised by randomly generated signals of the same length. The Walsh and Fourier spectra of each set were determined and up to 25% of the periodogram coefficients were utilized as input for an inverse transform. Mean square approximation error (MSE) was calculated for each of the series in order to compare the goodness of fit between the original and the reconstructed signal. The same procedure was performed with real data derived from a behavioral observation in pigs. The comparison of the two methods revealed that, in the analysis of discrete and binary time series, Walsh spectral analysis is the more appropriate method, if the time series is rather short. If the length of the signal increases, the difference between the two methods is less substantial.  相似文献   

3.
Dynamic aspects of R-R intervals have often been analyzed by means of linear and nonlinear measures. The goal of this study was to analyze binary sequences, in which only the dynamic information is retained, by means of two different aspects of regularity. R-R interval sequences derived from 24-h electrocardiogram (ECG) recordings of 118 healthy subjects were converted to symbolic binary sequences that coded the beat-to-beat increase or decrease in the R-R interval. Shannon entropy was used to quantify the occurrence of short binary patterns (length N = 5) in binary sequences derived from 10-min intervals. The regularity of the short binary patterns was analyzed on the basis of approximate entropy (ApEn). ApEn had a linear dependence on mean R-R interval length, with increasing irregularity occurring at longer R-R interval length. Shannon entropy of the same sequences showed that the increase in irregularity is accompanied by a decrease in occurrence of some patterns. Taken together, these data indicate that irregular binary patterns are more probable when the mean R-R interval increases. The use of surrogate data confirmed a nonlinear component in the binary sequence. Analysis of two consecutive 24-h ECG recordings for each subject demonstrated good intraindividual reproducibility of the results. In conclusion, quantification of binary sequences derived from ECG recordings reveals properties that cannot be found using the full information of R-R interval sequences.  相似文献   

4.
There is great interest in chromosome- and pathway-based techniques for genomics data analysis in the current work in order to understand the mechanism of disease. However, there are few studies addressing the abilities of machine learning methods in incorporating pathway information for analyzing microarray data. In this paper, we identified the characteristic pathways by combining the classification error rates of out-of-bag (OOB) in random forests with pathways information. At each characteristic pathway, the correlation of gene expression was studied and the co-regulated gene patterns in different biological conditions were mined by Mining Attribute Profile (MAP) algorithm. The discovered co-regulated gene patterns were clustered by the average-linkage hierarchical clustering technique. The results showed that the expression of genes at the same characteristic pathway were approximate. Furthermore, two characteristic pathways were discovered to present co-regulated gene patterns in which one contained 108 patterns and the other contained one pattern. The results of cluster analysis showed that the smallest similarity coefficient of clusters was more than 0.623, which indicated that the co-regulated patterns in different biological conditions were more approximate at the same characteristic pathway. The methods discussed in this paper can provide additional insight into the study of microarray data.  相似文献   

5.
Adjust quality scores from alignment and improve sequencing accuracy   总被引:2,自引:0,他引:2  
Li M  Nordborg M  Li LM 《Nucleic acids research》2004,32(17):5183-5191
In shotgun sequencing, statistical reconstruction of a consensus from alignment requires a model of measurement error. Churchill and Waterman proposed one such model and an expectation–maximization (EM) algorithm to estimate sequencing error rates for each assembly matrix. Ewing and Green defined Phred quality scores for base-calling from sequencing traces by training a model on a large amount of data. However, sample preparations and sequencing machines may work under different conditions in practice and therefore quality scores need to be adjusted. Moreover, the information given by quality scores is incomplete in the sense that they do not describe error patterns. We observe that each nucleotide base has its specific error pattern that varies across the range of quality values. We develop models of measurement error for shotgun sequencing by combining the two perspectives above. We propose a logistic model taking quality scores as covariates. The model is trained by a procedure combining an EM algorithm and model selection techniques. The training results in calibration of quality values and leads to a more accurate construction of consensus. Besides Phred scores obtained from ABI sequencers, we apply the same technique to calibrate quality values that come along with Beckman sequencers.  相似文献   

6.
Although skin contamination by radionuclides is the most common cause of nuclear workers accidents, few studies dealing with the penetration of radioactive contamination through the skin are available. This work is a review of experimental methods that allow to assess transfer of radionuclides through the skin in occupational conditions, with or without skin trauma. The first section describes the different methods applied for skin transfer assessment of chemicals used in pharmacology. Major radionuclide contamination accidents can be associated with skin traumas. Thus, the second section describes the adaptation of these methods to radiotoxicology. Finally, the third section is an in vivo investigation of cobalt transfer (57CoCl2) through undamaged and damaged skin which simulates different industrial accident conditions (excoriation, acid or alcalin burn, scalding, branding).  相似文献   

7.
Misclassification in binary outcomes can severely bias effect estimates of regression models when the models are naively applied to error‐prone data. Here, we discuss response misclassification in studies on the special class of bilateral diseases. Such diseases can affect neither, one, or both entities of a paired organ, for example, the eyes or ears. If measurements are available on both organ entities, disease occurrence in a person is often defined as disease occurrence in at least one entity. In this setting, there are two reasons for response misclassification: (a) ignorance of missing disease assessment in one of the two entities and (b) error‐prone disease assessment in the single entities. We investigate the consequences of ignoring both types of response misclassification and present an approach to adjust the bias from misclassification by optimizing an adequate likelihood function. The inherent modelling assumptions and problems in case of entity‐specific misclassification are discussed. This work was motivated by studies on age‐related macular degeneration (AMD), a disease that can occur separately in each eye of a person. We illustrate and discuss the proposed analysis approach based on real‐world data of a study on AMD and simulated data.  相似文献   

8.
ABSTRACT: BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly for modern regularization approaches that thrive with dimensionality. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several competing methodologies.  相似文献   

9.
Species distributional or trait data based on range map (extent‐of‐occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix.  相似文献   

10.
The aim of this work is to present a new training algorithm for SVMs based on the pattern selection strategy called Error Dependent Repetition (EDR). With EDR, the presentation frequency of a pattern depends on its error: patterns with larger errors are selected more frequently and patterns with smaller error (or learned) are presented with minor frequency. Using a simple iterative process based on gradient ascent, SVM-EDR can solve the dual problem without any assumption about support vectors or the Karush-Kuhn-Tucker (KKT) conditions.  相似文献   

11.
12.
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.  相似文献   

13.
An intertidal San Francisco Bay salt marsh was used to study the spatial relationships between vegetation patterns and hydrologic and edaphic variables. Multiple abiotic variables were represented by six metrics: elevation, distance to major tidal channels and to the nearest channel of any size, edaphic conditions during dry and wet circumstances, and the magnitude of tidally induced changes in soil saturation and salinity. A new approach, quantitative differential electromagnetic induction (Q-DEMI), was developed to obtain the last metric. The approach converts the difference in soil electrical conductivity (ECa) between dry and wet conditions to quantitative maps of tidally induced changes in root zone soil water content and salinity. The result is a spatially exhaustive map of edaphic changes throughout the mapped area of the ecosystem. Spatially distributed data on the six metrics were used to explore two hypotheses: (1) multiple abiotic variables relevant to vegetation zonation each exhibit different, uncorrelated, spatial patterns throughout an intertidal salt marsh; (2) vegetation zones and habitats of individual plant species are uniquely characterized by different combinations of key metrics. The first hypothesis was supported by observed, uncorrelated spatial variability in the metrics. The second hypothesis was supported by binary logistic regression models that identified key vegetation zone and species habitat characteristics from among the six metrics. Based on results from 108 models, the Q-DEMI map of saturation and salinity change was the most useful metric of those tested for distinguishing different vegetation zones and plant species habitats in the salt marsh.  相似文献   

14.
15.
Use of runs statistics for pattern recognition in genomic DNA sequences.   总被引:2,自引:0,他引:2  
In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.  相似文献   

16.
The cockroach is known to possess several morphologically distinct types of sensilla on its antenna, each of which contain a couple or a few receptor cells that respond to an array of compounds. We recorded the response of cells exclusively from one type of sensillum to evaluate the variation in the response of the cells in these sensilla to three closely related alcohols and their binary mixtures. Our results indicate that cells within the class of those responsive to aliphatic alcohols are otherwise variable in their response to particular aliphatic alcohols and not easily classifiable into subclasses. They also indicate that patterns of responses among cells are not robust with respect to concentration. Finally, a considerable level of inhibition is indicated in the response of the receptor cells to binary mixtures compared with the response to pure odorants. The data suggest that discrimination of alcohols (and other odorants of general but not special significance) by the cockroach cannot be understood simply in terms of labeled lines or linear filters. Accepted: 20 December 1996  相似文献   

17.
This work provides a formal evaluation of 25 ecological indicators highlighted by the Southeast Fisheries Science Center’s IEA program as useful for tracking ecosystem components in the Gulf of Mexico. Using an Atlantis ecosystem model as an operating model, we select indicators that are quantifiable using simulation outputs and evaluate their sensitivity to changes in fishing mortality. Indicator behavior was examined using a multivariate ordination. The ordination is used to tell how well each indicator describes variation in ecosystem structure (termed ‘importance’) under different levels of fishing mortality and to reveal redundancies in the information conveyed by indicators. We determine importance using sample data from the operating model, with and without observation error added. Indicators whose importance is diminished least by error are considered robust to observational error. We then quantify the interannual noise of each indicator, where annual variability relates to the required sampling frequency in a management application. Red snapper biomass, King mackerel biomass and Reef fish catch ranked in the top 5 most important without error scenarios, and King mackerel biomass and Species richness were in the top 5 most important even after error was added. Red snapper biomass was consistently found to be the most important and most robust among fishing mortality scenarios tested, and all 4 of these indicators were found to have low levels of interannual noise suggesting that they need to be sampled infrequently. Our results provide insight into the usefulness of these indicators for fisheries managers interested in the impacts of fishing on the ecosystem.  相似文献   

18.
Inherent uncertainties in empirical data limit our understanding of interrelationships among variables and constrain our possibilities to identify critical thresholds as well as our possibilities to develop practically useful predictive models for water management. This work concerns key water variables for water management and the first aim is to utilize a very comprehensive set of data set for Ringkobing Fjord, Denmark. The paper first presents the methods and data used, then a reference regression for chlorophyll, coefficients of variation (CV = SD/MV; MV = mean value; SD = standard deviation) for a variety of water variables and how these CV‐values influence n, the number of data used to determine coastal area characteristic mean or median values (note that the interest here is not on the conditions in sampling bottle but on the conditions in entire coastal areas, the ecosystem perspective). The main part of the work presents a data reduction exercise including a definition of an error function where the focus is on “large N”, i.e., the number of data in a regression. The results are summarized in a diagram relating the error in the regression to different water variables with different inherent CVs in rivers, lakes and coastal areas. Given the inherently high CV‐values of many of these water variables, more samples than generally taken in most regular monitoring programs are needed if scientific unassailable conclusions are to be made concerning interrelationships among the variables and to produce scientifically meaningful information to detect critical ecosystem changes and threshold values. (© 2007 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

19.
《IRBM》2022,43(1):13-21
Early discernment of drivers drowsy state may prevent numerous worldwide road accidents. Electroencephalogram (EEG) signals provide valuable information about the neurological changes for discrimination of alert and drowsy state. A signal is decomposed into multi-components for the analysis of the physiological state. Tunable Q wavelet transform (TQWT) decomposes the signal into low-pass and high-pass sub-bands without a choice of wavelet. The information content captured by these sub-bands depends on the choice of decomposition parameters. Due to the non-stationary nature of EEG signals, the predefined decomposition parameters of TQWT lead to information loss and degrade system performance. Hence it is required to automate the decomposition parameters in accordance with the nature of signals. In this paper, an optimized tunable Q wavelet transform (O-TQWT) is proposed for the adaptive selection of decomposition parameters by using different optimization algorithms. Objective function as a mean square error (MSE) of decomposition is minimized by optimization algorithms. Optimum decomposition parameters are used to decompose the signals into sub-bands. Time-domain based features are excerpted from the sub-bands of O-TQWT. Highly discriminant features selected by using Kruskal Wallis test are used as an input to different classification techniques. Classification accuracy of 96.14% is achieved by least square support vector machine with radial basis function kernel which is better than the other existing methodologies using the same database.  相似文献   

20.
“Smart”-scales are a new tool for frequent monitoring of weight change as well as weigh-in behavior. These scales give researchers the opportunity to discover patterns in the frequency that individuals weigh themselves over time, and how these patterns are associated with overall weight loss. Our motivating data come from an 18-month behavioral weight loss study of 55 adults classified as overweight or obese who were instructed to weigh themselves daily. Adherence to daily weigh-in routines produces a binary times series for each subject, indicating whether a participant weighed in on a given day. To characterize weigh-in by time-invariant patterns rather than overall adherence, we propose using hierarchical clustering with dynamic time warping (DTW). We perform an extensive simulation study to evaluate the performance of DTW compared to Euclidean and Jaccard distances to recover underlying patterns in adherence time series. In addition, we compare cluster performance using cluster validation indices (CVIs) under the single, average, complete, and Ward linkages and evaluate how internal and external CVIs compare for clustering binary time series. We apply conclusions from the simulation to cluster our real data and summarize observed weigh-in patterns. Our analysis finds that the adherence trajectory pattern is significantly associated with weight loss.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号