首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS: Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY: The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.  相似文献   

2.
Robust smooth segmentation approach for array CGH data analysis   总被引:2,自引:0,他引:2  
MOTIVATION: Array comparative genomic hybridization (aCGH) provides a genome-wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach. METHODS: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations, and the second deals with possible jumps in the underlying pattern associated with different segments. We develop a fast and reliable computational procedure based on the iterative weighted least-squares algorithm with band-limited matrix inversion. RESULTS: Using simulated and real data sets, we demonstrate how smoothseg can aid in identification of regions with genomic alteration and in classification of samples. For the real data sets, smoothseg leads to smaller false discovery rate and classification error rate than the circular binary segmentation (CBS) algorithm. In a realistic simulation setting, smoothseg is better than wavelet smoothing and CBS in identification of regions with genomic alterations and better than CBS in classification of samples. For comparative analyses, we demonstrate that segmenting the t-statistics performs better than segmenting the data. AVAILABILITY: The R package smoothseg to perform smooth segmentation is available from http://www.meb.ki.se/~yudpaw.  相似文献   

3.
The properties of the rectangular hyperbola and monomolecularfunctions, with respect to the photosynthesis/photon flux density(PFD) relationship, are discussed, and the shortcomings of theformer are highlighted. Both models were fitted to data acquiredfrom three closely related Veronica species of contrasting ecology.The non-linear regression algorithms give estimates, with standarderrors, of light saturated photosynthetic rate, light compensationpoint, dark respiration rate, and photochemical efficiency atlow PFD. While the rectangular hyperbola gave almost as gooda fit to the data as the monomolecular for each species, thelight saturated photosynthetic rate estimate given by the formerwas always unacceptably high in comparison with that indicatedby the obvious trend of the data. Moreover, this tendency wasaccentuated if near-saturating PFDs were removed from data sets,and there was a tendency for the fitting algorithm to becomeunstable. No such problems were encountered with the monomolecularfunction, and it is suggested that this be used whenever a simpleempirical model is required to analyze photosynthesis/PFD data. Veronica montana L, Veronica chamaedrys L, Veronica officinalis L, wood speedwell, Germander speedwell, common speedwell, empirical mathematical model, monomolecular function, rectangular hyperbola function, nonlinear regression, photosynthesis, photon flux density, light saturated photosynthetic rate, light compensation point, photochemical efficiency, dark respiration rate  相似文献   

4.
In this paper, our aim is to analyze geographical and temporal variability of disease incidence when spatio‐temporal count data have excess zeros. To that end, we consider random effects in zero‐inflated Poisson models to investigate geographical and temporal patterns of disease incidence. Spatio‐temporal models that employ conditionally autoregressive smoothing across the spatial dimension and B‐spline smoothing over the temporal dimension are proposed. The analysis of these complex models is computationally difficult from the frequentist perspective. On the other hand, the advent of the Markov chain Monte Carlo algorithm has made the Bayesian analysis of complex models computationally convenient. Recently developed data cloning method provides a frequentist approach to mixed models that is also computationally convenient. We propose to use data cloning, which yields to maximum likelihood estimation, to conduct frequentist analysis of zero‐inflated spatio‐temporal modeling of disease incidence. One of the advantages of the data cloning approach is that the prediction and corresponding standard errors (or prediction intervals) of smoothing disease incidence over space and time is easily obtained. We illustrate our approach using a real dataset of monthly children asthma visits to hospital in the province of Manitoba, Canada, during the period April 2006 to March 2010. Performance of our approach is also evaluated through a simulation study.  相似文献   

5.
Over the past decade, there has been growing enthusiasm for using electronic medical records (EMRs) for biomedical research. Quantile regression estimates distributional associations, providing unique insights into the intricacies and heterogeneity of the EMR data. However, the widespread nonignorable missing observations in EMR often obscure the true associations and challenge its potential for robust biomedical discoveries. We propose a novel method to estimate the covariate effects in the presence of nonignorable missing responses under quantile regression. This method imposes no parametric specifications on response distributions, which subtly uses implicit distributions induced by the corresponding quantile regression models. We show that the proposed estimator is consistent and asymptotically normal. We also provide an efficient algorithm to obtain the proposed estimate and a randomly weighted bootstrap approach for statistical inferences. Numerical studies, including an empirical analysis of real-world EMR data, are used to assess the proposed method's finite-sample performance compared to existing literature.  相似文献   

6.
Finding optimal three-dimensional molecular configurations based on a limited amount of experimental and/or theoretical data requires efficient nonlinear optimization algorithms. Optimization methods must be able to find atomic configurations that are close to the absolute, or global, minimum error and also satisfy known physical constraints such as minimum separation distances between atoms (based on van der Waals interactions). The most difficult obstacles in these types of problems are that 1) using a limited amount of input data leads to many possible local optima and 2) introducing physical constraints, such as minimum separation distances, helps to limit the search space but often makes convergence to a global minimum more difficult. We introduce a constrained global optimization algorithm that is robust and efficient in yielding near-optimal three-dimensional configurations that are guaranteed to satisfy known separation constraints. The algorithm uses an atom-based approach that reduces the dimensionality and allows for tractable enforcement of constraints while maintaining good global convergence properties. We evaluate the new optimization algorithm using synthetic data from the yeast phenylalanine tRNA and several proteins, all with known crystal structure taken from the Protein Data Bank. We compare the results to commonly applied optimization methods, such as distance geometry, simulated annealing, continuation, and smoothing. We show that compared to other optimization approaches, our algorithm is able combine sparse input data with physical constraints in an efficient manner to yield structures with lower root mean squared deviation.  相似文献   

7.
Yue YR  Loh JM 《Biometrics》2011,67(3):937-946
In this work we propose a fully Bayesian semiparametric method to estimate the intensity of an inhomogeneous spatial point process. The basic idea is to first convert intensity estimation into a Poisson regression setting via binning data points on a regular grid, and then model the log intensity semiparametrically using an adaptive version of Gaussian Markov random fields to smooth the corresponding counts. The inference is carried by an efficient Markov chain Monte Carlo simulation algorithm. Compared to existing methods for intensity estimation, for example, parametric modeling and kernel smoothing, the proposed estimator not only provides inference regarding the dependence of the intensity function on possible covariates, but also uses information from the data to adaptively determine the amount of smoothing at the local level. The effectiveness of using our method is demonstrated through simulation studies and an application to a rainforest dataset.  相似文献   

8.
Quantile smoothing of array CGH data   总被引:4,自引:0,他引:4  
MOTIVATION: Plots of array Comparative Genomic Hybridization (CGH) data often show special patterns: stretches of constant level (copy number) with sharp jumps between them. There can also be much noise. Classic smoothing algorithms do not work well, because they introduce too much rounding. To remedy this, we introduce a fast and effective smoothing algorithm based on penalized quantile regression. It can compute arbitrary quantile curves, but we concentrate on the median to show the trend and the lower and upper quartile curves showing the spread of the data. Two-fold cross-validation is used for optimizing the weight of the penalties. RESULTS: Simulated data and a published dataset are used to show the capabilities of the method to detect the segments of changed copy numbers in array CGH data.  相似文献   

9.
Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L1‐type convex function or solving the nonsmoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.  相似文献   

10.
11.
The identification of feasible operating conditions during the early stages of bioprocess development is implemented frequently through High Throughput (HT) studies. These typically employ techniques based on regression analysis, such as Design of Experiments. In this work, an alternative approach, based on a previously developed variant of the Simplex algorithm, is compared to the conventional regression‐based method for three experimental systems involving polishing chromatography and protein refolding. This Simplex algorithm variant was found to be more effective in identifying superior operating conditions, and in fact it reached the global optimum in most cases involving multiple optima. By contrast, the regression‐based method often failed to reach the global optimum, and in many cases reached poor operating conditions. The Simplex‐based method is further shown to be robust in dealing with noisy experimental data, and requires fewer experiments than regression‐based methods to reach favorable operating conditions. The Simplex‐variant also lends itself to the use of HT analytical methods, when they are available, which can assist in avoiding analytical bottlenecks. It is suggested that this Simplex‐variant is ideally suited to rapid optimization in early‐phase process development. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:404–419, 2016  相似文献   

12.
We have studied the relationship between community respiration(R) and enzymatic activity of the electron transport system(ETS) in upper ocean microbial communities (<225 µm)from different oceanic regions. In all except one of the regions,R and ETS were significantly positive correlated. This supportsthe hypothesis that ETS can be widely used to estimate planktonrespiration in natural marine communities (Packard, T.T., Adv.Aquat Microbiol, 3,207–261, 1985). A regression equationwas obtained between all the R and ETS data studied, to deriverespiration from ETS activity. This equation yields a mean errorin the prediction of ±34%, similar to the errors obtainedapplying the equations at each area, but lower than the errorobtained when using the mean R:ETS ratio to determine respiration(±45%). Our results suggest that the use of the ETS-Ralgorithm, along with measurements of ETS activity in seawater,facilitates the estimation of seawater respiratory oxygen consumptionon the mesoscale. This means that by using this approach onecould extend our knowledge of oceanic respiration over largetemporal and spatial scales, and begin to use respiration, notonly productivity, in addressing carbon balance problems inthe upper ocean.  相似文献   

13.

Background

Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research work focus on both smoothing-based and segmentation-based data processing. In this paper, we propose stationary packet wavelet transform based approach to smooth array CGH data. Our purpose is to remove CGH noise in whole frequency while keeping true signal by using bivariate model.

Results

In both synthetic and real CGH data, Stationary Wavelet Packet Transform (SWPT) is the best wavelet transform to analyze CGH signal in whole frequency. We also introduce a new bivariate shrinkage model which shows the relationship of CGH noisy coefficients of two scales in SWPT. Before smoothing, the symmetric extension is considered as a preprocessing step to save information at the border.

Conclusion

We have designed the SWTP and the SWPT-Bi which are using the stationary wavelet packet transform with the hard thresholding and the new bivariate shrinkage estimator respectively to smooth the array CGH data. We demonstrate the effectiveness of our approach through theoretical and experimental exploration of a set of array CGH data, including both synthetic data and real data. The comparison results show that our method outperforms the previous approaches.
  相似文献   

14.
蚤数量与宿主数量和气象因子的关系   总被引:6,自引:3,他引:3  
李仲来  张万荣 《昆虫学报》1995,38(4):442-447
根据内蒙古自治区鄂托克旗和鄂托克前旗1975一1989年长爪沙鼠密度、蚤指数监测数据和本地区气象站的,项气象因子资料,分别求出了蚤指数与鼠密度的直线和曲线的回归模型,与气象因子的最优回因子集模型和标准回归模型,给出了鼠蚤因子和气象因子间的典型棺关分析。结论:宿主数量变化导致蚤指数变化;气象因子综合影响蚤指数;相对湿度和地表温度是影响蚤数量变动的重要因子;气象因于对蚤指数的影响大于对鼠密度的影响。  相似文献   

15.

Background

Numerous studies have been conducted regarding a heartbeat classification algorithm over the past several decades. However, many algorithms have also been studied to acquire robust performance, as biosignals have a large amount of variation among individuals. Various methods have been proposed to reduce the differences coming from personal characteristics, but these expand the differences caused by arrhythmia.

Methods

In this paper, an arrhythmia classification algorithm using a dedicated wavelet adapted to individual subjects is proposed. We reduced the performance variation using dedicated wavelets, as in the ECG morphologies of the subjects. The proposed algorithm utilizes morphological filtering and a continuous wavelet transform with a dedicated wavelet. A principal component analysis and linear discriminant analysis were utilized to compress the morphological data transformed by the dedicated wavelets. An extreme learning machine was used as a classifier in the proposed algorithm.

Results

A performance evaluation was conducted with the MIT-BIH arrhythmia database. The results showed a high sensitivity of 97.51%, specificity of 85.07%, accuracy of 97.94%, and a positive predictive value of 97.26%.

Conclusions

The proposed algorithm achieves better accuracy than other state-of-the-art algorithms with no intrasubject between the training and evaluation datasets. And it significantly reduces the amount of intervention needed by physicians.  相似文献   

16.
A fuzzy guided genetic algorithm for operon prediction   总被引:4,自引:0,他引:4  
Motivation: The operon structure of the prokaryotic genome isa critical input for the reconstruction of regulatory networksat the whole genome level. As experimental methods for the detectionof operons are difficult and time-consuming, efforts are beingput into developing computational methods that can use availablebiological information to predict operons. Method: A genetic algorithm is developed to evolve a startingpopulation of putative operon maps of the genome into progressivelybetter predictions. Fuzzy scoring functions based on multiplecriteria are used for assessing the ‘fitness’ ofthe newly evolved operon maps and guiding their evolution. Results: The algorithm organizes the whole genome into operons.The fuzzy guided genetic algorithm-based approach makes it possibleto use diverse biological information like genome sequence data,functional annotations and conservation across multiple genomes,to guide the organization process. This approach does not requireany prior training with experimental operons. The predictionsfrom this algorithm for Escherchia coli K12 and Bacillus subtilisare evaluated against experimentally discovered operons forthese organisms. The accuracy of the method is evaluated usingan ROC (receiver operating characteristic) analysis. The areaunder the ROC curve is around 0.9, which indicates excellentaccuracy. Contact: roschen_csir{at}rediffmail.com  相似文献   

17.
Recent development of high-throughput analytical techniques has made it possible to qualitatively identify a number of metabolites simultaneously. Correlation and multivariate analyses such as principal component analysis have been widely used to analyse those data and evaluate correlations among the metabolic profiles. However, these analyses cannot simultaneously carry out identification of metabolic reaction networks and prediction of dynamic behaviour of metabolites in the networks. The present study, therefore, proposes a new approach consisting of a combination of statistical technique and mathematical modelling approach to identify and predict a probable metabolic reaction network from time-series data of metabolite concentrations and simultaneously construct its mathematical model. Firstly, regression functions are fitted to experimental data by the locally estimated scatter plot smoothing method. Secondly, the fitted result is analysed by the bivariate Granger causality test to determine which metabolites cause the change in other metabolite concentrations and remove less related metabolites. Thirdly, S-system equations are formed by using the remaining metabolites within the framework of biochemical systems theory. Finally, parameters including rate constants and kinetic orders are estimated by the Levenberg–Marquardt algorithm. The estimation is iterated by setting insignificant kinetic orders at zero, i.e., removing insignificant metabolites. Consequently, a reaction network structure is identified and its mathematical model is obtained. Our approach is validated using a generic inhibition and activation model and its practical application is tested using a simplified model of the glycolysis of Lactococcus lactis MG1363, for which actual time-series data of metabolite concentrations are available. The results indicate the usefulness of our approach and suggest a probable pathway for the production of lactate and acetate. The results also indicate that the approach pinpoints a probable strong inhibition of lactate on the glycolysis pathway.  相似文献   

18.
As plankton biologists ask more detailed questions of necessarilysparse and noisy spatial data, the need for well founded methodsfor statistical analysis of such data grows. This note examinesthe utility of constrained thin-plate smoothing splines as atool for inferring underlying spatial distribution functionsfrom sparse noisy data. Constrained thin-plate splines are describedin a straightforward manner. An economical method of calculationis suggested, which sacrifices mathematical optimality for easeof computation. Using simulated data several methods for choosingthe complexity of the inferred distribution function are comparedand robustness to large amplitude noise is examined. Confidenceintervals are calculated and tested. The method is applied toegg data from Dover sole (Solea solea) in the Bristol Channel.  相似文献   

19.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号