首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Part 1 of this study summarizes data for a field investigation of contaminant concentration variability within individual, discrete soil samples (intra-sample variability) and between closely spaced, “co-located” samples (inter-sample variability). Hundreds of discrete samples were collected from three sites known respectively to be contaminated with arsenic, lead, and polychlorinated biphenyls. Intra-sample variability was assessed by testing soil from ten points within a minimally disturbed sample collected at each of 24 grid points. Inter-sample variability was assessed by testing five co-located samples collected within a 0.5-m diameter of each grid point. Multi Increment soil samples (triplicates) were collected at each study site for comparison. The study data demonstrate that the concentration of a contaminant reported for a given discrete soil sample is largely random within a relatively narrow (max:min <2X) to a very wide (max:min >100X) range of possibilities at any given sample collection point. The magnitude of variability depends in part on the contaminant type and the nature of the release. The study highlights the unavoidable randomness of contaminant concentrations reported in discrete soil samples and the unavoidable error and inefficiency associated with the use of discrete soil sample data for decision making in environmental investigations.  相似文献   

2.
Species-occurrence data sets tend to contain a large proportion of zero values, i.e., absence values (zero-inflated). Statistical inference using such data sets is likely to be inefficient or lead to incorrect conclusions unless the data are treated carefully. In this study, we propose a new modeling method to overcome the problems caused by zero-inflated data sets that involves a regression model and a machine-learning technique. We combined a generalized liner model (GLM), which is widely used in ecology, and bootstrap aggregation (bagging), a machine-learning technique. We established distribution models of Vincetoxicum pycnostelma (a vascular plant) and Ninox scutulata (an owl), both of which are endangered and have zero-inflated distribution patterns, using our new method and traditional GLM and compared model performances. At the same time we modeled four theoretical data sets that contained different ratios of presence/absence values using new and traditional methods and also compared model performances. For distribution models, our new method showed good performance compared to traditional GLMs. After bagging, area under the curve (AUC) values were almost the same as with traditional methods, but sensitivity values were higher. Additionally, our new method showed high sensitivity values compared to the traditional GLM when modeling a theoretical data set containing a large proportion of zero values. These results indicate that our new method has high predictive ability with presence data when analyzing zero-inflated data sets. Generally, predicting presence data is more difficult than predicting absence data. Our new modeling method has potential for advancing species distribution modeling.  相似文献   

3.
This paper introduces a modified technique based on Hilbert-Huang transform (HHT) to improve the spectrum estimates of heart rate variability (HRV). In order to make the beat-to-beat (RR) interval be a function of time and produce an evenly sampled time series, we first adopt a preprocessing method to interpolate and resample the original RR interval. Then, the HHT, which is based on the empirical mode decomposition (EMD) approach to decompose the HRV signal into several monocomponent signals that become analytic signals by means of Hilbert transform, is proposed to extract the features of preprocessed time series and to characterize the dynamic behaviors of parasympathetic and sympathetic nervous system of heart. At last, the frequency behaviors of the Hilbert spectrum and Hilbert marginal spectrum (HMS) are studied to estimate the spectral traits of HRV signals. In this paper, two kinds of experiment data are used to compare our method with the conventional power spectral density (PSD) estimation. The analysis results of the simulated HRV series show that interpolation and resampling are basic requirements for HRV data processing, and HMS is superior to PSD estimation. On the other hand, in order to further prove the superiority of our approach, real HRV signals are collected from seven young health subjects under the condition that autonomic nervous system (ANS) is blocked by certain acute selective blocking drugs: atropine and metoprolol. The high-frequency power/total power ratio and low-frequency power/high-frequency power ratio indicate that compared with the Fourier spectrum based on principal dynamic mode, our method is more sensitive and effective to identify the low-frequency and high-frequency bands of HRV.  相似文献   

4.
Methods for modeling sets of complex curves where the curves must be aligned in time (or in another continuous predictor) fall into the general class of functional data analysis and include self-modeling regression and time-warping procedures. Self-modeling regression (SEMOR), also known as a shape invariant model (SIM), assumes the curves have a common shape, modeled nonparametrically, and curve-specific differences in amplitude and timing, traditionally modeled by linear transformations. When curves contain multiple features that need to be aligned in time, SEMOR may be inadequate since a linear time transformation generally cannot align more than one feature. Time warping procedures focus on timing variability and on finding flexible time warps to align multiple data features. We draw on these methods to develop a SIM that models the time transformations as random, flexible, monotone functions. The model is motivated by speech movement data from the University of Wisconsin X-ray microbeam speech production project and is applied to these data to test the effect of different speaking conditions on the shape and relative timing of movement profiles.  相似文献   

5.
This study evaluated three models of microbial temperature kinetics using CO2 respiration data from aerobic solid-state biodegradation experiments. The models included those of Andrews and Kambhu/Haug, Ratkowsky et al., and the Cardinal Temperature Model with Inflection (CTMI) of Rosso et al. A parameter estimation routine implemented the Complex-Box search method for each model on 48 data sets collected during the composting of synthetic food waste or sewage-sludge (biosolids) mixed with maple wood chips at different oxygen concentrations and extents of decomposition. Each of the three nonlinear temperature kinetic functions proved capable of modeling a wide range of experimental data sets. However, the models differed widely in the consistency of their parameters. Parameters in the CTMI model were more stable over the course of the degradation process, and that variability which did arise was directly related to changes in the microbial process. Additional benefits of the CTMI model include the ease of parameter determinations, which can be approximated directly from laboratory experiments or full-scale system analysis, and the direct value of its parameters in engineering design and process control under a wide range of biodegradation conditions.  相似文献   

6.
We have investigated the potential of sedimentation velocity analytical ultracentrifugation for the measurement of the second virial coefficients of proteins, with the goal of developing a method that allows efficient screening of different solvent conditions. This may be useful for the study of protein crystallization. Macromolecular concentration distributions were modeled using the Lamm equation with the approximation of linear concentration dependencies of the diffusion constant, D = D(o) (1 + k(D)c), and the reciprocal sedimentation coefficient s = s(o)/(1 + k(s)c). We have studied model distributions for their information content with respect to the particle and its non-ideal behavior, developed a strategy for their analysis by direct boundary modeling, and applied it to data from sedimentation velocity experiments on halophilic malate dehydrogenase in complex aqueous solvents containing sodium chloride and 2-methyl-2,4-pentanediol, including conditions near phase separation. Using global modeling for three sets of data obtained at three different protein concentrations, very good estimates for k(s) and s degrees and also for D degrees and the buoyant molar mass were obtained. It was also possible to obtain good estimates for k(D) and the second virial coefficients. Modeling of sedimentation velocity profiles with the non-ideal Lamm equation appears as a good technique to investigate weak inter-particle interactions in complex solvents and also to extrapolate the ideal behavior of the particle.  相似文献   

7.
A spatial statistical model for landscape genetics   总被引:17,自引:2,他引:15       下载免费PDF全文
Guillot G  Estoup A  Mortier F  Cosson JF 《Genetics》2005,170(3):1261-1280
Landscape genetics is a new discipline that aims to provide information on how landscape and environmental features influence population genetic structure. The first key step of landscape genetics is the spatial detection and location of genetic discontinuities between populations. However, efficient methods for achieving this task are lacking. In this article, we first clarify what is conceptually involved in the spatial modeling of genetic data. Then we describe a Bayesian model implemented in a Markov chain Monte Carlo scheme that allows inference of the location of such genetic discontinuities from individual geo-referenced multilocus genotypes, without a priori knowledge on populational units and limits. In this method, the global set of sampled individuals is modeled as a spatial mixture of panmictic populations, and the spatial organization of populations is modeled through the colored Voronoi tessellation. In addition to spatially locating genetic discontinuities, the method quantifies the amount of spatial dependence in the data set, estimates the number of populations in the studied area, assigns individuals to their population of origin, and detects individual migrants between populations, while taking into account uncertainty on the location of sampled individuals. The performance of the method is evaluated through the analysis of simulated data sets. Results show good performances for standard data sets (e.g., 100 individuals genotyped at 10 loci with 10 alleles per locus), with high but also low levels of population differentiation (e.g., FST < 0.05). The method is then applied to a set of 88 individuals of wolverines (Gulo gulo) sampled in the northwestern United States and genotyped at 10 microsatellites.  相似文献   

8.
Bennett J  Wakefield J 《Biometrics》2001,57(3):803-812
Pharmacokinetic (PK) models describe the relationship between the administered dose and the concentration of drug (and/or metabolite) in the blood as a function of time. Pharmacodynamic (PD) models describe the relationship between the concentration in the blood (or the dose) and the biologic response. Population PK/PD studies aim to determine the sources of variability in the observed concentrations/responses across groups of individuals. In this article, we consider the joint modeling of PK/PD data. The natural approach is to specify a joint model in which the concentration and response data are simultaneously modeled. Unfortunately, this approach may not be optimal if, due to sparsity of concentration data, an overly simple PK model is specified. As an alternative, we propose an errors-in-variables approach in which the observed-concentration data are assumed to be measured with error without reference to a specific PK model. We give an example of an analysis of PK/PD data obtained following administration of an anticoagulant drug. The study was originally carried out in order to make dosage recommendations. The prior for the distribution of the true concentrations, which may incorporate an individual's covariate information, is derived as a predictive distribution from an earlier study. The errors-in-variables approach is compared with the joint modeling approach and more naive methods in which the observed concentrations, or the separately modeled concentrations, are substituted into the response model. Throughout, a Bayesian approach is taken with implementation via Markov chain Monte Carlo methods.  相似文献   

9.
It is well known that ecological communities are spatially and temporally dynamic. Quantifying temporal variability in ecological communities is challenging, however, especially for time-series data sets of less than 40 measurement intervals. In this paper, we describe a method to quantify temporal variability in multispecies communities over time frames of 10–40 measurement intervals. Our approach is a community-level extension of autocorrelation analysis, but we use Euclidean distance to measure similarity of community samples at increasing time lags rather than the correlation coefficient. Regressing Euclidean distances versus increasing time lags yields a measure of the rate and nature of community change over time. We demonstrate the method with empirical data sets from shortgrass steppe, old-field succession and zooplankton dynamics in lakes, and we investigate properties of the analysis using simulation models. Results indicate that time-lag analysis provides a useful quantitative measurement of the rate and pattern of temporal dynamics in communities over time frames that are too short for more traditional autocorrelation approaches.  相似文献   

10.
11.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.  相似文献   

12.
MOTIVATION: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost. METHODS: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA). RESULTS: The GA selects sensible pre-processing steps from a total of approximately 10(10) possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.  相似文献   

13.
The importance of in silico modeling in the pharmaceutical industry is continuously increasing. The aim of the present study was the development of a neural network model for prediction of the postcompressional properties of scored tablets based on the application of existing data sets from our previous studies. Some important process parameters and physicochemical characteristics of the powder mixtures were used as training factors to achieve the best applicability in a wide range of possible compositions. The results demonstrated that, after some pre-processing of the factors, an appropriate prediction performance could be achieved. However, because of the poor extrapolation capacity, broadening of the training data range appears necessary.  相似文献   

14.
15.
Dietary restriction (DR)-induced changes in the serum metabolome may be biomarkers for physiological status (e.g., relative risk of developing age-related diseases such as cancer). Megavariate analysis (unsupervised hierarchical cluster analysis [HCA]; principal components analysis [PCA]) of serum metabolites reproducibly distinguish DR from ad libitum fed rats. Component-based approaches (i.e., PCA) consistently perform as well as or better than distance-based metrics (i.e., HCA). We therefore tested the following: (A) Do identified subsets of serum metabolites contain sufficient information to construct mathematical models of class membership (i.e., expert systems)? (B) Do component-based metrics out-perform distance-based metrics? Testing was conducted using KNN (k-nearest neighbors, supervised HCA) and SIMCA (soft independent modeling of class analogy, supervised PCA). Models were built with single cohorts, combined cohorts or mixed samples from previously studied cohorts as training sets. Both algorithms over-fit models based on single cohort training sets. KNN models had >85% accuracy within training/test sets, but were unstable (i.e., values of k could not be accurately set in advance). SIMCA models had 100% accuracy within all training sets, 89 % accuracy in test sets, did not appear to over-fit mixed cohort training sets, and did not require post-hoc modeling adjustments. These data indicate that (i) previously defined metabolites are robust enough to construct classification models (expert systems) with SIMCA that can predict unknowns by dietary category; (ii) component-based analyses outperformed distance-based metrics; (iii) use of over-fitting controls is essential; and (iv) subtle inter-cohort variability may be a critical issue for high data density biomarker studies that lack state markers.  相似文献   

16.
Harmonic analysis on manifolds and graphs has recently led to mathematical developments in the field of data analysis. The resulting new tools can be used to compress and analyze large and complex data sets, such as those derived from sensor networks or neuronal activity datasets, obtained in the laboratory or through computer modeling. The nature of the algorithms (based on diffusion maps and connectivity strengths on graphs) possesses a certain analogy with neural information processing, and has the potential to provide inspiration for modeling and understanding biological organization in perception and memory formation.  相似文献   

17.
Characterization of life processes at the molecular level requires structural details of protein–protein interactions (PPIs). The number of experimentally determined protein structures accounts only for a fraction of known proteins. This gap has to be bridged by modeling, typically using experimentally determined structures as templates to model related proteins. The fraction of experimentally determined PPI structures is even smaller than that for the individual proteins, due to a larger number of interactions than the number of individual proteins, and a greater difficulty of crystallizing protein–protein complexes. The approaches to structural modeling of PPI (docking) often have to rely on modeled structures of the interactors, especially in the case of large PPI networks. Structures of modeled proteins are typically less accurate than the ones determined by X‐ray crystallography or nuclear magnetic resonance. Thus the utility of approaches to dock these structures should be assessed by thorough benchmarking, specifically designed for protein models. To be credible, such benchmarking has to be based on carefully curated sets of structures with levels of distortion typical for modeled proteins. This article presents such a suite of models built for the benchmark set of the X‐ray structures from the Dockground resource ( http://dockground.bioinformatics.ku.edu ) by a combination of homology modeling and Nudged Elastic Band method. For each monomer, six models were generated with predefined Cα root mean square deviation from the native structure (1, 2, …, 6 Å). The sets and the accompanying data provide a comprehensive resource for the development of docking methodology for modeled proteins. Proteins 2014; 82:278–287. © 2013 Wiley Periodicals, Inc.  相似文献   

18.
Singh VR  Kopka M  Chen Y  Wedemeyer WJ  Lapidus LJ 《Biochemistry》2007,46(35):10046-10054
The formation of specific intramolecular contacts has been studied under a range of denaturing conditions in single domains of the immunoglobulin-binding proteins L and G. Although they share no significant sequence similarity and have dissimilar folding pathways, the two domains have a similar native fold. Our measurements show that the rates of forming corresponding contacts in the unfolded states of both proteins are remarkably similar and even exhibit similar dependence on denaturant concentration. The unfolded proteins were modeled using Szabo, Schulten, and Schulten (SSS) theory as wormlike chains with excluded volume; when combined with our experimental data, the SSS analysis suggests that the unfolded state becomes uniformly more compact and less diffusive (i.e., rearranges more slowly) with decreasing denaturant concentrations.  相似文献   

19.
Significant uncertainty exists in magnitude and variability of ammonia (NH3) emissions, which are needed for air quality modeling of aerosols and deposition of nitrogen compounds. Approximately 85% of NH3 emissions are estimated to come from agricultural nonpoint sources. We suspect a strong seasonal pattern in NH 3 emissions; however, current NH3 emission inventories lack intra-annual variability. Annually averaged NH 3 emissions could significantly affect model-predicted concentrations and wet and dry deposition of nitrogen-containing compounds. We apply a Kalman filter inverse modeling technique to deduce monthly NH3 emissions for the eastern U.S. Final products of this research will include monthly emissions estimates from each season. Results for January and June 1990 are currently available and are presented here. The U.S. Environmental Protection Agency (USEPA) Community Multiscale Air Quality (CMAQ) model and ammonium (NH4+) wet concentration data from the National Atmospheric Deposition Program (NADP) network are used. The inverse modeling technique estimates the emission adjustments that provide optimal modeled results with respect to wet NH4+ concentrations, observational data error, and emission uncertainty. Our results suggest that annual average NH 3 emissions estimates should be decreased by 64% for January 1990 and increased by 25% for June 1990. These results illustrate the strong differences that are anticipated for NH3 emissions.  相似文献   

20.
F. S. Nathoo 《Biometrics》2010,66(2):336-346
Summary In this article, we present a new statistical methodology for longitudinal studies in forestry, where trees are subject to recurrent infection, and the hazard of infection depends on tree growth over time. Understanding the nature of this dependence has important implications for reforestation and breeding programs. Challenges arise for statistical analysis in this setting with sampling schemes leading to panel data, exhibiting dynamic spatial variability, and incomplete covariate histories for hazard regression. In addition, data are collected at a large number of locations, which poses computational difficulties for spatiotemporal modeling. A joint model for infection and growth is developed wherein a mixed nonhomogeneous Poisson process, governing recurring infection, is linked with a spatially dynamic nonlinear model representing the underlying height growth trajectories. These trajectories are based on the von Bertalanffy growth model and a spatially varying parameterization is employed. Spatial variability in growth parameters is modeled through a multivariate spatial process derived through kernel convolution. Inference is conducted in a Bayesian framework with implementation based on hybrid Monte Carlo. Our methodology is applied for analysis in an 11‐year study of recurrent weevil infestation of white spruce in British Columbia.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号