首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

The free energetics of water density fluctuations in bulk water, at interfaces, and in hydrophobic confinement inform the hydration of hydrophobic solutes as well as their interactions and assembly. The characterisation of such free energetics is typically performed using enhanced sampling techniques such as umbrella sampling. In umbrella sampling, order parameter distributions obtained from adjacent biased simulations must overlap in order to estimate free energy differences between biased ensembles. Many biased simulations are typically required to ensure such overlap, which exacts a steep computational cost. We recently introduced a sparse sampling method, which circumvents the overlap requirement by using thermodynamic integration to estimate free energy differences between biased ensembles. Here we build upon and generalise sparse sampling for characterising the free energetics of water density fluctuations in systems near liquid-vapor coexistence. We also introduce sensible heuristics for choosing the biasing potential parameters and strategies for adaptively refining them, which facilitate the estimation of such free energetics accurately and efficiently. We illustrate the method by characterising the free energetics of cavitation in a large volume in bulk water. We also use sparse sampling to characterise the free energetics of capillary evaporation for water confined between two hydrophobic plates. In both cases, sparse sampling is nearly two orders of magnitude faster than umbrella sampling. Given its efficiency, the sparse sampling method is particularly well suited for characterising free energy landscapes for systems wherein umbrella sampling is prohibitively expensive.  相似文献   

2.
Electronic health record (EHR) data are increasingly used for biomedical research, but these data have recognized data quality challenges. Data validation is necessary to use EHR data with confidence, but limited resources typically make complete data validation impossible. Using EHR data, we illustrate prospective, multiwave, two-phase validation sampling to estimate the association between maternal weight gain during pregnancy and the risks of her child developing obesity or asthma. The optimal validation sampling design depends on the unknown efficient influence functions of regression coefficients of interest. In the first wave of our multiwave validation design, we estimate the influence function using the unvalidated (phase 1) data to determine our validation sample; then in subsequent waves, we re-estimate the influence function using validated (phase 2) data and update our sampling. For efficiency, estimation combines obesity and asthma sampling frames while calibrating sampling weights using generalized raking. We validated 996 of 10,335 mother-child EHR dyads in six sampling waves. Estimated associations between childhood obesity/asthma and maternal weight gain, as well as other covariates, are compared to naïve estimates that only use unvalidated data. In some cases, estimates markedly differ, underscoring the importance of efficient validation sampling to obtain accurate estimates incorporating validated data.  相似文献   

3.
Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.  相似文献   

4.
Field samples are commonly used to estimate disease prevalence in wild populations. Our confidence in these estimates requires understanding the sensitivity and specificity of the diagnostic tests. We assessed the sensitivity of the most commonly used diagnostic tests for amphibian Ranavirus by infecting salamanders (Ambystoma tigrinum; Amphibia, Caudata) with Ambystoma tigrinum virus (ATV) and then sampling euthanized animals (whole animal) and noneuthanized animals (tail clip) at five time intervals after exposure. We used a standard polymerase chain reaction (PCR) protocol to screen for ATV. Agreement between test results from whole-animal and tail-clip samples increased with time postexposure. This indicates that the ability to identify infected animals increases following exposure, leading to a more accurate estimate of prevalence in a population. Our results indicate that tail-clip sampling can underestimate the true prevalence of ATV in wild amphibian populations.  相似文献   

5.
Ranked set sampling with unequal samples   总被引:3,自引:0,他引:3  
Bhoj DS 《Biometrics》2001,57(3):957-962
A ranked set sampling procedure with unequal samples (RSSU) is proposed and used to estimate the population mean. This estimator is then compared with the estimators based on the ranked set sampling (RSS) and median ranked set sampling (MRSS) procedures. It is shown that the relative precisions of the estimator based on RSSU are higher than those of the estimators based on RSS and MRSS. An example of estimating the mean diameter at breast height of longleaf-pine trees on the Wade Tract in Thomas County, Georgia, is presented.  相似文献   

6.
Occupancy-abundance relationships and sampling scales   总被引:4,自引:0,他引:4  
The area of occupancy of a species and its abundance are dependent on the spatial scale at which they are measured. However, it is less obvious how the scale of sampling affects their correlation. This study investigated and modeled the effects of sampling unit size and a real extent on the interspecific occupancy-abundance relationships for a tropical tree species assemblage at a local scale and a temperate bird species assemblage at a regional scale. The results showed that both sampling unit size and study extent had profound quantitative effects on the occupancy-abundance relationship, although it remained positive. Several properties of the occupancy-abundance relationship can result from the effects of scale: 1) the linearity of the relationship decreases with the increase of sampling unit size; 2) for a given abundance, the area of occupancy increases with sampling unit size; and 3) variation in the area of occupancy increases with the increase of both sampling unit size and extent, and if the extent is large enough may be sufficient that no occupancy-abundance relationship is observed. Although the occupancy-abundance relationship can be satisfactorily modeled, the parameters depend on the scale used. This suggests that a model derived from one scale cannot be applied to another. In other words, to estimate the rarity or commonness of species using such a model, the estimation must be strictly done using the same sampling scale for all the species.  相似文献   

7.
A statistical theory for sampling species abundances   总被引:2,自引:1,他引:1  
Green JL  Plotkin JB 《Ecology letters》2007,10(11):1037-1045
The pattern of species abundances is central to ecology. But direct measurements of species abundances at ecologically relevant scales are typically unfeasible. This limitation has motivated a long-standing interest in the relationship between the abundance distribution in a large, regional community and the distribution observed in a small sample from the community. Here, we develop a statistical sampling theory to describe how observed patterns of species abundances are influenced by the spatial distributions of populations. For a wide range of regional-scale abundance distributions we derive exact expressions for the sampled abundance distributions, as a function of sample size and the degree of conspecific spatial aggregation. We show that if populations are randomly distributed in space then the sampled and regional-scale species-abundance distribution typically have the same functional form: sampling can be expressed by a simple scaling relationship. In the case of aggregated spatial distributions, however, the shape of a sampled species-abundance distribution diverges from the regional-scale distribution. Conspecific aggregation results in sampled distributions that are skewed towards both rare and common species. We discuss our findings in light of recent results from neutral community theory, and in the context of estimating biodiversity.  相似文献   

8.
Adaptive sampling designs are becoming increasingly popular in environmental science, particularly for surveying rare and aggregated populations. An adaptive sample is one in which the survey design is modified, or adapted, in some way on the basis of information gained during the survey. There are many different adaptive survey designs that can be used to estimate animal and plant abundance. In adaptive cluster sampling, additional sample effort is allocated during the survey to the immediate neighborhood in which the species is found. In adaptive stratified sampling, additional sample effort is allocated during the survey to strata of high abundance. The appealing feature of these adaptive designs is that the field biologist gets to do what innately seems sensible when working with rare and aggregated populations—field effort is targeted around where the species is observed in the first wave of the survey. However, there are logistical challenges of applying this principle of targeted field effort while remaining in the framework of probability-based sampling. We propose a simplified adaptive survey design that incorporates both targeting field effort and being logistically feasible. We show with a case study population of rockfish that complete allocation stratified sampling is a very efficient design.  相似文献   

9.
Monitoring procedures for Alpine ibex Capra ibex are limited in habitats with reduced visibility and when physical capture and marking of the animals is not intended. Photographic sampling, involving using camera‐trap data and identifying ibex from natural markings, was adopted with capture‐recapture models to estimate the abundance of ibex in Austria. The software CAPTURE's model produced an average capture probability of 0.44 with an estimate of 34–51 ibex and a mean population size of 38 ibex. This first study showed the applicability of photographic capture‐recapture techniques to estimate the abundance of ibex based on their natural markings.  相似文献   

10.
The quantification of ant nest densities is a useful but challenging task given the group’s high abundance and diversity of nesting sites. We present a new application of a distance-sampling method which follows standard distance analytical procedures, but introduces a sampling innovation that is particularly useful for ants; instead of having an observer look for ants we let ants find a bait station and measure the distances covered between nest and station. We test this method by estimating the density of epigaeic ant nests in an Amazon tropical forest site near Manaus, Brazil. We distributed 220 baits of canned sardine mixed with cassava flour among 10, 210-m long transects in old-growth upland forest. Forty-five minutes after baiting, we followed the ants’ trails and measured the linear distance between the bait and each nest’s entrance. We then used the freely available program DISTANCE to estimate the number of nests per unit area while accounting for the effect of distance on the probability that a colony will find a bait. There were found 38 species nesting in 287 different colonies, with an estimated 2.66 nests/m2. This estimate fell within the 95 % confidence bounds of nest density predicted for a similar number of species based on a literature survey of ant species richness and nest density. Our sampling solution, however, takes less than 30 % of the time used by conventional sampling approaches for a similar area, with the advantage that it produces not only a point estimate but also a quantification of uncertainty about density.  相似文献   

11.
An efficient method for estimating bryophyte diversity in forest stands must consider more than just the dominant forest mesohabitat. We compared two methodologies commonly used for estimating diversity in forest ecosystems. Floristic habitat sampling (FHS) utilizes stratification of all forest mesohabitats, which includes the natural diversity of microhabitats found within and stratifies a mosaic of mesohabitats (e.g. forest, streams, seeps, and cliffs) and microhabitats (e.g. rocks logs, etc.) that are often not considered in forest research projects that use plot sampling to estimate species diversity. In Canadian cedar hemlock forest, FHS methodology recorded more than twice as many bryophyte species as plot sampling (PS). A comparison of the dominant forest mesohabitat concluded that plot sampling was not as efficient as FHS in estimating bryophyte diversity and that plot sampling can result in different interpretations of species diversity. Rare species ordination of stands sampled using FHS showed strong clustering of sites with respect to biogeoclimatic zones and age since the last major disturbance (fire or logging) as compared with rare species ordinations from PS data, which showed no delineation of stands along temporal gradients. Plot sampling has many useful applications in ecology, but floristic habitat sampling is more efficient for quantifying overall bryophyte diversity. FHS provides an excellent way to record a comprehensive list of species.  相似文献   

12.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

13.
Green's sequential sampling plan is widely used in applied entomology. Green's equation can be used to construct sampling stop charts, and a crop can then be surveyed using a simple random sampling (SRS) approach. In practice, however, crops are rarely surveyed according to SRS. Rather, some type of hierarchical design is usually used, such as cluster sampling, where sampling units form distinct groups. This article explains how to make adjustments to sampling plans that intend to use cluster sampling, a commonly used hierarchical design, rather than SRS. The methodologies are illustrated using diamondback moth, Plutella xylostella (L.), a pest of Brassica crops, as an example.  相似文献   

14.
Barabesi L  Pisani C 《Biometrics》2002,58(3):586-592
In practical ecological sampling studies, a certain design (such as plot sampling or line-intercept sampling) is usually replicated more than once. For each replication, the Horvitz-Thompson estimation of the objective parameter is considered. Finally, an overall estimator is achieved by averaging the single Horvitz-Thompson estimators. Because the design replications are drawn independently and under the same conditions, the overall estimator is simply the sample mean of the Horvitz-Thompson estimators under simple random sampling. This procedure may be wisely improved by using ranked set sampling. Hence, we propose the replicated protocol under ranked set sampling, which gives rise to a more accurate estimation than the replicated protocol under simple random sampling.  相似文献   

15.
16.
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.  相似文献   

17.
St-Pierre JF  Mousseau N 《Proteins》2012,80(7):1883-1894
We present an adaptation of the ART-nouveau energy surface sampling method to the problem of loop structure prediction. This method, previously used to study protein folding pathways and peptide aggregation, is well suited to the problem of sampling the conformation space of large loops by targeting probable folding pathways instead of sampling exhaustively that space. The number of sampled conformations needed by ART nouveau to find the global energy minimum for a loop was found to scale linearly with the sequence length of the loop for loops between 8 and about 20 amino acids. Considering the linear scaling dependence of the computation cost on the loop sequence length for sampling new conformations, we estimate the total computational cost of sampling larger loops to scale quadratically compared to the exponential scaling of exhaustive search methods.  相似文献   

18.
Heterogeneity, both inter- and intrafamilial, represents a serious problem in linkage studies of common complex diseases. In this study we simulated different scenarios with families who phenotypically have identical diseases but who genotypically have two different forms of the disease (both forms genetic). We examined the proportion of families displaying intrafamilial heterogeneity, as a function of mode of inheritance, gene frequency, penetrance, and sampling strategies. Furthermore, we compared two different ways of analyzing linkage in these data sets: a two-locus (2L) analysis versus a one-locus (SL) analysis combined with an admixture test. Data were simulated with tight linkage between one disease locus and a marker locus; the other disease locus was not linked to a marker. Our findings are as follows: (1) In contrast to what has been proposed elsewhere to minimize heterogeneity, sampling only "high-density" pedigrees will increase the proportion of families with intrafamilial heterogeneity, especially when the two forms are relatively close in frequency. (2) When one form is dominant and one is recessive, this sampling strategy will greatly decrease the proportions of families with a recessive form and may therefore make it more difficult to detect linkage to the recessive form. (3) An SL analysis combined with an admixture test achieves about the same lod scores and estimate of the recombination fraction as does a 2L analysis. Also, a 2L analysis of a sample of families with intrafamilial heterogeneity does not perform significantly better than an SL analysis. (4) Bilineal pedigrees have little effect on the mean maximum lod score and mean maximum recombination fraction, and therefore there is little danger that including these families will lead to a false exclusion of linkage.  相似文献   

19.
Models and data used to describe species–area relationships confound sampling with ecological process as they fail to acknowledge that estimates of species richness arise due to sampling. This compromises our ability to make ecological inferences from and about species–area relationships. We develop and illustrate hierarchical community models of abundance and frequency to estimate species richness. The models we propose separate sampling from ecological processes by explicitly accounting for the fact that sampled patches are seldom completely covered by sampling plots and that individuals present in the sampling plots are imperfectly detected. We propose a multispecies abundance model in which community assembly is treated as the summation of an ensemble of species‐level Poisson processes and estimate patch‐level species richness as a derived parameter. We use sampling process models appropriate for specific survey methods. We propose a multispecies frequency model that treats the number of plots in which a species occurs as a binomial process. We illustrate these models using data collected in surveys of early‐successional bird species and plants in young forest plantation patches. Results indicate that only mature forest plant species deviated from the constant density hypothesis, but the null model suggested that the deviations were too small to alter the form of species–area relationships. Nevertheless, results from simulations clearly show that the aggregate pattern of individual species density–area relationships and occurrence probability–area relationships can alter the form of species–area relationships. The plant community model estimated that only half of the species present in the regional species pool were encountered during the survey. The modeling framework we propose explicitly accounts for sampling processes so that ecological processes can be examined free of sampling artefacts. Our modeling approach is extensible and could be applied to a variety of study designs and allows the inclusion of additional environmental covariates.  相似文献   

20.
In crop protection and ecology accurate and precise estimates of insect populations are required for many purposes. The spatial pattern of the organism sampled, in relation to the sampling scheme adopted, affects the difference between the actual and estimated population density, the bias, and the variability of that estimate, the precision. Field monitoring schemes usually adopt time‐efficient sampling regimes involving contiguous units rather than the most efficient for estimation, the completely random sample. This paper uses spatially‐explicit ecological field data on aphids and beetles to compare common sampling regimes. The random sample was the most accurate method and often the most precise; of the contiguous schemes the line transect was superior to more compact arrangements such as a square block. Bias depended on the relationship between the size and shape of the group of units comprising the sample and the dominant cluster size underlying the spatial pattern. Existing knowledge of spatial pattern to inform the choice of sampling scheme may provide considerable improvements in accuracy. It is recommended to use line transects longer than the grain of the spatial pattern, where grain is defined as the average dimension of clusters over both patches and gaps, and with length at least twice the dominant cluster size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号