首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present the one‐inflated zero‐truncated negative binomial (OIZTNB) model, and propose its use as the truncated count distribution in Horvitz–Thompson estimation of an unknown population size. In the presence of unobserved heterogeneity, the zero‐truncated negative binomial (ZTNB) model is a natural choice over the positive Poisson (PP) model; however, when one‐inflation is present the ZTNB model either suffers from a boundary problem, or provides extremely biased population size estimates. Monte Carlo evidence suggests that in the presence of one‐inflation, the Horvitz–Thompson estimator under the ZTNB model can converge in probability to infinity. The OIZTNB model gives markedly different population size estimates compared to some existing truncated count distributions, when applied to several capture–recapture data that exhibit both one‐inflation and unobserved heterogeneity.  相似文献   

2.
Jian Zhang  Faming Liang 《Biometrics》2010,66(4):1078-1086
Summary Clustering is a widely used method in extracting useful information from gene expression data, where unknown correlation structures in genes are believed to persist even after normalization. Such correlation structures pose a great challenge on the conventional clustering methods, such as the Gaussian mixture (GM) model, k‐means (KM), and partitioning around medoids (PAM), which are not robust against general dependence within data. Here we use the exponential power mixture model to increase the robustness of clustering against general dependence and nonnormality of the data. An expectation–conditional maximization algorithm is developed to calculate the maximum likelihood estimators (MLEs) of the unknown parameters in these mixtures. The Bayesian information criterion is then employed to determine the numbers of components of the mixture. The MLEs are shown to be consistent under sparse dependence. Our numerical results indicate that the proposed procedure outperforms GM, KM, and PAM when there are strong correlations or non‐Gaussian components in the data.  相似文献   

3.
Obtaining inferences on disease dynamics (e.g., host population size, pathogen prevalence, transmission rate, host survival probability) typically requires marking and tracking individuals over time. While multistate mark–recapture models can produce high‐quality inference, these techniques are difficult to employ at large spatial and long temporal scales or in small remnant host populations decimated by virulent pathogens, where low recapture rates may preclude the use of mark–recapture techniques. Recently developed N‐mixture models offer a statistical framework for estimating wildlife disease dynamics from count data. N‐mixture models are a type of state‐space model in which observation error is attributed to failing to detect some individuals when they are present (i.e., false negatives). The analysis approach uses repeated surveys of sites over a period of population closure to estimate detection probability. We review the challenges of modeling disease dynamics and describe how N‐mixture models can be used to estimate common metrics, including pathogen prevalence, transmission, and recovery rates while accounting for imperfect host and pathogen detection. We also offer a perspective on future research directions at the intersection of quantitative and disease ecology, including the estimation of false positives in pathogen presence, spatially explicit disease‐structured N‐mixture models, and the integration of other data types with count data to inform disease dynamics. Managers rely on accurate and precise estimates of disease dynamics to develop strategies to mitigate pathogen impacts on host populations. At a time when pathogens pose one of the greatest threats to biodiversity, statistical methods that lead to robust inferences on host populations are critically needed for rapid, rather than incremental, assessments of the impacts of emerging infectious diseases.  相似文献   

4.
Summary We consider a problem of testing mixture proportions using two‐sample data, one from group one and the other from a mixture of groups one and two with unknown proportion, λ, for being in group two. Various statistical applications, including microarray study, infectious epidemiological studies, case–control studies with contaminated controls, clinical trials allowing “nonresponders,” genetic studies for gene mutation, and fishery applications can be formulated in this setup. Under the assumption that the log ratio of probability (density) functions from the two groups is linear in the observations, we propose a generalized score test statistic to test the mixture proportion. Under some regularity conditions, it is shown that this statistic converges to a weighted chi‐squared random variable under the null hypothesis of λ= 0 , where the weight depends only on the sampling fraction of both groups. The permutation method is used to provide more reliable finite sample approximation. Simulation results and two real data applications are presented.  相似文献   

5.
Little attention has been paid to the use of multi‐sample batch‐marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. ( 2010 ) present a pseudo‐likelihood for a multi‐sample batch‐marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson‐type estimator. We have developed and maximized the likelihood for batch‐marking studies. We use data simulated from a Jolly–Seber‐type study and convert this to what would have been obtained from an extended batch‐marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch‐marking model to determine the efficiency of collecting and analyzing batch‐marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo‐likelihood method of Huggins et al. ( 2010 ). When faced with designing a batch‐marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size.  相似文献   

6.
Multistate models can be successfully used for describing complex event history data, for example, describing stages in the disease progression of a patient. The so‐called “illness‐death” model plays a central role in the theory and practice of these models. Many time‐to‐event datasets from medical studies with multiple end points can be reduced to this generic structure. In these models one important goal is the modeling of transition rates but biomedical researchers are also interested in reporting interpretable results in a simple and summarized manner. These include estimates of predictive probabilities, such as the transition probabilities, occupation probabilities, cumulative incidence functions, and the sojourn time distributions. We will give a review of some of the available methods for estimating such quantities in the progressive illness‐death model conditionally (or not) on covariate measures. For some of these quantities estimators based on subsampling are employed. Subsampling, also referred to as landmarking, leads to small sample sizes and usually to heavily censored data leading to estimators with higher variability. To overcome this issue estimators based on a preliminary estimation (presmoothing) of the probability of censoring may be used. Among these, the presmoothed estimators for the cumulative incidences are new. We also introduce feasible estimation methods for the cumulative incidence function conditionally on covariate measures. The proposed methods are illustrated using real data. A comparative simulation study of several estimation approaches is performed and existing software in the form of R packages is discussed.  相似文献   

7.
The species–area relationship (SAR) constitutes one of the most general ecological patterns globally. A number of different SAR models have been proposed. Recent work has shown that no single model universally provides the best fit to empirical SAR datasets: multiple models may be of practical and theoretical interest. However, there are no software packages available that a) allow users to fit the full range of published SAR models, or b) provide functions to undertake a range of additional SAR‐related analyses. To address these needs, we have developed the R package ‘sars’ that provides a wide variety of SAR‐related functionality. The package provides functions to: a) fit 20 SAR models using non‐linear and linear regression, b) calculate multi‐model averaged curves using various information criteria, and c) generate confidence intervals using bootstrapping. Plotting functions allow users to depict and scrutinize the fits of individual models and multi‐model averaged curves. The package also provides additional SAR functionality, including functions to fit, plot and evaluate the random placement model using a species–sites abundance matrix, and to fit the general dynamic model of oceanic island biogeography. The ‘sars’ R package will aid future SAR research by providing a comprehensive set of simple to use tools that enable in‐depth exploration of SARs and SAR‐related patterns. The package has been designed to allow other researchers to add new functions and models in the future and thus the package represents a resource for future SAR work that can be built on and expanded by workers in the field.  相似文献   

8.
I describe an open‐source R package, multimark , for estimation of survival and abundance from capture–mark–recapture data consisting of multiple “noninvasive” marks. Noninvasive marks include natural pelt or skin patterns, scars, and genetic markers that enable individual identification in lieu of physical capture. multimark provides a means for combining and jointly analyzing encounter histories from multiple noninvasive sources that otherwise cannot be reliably matched (e.g., left‐ and right‐sided photographs of bilaterally asymmetrical individuals). The package is currently capable of fitting open population Cormack–Jolly–Seber (CJS) and closed population abundance models with up to two mark types using Bayesian Markov chain Monte Carlo (MCMC) methods. multimark can also be used for Bayesian analyses of conventional capture–recapture data consisting of a single‐mark type. Some package features include (1) general model specification using formulas already familiar to most R users, (2) ability to include temporal, behavioral, age, cohort, and individual heterogeneity effects in detection and survival probabilities, (3) improved MCMC algorithm that is computationally faster and more efficient than previously proposed methods, (4) Bayesian multimodel inference using reversible jump MCMC, and (5) data simulation capabilities for power analyses and assessing model performance. I demonstrate use of multimark using left‐ and right‐sided encounter histories for bobcats (Lynx rufus) collected from remote single‐camera stations in southern California. In this example, there is evidence of a behavioral effect (i.e., trap “happy” response) that is otherwise indiscernible using conventional single‐sided analyses. The package will be most useful to ecologists seeking stronger inferences by combining different sources of mark–recapture data that are difficult (or impossible) to reliably reconcile, particularly with the sparse datasets typical of rare or elusive species for which noninvasive sampling techniques are most commonly employed. Addressing deficiencies in currently available software, multimark also provides a user‐friendly interface for performing Bayesian multimodel inference using capture–recapture data consisting of a single conventional mark or multiple noninvasive marks.  相似文献   

9.
In capture–recapture models, survival and capture probabilities can be modelled as functions of time‐varying covariates, such as temperature or rainfall. The Cormack–Jolly–Seber (CJS) model allows for flexible modelling of these covariates; however, the functional relationship may not be linear. We extend the CJS model by semi‐parametrically modelling capture and survival probabilities using a frequentist approach via P‐splines techniques. We investigate the performance of the estimators by conducting simulation studies. We also apply and compare these models with known semi‐parametric Bayesian approaches on simulated and real data sets.  相似文献   

10.
Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings.  相似文献   

11.
When two proteins diffuse together to form a bound complex, an intermediate is formed at the end‐point of diffusional association which is called the encounter complex. Its characteristics are important in determining association rates, yet its structure cannot be directly observed experimentally. Here, we address the problem of how to construct the ensemble of three‐dimensional structures which constitute the protein–protein diffusional encounter complex using available experimental data describing the dependence of protein association rates on mutation and on solvent ionic strength and viscosity. The magnitude of the association rates is fitted well using a variety of definitions of encounter complexes in which the two proteins are located at up to about 17 Å root‐mean‐squared distance from their relative arrangement in the bound complex. Analysis of the ionic strength dependence of bimolecular association rates shows that this is determined to a greater extent by the (protein charge) – (salt ion) separation distance than by the protein–protein charge separation distance. Consequently, ionic strength dependence of association rates provides little information about the geometry of the encounter complex. On the other hand, experimental data on electrostatic rate enhancement, mutation and viscosity dependence suggest a model of the encounter complex in which the two proteins form a subset of the contacts present in the bound complex and are significantly desolvated. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

12.
Summary Case–cohort sampling is a commonly used and efficient method for studying large cohorts. Most existing methods of analysis for case–cohort data have concerned the analysis of univariate failure time data. However, clustered failure time data are commonly encountered in public health studies. For example, patients treated at the same center are unlikely to be independent. In this article, we consider methods based on estimating equations for case–cohort designs for clustered failure time data. We assume a marginal hazards model, with a common baseline hazard and common regression coefficient across clusters. The proposed estimators of the regression parameter and cumulative baseline hazard are shown to be consistent and asymptotically normal, and consistent estimators of the asymptotic covariance matrices are derived. The regression parameter estimator is easily computed using any standard Cox regression software that allows for offset terms. The proposed estimators are investigated in simulation studies, and demonstrated empirically to have increased efficiency relative to some existing methods. The proposed methods are applied to a study of mortality among Canadian dialysis patients.  相似文献   

13.
14.
Zhiguo Li  Peter Gilbert  Bin Nan 《Biometrics》2008,64(4):1247-1255
Summary Grouped failure time data arise often in HIV studies. In a recent preventive HIV vaccine efficacy trial, immune responses generated by the vaccine were measured from a case–cohort sample of vaccine recipients, who were subsequently evaluated for the study endpoint of HIV infection at prespecified follow‐up visits. Gilbert et al. (2005, Journal of Infectious Diseases 191 , 666–677) and Forthal et al. (2007, Journal of Immunology 178, 6596–6603) analyzed the association between the immune responses and HIV incidence with a Cox proportional hazards model, treating the HIV infection diagnosis time as a right‐censored random variable. The data, however, are of the form of grouped failure time data with case–cohort covariate sampling, and we propose an inverse selection probability‐weighted likelihood method for fitting the Cox model to these data. The method allows covariates to be time dependent, and uses multiple imputation to accommodate covariate data that are missing at random. We establish asymptotic properties of the proposed estimators, and present simulation results showing their good finite sample performance. We apply the method to the HIV vaccine trial data, showing that higher antibody levels are associated with a lower hazard of HIV infection.  相似文献   

15.
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross‐tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log–linear models. The structure of a log–linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower‐order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high‐dimensional regression or classification procedures because, in addition to a high‐dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high‐dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower‐dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log–linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio‐medical problem in cancer research.  相似文献   

16.
Aerial survey is an important, widely employed approach for estimating free‐ranging wildlife over large or inaccessible study areas. We studied how a distance covariate influenced probability of double‐observer detections for birds counted during a helicopter survey in Canada’s central Arctic. Two observers, one behind the other but visually obscured from each other, counted birds in an incompletely shared field of view to a distance of 200 m. Each observer assigned detections to one of five 40‐m distance bins, guided by semi‐transparent marks on aircraft windows. Detections were recorded with distance bin, taxonomic group, wing‐flapping behavior, and group size. We compared two general model‐based estimation approaches pertinent to sampling wildlife under such situations. One was based on double‐observer methods without distance information, that provide sampling analogous to that required for mark–recapture (MR) estimation of detection probability, , and group abundance, , along a fixed‐width strip transect. The other method incorporated double‐observer MR with a categorical distance covariate (MRD). A priori, we were concerned that estimators from MR models were compromised by heterogeneity in due to un‐modeled distance information; that is, more distant birds are less likely to be detected by both observers, with the predicted effect that would be biased high, and biased low. We found that, despite increased complexity, MRD models (ΔAICc range: 0–16) fit data far better than MR models (ΔAICc range: 204–258). However, contrary to expectation, the more naïve MR estimators of were biased low in all cases, but only by 2%–5% in most cases. We suspect that this apparently anomalous finding was the result of specific limitations to, and trade‐offs in, visibility by observers on the survey platform used. While MR models provided acceptable point estimates of group abundance, their far higher stranded errors (0%–40%) compared to MRD estimates would compromise ability to detect temporal or spatial differences in abundance. Given improved precision of MRD models relative to MR models, and the possibility of bias when using MR methods from other survey platforms, we recommend avian ecologists use MRD protocols and estimation procedures when surveying Arctic bird populations.  相似文献   

17.
This article proposes improved numerical procedures for estimating parameters in a spatiotemporal lattice model introduced for the analysis of cortical activities monitored from arrays of diodes. The numerical algorithms are based on approximations inspired by statistical physics. Both Gibbsian and mean-field approximations are used; they allow for computing local conditional probabilities inside the lattice. The statistical procedures rely on the computation of pseudomaximum-likelihood estimators. The estimators are evaluated on the basis of Monte Carlo simulations. These simulations show that mean-field approximations are useful for reducing the variance of estimators when the data are recorded from arrays of 144 diodes (which are in accordance with standard practice). In light of these improved methods, we give new interpretations for a data set obtained from optical recording of a Guinea pig's auditory cortex in response to pure tone stimulations.  相似文献   

18.
Spider monkeys (Ateles sp.) live in a flexible fission–fusion social system in which members of a social group are not in constant association, but instead form smaller subgroups of varying size and composition. Patterns of range use in spider monkeys have been described as sex‐segregated, with males and females often ranging separately, females utilizing core areas that encompass only a fraction of the entire community range, and males using much larger portions of the community range that overlap considerably with the core areas of females and other males. Males are also reported to use the boundary areas of community home ranges more often than females. Spider monkeys thus seem to parallel the “male‐bonded” patterns of ranging and association found among some groups of chimpanzees. Over several years of research on one group of spider monkeys (Ateles belzebuth) in Yasuní National Park, Ecuador, we characterized the ranging patterns of adult males and females and evaluated the extent to which they conform to previously reported patterns. In contrast to ranging patterns seen at several other spider monkey sites, the ranges of our study females overlapped considerably, with little evidence of exclusive use of particular areas by individual monkeys. Average male and female home range size was comparable, and males and females were similar in their use of boundary areas. These ranging patterns are similar to those of “bisexually bonded” groups of chimpanzees in West Africa. We suggest that the less sex‐segregated ranging patterns seen in this particular group of spider monkeys may be owing to a history of human disturbance in the area and to lower genetic relatedness between males, highlighting the potential for flexibility some aspects of the spider monkeys' fission–fusion social system. Am. J. Primatol. 72:129–141, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
In this issue of Molecular Ecology, Neuwald & Templeton (2013) report on a 22‐year study of natural populations of Collared Lizards (Crotaphytus collaris) that evolved on isolated on rock outcrops (‘glades’) in the Ozark Mountains in eastern Missouri. This ecosystem was originally maintained by frequent fires that kept the forest understory open, but fire‐suppression was adopted as official policy in about 1945, which led to a loss of native biodiversity, including local extinctions of some lizard populations. Policies aimed at restoring biodiversity included controlled burns and re‐introductions of lizards to some glades, which began in 1984. Populations were monitored from 1984–2006, and demographic and genetic data collected from 1 679 lizards were used to documents shifts in meta‐population dynamics over four distinct phases of lizard recovery: 1–an initial translocation of lizards drawn from the same source populations onto three glades that were likely part of one meta‐population; 2–a period of isolation and genetic drift associated with the absence of fires; 3–a period of rapid colonization and population increase following restoration of fire; and 4–stabilization of the meta‐population under regular prescribed burning. This study system thus provides a rare opportunity to characterize the dynamics of a landscape‐scale management strategy on the restoration of the meta‐population of a reintroduced species; long‐term case studies of the extinction, founding, increase, and stabilization of a well‐defined meta‐population, based on both demographic and population genetic data, are rare in the conservation, ecological, and evolutionary literature.  相似文献   

20.
Estimating population density as precise as possible is a key premise for managing wild animal species. This can be a challenging task if the species in question is elusive or, due to high quantities, hard to count. We present a new, mathematically derived estimator for population size, where the estimation is based solely on the frequency of genetically assigned parent–offspring pairs within a subsample of an ungulate population. By use of molecular markers like microsatellites, the number of these parent–offspring pairs can be determined. The study's aim was to clarify whether a classical capture–mark–recapture (CMR) method can be adapted or extended by this genetic element to a genetic‐based capture–mark–recapture (g‐CMR). We numerically validate the presented estimator (and corresponding variance estimates) and provide the R‐code for the computation of estimates of population size including confidence intervals. The presented method provides a new framework to precisely estimate population size based on the genetic analysis of a one‐time subsample. This is especially of value where traditional CMR methods or other DNA‐based (fecal or hair) capture–recapture methods fail or are too difficult to apply. The DNA source used is basically irrelevant, but in the present case the sampling of an annual hunting bag is to serve as data basis. In addition to the high quality of muscle tissue samples, hunting bags provide additional and essential information for wildlife management practices, such as age, weight, or sex. In cases where a g‐CMR method is ecologically and hunting‐wise appropriate, it enables a wide applicability, also through its species‐independent use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号