首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data from large-scale biological inventories are essential for understanding and managing Earth's ecosystems. The Forest Inventory and Analysis Program (FIA) of the U.S. Forest Service is the largest biological inventory in North America; however, the FIA inventory recently changed from an amalgam of different approaches to a nationally-standardized approach in 2000. Full use of both data sets is clearly warranted to target many pressing research questions including those related to climate change and forest resources. However, full use requires lumping FIA data from different regionally-based designs (pre-2000) and/or lumping the data across the temporal changeover. Combining data from different inventory types must be approached with caution as inventory types represent different probabilities of detecting trees per sample unit, which can ultimately confound temporal and spatial patterns found in the data. Consequently, the main goal of this study is to evaluate the effect of inventory on a common analysis in ecology, modeling of climatic niches (or species-climate relations). We use non-parametric multiplicative regression (NPMR) to build and compare niche models for 41 tree species from the old and new FIA design in the Pacific coastal United States. We discover two likely effects of inventory on niche models and their predictions. First, there is an increase from 4 to 6% in random error for modeled predictions from the different inventories when compared to modeled predictions from two samples of the same inventory. Second, systematic error (or directional disagreement among modeled predictions) is detectable for 4 out of 41 species among the different inventories: Calocedrus decurrens, Pseudotsuga menziesii, and Pinus ponderosa, and Abies concolor. Hence, at least 90% of niche models and predictions of probability of occurrence demonstrate no obvious effect from the change in inventory design. Further, niche models built from sub-samples of the same data set can yield systematic error that rivals systematic error in predictions for models built from two separate data sets. This work corroborates the pervasive and pressing need to quantify different types of error in niche modeling to address issues associated with data quality and large-scale data integration.  相似文献   

2.
《Ecological Indicators》2002,1(3):139-153
Information on the amount, distribution, and characteristics of coarse woody debris (CWD) in forest ecosystems is in high demand by wildlife biologists, fire specialists, and ecologists. In its important role in wildlife habitat, fuel loading, forest productivity, and carbon sequestration, CWD is an indicator of forest health. Because of this, the USDA Forest Service Pacific Northwest Research Station’s Forest Inventory and Analysis (FIA) program recognized the need to collect data on CWD in their extensive resource inventories. This paper describes a sampling method, measurement protocols, and estimation procedures to collect and compile data on CWD attributes within FIA’s forest inventory. The line-intersect method was used to sample CWD inside the boundaries of the standard inventory field plot. Previously published equations were customized to allow for easy calculation of per-unit-area values, such as biomass and carbon per hectare, log density per hectare, or volume per hectare, for each plot. These estimates are associated with all other information recorded or calculated for an inventory plot. This allows for indepth analysis of CWD data in relation to stand level characteristics. The data on CWD can be used to address current, relevant issues such as criteria no. 5 outlined in the 1994 Montreal process and the 1995 Santiago declaration. This criteria assesses the contribution of forests to the global carbon cycle by measuring such indicators as CWD, live plant biomass, and soil carbon.  相似文献   

3.
Pan W 《Biometrics》2000,56(1):199-203
We propose a general semiparametric method based on multiple imputation for Cox regression with interval-censored data. The method consists of iterating the following two steps. First, from finite-interval-censored (but not right-censored) data, exact failure times are imputed using Tanner and Wei's poor man's or asymptotic normal data augmentation scheme based on the current estimates of the regression coefficient and the baseline survival curve. Second, a standard statistical procedure for right-censored data, such as the Cox partial likelihood method, is applied to imputed data to update the estimates. Through simulation, we demonstrate that the resulting estimate of the regression coefficient and its associated standard error provide a promising alternative to the nonparametric maximum likelihood estimate. Our proposal is easily implemented by taking advantage of existing computer programs for right-censored data.  相似文献   

4.
In this study, the availability of the Ovine HD SNP BeadChip (HD‐chip) and the development of an imputation strategy provided an opportunity to further investigate the extent of linkage disequilibrium (LD) at short distances in the genome of the Spanish Churra dairy sheep breed. A population of 1686 animals, including 16 rams and their half‐sib daughters, previously genotyped for the 50K‐chip, was imputed to the HD‐chip density based on a reference population of 335 individuals. After assessing the imputation accuracy for beagle v4.0 (0.922) and fimpute v2.2 (0.921) using a cross‐validation approach, the imputed HD‐chip genotypes obtained with beagle were used to update the estimates of LD and effective population size for the studied population. The imputed genotypes were also used to assess the degree of homozygosity by calculating runs of homozygosity and to obtain genomic‐based inbreeding coefficients. The updated LD estimations provided evidence that the extent of LD in Churra sheep is even shorter than that reported based on the 50K‐chip and is one of the shortest extents compared with other sheep breeds. Through different comparisons we have also assessed the impact of imputation on LD and effective population size estimates. The inbreeding coefficient, considering the total length of the run of homozygosity, showed an average estimate (0.0404) lower than the critical level. Overall, the improved accuracy of the updated LD estimates suggests that the HD‐chip, combined with an imputation strategy, offers a powerful tool that will increase the opportunities to identify genuine marker‐phenotype associations and to successfully implement genomic selection in Churra sheep.  相似文献   

5.
For confidentiality reasons, US federal death certificate data are incomplete with regards to the dates of birth and death for the decedents, making calculation of total lifetime of a decedent impossible and thus estimation of mortality incidence difficult. This paper proposes the use of natality data and an imputation‐based method to estimate age‐specific mortality incidence rates in the face of this missing information. By utilizing previously determined probabilities of birth, a birth date and death date are imputed for every decedent in the dataset. Thus, the birth cohort of each individual is imputed, and the total on‐study time can be calculated. This idea is implemented in two approaches for estimation of mortality incidence rates. The first is an extension of a person‐time approach, while the second is an extension of a life table approach. Monte Carlo simulations showed that both approaches perform well in comparison to the ideal complete data methods, but that the person‐time method is preferred. An application to Tay–Sachs disease is demonstrated. It is concluded that the imputation methods proposed provide valid estimates of the incidence of death from death certificate data without the need for additional assumptions under which usual mortality rates provide valid estimates.  相似文献   

6.
森林生态系统碳平衡估测方法及其研究进展   总被引:20,自引:0,他引:20       下载免费PDF全文
综述了全球范围内森林生态系统碳平衡估测的 2种主要方法 ,即测定表面通量的微气象学方法 (涡相关法 )和生物量清单统计方法。指出了每种方法的优缺点及综合运用各种方法的重要性。简要介绍了应用上述方法对森林生态系统碳平衡研究的进展情况 ,并对今后森林生态系统碳平衡研究的发展趋势进行了探讨。  相似文献   

7.
Multiple imputation has become a widely accepted technique to deal with the problem of incomplete data. Typically, imputation of missing values and the statistical analysis are performed separately. Therefore, the imputation model has to be consistent with the analysis model. If the data are analyzed with a mixture model, the parameter estimates are usually obtained iteratively. Thus, if the data are missing not at random, parameter estimation and treatment of missingness should be combined. We solve both problems by simultaneously imputing values using the data augmentation method and estimating parameters using the EM algorithm. This iterative procedure ensures that the missing values are properly imputed given the current parameter estimates. Properties of the parameter estimates were investigated in a simulation study. The results are illustrated using data from the National Health and Nutrition Examination Survey.  相似文献   

8.
Gaussian mixture clustering and imputation of microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. RESULTS: Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.  相似文献   

9.
辽宁东部山区林地生态分类系统   总被引:1,自引:2,他引:1  
以数字高程模型和SPOT-5遥感数据为数据源,通过地理信息系统的空间分析,在辽宁东部山区有代表性实验区域内,完成了生态土地类型(ELT)和生态土地类型相(ELTP)两个等级的分类和绘图,构成了生态分类系统(ECS)等级结构中最低的两个层次.实验区域内共得到5种ELT和34种ELTP.ELT的划分以环境特征为基础,表征了植被分布的潜在状况和森林生态系统潜在的生产力.ELTP是对ELT的再划分,是生态分类系统的最小单元,相当于我国林业区划中的小班.ELTP既包含ELT中的环境信息又包含现存植被构成的信息,具有空间精准、生态学含义明确的优点.在森林管理活动中,采用ELTP代替小班可以从景观尺度科学地指导森林经营计划编制,调整森林管理方式,实现对森林生态系统的有效管理.  相似文献   

10.
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.  相似文献   

11.
Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.  相似文献   

12.
Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen’s kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants.  相似文献   

13.
Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women’s Health Initiative Study, provide further support to the validity and efficiency of our proposed methods.  相似文献   

14.
Wetland classification and inventory: A summary   总被引:2,自引:0,他引:2  
Regional, national and local wetland classifications have been developed and successfully applied. These have invariably been orientated towards conservation and management goals, and the information used to assess wetland loss or to assign management priorities. Existing national and regional classification systems have not only been useful, but they provide an essential base for developing an international system. At the international level, differences among existing systems in the definition of a wetland and how wetland types are defined assume great importance and need to be resolved. Classification is an essential prerequisite for wetland inventory. A number of international inventories have been undertaken, although these have not generally utilized the available high technology and data storage systems available through remote sensing and geographic information systems. More extensive international inventories will require standardization of techniques for data collection, storage and dissemination. A minimum data set needs to be defined with standards for data accuracy. An international committee under the auspices of an international agency (e.g. IWRB, Ramsar Bureau, IUCN) needs to be established to develop an international classification system and guidelines for carrying out a complete inventory of the world's wetlands.  相似文献   

15.
Summary Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated‐measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques ( Rubin, 1987 , in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within‐ and between‐imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply ( Kenward and Roger, 1997 , Biometrics 53, 983–997) degrees of freedom to account for the uncertainty associated with variance–covariance parameter estimates for the repeated measures model.  相似文献   

16.
Forest inventories are largely neglected in the debate of national parks selection in Guyana (and probably elsewhere). Because taxonomic data are often scant and biased towards are as of high collecting effort, large scale forest inventory data can be a useful tool adding to a knowledge database for forests. In this paper the use of forest inventories to select national parks in Guyana is assessed. With the data of a large scale inventory five forest regions could be distinguished and two were added on the base of existing other information. Forest composition in Guyana is largely determined by geology at a national level and soil type at regional level. Species diversity is higher in the south of Guyana, possibly due to higher disturbance and is also higher on the better soils. It is concluded that a selection of national parks in Guyana should include a sample of all seven regions, including as much soil variation as possible. Because of land use conflicts in central Guyana, this area is in need of quick attention of Guyana's policy makers.  相似文献   

17.
乌苏里江国家森林公园规划方案的景观指数辅助评价   总被引:8,自引:0,他引:8  
通过对比分析景观指数的变化情况,评价了乌苏里江国家森林公园总体规划方案的合理性,计算发现规划前后斑块数从413减少到401,香农均匀度指数从0.755提高到0.787。优势度指数从25.547减少至24.500,景观破碎化程度降低,反映廓道特征的r指数、a指数和主要景点之间的景观引力值都有所增加,表明规划改善了公园的景观空间结构,提高了景观系统的稳定性和风景质量,有利于生态环境保护和生态旅游的发展。  相似文献   

18.
Assessing the effects of human land use and management decisions requires an understanding of how temporal changes in biodiversity influence the rate of ecosystem functions and subsequent delivery of ecosystem services. In highly modified anthromes, the spatial distribution of natural vegetation types is often unknown or coarsely represented challenging comparative analyses seeking to assess changes in biodiversity and potential downstream effects on ecosystem processes and functions. In this context, the objectives of this study were to construct a multi-resolution representation of potential natural vegetation at four hierarchical classification levels of increasing floristic and physiognomic detail for the state of Minnesota, USA. Using a collection of natural/near-natural vegetation relevés, a series of Random Forest classification models were used to project the potential distribution of natural vegetation types based on their association with a variety of environmental variables.Model performance varied within and between classification levels with overall accuracy ranging between 64–99% (kappa 0.44–0.99). Model performance tended to decrease and become more variable with increasing floristic complexity at finer classification levels. Classwise performance metrics including precision and sensitivity were also reported. A method for exploring potential class confusion resulting from niche overlap using Random Forest proximities and Nonmetric Multidimensional Scaling is demonstrated.Collectively, the results presented here provide an analytically supported baseline representation of potential natural vegetation for the state of Minnesota, USA. These data can provide a backdrop to further analyses surrounding the influence of human activity on ecosystem processes and services as well as inform future conservation and restoration efforts.  相似文献   

19.
Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

20.
To test for association between a disease and a set of linked markers, or to estimate relative risks of disease, several different methods have been developed. Many methods for family data require that individuals be genotyped at the full set of markers and that phase can be reconstructed. Individuals with missing data are excluded from the analysis. This can result in an important decrease in sample size and a loss of information. A possible solution to this problem is to use missing-data likelihood methods. We propose an alternative approach, namely the use of multiple imputation. Briefly, this method consists in estimating from the available data all possible phased genotypes and their respective posterior probabilities. These posterior probabilities are then used to generate replicate imputed data sets via a data augmentation algorithm. We performed simulations to test the efficiency of this approach for case/parent trio data and we found that the multiple imputation procedure generally gave unbiased parameter estimates with correct type 1 error and confidence interval coverage. Multiple imputation had some advantages over missing data likelihood methods with regards to ease of use and model flexibility. Multiple imputation methods represent promising tools in the search for disease susceptibility variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号