首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
phylin is a package for the r programming environment which offers different methods to spatially interpolate genetic information from phylogeographic data. These interpolations can be used to predict the spatial occurrence of different lineages within a phylogeny using a modified method of kriging, which allows the usage of a genetic distance matrix to derive a model of spatial dependence. phylin improves the available methods to generate interpolated surfaces from a phylogenetic trees by assessing the autocorrelation structure of the genetic information, interpolating the genetic data based on a statistical model, estimating the uncertainty of the predictions and identifying lineage occurrence and contact zones probability without projection of pairwise genetic distances into mid‐points between sample locations. The package also includes methods to plot interpolation surfaces and provide summary tables from the generated data and models. We provide an example of the usefulness of this tool by inferring the spatial occurrence of distinct historical evolutionary lineages of the Lataste's viper (Vipera latastei Boscá, 1878) in the Iberian Peninsula and identifying potential contact areas. The maps of phylogenetic patterns obtained with these methods provide a spatial context to test hypotheses related to processes underlying the geographic distribution of genetic diversity and to inform conservation planning.  相似文献   

2.
Variable selection is critical in competing risks regression with high-dimensional data. Although penalized variable selection methods and other machine learning-based approaches have been developed, many of these methods often suffer from instability in practice. This paper proposes a novel method named Random Approximate Elastic Net (RAEN). Under the proportional subdistribution hazards model, RAEN provides a stable and generalizable solution to the large-p-small-n variable selection problem for competing risks data. Our general framework allows the proposed algorithm to be applicable to other time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We show that variable selection and parameter estimation improved markedly using the new computationally intensive algorithm through extensive simulations. A user-friendly R package RAEN is developed for public use. We also apply our method to a cancer study to identify influential genes associated with the death or progression from bladder cancer.  相似文献   

3.
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp ). Results Model selection based on cross‐validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross‐validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.  相似文献   

4.
SUMMARY: The development of statistical models linking the molecular state of a cell to its physiology is one of the most important tasks in the analysis of Functional Genomics data. Because of the large number of variables measured a comprehensive evaluation of variable subsets cannot be performed with available computational resources. It follows that an efficient variable selection strategy is required. However, although software packages for performing univariate variable selection are available, a comprehensive software environment to develop and evaluate multivariate statistical models using a multivariate variable selection strategy is still needed. In order to address this issue, we developed GALGO, an R package based on a genetic algorithm variable selection strategy, primarily designed to develop statistical models from large-scale datasets.  相似文献   

5.
Spatial partitioning methods correct for nonstationarity in spatially related data by partitioning the space into regions of local stationarity. Existing spatial partitioning methods can only estimate linear partitioning boundaries. This is inadequate for detecting an arbitrarily shaped anomalous spatial region within a larger area. We propose a novel Bayesian functional spatial partitioning (BFSP) algorithm, which estimates closed curves that act as partitioning boundaries around anomalous regions of data with a distinct distribution or spatial process. Our method utilizes transitions between a fixed Cartesian and moving polar coordinate system to model the smooth boundary curves using functional estimation tools. Using adaptive Metropolis-Hastings, the BFSP algorithm simultaneously estimates the partitioning boundary and the parameters of the spatial distributions within each region. Through simulation we show that our method is robust to shape of the target zone and region-specific spatial processes. We illustrate our method through the detection of prostate cancer lesions using magnetic resonance imaging.  相似文献   

6.
Ring width of a given year can be highly variable throughout the cross section of a stem. This is especially true for roots. Therefore, the entire circumference of tree rings is often needed for studies focusing on specific reactions of individual trees on certain environmental conditions. Also, ring reconstructions are of interest for biomass calculations estimated by the cross-sectional area. The aim of the study is thus to reconstruct tree rings of cross sections within a 3D root-surface model, which will be the basis for an upcoming 3D root-development model. A FARO ScanArm was used for the acquisition of the 3D root structure (Technologies Inc., 2010). Afterwards ring-width data was measured along 4 radii per cross section and the resulting ring boundaries were integrated into the 3D root model. A weighted interpolation algorithm was used to reconstruct entire ring-width profiles of the cross sections. The algorithm considered the ring-width variations of the adjacent radii as well as the outer shape of the cross section. Hence, the intention was to estimate ring width around the root circumference using ring widths measured along 4 radii and the surface dimensions of roots. Interpolated ring-width data was compared to the measured tree-ring data as a control for the developed interpolation algorithm. Comparisons between modelled and empirical values showed a mean absolute error of about 0.06 mm deviation, and with a few exceptions the growth patterns could be accurately simulated. This has permitted additional radii measurements to be replaced by model interpolations.  相似文献   

7.
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero‐inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open‐source R package mpath .  相似文献   

8.
Large sample theory of semiparametric models based on maximum likelihood estimation (MLE) with shape constraint on the nonparametric component is well studied. Relatively less attention has been paid to the computational aspect of semiparametric MLE. The computation of semiparametric MLE based on existing approaches such as the expectation‐maximization (EM) algorithm can be computationally prohibitive when the missing rate is high. In this paper, we propose a computational framework for semiparametric MLE based on an inexact block coordinate ascent (BCA) algorithm. We show theoretically that the proposed algorithm converges. This computational framework can be applied to a wide range of data with different structures, such as panel count data, interval‐censored data, and degradation data, among others. Simulation studies demonstrate favorable performance compared with existing algorithms in terms of accuracy and speed. Two data sets are used to illustrate the proposed computational method. We further implement the proposed computational method in R package BCA1SG , available at CRAN.  相似文献   

9.
《Mathematical biosciences》1986,79(2):155-170
A new least squares estimation method for Bezier polynomial curves and surfaces is described and illustrated. The Cartesian coordinates of the vertices of polygons defining the curves separated well natural populations of Anodonta cygnea L. and human sagittal profiles of different sexes and age classes. Multivariate comparisons of the coordinates of Bezier polygon vertices and Euclidean distance measures showed that the polygon coordinates revealed shape differences better than distances between homologous points on the curves. Further, polygon coordinates separated groups for more than two variables as well as or better than the equivalent number distances. In addition, polygon coordinates permit construction of mean shapes and their variances. Possible applications in trend surface analyses and for illustration in computer-aided identification programs are suggested.  相似文献   

10.
SUMMARY: SScore is an R package that facilitates the comparison of gene expression between Affymetrix GeneChips using the S-score algorithm. The S-score algorithm uses probe level data directly to assess differences in gene expression, without requiring a preliminary separate step of probe set expression summary estimation. Therefore, the algorithm avoids introduction of error associated with the expression summary estimation process and has been demonstrated to improve the accuracy of identifying differentially expressed genes. The S-score produces accurate results even when few or no replicates are available. AVAILABILITY: The R package SScore is available from Bioconductor at http://www.bioconductor.org  相似文献   

11.
Because most macroecological and biodiversity data are spatially autocorrelated, special tools for describing spatial structures and dealing with hypothesis testing are usually required. Unfortunately, most of these methods have not been available in a single statistical package. Consequently, using these tools is still a challenge for most ecologists and biogeographers. In this paper, we present sam (Spatial Analysis in Macroecology), a new, easy-to-use, freeware package for spatial analysis in macroecology and biogeography. Through an intuitive, fully graphical interface, this package allows the user to describe spatial patterns in variables and provides an explicit spatial framework for standard techniques of regression and correlation. Moran's I autocorrelation coefficient can be calculated based on a range of matrices describing spatial relationships, for original variables as well as for residuals of regression models, which can also include filtering components (obtained by standard trend surface analysis or by principal coordinates of neighbour matrices). sam also offers tools for correcting the number of degrees of freedom when calculating the significance of correlation coefficients. Explicit spatial modelling using several forms of autoregression and generalized least-squares models are also available. We believe this new tool will provide researchers with the basic statistical tools to resolve autocorrelation problems and, simultaneously, to explore spatial components in macroecological and biogeographical data. Although the program was designed primarily for the applications in macroecology and biogeography, most of sam 's statistical tools will be useful for all kinds of surface pattern spatial analysis. The program is freely available at http://www.ecoevol.ufg.br/sam (permanent URL at http://purl.oclc.org/sam/ ).  相似文献   

12.
Geometric morphometrics is the statistical analysis of form based on Cartesian landmark coordinates. After separating shape from overall size, position, and orientation of the landmark configurations, the resulting Procrustes shape coordinates can be used for statistical analysis. Kendall shape space, the mathematical space induced by the shape coordinates, is a metric space that can be approximated locally by a Euclidean tangent space. Thus, notions of distance (similarity) between shapes or of the length and direction of developmental and evolutionary trajectories can be meaningfully assessed in this space. Results of statistical techniques that preserve these convenient properties—such as principal component analysis, multivariate regression, or partial least squares analysis—can be visualized as actual shapes or shape deformations. The Procrustes distance between a shape and its relabeled reflection is a measure of bilateral asymmetry. Shape space can be extended to form space by augmenting the shape coordinates with the natural logarithm of Centroid Size, a measure of size in geometric morphometrics that is uncorrelated with shape for small isotropic landmark variation. The thin-plate spline interpolation function is the standard tool to compute deformation grids and 3D visualizations. It is also central to the estimation of missing landmarks and to the semilandmark algorithm, which permits to include outlines and surfaces in geometric morphometric analysis. The powerful visualization tools of geometric morphometrics and the typically large amount of shape variables give rise to a specific exploratory style of analysis, allowing the identification and quantification of previously unknown shape features.  相似文献   

13.
Creating visually pleasing graphs in data visualization programs such as Matlab is surprisingly challenging. One common problem is that the positions and sizes of non-data elements such as textual annotations must typically be specified in either data coordinates or in absolute paper coordinates, whereas it would be more natural to specify them using a combination of these coordinate systems. I propose a framework in which it is easy to express, e.g., “this label should appear 2 mm to the right of the data point at (3, 2)” or “this arrow should point to the datum at (2, 1) and be 5 mm long.” I describe an algorithm for the correct layout of graphs of arbitrary complexity with automatic axis scaling within this framework. An implementation is provided in the form of a complete 2D plotting package that can be used to produce publication-quality graphs from within Matlab or Octave.  相似文献   

14.
The exact area calculation of irregularly distributed data is in the focus of all territorial geochemical balancing methods or definition of protection zones. Especially in the deep-sea environment the interpolation of measurements into surfaces represents an important gain of information, because of cost- and time-intensive data acquisition. The geostatistical interpolation method indicator kriging therefore is applied for an accurate mapping of the spatial distribution of benthic communities following a categorical classification scheme at the deep-sea submarine Håkon Mosby Mud Volcano. Georeferenced video mosaics were obtained during several dives by the Remotely Operated Vehicle Victor6000 in a water depth of 1260 m. Mud volcanoes are considered as significant source locations for methane indicated by unique chemoautotrophic communities as Beggiatoa mats and pogonophoran tube worms. For the detection and quantification of their spatial distribution 2840 georeferenced video mosaics were analysed by visual inspection. Polygons, digitised on the georeferenced images within a GIS, build the data basis for geostatistically interpolated mono-parametric surface maps. Indicator kriging is applied to the centroids of the polygons calculating surface maps. The quality assessment of the surface maps is conducted by leave-one-out cross-validation evaluating the fit of the indicator kriging variograms by using statistical mean values. Furthermore, the estimate was evaluated by a validation dataset of the visual inspection of 530 video mosaics not included within the interpolation, thus, proving the interpolated surfaces independently. With regard to both validating mechanisms, we attained satisfying results and we provided each category applied for the identification of biogeochemical habitats with a percentage probability value of occurrence.  相似文献   

15.
Species richness patterns of tiger beetles (Coleoptera: Cicindelidae) were analyzed using a grid of 407 squares (137.5 km per side) across northwestern South America (Guyana, Venezuela, Colombia, Ecuador, Peru, Bolivia and western Brazil). Reliable data on species numbers were available for only 149 of the squares. Using a trend surface model (a model used to represent the mean of a spatial process by a polynomial function of spatial coordinates) as well as altitudinal relief and biogeographical influence for each square, we predicted the number of tiger beetle species likely to occur in intermediate squares for which no or unreliable data were available. The resultant spatial patterns of species richness were compared to similar analyses for temperate areas of North America. Intercontinental comparisons and a more complete pattern of species numbers in South America are useful in developing an understanding of general spatial patterns and in the environmental management of species richness.  相似文献   

16.
熊伟  杨红龙  冯颖竹 《生态学报》2010,30(18):5050-5058
作物模型区域模拟已成为作物模型应用的一个新方向。运用作物模型进行区域研究时,遇到的问题之一就是输入模型的空间数据质量问题,研究不同空间内插法获得的气象数据对作物模型区域模拟结果的影响,可以为区域模拟对输入数据的敏感性研究提供一定的参考。利用区域校准的CERES-Maize模型,将3类内插方法(几何内插、统计内插、动力模型内插)产生的网格化天气数据分别输入到CERES-Maize模型中,模拟了50km×50km网格水平下1961—1990年我国玉米生产状况,并选取1980—1990年模拟的平均产量与同期农调队调查产量进行比较,以了解区域模拟中,不同空间内插方法所得的逐日气象数据对区域模拟结果的影响。结果表明:(1)作物模型区域应用时,所采用的3种内插方法都能满足作物模型区域模拟对网格化天气数据的要求,采用3种天气数据的区域模拟结果都能反映出玉米平均产量的空间变化特征,与网格调查平均产量之间具有极显著的相关关系,但采用不同内插天气数据对模拟结果造成了8%以内的偏差。(2)采用不同内插天气数据,在进行作物区域模拟时,各方法的模拟结果之间呈极显著的相关关系,但这些模拟结果之间,在全国大部分地区是差异显著。  相似文献   

17.
Zimmerman DL 《Biometrics》2008,64(1):262-270
Summary .   The estimation of spatial intensity is an important inference problem in spatial epidemiologic studies. A standard data assimilation component of these studies is the assignment of a geocode, that is, point-level spatial coordinates, to the address of each subject in the study population. Unfortunately, when geocoding is performed by the standard automated method of street-segment matching to a georeferenced road file and subsequent interpolation, it is rarely completely successful. Typically, 10–30% of the addresses in the study population, and even higher percentages in particular subgroups, fail to geocode, potentially leading to a selection bias, called geographic bias, and an inefficient analysis. Missing-data methods could be considered for analyzing such data; however, because there is almost always some geographic information coarser than a point (e.g., a Zip code) observed for the addresses that fail to geocode, a coarsened-data analysis is more appropriate. This article develops methodology for estimating spatial intensity from coarsened geocoded data. Both nonparametric (kernel smoothing) and likelihood-based estimation procedures are considered. Substantial improvements in the estimation quality of coarsened-data analyses relative to analyses of only the observations that geocode are demonstrated via simulation and an example from a rural health study in Iowa.  相似文献   

18.
19.
利用辅助变量对污染土壤锌分布的克里格估值   总被引:10,自引:2,他引:10  
采用协同克里格及与回归相结合的克里格方法,以36个下层(10~20 cm)土壤锌数据为目标变量、另36个下层土壤锌数据为校验数据、72个上层(0~10 cm)土壤锌为辅助变量,对沈阳市南郊某有色金属加工厂附近农田0.1 mol·L-1 HCl浸提态土壤锌进行插值分析,并对这两种利用辅助变量的克里格方法在土壤锌空间分布研究中的适用性进行评价.结果表明,与回归相结合的克里格的估值效果明显优于协同克里格及普通克里格法.结合回归模型的变异函数理论模型决定系数较大、残差较小,估值精度比普通克里格法提高4%,且基于回归克里格的土壤锌分布图与利用72个样点普通克里格插值图具有高度的相似性.而协同克里格与普通克里格相比未表现出明显优势.借助辅助变量,采用基于回归模型的克里格方法是进行土壤重金属空间分布估值的一种有效方法.  相似文献   

20.
Different spatial interpolation techniques have been applied to construct objective bioclimatic maps of La Palma, Canary Islands. Interpolation of climatic data on this topographically complex island with strong elevation and climatic gradients represents a challenge. Furthermore, meteorological stations are not evenly distributed over the island, with few stations at high elevations. We carried out spatial interpolations of the compensated thermicity index (Itc) and the annual ombrothermic Index (Io), in order to obtain appropriate bioclimatic maps by using automatic interpolation procedures, and to establish their relation to potential vegetation units for constructing a climatophilous potential natural vegetation map (CPNV). For this purpose, we used five interpolation techniques implemented in a GIS: inverse distance weighting (IDW), ordinary kriging (OK), ordinary cokriging (OCK), multiple linear regression (MLR) and MLR followed by ordinary kriging of the regression residuals. Two topographic variables (elevation and aspect), derived from a high-resolution digital elevation model (DEM), were included in OCK and MLR. The accuracy of the interpolation techniques was examined by the results of the error statistics of test data derived from comparison of the predicted and measured values. Best results for both bioclimatic indices were obtained with the MLR method with interpolation of the residuals showing the highest R 2 of the regression between observed and predicted values and lowest values of root mean square errors. MLR with correction of interpolated residuals is an attractive interpolation method for bioclimatic mapping on this oceanic island since it permits one to fully account for easily available geographic information but also takes into account local variation of climatic data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号