首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
通常来讲,生态学者对于解释生态关系、描述格局和过程、进行空间或时间预测比较感兴趣。这些工作可以通过模拟输出值(响应)与一些特征值(即解释变量)的关系来实现。然而,生态数据模拟遇到了挑战,这是因为响应变量和预测变量可能是连续变量或离散变量。需要解释的生态关系通常是非线性的,并且解释变量之间具有复杂的相互作用关系。响应变量和解释变量存在缺失值并不是不常有的现象,奇异值也经常出现在生态数据中。此外,生态学者通常希望生态模型即要易于建立又易要于解释。通常是利用多种统计方法来分析处理各种各样情景中出现的独特的生态问题,这些模型包括(多元)逻辑回归、线性模型、生存模型、方差分析等等。随机森林是一个可以处理所有这些问题的有效方法。随机森林可以用来做分类、聚类、回归和生存分析、评估变量的重要性、检测数据中的奇异值、对缺失数据进行插补等。鉴于随机森林本身在算法上的优势,将就随机森林在生态学中的应用进行总结,对建模过程进行概述,并以云南松分布模拟研究为例,对其主要功能特点进行案例展示。通过对随机森林的一般术语、概念和建模思想进行介绍,有利于读者掌握本方法的应用本质,可以预见随机森林在生态学研究中将得到更多的应用和发展。  相似文献   

2.
Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology.  相似文献   

3.
Responses to hallucinogenic drugs, such as psilocybin, are believed to be critically dependent on the user's personality, current mood state, drug pre-experiences, expectancies, and social and environmental variables. However, little is known about the order of importance of these variables and their effect sizes in comparison to drug dose. Hence, this study investigated the effects of 24 predictor variables, including age, sex, education, personality traits, drug pre-experience, mental state before drug intake, experimental setting, and drug dose on the acute response to psilocybin. The analysis was based on the pooled data of 23 controlled experimental studies involving 409 psilocybin administrations to 261 healthy volunteers. Multiple linear mixed effects models were fitted for each of 15 response variables. Although drug dose was clearly the most important predictor for all measured response variables, several non-pharmacological variables significantly contributed to the effects of psilocybin. Specifically, having a high score in the personality trait of Absorption, being in an emotionally excitable and active state immediately before drug intake, and having experienced few psychological problems in past weeks were most strongly associated with pleasant and mystical-type experiences, whereas high Emotional Excitability, low age, and an experimental setting involving positron emission tomography most strongly predicted unpleasant and/or anxious reactions to psilocybin. The results confirm that non-pharmacological variables play an important role in the effects of psilocybin.  相似文献   

4.
Gustafson P  Le Nhu D 《Biometrics》2002,58(4):878-887
It is well known that imprecision in the measurement of predictor variables typically leads to bias in estimated regression coefficients. We compare the bias induced by measurement error in a continuous predictor with that induced by misclassification of a binary predictor in the contexts of linear and logistic regression. To make the comparison fair, we consider misclassification probabilities for a binary predictor that correspond to dichotomizing an imprecise continuous predictor in lieu of its precise counterpart. On this basis, nondifferential binary misclassification is seen to yield more bias than nondifferential continuous measurement error. However, it is known that differential misclassification results if a binary predictor is actually formed by dichotomizing a continuous predictor subject to nondifferential measurement error. When the postulated model linking the response and precise continuous predictor is correct, this differential misclassification is found to yield less bias than continuous measurement error, in contrast with nondifferential misclassification, i.e., dichotomization reduces the bias due to mismeasurement. This finding, however, is sensitive to the form of the underlying relationship between the response and the continuous predictor. In particular, we give a scenario where dichotomization involves a trade-off between model fit and misclassification bias. We also examine how the bias depends on the choice of threshold in the dichotomization process and on the correlation between the imprecise predictor and a second precise predictor.  相似文献   

5.
We consider the problem of finding a subnetwork in a given biological network (i.e. target network) that is most similar to a given small query network. We aim to find the optimal solution (i.e. the subnetwork with the largest alignment score) with a provable confidence bound. There is no known polynomial time solution to this problem in the literature. Alon et al. has developed a state-of-the-art coloring method that reduces the cost of this problem. This method randomly colors the target network prior to alignment for many iterations until a user-supplied confidence is reached. Here we develop a novel coloring method, named k-hop coloring (k is a positive integer), that achieves a provable confidence value in a small number of iterations without sacrificing the optimality. Our method considers the color assignments already made in the neighborhood of each target network node while assigning a color to a node. This way, it preemptively avoids many color assignments that are guaranteed to fail to produce the optimal alignment. We also develop a filtering method that eliminates the nodes that cannot be aligned without reducing the alignment score after each coloring instance. We demonstrate both theoretically and experimentally that our coloring method outperforms that of Alon et al., which is also used by a number network alignment methods, including QPath and QNet, by a factor of three without reducing the confidence in the optimality of the result. Our experiments also suggest that the resulting alignment method is capable of identifying functionally enriched regions in the target network successfully.  相似文献   

6.

Land-use practices in Mongolia can lead to environmental degradation and consequently affect the structure and function of biological communities. The main aim of this study was to determine land-use effects on freshwater macroinvertebrate communities based on their response to grazing and mining, using a trait-based approach (TBA). The functional structure of macroinvertebrate communities was examined using 86 categories of 16 traits. A total of 13 physical and chemical variables were significantly different among the levels of land-use intensity. Significant declines in functional diversity were observed with increased land-use intensity. The community weighted mean of 19 trait categories for 11 traits varied significantly among different levels of land-use intensity. Traits were significantly explained by environmental variables across a land-use intensity gradient. Water temperature, gravel, nitrate, silt, and cobble were the main predictor variables and explained 28% of the total variance of the trait variation. The functional structure of the macroinvertebrate community was strongly related to environmental conditions. The TBA is an important method in assessing disturbance responses in freshwater communities of steppe and taiga regions, such as in Mongolia and other countries in Central Asia and will be useful in finding best management practices for conserving aquatic ecosystems.

  相似文献   

7.
Around 1970, the author proposed a general theoretical approach to multiple decision problems (MDPs) of which multiple comparison problems (MCPs) are special cases. Suppose that a sample space Χ is given together with a set of probability distributions P = {P(θ), θ ∈ Ω} defined over Χ. Let a finite partition of the parameter space Ω = cupω(a), a ∈ A be given. Based on the observation X ∈ X, an MDP is to decide, which ω(a) the true parameter θ belongs to. An MD confidence procedure is a mapping ψ from X to the class of subsets of A, such that the probability that cupω(a), ω(a) C ψ(X) includes the true parameter θ is not smaller than 1-α(θ) . Here, 1-α(θ) is called the level of the confidence procedures and may vary depending on θ∈ω(a) . The MP confidence procedures are derived from the following proposition. When the ω(a) 's are mutually disjoint, there is a one-to-one correspondence between an MD confidence procedure ψ and a collection of (non-randomized) tests for the hypotheses H(a)?: θ∈ω(a) with level α(a) by rejecting the hypothesis H(a) if ω(a) ? ψ(X). In this paper we discuss in detail the problems of determining the signs or the orderings of normal means. The resulting confidence procedures from the LR tests are seen to be too complicated and difficult to understand. We therefore propose simplified, less powerful methods. We define an overlapping partition of Ω into simple sets, such that the original ω(a) 's can be expressed as an intersection of such simple sets. For each such set we define rejection regions corresponding to the levels α, α/2,...,α/k. Then we obtain the acceptance regions for H(a) :?θ∈ω(a) given as the intersection of all acceptance regions for the simple sets containing ω(a) at the level α/k, if there are k such simple sets. This method can be extended to obtain sequential confidence procedures.  相似文献   

8.
Variables for predicting assemblage differences change as the geographic extent of studies change, hindering development of useful predictive models where study data are limited, or where the chief predictive variables available are fish zones, river size, physiographic regions, ecoregions, hydrologic units, and river basins. In addition, some studies have shown that site-scale predictor metrics have accounted for more of the variation in fish assemblage response metrics than catchment-scale metrics and other studies have shown the reverse. We used cluster analysis on a 780-site database to determine 12–15 aquatic vertebrate clusters at three geographic extents (all 12 conterminous western U.S. states, all western mountain ecoregions, Pacific Northwest mountain ecoregions). Next, we determined predictor variables for those assemblage clusters through use of stepwise discriminant function analysis. Site longitude, site latitude, and catchment dam count were the most significant predictors at the three geographic extents. Site-scale variables represented most of the significant predictors for all three geographic extents, but explained only slightly more aquatic vertebrate assemblage variance than catchment or pure spatial variables. Catchment- and site-scale classification variables accounted for less than half the mean within-cluster similarity demonstrated by the aquatic vertebrate assemblage clusters. We conclude that (a) the large geographic extent of the analysis did not result in catchment-scale predictor variables being more important than site-scale predictors, (b) both catchment- and site-scale variables are important predictors, and (c) existing river basin and ecoregion classifications are useful but insufficient predictors of aquatic vertebrate assemblages.  相似文献   

9.
We propose a multiple comparison procedure to identify the minimum effective dose level by sequentially comparing each dose level with the zero dose level in the dose finding test. If we can find the minimum effective dose level at an early stage in the sequential test, it is possible to terminate the procedure in the dose finding test after a few group observations up to the dose level. Thus, the procedure is viable from an economical point of view when high costs are involved in obtaining the observations. In the procedure, we present an integral formula to determine the critical values for satisfying a predefined type I familywise error rate. Furthermore, we show how to determine the required sample size in order to guarantee the power of the test in the procedure. In practice, we compare the power of the test and the required sample size for various configurations of the population means in simulation studies and adopt our sequential procedure to the dose response test in a case study.  相似文献   

10.
MOTIVATION: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no 'average biologist' client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. RESULTS: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.  相似文献   

11.
Catchment land uses, particularly agriculture and urban uses, have long been recognized as major drivers of nutrient concentrations in surface waters. However, few simple models have been developed that relate the amount of catchment land use to downstream freshwater nutrients. Nor are existing models applicable to large numbers of freshwaters across broad spatial extents such as regions or continents. This research aims to increase model performance by exploring three factors that affect the relationship between land use and downstream nutrients in freshwater: the spatial extent for measuring land use, hydrologic connectivity, and the regional differences in both the amount of nutrients and effects of land use on them. We quantified the effects of these three factors that relate land use to lake total phosphorus (TP) and total nitrogen (TN) in 346 north temperate lakes in 7 regions in Michigan, USA. We used a linear mixed modeling framework to examine the importance of spatial extent, lake hydrologic class, and region on models with individual lake nutrients as the response variable, and individual land use types as the predictor variables. Our modeling approach was chosen to avoid problems of multi-collinearity among predictor variables and a lack of independence of lakes within regions, both of which are common problems in broad-scale analyses of freshwaters. We found that all three factors influence land use-lake nutrient relationships. The strongest evidence was for the effect of lake hydrologic connectivity, followed by region, and finally, the spatial extent of land use measurements. Incorporating these three factors into relatively simple models of land use effects on lake nutrients should help to improve predictions and understanding of land use-lake nutrient interactions at broad scales.  相似文献   

12.
The aim of dose finding studies is sometimes to estimate parameters in a fitted model. The precision of the parameter estimates should be as high as possible. This can be obtained by increasing the number of subjects in the study, N, choosing a good and efficient estimation approach, and by designing the dose finding study in an optimal way. Increasing the number of subjects is not always feasible because of increasing cost, time limitations, etc. In this paper, we assume fixed N and consider estimation approaches and study designs for multiresponse dose finding studies. We work with diabetes dose–response data and compare a system estimation approach that fits a multiresponse Emax model to the data to equation‐by‐equation estimation that fits uniresponse Emax models to the data. We then derive some optimal designs for estimating the parameters in the multi‐ and uniresponse Emax model and study the efficiency of these designs.  相似文献   

13.
Understanding the factors that drive the global distribution of alien species is a pivotal issue in invasion biology. Here, we used data on naturalized conifers (Pinaceae, Cupressaceae) from sixty temperate and subtropical regions and five continents to test how environmental and socio‐economic conditions of recipient areas as well as introduction efforts affect naturalization probabilities. We collated 18 predictor variables for each region describing environmental, biogeographic and socio‐economic conditions as well as a measure of the macro‐climatic match with the species' native ranges, and the extent to which alien conifers are used in commercial forestry. Naturalization probabilities across all species and regions were then related to these predictor variables by means of generalized linear mixed models. For both Pinaceae and Cupressaceae, naturalization probabilities were generally higher in the Southern Hemisphere, and increased with indicators of habitat diversity of the recipient region. The match in macro‐climatic conditions between the native and introduced regions was a significant predictor of conifer naturalization, but socio‐economic variables were less powerful predictors. Only for Cupressaceae did a socio‐economic variable (human population density) affect naturalization probabilities. Key attributes facilitating naturalization were related to introduction effort. Moreover, usage in commercial forestry generally fostered naturalization, although the actual size of alien conifer plantations in a region was only correlated with the naturalization of Pinaceae. Our results suggest that climate matching, habitat diversity and introduction effort co‐determine the probability of naturalization, which additionally, is modulated by biogeographic features of the recipient area, such as incidence of natural enemies or competitors. To date, the most widely used tools for invasive plant risk assessment only account for climate match and rarely factor in other attributes of the recipient environment. Future tools should additionally consider biotic environment and introduction effort if risk assessment is to be effective.  相似文献   

14.
A method is given for analyzing a slope ratio assay in which a test drug is compared with a standard drug, two or more response variates being measured on each subject at each of several successively increased drug doses. The method requires all subjects to receive the same number of doses, all subjects on the same drug to receive the same doses, the ratio of corresponding doses of the two drugs to be constant over the successive increases, and response variables to be measured only once on each subject at each dose with no missing data allowed. The technique is also applicable when doses are randomly assigned, provided there is no carry-over effect between doses. For each of the J response variates, the relative potency of the test drug with respect to the standard is defined and estimated in the usual way; a 100(1-alpha)% confidence region is then obtained for the vector of the J relative potencies. A procedure is given for testing the equality of some or all of the J relative potencies; an estimator of a common relative potency is obtained by a standard multivariate least squares method. A common relative potency is of interest because the multiple outcome variables are often different indicators of a general physiologic response. The procedures in the paper are illustrated by a simple example concerning the effects of two anesthetics on children.  相似文献   

15.
Considerable heterogeneity exists in the anabolic response to androgen administration; however, the factors that contribute to variation in an individual's anabolic response to androgens remain unknown. We investigated whether testosterone dose and/or any combination of baseline variables, including concentrations of hormones, age, body composition, muscle function, and morphometry or polymorphisms in androgen receptor could explain the variability in anabolic response to testosterone. Fifty-four young men were treated with a long-acting gonadotropin-releasing hormone (GnRH) agonist and one of five doses (25, 50, 125, 300, or 600 mg/wk) of testosterone enanthate (TE) for 20 wk. Anabolic response was defined as a change in whole body fat-free mass (FFM) by dual-energy X-ray absorptiometry (DEXA), appendicular FFM (by DEXA), and thigh muscle volume (by magnetic resonance imaging) during TE treatment. We used univariate and multivariate analysis to identify the subset of baseline measures that best explained the variability in anabolic response to testosterone supplementation. The three-variable model of TE dose, age, and baseline prostate-specific antigen (PSA) level explained 67% of the variance in change in whole body FFM. Change in appendicular FFM was best explained (64% of the variance) by the linear combination of TE dose, baseline PSA, and leg press strength, whereas TE dose, log of the ratio of luteinizing hormone to testosterone concentration, and age explained 66% of the variation in change in thigh muscle volume. The models were further validated by using Ridge analysis and cross-validation in data subsets. Only the model using testosterone dose, age, and PSA was a consistent predictor of change in FFM in subset analyses. The length of CAG tract was only a weak predictor of change in thigh muscle volume and lean body mass. Hence, the anabolic response of healthy, young men to exogenous testosterone administration can largely be predicted by the testosterone dose.  相似文献   

16.
Bushmint (Hyptis suaveolens (L.) Poit.) is one among the world's most noxious weeds. Bushmint is rapidly invading tropical ecosystems across the world, including India, and is major threat to native biodiversity, ecosystems and livelihoods. Knowledge about the likely areas under bushmint invasion has immense importance for taking rapid response and mitigation measures. In the present study, we model the potential invasion range of bushmint in India and investigate prediction capabilities of two popular species distribution models (SDM) viz., MaxEnt (Maximum Entropy) and GARP (Genetic Algorithm for Rule-Set Production). We compiled spatial layers on 22 climatic and non-climatic (soil type and land use land cover) environmental variables at India level and selected least correlated 14 predictor variables. 530 locations of bushmint along with 14 predictor variables were used to predict bushmint distribution using MaxEnt and GARP. We demonstrate the relative contribution of predictor variables and species-environmental linkages in modeling bushmint distribution. A receiver operating characteristic (ROC) curve was used to assess each model's performance and robustness. GARP had a relatively lower area under curve (AUC) score (AUC: 0.75), suggesting its lower ability in discriminating the suitable/unsuitable sites. Relative to GARP, MaxEnt performed better with an AUC value of 0.86. Overall the outputs of MaxEnt and GARP matched in terms of geographic regions predicted as suitable/unsuitable for bushmint in India, however, predictions were closer in the spatial extent in Central India and Western Himalayan foothills compared to North-East India, Chottanagpur and Vidhayans and Deccan Plateau in India.  相似文献   

17.
18.
Consider a general linear model with p -dimensional parameter vector beta and i.i.d. normal errors. Let K(1), ..., K(k ), and L be linearly independent vectors of constants such that L(T)beta not equal 0. We describe exact simultaneous tests for hypotheses that Ki(T)beta/L(T)beta equal specified constants using one-sided and two-sided alternatives, and describe exact simultaneous confidence intervals for these ratios. In the case where the confidence set is a single bounded contiguous set, we describe what we claim are the best possible conservative simultaneous confidence intervals for these ratios - best in that they form the minimum k -dimensional hypercube enclosing the exact simultaneous confidence set. We show that in the case of k = 2, this "box" is defined by the minimum and maximum values for the two ratios in the simultaneous confidence set and that these values are obtained via one of two sources: either from the solutions to each of four systems of equations or at points along the boundary of the simultaneous confidence set where the correlation between two t variables is zero. We then verify that these intervals are narrower than those previously presented in the literature.  相似文献   

19.
Abstract. Empirical ecological response surfaces were derived for eight dominant tree species in the boreal forest region of Canada. Stepwise logistic regression was used to model species dominance as a response to five climatic predictor variables. The predictor variables (annual snowfall, degree-days, absolute minimum temperature, annual soil moisture deficit, and actual evapotranspiration summed over the summer months) influence the response of plants more directly than the annual or monthly measures of temperature and precipitation commonly used in response surface modeling. The response surfaces provided estimates of the probability of species dominance across the spatial extent of North America with a high degree of success. Much of the variation in the probability of dominance is apparently related to the species' individualistic response to climatic constraints within different airmass regions. A forest type classification for the Canadian boreal forest region was derived by a cluster analysis based on the probability estimates. Five major forest types were distinguished by the application of a stopping rule. The predicted forest types showed a high degree of geographic correspondence with the distribution of forest types in the actual vegetation mosaic. The distribution of the predicted types also bears a direct relationship to seasonal airmass dynamics in the boreal forest region.  相似文献   

20.
An important issue in dose finding is whether a further dose increment leads to a relevant increase in efficacy. Clinical efficacy should not be considered by point zero null hypotheses. Instead, shifted hypotheses for the difference or the ratio can be used. Because the a priori definition of a relevance threshold is frequently difficult, confidence intervals should be used for a posteriori interpretation. Sample size estimation – a‐priori or by adaptive interim analysis‐ is inherent, because the effective dose steps are arbitrary in un‐designed studies. For simultaneous confidence intervals without order restriction the exact distributions under the null and the alternative hypothesis is proposed for the general unbalanced one‐way design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号