首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Classification and regression tree (CART) modelling was used to determine infectious hypodermal and haematopoietic necrosis virus (IHHNV) resistance and susceptibility in Penaeus stylirostris. In a previous study, eight random amplified polymorphic DNA (RAPD) markers and viral load values using real-time quantitative PCR were obtained and used as the training data set in order to create numerous regression tree models. Specifically, the genetic markers were used as categorical predictor variables and viral load values as the dependent response variable. To determine which model has the highest predictive accuracy for future samples, RAPD fingerprint data was generated from new Penaues stylirostris IHHNV resistant and susceptible individuals and used to test the regression models. The best performing tree was a four terminal node tree with three genetic markers as significant variables. Marker-assisted breeding practices may benefit from the creation of regression tree models that apply genetic markers as predictive factors. To our knowledge this is the first study to use RAPD markers as predictors within a CART prediction model to determine viral susceptibility.  相似文献   

2.
Latent class regression on latent factors   总被引:1,自引:0,他引:1  
In the research of public health, psychology, and social sciences, many research questions investigate the relationship between a categorical outcome variable and continuous predictor variables. The focus of this paper is to develop a model to build this relationship when both the categorical outcome and the predictor variables are latent (i.e. not observable directly). This model extends the latent class regression model so that it can include regression on latent predictors. Maximum likelihood estimation is used and two numerical methods for performing it are described: the Monte Carlo expectation and maximization algorithm and Gaussian quadrature followed by quasi-Newton algorithm. A simulation study is carried out to examine the behavior of the model under different scenarios. A data example involving adolescent health is used for demonstration where the latent classes of eating disorders risk are predicted by the latent factor body satisfaction.  相似文献   

3.
ABSTRACT Ecologists often develop complex regression models that include multiple categorical and continuous variables, interactions among predictors, and nonlinear relationships between the response and predictor variables. Nomograms, which are graphical devices for presenting mathematical functions and calculating output values, can aid biologists in interpreting and presenting these complex models. To illustrate benefits of nomograms, we developed a logistic regression model of elk (Cervus elaphus) resource selection. With this model, we demonstrated how a nomogram helps scientists and managers interpret interactions among variables, compare the relative biological importance of variables, and examine predicted shapes of relationships (e.g., linear vs. nonlinear) between response and predictor variables. Although our example focused on logistic regression, nomograms are equally useful for other linear and nonlinear models. Regardless of the approach used for model development, nomograms and other graphical summaries can help scientists and managers develop, interpret, and apply statistical models.  相似文献   

4.
Larsen K 《Biometrics》2004,60(1):85-92
Multiple categorical variables are commonly used in medical and epidemiological research to measure specific aspects of human health and functioning. To analyze such data, models have been developed considering these categorical variables as imperfect indicators of an individual's "true" status of health or functioning. In this article, the latent class regression model is used to model the relationship between covariates, a latent class variable (the unobserved status of health or functioning), and the observed indicators (e.g., variables from a questionnaire). The Cox model is extended to encompass a latent class variable as predictor of time-to-event, while using information about latent class membership available from multiple categorical indicators. The expectation-maximization (EM) algorithm is employed to obtain maximum likelihood estimates, and standard errors are calculated based on the profile likelihood, treating the nonparametric baseline hazard as a nuisance parameter. A sampling-based method for model checking is proposed. It allows for graphical investigation of the assumption of proportional hazards across latent classes. It may also be used for checking other model assumptions, such as no additional effect of the observed indicators given latent class. The usefulness of the model framework and the proposed techniques are illustrated in an analysis of data from the Women's Health and Aging Study concerning the effect of severe mobility disability on time-to-death for elderly women.  相似文献   

5.
Using four physical characteristics of 394 adult male individuals in the Sichuan Province of China — stature, length of thigh, length of leg and length of foot, a series of linear regression equations and ternary regression equations have been established. Meanwhile three stepwise regression equations also have been established including a quarternary regression equation. Generally speaking, the established multiple equations in this study should be applied as much as possible, where there are two or three independent variables, because they predict more effectively in the individual identification of forensic practice.  相似文献   

6.
The effect of four operating variables (enzyme concentration, substrate concentration, flow rate, and reaction volume) on the performance of CSTR-hollow fiber membrane reactor was studied for the continuous hydrolysis of a soy protein isolate using Pronase. Based on a residence time distribution study, the reactor system was modeled as an ideal CSTR in combination with the Michaelis-Menten equation of enzyme kinetics. This kinetic model correlated conversion with a space-time parameter modified to include all four independent variables. An empirical model based on curvilinear regression analysis was also developed. Both models predicted conversion fairly well, although the kinetic model slightly underpredicts at high conversion.  相似文献   

7.
An algorithm for model selection in discrimination with categorical variables is presented. It is based on four models applied hierarchically and linked with a build-up procedure of feature-selection. The choice of models and features is ensured by a consequent cross-validation. Results of an application in medical diagnostics are described.  相似文献   

8.
Data from the first wave of the Irish Longitudinal Study on Ageing are used to examine the relationship between fatness and obesity and employment status among older Irish adults. Employment status is regressed on one of the following measures of fatness: BMI and waist circumference entered linearly as continuous variables and obesity as a categorical variable defined using both BMI and waist circumference. Controls for demographic and socioeconomic characteristics, socioeconomic characteristics in childhood and physical, mental and behavioural health are also included. The regression results for women indicate that all measures of fatness are negatively associated with the probability of being employed and that the employment elasticity associated with waist circumference is larger than the elasticity associated with BMI. The results for men indicate that employment is not significantly associated with BMI and waist circumference when these are entered linearly in the regression, but it is significantly and negatively associated with obesity defined either using BMI or waist circumference as categorical variables. The results also indicate that the negative association between obesity and employment status is larger among women. For example, the probability of being employed for the obese category defined using BMI is around 8 percentage points lower for women and 5 percentage points lower for men.  相似文献   

9.
The importance of micro mammals from many points of view, mainly with an ecological approach was stressed. The study of the spatial-temporal distribution of parasites in their hosts may be carried out in several ways. Tests done in collaboration with the Parasitology Laboratory in the Faculty of Chemistry at the University of Barcelona, the NRCS and the Department of Ecological Studies in Cosenza, have contributed to an understanding of the Helminth communities as relating to several intrinsic variables of microteriofauna as well as extrinsic ones, particularly those concerning environment, climate and season. These comparisons were made using statistical means which compared the categorical and dichotomic variables which would highlight risk differences and their effects on the system. Quantitative dependent variables were also considered in relation to the aforementioned qualitative variables. One of the models studied is that of logistic regression, which estimates the function of regression, connecting the probability of the presence of Helminth as a dependent variable, with biological and ecological parameters (independent variables) such as: gender, age, season of capture, bioclimate, biotope and trapping section.  相似文献   

10.
The Government of Madagascar plans to increase marine protected area coverage by over one million hectares. To assist this process, we compare four methods for marine spatial planning of Madagascar's west coast. Input data for each method was drawn from the same variables: fishing pressure, exposure to climate change, and biodiversity (habitats, species distributions, biological richness, and biodiversity value). The first method compares visual color classifications of primary variables, the second uses binary combinations of these variables to produce a categorical classification of management actions, the third is a target-based optimization using Marxan, and the fourth is conservation ranking with Zonation. We present results from each method, and compare the latter three approaches for spatial coverage, biodiversity representation, fishing cost and persistence probability. All results included large areas in the north, central, and southern parts of western Madagascar. Achieving 30% representation targets with Marxan required twice the fish catch loss than the categorical method. The categorical classification and Zonation do not consider targets for conservation features. However, when we reduced Marxan targets to 16.3%, matching the representation level of the "strict protection" class of the categorical result, the methods show similar catch losses. The management category portfolio has complete coverage, and presents several management recommendations including strict protection. Zonation produces rapid conservation rankings across large, diverse datasets. Marxan is useful for identifying strict protected areas that meet representation targets, and minimize exposure probabilities for conservation features at low economic cost. We show that methods based on Zonation and a simple combination of variables can produce results comparable to Marxan for species representation and catch losses, demonstrating the value of comparing alternative approaches during initial stages of the planning process. Choosing an appropriate approach ultimately depends on scientific and political factors including representation targets, likelihood of adoption, and persistence goals.  相似文献   

11.
Abstract We present regression models of species richness for total tree species, two growth forms, rainforest trees (broadleaf evergreens) and eucalypts (sclerophylls), and two large subgenera of Eucalyptus. The correlative models are based on a data set of 166 tree species from 7208 plots in an area of southeastern New South Wales, Australia. Eight environmental variables are used to model the patterns of species richness, four continuous variables (mean annual temperature, rainfall, radiation and plot size), plus four categorical factors (topographic position, lithology, soil nutrient level and rainfall seasonality). Generalized linear modelling with curvilinear and interaction terms, is used to derive the models. Each model shows a significant and differing response to the environmental predictors. Maximum species richness of eucalypts occurs at high temperatures, and intermediate rainfall and radiation conditions on ridges with aseasonal rainfall and intermediate nutrient levels. Maximum richness of rainforest species occurs at high temperatures, intermediate rainfall and low radiation in gullies with summer rainfall and high nutrient levels. The eucalypt subgenera models differ in ways consistent with experimental studies of habitat preferences of the subgenera. Curvilinear and interaction terms are necessary for adequate modelling. Patterns of richness vary widely with taxonomic rank and growth form. Any theories of species diversity should be consistent with these correlative models. The models are consistent with an available energy hypothesis based on actual evapotranspiration. We conclude that studies of species richness patterns should include local (e.g. soil nutrients, topographic position) and regional (e.g. mean annual temperature, annual rainfall) environmental variables before invoking concepts such as niche saturation.  相似文献   

12.
Bilder CR  Loughin TM 《Biometrics》2004,60(1):241-248
Questions that ask respondents to "choose all that apply" from a set of items occur frequently in surveys. Categorical variables that summarize this type of survey data are called both pick any/c variables and multiple-response categorical variables. It is often of interest to test for independence between two categorical variables. When both categorical variables can have multiple responses, traditional Pearson chi-square tests for independence should not be used because of the within-subject dependence among responses. An intuitively constructed version of the Pearson statistic is proposed to perform the test using bootstrap procedures to approximate its sampling distribution. First- and second-order adjustments to the proposed statistic are given in order to use a chi-square distribution approximation. A Bonferroni adjustment is proposed to perform the test when the joint set of responses for individual subjects is unavailable. Simulations show that the bootstrap procedures hold the correct size more consistently than the other procedures.  相似文献   

13.
Structural equation models (SEMs) of a recursive type with heterogeneous structural coefficients were used to explore biological relationships between gestation length (GL), calving difficulty (CD), and perinatal mortality, also known as stillbirth (SB), in cattle, with the last two traits having categorical expression. An acyclic model was assumed, where recursive effects existed from the GL phenotype to the liabilities (latent variables) to CD and SB and from the liability to CD to that of SB considering four periods regarding GL. The data contained GL, CD, and SB records from 90,393 primiparous cows, sired by 1122 bulls, distributed over 935 herd-calving year classes. Low genetic correlations between GL and the other calving traits were found, whereas the liabilities to CD and SB were high and positively correlated, genetically. The model indicated that gestations of approximately 274 days of length (3 days shorter than the average) would lead to the lowest CD and SB and confirmed the existence of an intermediate optimum of GL with respect to these traits.  相似文献   

14.
Differential microhabitat use may be beneficial to achieving fitness in seasonally variable environmental conditions. To explore whether the microhabitat use of the nocturnal Schlegel’s Japanese gecko, Gekko japonicus, varies seasonally and depends on juvenile, male, and female reproductive groups, we investigated five categorical and five quantitative measure variables of microhabitat use in a wild population both in spring and summer. Most geckos were found on white, vertical planes of concrete and plastered brick walls. None of the categorical variables (type of location, substrate, substrate color, light source, and refuge) significantly differed according to season or group, while substrate temperature and irradiance at the location where geckos were observed and the distance from the nearest potential refuge were significantly greater in summer than in spring. The quantitative measure variables did not differ among the reproductive groups. These results suggest that G. japonicus seasonally adjusts its microhabitat use mainly in terms of quantitative measure variables rather than categorical variables.  相似文献   

15.
Phosgene has been a long-term subject of toxicological research due to its widespread use, high toxicity, and status as a model of chemically induced lung injury. To take advantage of the abundant data set for the acute inhalation toxicity of phosgene, methods for exposure-response analysis that use more data than the traditional no-observed-adverse-effect level approach were used to perform an exposure-response assessment for phosgene. Categorical regression is particularly useful for acute exposures due to the ability to combine studies of various exposure durations, and thus provide estimates of effect severity for a range of both exposure concentrations and durations. Results from the categorical regression approach were compared to those from parametric curve fitting models (i.e., benchmark concentration models) that make use of information from an entire dose-response, but only for one exposure duration. While categorical regression analysis provided results that were comparable to benchmark concentration results, categorical regression provides an improvement over that technique by accounting for the effects of both exposure concentration and duration on response. The other major advantage afforded by categorical regression is the ability to combine studies, allowing the quantitative use of a larger data set, which increases confidence in the final result.  相似文献   

16.
Questions: How well do GIS‐derived categorical variables (e.g., vegetation, soils, geology, elevation, geography, and physiography) separate plots based on community composition? How does the ability to distinguish plots by community composition vary with spatial scale, specifically number of patch types, patch size and spatial correlation? Both these questions bear on the effective use of stratifying variables in landscape ecology. Location: Arctic tundra; Bering Land Bridge National Preserve, northwestern Alaska, USA. Methods: We evaluated the strength of numerous alternative stratifying variables using the multi‐response permutation procedure (MRPP). We also created groups based on lichen community composition, using cluster analyses, and evaluated the relationship between these groups and groupings within categorical variables using Mantel tests. Each test represents different measures of community separation, which were then evaluated with respect to each variable's spatial characteristics. Results: We found each categorical variable derived from GIS separated lichen communities to some degree. Separation success ranged from strong (Alaska Subsections) to weak (Watersheds and Reindeer Ownership). Lichen community groups derived from cluster analysis demonstrated statistically significant relationships with 13 of the 17 categorical variables. Partialling out effects of spatial distance had little effect on these relationships. Conclusions: Greater number of patch types and larger average patch sizes contribute to optimal success in separating lichen communities; geographic distance did not appear to significantly alter separation success. Group distinctiveness or strength increased with more patch types or groups. Alternatively, congruence between lichen community types derived from cluster analysis and the 17 categorical variables was inversely related to patch size and spatial correlation.  相似文献   

17.
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross‐tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log–linear models. The structure of a log–linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower‐order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high‐dimensional regression or classification procedures because, in addition to a high‐dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high‐dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower‐dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log–linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio‐medical problem in cancer research.  相似文献   

18.
S. Bharati  M. Pal  S. Shome  P. Roy  P. Dhara  P. Bharati 《HOMO》2017,68(6):487-494
Obesity is fast becoming an epidemic among the urban children and it has its adverse effect on the status of health even during adulthood. In this paper an attempt is made to assess the percentage of obesity among 6–10 year children and assess the effect of different socio-economic variables and TV watching on childhood obesity. We restricted our study to primary school-going children who attended classes I–IV. The sample consisted of 5216 children from 20 different Bengali medium and English medium schools in Kolkata. Categorical logistic regression of obesity on the socio-economic factors namely type of medium school, religion, parent's education, duration of television watching etc., has been carried out. The categorical logistic regression shows the significant effect of some of the socio-economic or demographic variables including the duration of television watching on obesity. We have seen a positive association between obesity and TV watching and also between obesity and consumption of fast food. This calls for making the parents aware and taking action as early as possible.  相似文献   

19.
对宣木瓜总皂苷的超声辅助提取工艺优化进行研究.在单因素试验基础上,选择提取时间、温度、乙醇浓度和料液比为自变量,以宣木瓜总皂苷得率为响应值,采用Central Composite Design试验设计方法,研究各自变量及其交互作用对宣木瓜总皂苷提取率的影响.利用Design Expert软件得到回归方程的预测模型并进行响应面分析,确定超声辅助提取宣木瓜总皂苷的最佳条件为时间61.69 min,温度62.34℃,乙醇浓度70.49%,料液比1∶30.57 g/mL,在此条件下,总皂苷提取率达到1.55%.验证实验表明,所得模型方程能较好地预测实验结果.  相似文献   

20.
The purpose of this study was to determine whether a logistic regression model for the diagnosis of carpal tunnel syndrome (CTS) could be developed. Forty-eight variables were initially identified, for the 28 CTS and 34 non-CTS subjects, including 28 measures of nerve function, 6 anatomical measurements, 8 variables relating to disease symptoms, and 6 variables relating to physical attributes. An a priori clustering procedure was used to establish groups for the principal components analyses. The first principal component of each cluster was then used in a backward, stepwise logistic regression analysis. The best combination of candidate variables, as identified by the regression equation, was Raynaud's symptoms and median nerve motor function. The results of this study indicate that a model for CTS can be generated from a set of variables and that a linear combination of variables representing nerve function is closely associated with conduction decrements resulting from CTS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号