首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Predictions of lung cancer incidence and mortality are necessary for planning public health programs and clinical services. It is proposed that generalized additive models (GAMs) are practical for cancer rate prediction. Smooth equivalents for classical age-period, age-cohort, and age-period-cohort models are available using one-dimensional smoothing splines. We also propose using two-dimensional smoothing splines for age and period. Variance estimation can be based on the bootstrap. To assess predictive performance, we compared the models with a Bayesian age-period-cohort model. Model comparison used cross-validation and measures of predictive performance for recent predictions. The models were applied to data from the World Health Organization Mortality Database for females in five countries. Model choice between the age-period-cohort models and the two-dimensional models was equivocal with respect to cross-validation, while the two-dimensional GAMs had very good predictive performance. The Bayesian model performed poorly due to imprecise predictions and the assumption of linearity outside of observed data. In summary, the two-dimensional GAM performed well. The GAMs make the important prediction that female lung cancer rates in these countries will be stable or begin to decline in the future.  相似文献   

2.
The projection of age‐stratified cancer incidence and mortality rates is of great interest due to demographic changes, but also therapeutical and diagnostic developments. Bayesian age–period–cohort (APC) models are well suited for the analysis of such data, but are not yet used in routine practice of epidemiologists. Reasons may include that Bayesian APC models have been criticized to produce too wide prediction intervals. Furthermore, the fitting of Bayesian APC models is usually done using Markov chain Monte Carlo (MCMC), which introduces complex convergence concerns and may be subject to additional technical problems. In this paper we address both concerns, developing efficient MCMC‐free software for routine use in epidemiological applications. We apply Bayesian APC models to annual lung cancer data for females in five different countries, previously analyzed in the literature. To assess the predictive quality, we omit the observations from the last 10 years and compare the projections with the actual observed data based on the absolute error and the continuous ranked probability score. Further, we assess calibration of the one‐step‐ahead predictive distributions. In our application, the probabilistic forecasts obtained by the Bayesian APC model are well calibrated and not too wide. A comparison to projections obtained by a generalized Lee–Carter model is also given. The methodology is implemented in the user‐friendly R‐package BAPC using integrated nested Laplace approximations.  相似文献   

3.
Incidence and mortality figures are needed to get a comprehensive overview of cancer burden. In many countries, cancer mortality figures are routinely recorded by statistical offices, whereas incidence depends on regional cancer registries. However, due to the complexity of updating cancer registries, incidence numbers become available 3 or 4 years later than mortality figures. It is, therefore, necessary to develop reliable procedures to predict cancer incidence at least until the period when mortality data are available. Most of the methods proposed in the literature are designed to predict total cancer (except nonmelanoma skin cancer) or major cancer sites. However, less frequent lethal cancers, such as brain cancer, are generally excluded from predictions because the scarce number of cases makes it difficult to use univariate models. Our proposal comes to fill this gap and consists of modeling jointly incidence and mortality data using spatio-temporal models with spatial and age shared components. This approach allows for predicting lethal cancers improving the performance of individual models when data are scarce by taking advantage of the high correlation between incidence and mortality. A fully Bayesian approach based on integrated nested Laplace approximations is considered for model fitting and inference. A validation process is also conducted to assess the performance of alternative models. We use the new proposals to predict brain cancer incidence rates by gender and age groups in the health units of Navarre and Basque Country (Spain) during the period 2005–2008.  相似文献   

4.
BackgroundForecast of disease burden in lung cancer is an important health agenda. One of the main challenges is to predict the evolution of trends in disability-adjusted life year (DALY) of lung cancer so as to anticipate the future burden and to coordinate the supply of sufficient health services and care.MethodsUsing 2004–2013 cancer registry data in Guangzhou, we fitted Bayesian age-period-cohort models with age, period, and cohort effects to analyze trends of lung cancer among women, and then made forecast for DALY of lung cancer until 2030.ResultsDuring 2004–2013, there was an annual average of 10,582 DALYs for lung cancer (15.84% of total DALY). In 2014–2030, DALY is expected to reach 234,752 person-years for lung cancer (12.25% of total DALY), with an annual mean of 13,809 DALYs. Lung cancer crude DALY rate is projected to rise steadily from 257.56 (95% uncertainty interval: 165.97–361.22) in 2014 to 316.99 (219.96–419.41) per 100,000 women in 2030, and the rise is mainly seen in 45–64 years age group. Lung cancer DALY rate remains the highest in the 65–89 years age group.ConclusionsWomen at 65–89 years carry the highest lung cancer burden among other age groups in Guangzhou. The DALY rate of lung cancer is projected to increase most precipitously for the 45–64 years age group. This indicates that concerted efforts are needed to develop adequate cancer services, and to reassess health resources for control and care of lung cancer in these populations.  相似文献   

5.
BackgroundTo examine changes in prostate cancer incidence and mortality rates, and 5-year relative survival, in relation to changes in the rate of prostate specific antigen (PSA) screening tests and the use of radical prostatectomy (RP) in the Australian population.MethodsProstate cancer stage-specific incidence rates, 5-year relative survival and mortality rates were estimated using New South Wales Cancer Registry data. PSA screening test rates and RP/Incidence ratios were estimated from Medicare Benefits Schedule claims data. We used multiple imputation to impute stage for cases with “unknown” stage at diagnosis. Annual percentage changes (APC) in rates were estimated using Joinpoint regression.ResultsTrends in the age-standardized incidence rates for localized disease largely mirrored the trends in PSA screening test rates, with a substantial ‘spike’ in the rates occurring in 1994, followed by a second ‘spike’ in 2008, and then a significant decrease from 2008 to 2015 (APC −6.7, 95% CI −8.2, −5.1). Increasing trends in incidence rates were observed for regional stage from the early 2000s, while decreasing or stable trends were observed for distant stage since 1993. The overall RP/Incidence ratio increased from 1998 to 2003 (APC 9.6, 95% CI 3.8, 15.6), then remained relatively stable to 2015. The overall 5-year relative survival for prostate cancer increased from 58.4% (95% CI: 55.0–61.7%) in 1981–1985 to 91.3% (95% CI: 90.5–92.1%) in 2011–2015. Prostate cancer mortality rates decreased from 1990 onwards (1990–2006: APC −1.7, 95% CI −2.1, −1.2; 2006–2017: APC −3.8, 95% CI −4.4, −3.1).ConclusionsOverall, there was a decrease in the incidence rate of localized prostate cancer after 2008, an increase in survival over time and a decrease in the mortality rate since the 1990s. This seems to indicate that the more conservative use of PSA screening tests in clinical practice since 2008 has not had a negative impact on population-wide prostate cancer outcomes.  相似文献   

6.
The annual percent change (APC) has been used as a measure to describe the trend in the age-adjusted cancer incidence or mortality rate over relatively short time intervals. The yearly data on these age-adjusted rates are available from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The traditional methods to estimate the APC is to fit a linear regression of logarithm of age-adjusted rates on time using the least squares method or the weighted least squares method, and use the estimate of the slope parameter to define the APC as the percent change in the rates between two consecutive years. For comparing the APC for two regions, one uses a t-test which assumes that the two datasets on the logarithm of the age-adjusted rates are independent and normally distributed with a common variance. Two modifications of this test, when there is an overlap between the two regions or between the time intervals for the two datasets have been recently developed. The first modification relaxes the assumption of the independence of the two datasets but still assumes the common variance. The second modification relaxes the assumption of the common variance also, but assumes that the variances of the age-adjusted rates are obtained using Poisson distributions for the mortality or incidence counts. In this paper, a unified approach to the problem of estimating the APC is undertaken by modeling the counts to follow an age-stratified Poisson regression model, and by deriving a corrected Z -test for testing the equality of two APCs. A simulation study is carried out to assess the performance of the test and an application of the test to compare the trends, for a selected number of cancer sites, for two overlapping regions and with varied degree of overlapping time intervals is presented.  相似文献   

7.

Background

Time series models can play an important role in disease prediction. Incidence data can be used to predict the future occurrence of disease events. Developments in modeling approaches provide an opportunity to compare different time series models for predictive power.

Results

We applied ARIMA and Random Forest time series models to incidence data of outbreaks of highly pathogenic avian influenza (H5N1) in Egypt, available through the online EMPRES-I system. We found that the Random Forest model outperformed the ARIMA model in predictive ability. Furthermore, we found that the Random Forest model is effective for predicting outbreaks of H5N1 in Egypt.

Conclusions

Random Forest time series modeling provides enhanced predictive ability over existing time series models for the prediction of infectious disease outbreaks. This result, along with those showing the concordance between bird and human outbreaks (Rabinowitz et al. 2012), provides a new approach to predicting these dangerous outbreaks in bird populations based on existing, freely available data. Our analysis uncovers the time-series structure of outbreak severity for highly pathogenic avain influenza (H5N1) in Egypt.  相似文献   

8.

Background

The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission.

Methods

We gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004–2011) and Bangkok (monthly incidence, 2004–2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques.

Results

The linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an , and a correlation of 0.869 between fitted and observed. In Singapore, the model has an , and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok.

Conclusions

Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings.  相似文献   

9.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

10.
The higher heating value (HHV) is an important property defining the energy content of biomass fuels. A number of proximate and/or ultimate analysis based predominantly linear correlations have been proposed for predicting the HHV of biomass fuels. A scrutiny of the relationships between the constituents of the proximate and ultimate analyses and the corresponding HHVs suggests that all relationships are not linear and thus nonlinear models may be more appropriate. Accordingly, a novel artificial intelligence (AI) formalism, namely genetic programming (GP) has been employed for the first time for developing two biomass HHV prediction models, respectively using the constituents of the proximate and ultimate analyses as the model inputs. The prediction and generalization performance of these models was compared rigorously with the corresponding multilayer perceptron (MLP) neural network based as also currently available high-performing linear and nonlinear HHV models. This comparison reveals that the HHV prediction performance of the GP and MLP models is consistently better than that of their existing linear and/or nonlinear counterparts. Specifically, the GP- and MLP-based models exhibit an excellent overall prediction accuracy and generalization performance with high (>0.95) magnitudes of the coefficient of correlation and low (<4.5 %) magnitudes of mean absolute percentage error in respect of the experimental and model-predicted HHVs. It is also found that the proximate analysis-based GP model has outperformed all the existing high-performing linear biomass HHV prediction models. In the case of ultimate analysis-based HHV models, the MLP model has exhibited best prediction accuracy and generalization performance when compared with the existing linear and nonlinear models. The AI-based models introduced in this paper due to their excellent performance have the potential to replace the existing biomass HHV prediction models.  相似文献   

11.
Longitudinal analysis investigates period (P), often as years. Additional scales of time are age (A) and birth cohort (C) Aim of our study was to use ecological APC analysis for women breast cancer incidence and mortality in Germany. Nation-wide new cases and deaths were obtained from Robert Koch Institute and female population from federal statistics, 1999–2008. Data was stratified into ten 5-years age-groups starting 20–24 years, ten birth cohorts starting 1939–43, and two calendar periods 1999–2003 and 2004–2008. Annual incidence and mortality were calculated: cases to 100,000 women per year. Data was analyzed using glm and apc packages of R. Breast cancer incidence and mortality increased with age. Secular rise in breast cancer incidence and decline in mortality was observed for period1999-2008. Breast cancer incidence and mortality declined with cohorts; cohorts 1950s showed highest incidence and mortality. Age-cohort best explained incidence and mortality followed by age-period-cohort with overall declining trends. Declining age-cohort mortality could be probable. Declining age-cohort incidence would require future biological explanations or rendered statistical artefact. Cohorts 1949–1958 could be unique in having highest incidence and mortality in recent time or future period associations could emerge relatively stronger to cohort to provide additional explanation of temporal change over cohorts.  相似文献   

12.
13.
Objectives: To compare the trends in prostate cancer incidence, treatment with curative intent and mortality across regions and counties in Norway, and to consider changes in incidence (an indicator for early diagnosis) and treatment with curative intent as explanatory factors for the decreasing prostate cancer mortality rates. Patients and methods: Prostate cancer incidence and mortality data (1980–2007) alongside treatment data (1987–2005) were obtained from the national, population-based Cancer Registry of Norway. Joinpoint regression models were fitted to age-adjusted incidence, treatment and mortality rates to identify linear changes in the trends. Results: Both age-adjusted incidence rates and rates of curative treatment of prostate cancer increased significantly in all five regions of Norway since the early 1990s. There was a strong positive correlation between increasing incidence and increasing use of curative treatment. The frequency of curative treatment in Western Norway was almost threefold that in the Northern and Central regions around year 2000. Subsequently, the regional trends converged and only minor differences in prostate cancer incidence and use of curative treatment were observed by 2005. The declines in mortality were observed earliest in the regions with the highest incidence and the most frequent use of curative treatment, while the largest decreases in mortality were found in counties where the largest increases in curative treatment were observed. Conclusions: The elucidation of the prostate cancer mortality trends is hindered by an inability to tease out the potential effects of early treatment from the more general impact of improved and more active treatment. However, it is likely that both sets of intervention have contributed to the decline in prostate cancer mortality in Norway since 1996.  相似文献   

14.
15.
MIXED MODEL APPROACHES FOR ESTIMATING GENETIC VARIANCES AND COVARIANCES   总被引:62,自引:4,他引:58  
The limitations of methods for analysis of variance(ANOVA)in estimating genetic variances are discussed. Among the three methods(maximum likelihood ML, restricted maximum likelihood REML, and minimum norm quadratic unbiased estimation MINQUE)for mixed linear models, MINQUE method is presented with formulae for estimating variance components and covariances components and for predicting genetic effects. Several genetic models, which cannot be appropriately analyzed by ANOVA methods, are introduced in forms of mixed linear models. Genetic models with independent random effects can be analyzed by MINQUE(1)method whieh is a MINQUE method with all prior values setting 1. MINQUE(1)method can give unbiased estimation for variance components and covariance components, and linear unbiased prediction (LUP) for genetic effects. There are more complicate genetic models for plant seeds which involve correlated random effects. MINQUE(0/1)method, which is a MINQUE method with all prior covariances setting 0 and all prior variances setting 1, is suitable for estimating variance and covariance components in these models. Mixed model approaches have advantage over ANOVA methods for the capacity of analyzing unbalanced data and complicated models. Some problems about estimation and hypothesis test by MINQUE method are discussed.  相似文献   

16.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

17.
Accurate prediction of the phenotypic performance of a hybrid plant based on the molecular fingerprints of its parents should lead to a more cost-effective breeding programme as it allows to reduce the number of expensive field evaluations. The construction of a reliable prediction model requires a representative sample of hybrids for which both molecular and phenotypic information are accessible. This phenotypic information is usually readily available as typical breeding programmes test numerous new hybrids in multi-location field trials on a yearly basis. Earlier studies indicated that a linear mixed model analysis of this typically unbalanced phenotypic data allows to construct ɛ-insensitive support vector machine regression and best linear prediction models for predicting the performance of single-cross maize hybrids. We compare these prediction methods using different subsets of the phenotypic and marker data of a commercial maize breeding programme and evaluate the resulting prediction accuracies by means of a specifically designed field experiment. This balanced field trial allows to assess the reliability of the cross-validation prediction accuracies reported here and in earlier studies. The limits of the predictive capabilities of both prediction methods are further examined by reducing the number of training hybrids and the size of the molecular fingerprints. The results indicate a considerable discrepancy between prediction accuracies obtained by cross-validation procedures and those obtained by correlating the predictions with the results of a validation field trial. The prediction accuracy of best linear prediction was less sensitive to a reduction of the number of training examples compared with that of support vector machine regression. The latter was, however, better at predicting hybrid performance when the size of the molecular fingerprints was reduced, especially if the initial set of markers had a low information content.  相似文献   

18.
Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR-Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well-defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR-Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source .  相似文献   

19.
ARIMA与SVM组合模型在害虫预测中的应用   总被引:2,自引:0,他引:2  
向昌盛  周子英 《昆虫学报》2010,53(9):1055-1060
害虫发生是一种复杂、 动态时间序列数据, 单一预测模型都是基于线性或非线性数据, 不能同时捕捉害虫发生的线性和非线性规律, 很难达到理想的预测精度。本研究首先采用差分自回归移动平均模型对昆虫发生时间序列进行线性建模, 然后采用支持向量机对非线性部分进行建模, 最后得到两种模型的组合预测结果。将组合模型应用到松毛虫Dendrolimus punctatus发生面积的预测, 实验结果表明组合模型的预测精度明显优于单一模型, 发挥了两种模型各自的优势。组合模型是一种切实可行的害虫预测预报方法。  相似文献   

20.
Teng S  Luo H  Wang L 《Amino acids》2012,43(1):447-455
Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO (http://bioinfo.ggc.org/seesumo/), for sequence-based prediction of protein sumoylation sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号