首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

In modern biomedical research of complex diseases, a large number of demographic and clinical variables, herein called phenomic data, are often collected and missing values (MVs) are inevitable in the data collection process. Since many downstream statistical and bioinformatics methods require complete data matrix, imputation is a common and practical solution. In high-throughput experiments such as microarray experiments, continuous intensities are measured and many mature missing value imputation methods have been developed and widely applied. Numerous methods for missing data imputation of microarray data have been developed. Large phenomic data, however, contain continuous, nominal, binary and ordinal data types, which void application of most methods. Though several methods have been developed in the past few years, not a single complete guideline is proposed with respect to phenomic missing data imputation.

Results

In this paper, we investigated existing imputation methods for phenomic data, proposed a self-training selection (STS) scheme to select the best imputation method and provide a practical guideline for general applications. We introduced a novel concept of “imputability measure” (IM) to identify missing values that are fundamentally inadequate to impute. In addition, we also developed four variations of K-nearest-neighbor (KNN) methods and compared with two existing methods, multivariate imputation by chained equations (MICE) and missForest. The four variations are imputation by variables (KNN-V), by subjects (KNN-S), their weighted hybrid (KNN-H) and an adaptively weighted hybrid (KNN-A). We performed simulations and applied different imputation methods and the STS scheme to three lung disease phenomic datasets to evaluate the methods. An R package “phenomeImpute” is made publicly available.

Conclusions

Simulations and applications to real datasets showed that MICE often did not perform well; KNN-A, KNN-H and random forest were among the top performers although no method universally performed the best. Imputation of missing values with low imputability measures increased imputation errors greatly and could potentially deteriorate downstream analyses. The STS scheme was accurate in selecting the optimal method by evaluating methods in a second layer of missingness simulation. All source files for the simulation and the real data analyses are available on the author’s publication website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0346-6) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

Using current climate models, regional-scale changes for Florida over the next 100 years are predicted to include warming over terrestrial areas and very likely increases in the number of high temperature extremes. No uniform definition of a heat wave exists. Most past research on heat waves has focused on evaluating the aftermath of known heat waves, with minimal consideration of missing exposure information.

Objectives

To identify and discuss methods of handling and imputing missing weather data and how those methods can affect identified periods of extreme heat in Florida.

Methods

In addition to ignoring missing data, temporal, spatial, and spatio-temporal models are described and utilized to impute missing historical weather data from 1973 to 2012 from 43 Florida weather monitors. Calculated thresholds are used to define periods of extreme heat across Florida.

Results

Modeling of missing data and imputing missing values can affect the identified periods of extreme heat, through the missing data itself or through the computed thresholds. The differences observed are related to the amount of missingness during June, July, and August, the warmest months of the warm season (April through September).

Conclusions

Missing data considerations are important when defining periods of extreme heat. Spatio-temporal methods are recommended for data imputation. A heat wave definition that incorporates information from all monitors is advised.  相似文献   

3.

Objective

The study aim was to evaluate the performance of a novel simultaneous testing model, based on the Finnish Diabetes Risk Score (FINDRISC) and HbA1c, in detecting undiagnosed diabetes and pre-diabetes in Americans.

Research Design and Methods

This cross-sectional analysis included 3,886 men and women (≥ 20 years) without known diabetes from the U.S. National Health and Nutrition Examination Survey (NHANES) 2005-2010. The FINDRISC was developed based on eight variables (age, BMI, waist circumference, use of antihypertensive drug, history of high blood glucose, family history of diabetes, daily physical activity and fruit & vegetable intake). The sensitivity, specificity, and the receiver operating characteristic (ROC) curve of the testing model were calculated for undiagnosed diabetes and pre-diabetes, determined by oral glucose tolerance test (OGTT).

Results

The prevalence of undiagnosed diabetes was 7.0% and 43.1% for pre-diabetes (27.7% for isolated impaired fasting glucose (IFG), 5.1% for impaired glucose tolerance (IGT), and 10.3% for having both IFG and IGT). The sensitivity and specificity of using the HbA1c alone was 24.2% and 99.6% for diabetes (cutoff of ≥6.5%), and 35.2% and 86.4% for pre-diabetes (cutoff of ≥5.7%). The sensitivity and specificity of using the FINDRISC alone (cutoff of ≥9) was 79.1% and 48.6% for diabetes and 60.2% and 61.4% for pre-diabetes. Using the simultaneous testing model with a combination of FINDRISC and HbA1c improved the sensitivity to 84.2% for diabetes and 74.2% for pre-diabetes. The specificity for the simultaneous testing model was 48.4% of diabetes and 53.0% for pre-diabetes.

Conclusions

This simultaneous testing model is a practical and valid tool in diabetes screening in the general U.S. population.  相似文献   

4.

Background

Missing data within the comprehensive geriatric assessment of the interRAI suite of assessment instruments potentially imply the under-detection of conditions that require care as well as the risk of biased statistical results. Impaired oral health in older individuals has to be registered accurately as it causes pain and discomfort and is related to the general health status.

Objective

This study was based on interRAI-Home Care (HC) baseline data from 7590 subjects (mean age 81.2 years, SD 6.9) in Belgium. It was investigated if missingness of the oral health-related items was associated with selected variables of general health. It was also determined if multiple imputation of missing data affected the associations between oral and general health.

Materials and Methods

Multivariable logistic regression was used to determine if the prevalence of missingness in the oral health-related variables was associated with activities of daily life (ADLH), cognitive performance (CPS2) and depression (DRS). Associations between oral health and ADLH, CPS2 and DRS were determined, with missing data treated by 1. the complete-case technique and 2. by multiple imputation, and results were compared.

Results

The individual oral health-related variables had a similar proportion of missing values, ranging from 16.3% to 17.2%. The prevalence of missing data in all oral health-related variables was significantly associated with symptoms of depression (dental prosthesis use OR 1.66, CI 1.41–1.95; damaged teeth OR 1.74, CI 1.48–2.04; chewing problems OR 1.74, CI 1.47–2.05; dry mouth OR 1.65, CI 1.40–1.94). Missingness in damaged teeth (OR 1.27, CI 1.08–1.48), chewing problems (OR 1.22, CI 1.04–1.44) and dry mouth (OR 1.23, CI 1.05–1.44) occurred more frequently in cognitively impaired subjects. ADLH was not associated with the prevalence of missing data. When comparing the complete-case technique with the multiple imputation approach, nearly identical odds ratios characterized the associations between oral and general health.

Conclusion

Cognitively impaired and depressive individuals had a higher risk of missing oral health-related information. Associations between oral health and ADLH, CPS2 and DRS were not influenced by multiple imputation of missing data. Further research should concentrate on the mechanisms that mediate the occurrence of missingness to develop preventative strategies.  相似文献   

5.

Background

Randomised controlled trials (RCTs) are perceived as the gold-standard method for evaluating healthcare interventions, and increasingly include quality of life (QoL) measures. The observed results are susceptible to bias if a substantial proportion of outcome data are missing. The review aimed to determine whether imputation was used to deal with missing QoL outcomes.

Methods

A random selection of 285 RCTs published during 2005/6 in the British Medical Journal, Lancet, New England Journal of Medicine and Journal of American Medical Association were identified.

Results

QoL outcomes were reported in 61 (21%) trials. Six (10%) reported having no missing data, 20 (33%) reported ≤ 10% missing, eleven (18%) 11%–20% missing, and eleven (18%) reported >20% missing. Missingness was unclear in 13 (21%). Missing data were imputed in 19 (31%) of the 61 trials. Imputation was part of the primary analysis in 13 trials, but a sensitivity analysis in six. Last value carried forward was used in 12 trials and multiple imputation in two. Following imputation, the most common analysis method was analysis of covariance (10 trials).

Conclusion

The majority of studies did not impute missing data and carried out a complete-case analysis. For those studies that did impute missing data, researchers tended to prefer simpler methods of imputation, despite more sophisticated methods being available.
  相似文献   

6.

Background

Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50 000 (moderate-density) to 777 000 (high-density) SNPs (single nucleotide polymorphisms).

Methods

The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE.

Results

Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%).

Conclusions

For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly.  相似文献   

7.

Background

Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

Methods

The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

Results

After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

Conclusion

The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.  相似文献   

8.

Background

Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation.

Methods

An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis.

Results

Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored.

Conclusions

The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations.  相似文献   

9.

Background

In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias. In anti-obesity drug trials, many data are usually missing, and the most used imputation method is last observation carried forward (LOCF). LOCF is generally considered conservative, but there are more reliable methods such as multiple imputation (MI).

Objectives

To compare four different methods of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate.

Methods

We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI.

Results

561 participants were randomised. Compared to placebo, there was a significantly greater weight loss with topiramate in all analyses: 9.5 kg (SE 1.17) in the complete case analysis (N = 86), 6.8 kg (SE 0.66) using LOCF (N = 561), 6.4 kg (SE 0.90) using MI (N = 561) and 1.5 kg (SE 0.28) using BOCF (N = 561).

Conclusions

The different imputation methods gave very different results. Contrary to widely stated claims, LOCF did not produce a conservative (i.e., lower) efficacy estimate compared to MI. Also, LOCF had a lower SE than MI.  相似文献   

10.

Background

Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds.

Methods

Accuracy of genotype imputation in Nelore cattle was evaluated using different LD chips, imputation software and sets of animals. Twelve commercial and customized LD chips with densities ranging from 7 K to 75 K were tested. Customized LD chips were virtually designed taking into account minor allele frequency, linkage disequilibrium and distance between markers. Software programs FImpute and BEAGLE were applied to impute genotypes. From 995 bulls and 1247 cows that were genotyped with the Illumina® BovineHD chip (HD), 793 sires composed the reference set, and the remaining 202 younger sires and all the cows composed two separate validation sets for which genotypes were masked except for the SNPs of the LD chip that were to be tested.

Results

Imputation accuracy increased with the SNP density of the LD chip. However, the gain in accuracy with LD chips with more than 15 K SNPs was relatively small because accuracy was already high at this density. Commercial and customized LD chips with equivalent densities presented similar results. FImpute outperformed BEAGLE for all LD chips and validation sets. Regardless of the imputation software used, accuracy tended to increase as the relatedness between imputed and reference animals increased, especially for the 7 K chip.

Conclusions

If the Illumina® BovineHD is considered as the target chip for genomic applications in the Nelore breed, cost-effectiveness can be improved by genotyping part of the animals with a chip containing around 15 K useful SNPs and imputing their high-density missing genotypes with FImpute.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0069-1) contains supplementary material, which is available to authorized users.  相似文献   

11.
《PloS one》2015,10(11)

Objective

Risk models and scores have been developed to predict incidence of type 2 diabetes in Western populations, but their performance may differ when applied to non-Western populations. We developed and validated a risk score for predicting 3-year incidence of type 2 diabetes in a Japanese population.

Methods

Participants were 37,416 men and women, aged 30 or older, who received periodic health checkup in 2008–2009 in eight companies. Diabetes was defined as fasting plasma glucose (FPG) ≥126 mg/dl, random plasma glucose ≥200 mg/dl, glycated hemoglobin (HbA1c) ≥6.5%, or receiving medical treatment for diabetes. Risk scores on non-invasive and invasive models including FPG and HbA1c were developed using logistic regression in a derivation cohort and validated in the remaining cohort.

Results

The area under the curve (AUC) for the non-invasive model including age, sex, body mass index, waist circumference, hypertension, and smoking status was 0.717 (95% CI, 0.703–0.731). In the invasive model in which both FPG and HbA1c were added to the non-invasive model, AUC was increased to 0.893 (95% CI, 0.883–0.902). When the risk scores were applied to the validation cohort, AUCs (95% CI) for the non-invasive and invasive model were 0.734 (0.715–0.753) and 0.882 (0.868–0.895), respectively. Participants with a non-invasive score of ≥15 and invasive score of ≥19 were projected to have >20% and >50% risk, respectively, of developing type 2 diabetes within 3 years.

Conclusions

The simple risk score of the non-invasive model might be useful for predicting incident type 2 diabetes, and its predictive performance may be markedly improved by incorporating FPG and HbA1c.  相似文献   

12.

Objective

Type 2 diabetes has a long pre clinical asymptomatic phase. Early detection may delay or arrest disease progression. The Diabetes Mellitus and Vascular health initiative (DMVhi) was initiated as a prospective longitudinal cohort study on the prevalence of undiagnosed Type 2 diabetes and prediabetes, diabetes risk and cardiovascular risk in a cohort of Irish adults aged 45-75 years.

Research Design and Methods

Members of the largest Irish private health insurance provider aged 45 to 75 years were invited to participate in the study. Exclusion criteria: already diagnosed with diabetes or taking oral hypoglycaemic agents. Participants completed a detailed medical questionnaire, had weight, height, waist and hip circumference and blood pressure measured. Fasting blood samples were taken for fasting plasma glucose (FPG). Those with FPG in the impaired fasting glucose (IFG) range had a 75gm oral glucose tolerance test performed.

Results

122,531 subjects were invited to participate. 29,144 (24%) completed the study. The prevalence of undiagnosed diabetes was 1.8%, of impaired fasting glucose (IFG) was 7.1% and of impaired glucose tolerance (IGT) was 2.9%. Dysglycaemia increased among those aged 45-54, 55-64 and 65-75 years in both males (10.6%, 18.5%, 21.7% respectively) and females (4.3%, 8.6%, 10.9% respectively). Undiagnosed T2D, IFG and IGT were all associated with gender, age, blood pressure, BMI, abdominal obesity, family history of diabetes and triglyceride levels. Using FPG as initial screening may underestimate the prevalence of T2D in the study population.

Conclusions

This study is the largest screening study for diabetes and prediabetes in the Irish population. Follow up of this cohort will provide data on progression to diabetes and on cardiovascular outcomes.  相似文献   

13.

Background

The main goal of our study was to investigate the implementation, prospects, and limits of marker imputation for quantitative genetic studies contrasting map-independent and map-dependent algorithms. We used a diversity panel consisting of 372 European elite wheat (Triticum aestivum L.) varieties, which had been genotyped with SNP arrays, and performed intensive simulation studies.

Results

Our results clearly showed that imputation accuracy was substantially higher for map-dependent compared to map-independent methods. The accuracy of marker imputation depended strongly on the linkage disequilibrium between the markers in the reference panel and the markers to be imputed. For the decay of linkage disequilibrium present in European wheat, we concluded that around 45,000 markers are needed for low cost, low-density marker profiling. This will facilitate high imputation accuracy, also for rare alleles. Genomic selection and diversity studies profited only marginally from imputing missing values. In contrast, the power of association mapping increased substantially when missing values were imputed.

Conclusions

Imputing missing values is especially of interest for an economic implementation of association mapping in breeding populations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1366-y) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.  相似文献   

15.

Background

Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives.

Methods

Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population.

Results

The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively.

Conclusions

At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring.  相似文献   

16.

Background

Differentiated thyroid carcinoma (DTC) is associated with an increased mortality. Few studies have constructed predictive models of all-cause mortality with a high discriminating power for patients with this disease that would enable us to determine which patients are more likely to die.

Objective

To construct a predictive model of all-cause mortality at 5, 10, 15 and 20 years for patients diagnosed with and treated surgically for DTC for use as a mobile application.

Design

We undertook a retrospective cohort study using data from 1984 to 2013.

Setting

All patients diagnosed with and treated surgically for DTC at a general university hospital covering a population of around 200,000 inhabitants in Spain.

Participants

The study involved 201 patients diagnosed with and treated surgically for DTC (174, papillary; 27, follicular).

Exposures

Age, gender, town, family history, type of surgery, type of cancer, histological subtype, microcarcinoma, multicentricity, TNM staging system, diagnostic stage, permanent post-operative complications, local and regional tumor persistence, distant metastasis, and radioiodine therapy.

Main outcome measure

All-cause mortality.

Methods

A Cox multivariate regression model was constructed to determine which variables at diagnosis were associated with mortality. Using the model a risk table was constructed based on the sum of all points to estimate the likelihood of death. This was then incorporated into a mobile application.

Results

The mean follow-up was 8.8±6.7 years. All-cause mortality was 12.9% (95% confidence interval [CI]: 8.3–17.6%). Predictive variables: older age, local tumor persistence and distant metastasis. The area under the ROC curve was 0.81 (95% CI: 0.72–0.91, p<0.001).

Conclusion

This study provides a practical clinical tool giving a simple and rapid indication (via a mobile application) of which patients with DTC are at risk of dying in 5, 10, 15 or 20 years. Nonetheless, caution should be exercised until validation studies have corroborated our results.  相似文献   

17.

Background

Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.

Methods

Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.

Results

Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.

Conclusion

In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.  相似文献   

18.

Objective

To investigate associations between retinal microvascular changes and cognitive impairment in newly diagnosed type 2 diabetes mellitus.

Design

Case control study.

Setting

A primary care cohort with newly diagnosed type 2 diabetes mellitus.

Methods

For this analysis, we compared 69 cases with lowest decile scores (for the cohort) on the Modified Telephone Interview for Cognitive Status and 68 controls randomly selected from the remainder of the cohort. Retinal images were rated and the following measures compared between cases and controls: retinal vessel calibre, arterio-venous ratio, retinal fractal dimension, and simple and curvature retinal vessel tortuosity.

Results

Total and venular (but not arteriolar) simple retinal vessel tortuosity levels were significantly higher in cases than controls (t = 2.45, p = 0.015; t = 2.53, p = 0.013 respectively). The associations persisted after adjustment for demographic factors, retinopathy, neuropathy, obesity and blood pressure. There were no other significant differences between cases and controls in retinal measures.

Conclusions

A novel association was found between higher venular tortuosity and cognitive impairment in newly diagnosed type 2 diabetes mellitus. This might be accounted for by factors such as hypoxia, thrombus formation, increased vasoendothelial growth factor release and inflammation affecting both the visible retinal and the unobserved cerebral microvasculature.  相似文献   

19.

Background

Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.

Methods

Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively.

Results

Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP.

Conclusions

Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.  相似文献   

20.

Objectives

To predict in an Australian Aboriginal community, the 10-year absolute risk of type 2 diabetes associated with waist circumference and age on baseline examination.

Method

A sample of 803 diabetes-free adults (82.3% of the age-eligible population) from baseline data of participants collected from 1992 to 1998 were followed-up for up to 20 years till 2012. The Cox-proportional hazard model was used to estimate the effects of waist circumference and other risk factors, including age, smoking and alcohol consumption status, of males and females on prediction of type 2 diabetes, identified through subsequent hospitalisation data during the follow-up period. The Weibull regression model was used to calculate the absolute risk estimates of type 2 diabetes with waist circumference and age as predictors.

Results

Of 803 participants, 110 were recorded as having developed type 2 diabetes, in subsequent hospitalizations over a follow-up of 12633.4 person-years. Waist circumference was strongly associated with subsequent diagnosis of type 2 diabetes with P<0.0001 for both genders and remained statistically significant after adjusting for confounding factors. Hazard ratios of type 2 diabetes associated with 1 standard deviation increase in waist circumference were 1.7 (95%CI 1.3 to 2.2) for males and 2.1 (95%CI 1.7 to 2.6) for females. At 45 years of age with baseline waist circumference of 100 cm, a male had an absolute diabetic risk of 10.9%, while a female had a 14.3% risk of the disease.

Conclusions

The constructed model predicts the 10-year absolute diabetes risk in an Aboriginal Australian community. It is simple and easily understood and will help identify individuals at risk of diabetes in relation to waist circumference values. Our findings on the relationship between waist circumference and diabetes on gender will be useful for clinical consultation, public health education and establishing WC cut-off points for Aboriginal Australians.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号