首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Genomics》2019,111(5):1115-1123
Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.  相似文献   

2.

Background

Chronic kidney disease (CKD) is common, and associated with increased risk of cardiovascular disease and end-stage renal disease, which are potentially preventable through early identification and treatment of individuals at risk. Although risk factors for occurrence and progression of CKD have been identified, their utility for CKD risk stratification through prediction models remains unclear. We critically assessed risk models to predict CKD and its progression, and evaluated their suitability for clinical use.

Methods and Findings

We systematically searched MEDLINE and Embase (1 January 1980 to 20 June 2012). Dual review was conducted to identify studies that reported on the development, validation, or impact assessment of a model constructed to predict the occurrence/presence of CKD or progression to advanced stages. Data were extracted on study characteristics, risk predictors, discrimination, calibration, and reclassification performance of models, as well as validation and impact analyses. We included 26 publications reporting on 30 CKD occurrence prediction risk scores and 17 CKD progression prediction risk scores. The vast majority of CKD risk models had acceptable-to-good discriminatory performance (area under the receiver operating characteristic curve>0.70) in the derivation sample. Calibration was less commonly assessed, but overall was found to be acceptable. Only eight CKD occurrence and five CKD progression risk models have been externally validated, displaying modest-to-acceptable discrimination. Whether novel biomarkers of CKD (circulatory or genetic) can improve prediction largely remains unclear, and impact studies of CKD prediction models have not yet been conducted. Limitations of risk models include the lack of ethnic diversity in derivation samples, and the scarcity of validation studies. The review is limited by the lack of an agreed-on system for rating prediction models, and the difficulty of assessing publication bias.

Conclusions

The development and clinical application of renal risk scores is in its infancy; however, the discriminatory performance of existing tools is acceptable. The effect of using these models in practice is still to be explored. Please see later in the article for the Editors'' Summary  相似文献   

3.
Many existing cohort studies initially designed to investigate disease risk as a function of environmental exposures have collected genomic data in recent years with the objective of testing for gene-environment interaction (G × E) effects. In environmental epidemiology, interest in G × E arises primarily after a significant effect of the environmental exposure has been documented. Cohort studies often collect rich exposure data; as a result, assessing G × E effects in the presence of multiple exposure markers further increases the burden of multiple testing, an issue already present in both genetic and environment health studies. Latent variable (LV) models have been used in environmental epidemiology to reduce dimensionality of the exposure data, gain power by reducing multiplicity issues via condensing exposure data, and avoid collinearity problems due to presence of multiple correlated exposures. We extend the LV framework to characterize gene-environment interaction in presence of multiple correlated exposures and genotype categories. Further, similar to what has been done in case-control G × E studies, we use the assumption of gene-environment (G-E) independence to boost the power of tests for interaction. The consequences of making this assumption, or the issue of how to explicitly model G-E association has not been previously investigated in LV models. We postulate a hierarchy of assumptions about the LV model regarding the different forms of G-E dependence and show that making such assumptions may influence inferential results on the G, E, and G × E parameters. We implement a class of shrinkage estimators to data adaptively trade-off between the most restrictive to most flexible form of G-E dependence assumption and note that such class of compromise estimators can serve as a benchmark of model adequacy in LV models. We demonstrate the methods with an example from the Early Life Exposures in Mexico City to Neuro-Toxicants Study of lead exposure, iron metabolism genes, and birth weight.  相似文献   

4.
For the etiology, progression, and treatment of complex diseases, gene-environment (G-E) interactions have important implications beyond the main G and E effects. G-E interaction analysis can be more challenging with higher dimensionality and need for accommodating the “main effects, interactions” hierarchy. In recent literature, an array of novel methods, many of which are based on the penalization technique, have been developed. In most of these studies, however, the structures of G measurements, for example, the adjacency structure of single nucleotide polymorphisms (SNPs; attributable to their physical adjacency on the chromosomes) and the network structure of gene expressions (attributable to their coordinated biological functions and correlated measurements) have not been well accommodated. In this study, we develop structured G-E interaction analysis, where such structures are accommodated using penalization for both the main G effects and interactions. Penalization is also applied for regularized estimation and selection. The proposed structured interaction analysis can be effectively realized. It is shown to have consistency properties under high-dimensional settings. Simulations and analysis of GENEVA diabetes data with SNP measurements and TCGA melanoma data with gene expression measurements demonstrate its competitive practical performance.  相似文献   

5.
Tang  Baozhen  Dong  Wei  Liang  Pei  Zhou  Xuguo  Gao  Xiwu 《BMC molecular biology》2012,13(1):1-12

Background

RNA interference (RNAi) and antisense strategies provide experimental therapeutic agents for numerous diseases, including polyglutamine (polyQ) disorders caused by CAG repeat expansion. We compared the potential of different oligonucleotide-based strategies for silencing the genes responsible for several polyQ diseases, including Huntington's disease and two spinocerebellar ataxias, type 1 and type 3. The strategies included nonallele-selective gene silencing, gene replacement, allele-selective SNP targeting and CAG repeat targeting.

Results

Using the patient-derived cell culture models of polyQ diseases, we tested various siRNAs, and antisense reagents and assessed their silencing efficiency and allele selectivity. We showed considerable allele discrimination by several SNP targeting siRNAs based on a weak G-G or G-U pairing with normal allele and strong G-C pairing with mutant allele at the site of RISC-induced cleavage. Among the CAG repeat targeting reagents the strongest allele discrimination is achieved by miRNA-like functioning reagents that bind to their targets and inhibit their translation without substantial target cleavage. Also, morpholino analog performs well in mutant and normal allele discrimination but its efficient delivery to cells at low effective concentration still remains a challenge.

Conclusions

Using three cellular models of polyQ diseases and the same experimental setup we directly compared the performance of different oligonucleotide-based treatment strategies that are currently under development. Based on the results obtained by us and others we discussed the advantages and drawbacks of these strategies considering them from several different perspectives. The strategy aimed at nonallele-selective inhibiting of causative gene expression by targeting specific sequence of the implicated gene is the easiest to implement but relevant benefits are still uncertain. The gene replacement strategy that combines the nonallele-selective gene silencing with the expression of the exogenous normal allele is a logical extension of the former and it deserves to be explored further. Both allele-selective RNAi approaches challenge cellular RNA interference machinery to show its ability to discriminate between similar sequences differing in either single base substitutions or repeated sequence length. Although both approaches perform well in allele discrimination most of our efforts are focused on repeat targeting due to its potentially higher universality.  相似文献   

6.
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.  相似文献   

7.
Predicting population extinction risk is a fundamental application of ecological theory to the practice of conservation biology. Here, we compared the prediction performance of a wide array of stochastic, population dynamics models against direct observations of the extinction process from an extensive experimental data set. By varying a series of biological and statistical assumptions in the proposed models, we were able to identify the assumptions that affected predictions about population extinction. We also show how certain autocorrelation structures can emerge due to interspecific interactions, and that accounting for the stochastic effect of these interactions can improve predictions of the extinction process. We conclude that it is possible to account for the stochastic effects of community interactions on extinction when using single‐species time series.  相似文献   

8.
Networks offer a powerful tool for understanding and visualizing inter-species ecological and evolutionary interactions. Previously considered examples, such as trophic networks, are just representations of experimentally observed direct interactions. However, species interactions are so rich and complex it is not feasible to directly observe more than a small fraction. In this paper, using data mining techniques, we show how potential interactions can be inferred from geographic data, rather than by direct observation. An important application area for this methodology is that of emerging diseases, where, often, little is known about inter-species interactions, such as between vectors and reservoirs. Here, we show how using geographic data, biotic interaction networks that model statistical dependencies between species distributions can be used to infer and understand inter-species interactions. Furthermore, we show how such networks can be used to build prediction models. For example, for predicting the most important reservoirs of a disease, or the degree of disease risk associated with a geographical area. We illustrate the general methodology by considering an important emerging disease - Leishmaniasis. This data mining methodology allows for the use of geographic data to construct inferential biotic interaction networks which can then be used to build prediction models with a wide range of applications in ecology, biodiversity and emerging diseases.  相似文献   

9.
Virtually all pre-mRNA introns begin with the sequence /GU and end with AG/ (where / indicates a border between an exon and an intron). We have previously shown that the G residues at the first and last positions of the yeast actin intron interact during the second step of splicing. In this work, we ask if other highly conserved intron nucleotides also take part in this /G-G/ interaction. Of special interest is the penultimate intron nucleotide (AG/), which is important for the second step of splicing and is in proximity to other conserved intron nucleotides. Therefore, we tested interactions of the penultimate intron nucleotide with the second intron nucleotide (/GU) and with the branch site nucleotide. We also tested two models that predict interactions between sets of three conserved intron nucleotides. In addition, we used random mutagenesis and genetic selection to search for interactions between nucleotides in the pre-mRNA. We find no evidence for other interactions between intron nucleotides besides the interaction between the first and last intron nucleotides.  相似文献   

10.
Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.  相似文献   

11.
12.
研究目的是分子设计并构建较G—CSF。单体分子具有半衰期更长、生物活性更高的新型重组人G—CSF/G—CSF双体分子(简称G-G),并在原核系统进行高效表达。分别构建pET32/G—G和pET32/G原核表达载体,实现G-G双体融合蛋白在大肠杆菌中的高效表达,利用亲和层析等方法进行蛋白纯化;对该融合蛋白的结构特征诸如等电点、柔性、抗原性及亲水性进行模拟分析;采用MTT法对G-G双体融合蛋白的生物学活性进行测定。结果首次构建成功了G-G双体分子融合蛋白高效表达载体,表达量高于40%,一步亲和层析所获融合蛋白的纯度为80%左右。该融合蛋白的结构特征模拟分析的结果显示,G-CSF双体分子的等电点、柔性、抗原性及亲水性均未显改变。活性测定表明,所构建表达的重组人G-G双体分子能有效刺激G—CSF依赖细胞株NFS-60的增殖,但其刺激效应低于G-CSF的单体和标准品。结果表明,所构建表达的G-CSF的单体和G-G双体分子均可在大肠杆菌中获得高效表达,但G-G双体分子的比活性不及G-CSF单体分子,与预期设想的有差别,其原因正在研究之中。  相似文献   

13.
MOTIVATION: The identification of risk-associated genetic variants in common diseases remains a challenge to the biomedical research community. It has been suggested that common statistical approaches that exclusively measure main effects are often unable to detect interactions between some of these variants. Detecting and interpreting interactions is a challenging open problem from the statistical and computational perspectives. Methods in computing science may improve our understanding on the mechanisms of genetic disease by detecting interactions even in the presence of very low heritabilities. RESULTS: We have implemented a method using Genetic Programming that is able to induce a Decision Tree to detect interactions in genetic variants. This method has a cross-validation strategy for estimating classification and prediction errors and tests for consistencies in the results. To have better estimates, a new consistency measure that takes into account interactions and can be used in a genetic programming environment is proposed. This method detected five different interaction models with heritabilities as low as 0.008 and with prediction errors similar to the generated errors. AVAILABILITY: Information on the generated data sets and executable code is available upon request.  相似文献   

14.
Naphthyridine dimer composed of two 2-amino-1,8-naphthyridines and a connecting linker strongly binds to guanine-guanine (G-G) mismatch in duplex DNA. In order to improve G-G selectivity for the binding, we have examined structure modification of the linker. A new naphthyridine dimer possessing 3,6-diazaoctanedioic acid linker binds to G-G mismatch with an association constant of 1.18 x 10(7) M(-1), which is somewhat weaker than that of the original naphthyridine dimer having a shorter connecting linker. However, the binding of the modified naphthyridine dimer to G-A mismatch was almost negligible as compared to that of the original. This results in a net increase of the selectivity for the binding to G-G mismatch by 4-folds.  相似文献   

15.
The Saccharomyces cerevisiae Sgs1p helicase localizes to the nucleolus and is required to maintain the integrity of the rDNA repeats. Sgs1p is a member of the RecQ DNA helicase family, which also includes Schizo-saccharomyces pombe Rqh1, and the human BLM and WRN genes. These genes encode proteins which are essential to maintenance of genomic integrity and which share a highly conserved helicase domain. Here we show that recombinant Sgs1p helicase efficiently unwinds guanine-guanine (G-G) paired DNA. Unwinding of G-G paired DNA is ATP- and Mg2+-dependent and requires a short 3' single-stranded tail. Strikingly, Sgs1p unwinds G-G paired substrates more efficiently than duplex DNAs, as measured either in direct assays or by competition experiments. Sgs1p efficiently unwinds G-G paired telomeric sequences, suggesting that one function of Sgs1p may be to prevent telomere-telomere interactions which can lead to chromosome non-disjunction. The rDNA is G-rich and has considerable potential for G-G pairing. Diminished ability to unwind G-G paired regions may also explain the deleterious effect of mutation of Sgs1 on rDNA stability, and the accelerated aging characteristic of yeast strains that lack Sgs1 as well as humans deficient in the related WRN helicase.  相似文献   

16.
Advances in human genetics have led to epidemiological investigations not only of the effects of genes alone but also of gene-environment (G-E) interaction. A widely accepted design strategy in the study of how G-E relate to disease risks is the population-based case-control study (PBCCS). For simple random samples, semiparametric methods for testing G-E have been developed by Chatterjee and Carroll in 2005. The use of complex sampling in PBCCS that involve differential probabilities of sample selection of cases and controls and possibly cluster sampling is becoming more common. Two complexities, weighting for selection probabilities and intracluster correlation of observations, are induced by the complex sampling. We develop pseudo-semiparametric maximum likelihood estimators (pseudo-SPMLE) that apply to PBCCS with complex sampling. We study the finite sample performance of the pseudo-SPMLE using simulations and illustrate the pseudo-SPMLE with a US case-control study of kidney cancer.  相似文献   

17.
Naphthyridine dimer (ND) specially binds to guanine-guanine (G-G) mismatch in duplex DNA. In order to improve the thermal and alkaline stability and binding ability of the ligand, we have examined structural modification of the linker. A new ligand (NNC) possessing 2-amino-1,8-naphthyridines and a carbamate linker is much more thermally stable than ND. The half-life of NNC is 2.5 times longer than that of ND at 80 degrees C. NNC is also much more stable than ND under alkaline conditions. In addition, NNC binds to G-G mismatch more strongly than ND. The improved stability and the binding of NNC to the G-G mismatch would be suitable for the practical use of NNC-immobilized sensor.  相似文献   

18.
So HC  Sham PC 《PLoS genetics》2010,6(12):e1001230
An increasing number of genetic variants have been identified for many complex diseases. However, it is controversial whether risk prediction based on genomic profiles will be useful clinically. Appropriate statistical measures to evaluate the performance of genetic risk prediction models are required. Previous studies have mainly focused on the use of the area under the receiver operating characteristic (ROC) curve, or AUC, to judge the predictive value of genetic tests. However, AUC has its limitations and should be complemented by other measures. In this study, we develop a novel unifying statistical framework that connects a large variety of predictive indices together. We showed that, given the overall disease probability and the level of variance in total liability (or heritability) explained by the genetic variants, we can estimate analytically a large variety of prediction metrics, for example the AUC, the mean risk difference between cases and non-cases, the net reclassification improvement (ability to reclassify people into high- and low-risk categories), the proportion of cases explained by a specific percentile of population at the highest risk, the variance of predicted risks, and the risk at any percentile. We also demonstrate how to construct graphs to visualize the performance of risk models, such as the ROC curve, the density of risks, and the predictiveness curve (disease risk plotted against risk percentile). The results from simulations match very well with our theoretical estimates. Finally we apply the methodology to nine complex diseases, evaluating the predictive power of genetic tests based on known susceptibility variants for each trait.  相似文献   

19.
Many environmental risk factors for common, complex human diseases have been revealed by epidemiologic studies, but how genotypes at specific loci modulate individual responses to environmental risk factors is largely unknown. Gene-environment interactions will be missed in genome-wide association studies and could account for some of the 'missing heritability' for these diseases. In this review, we focus on asthma as a model disease for studying gene-environment interactions because of relatively large numbers of candidate gene-environment interactions with asthma risk in the literature. Identifying these interactions using genome-wide approaches poses formidable methodological problems, and elucidating molecular mechanisms for these interactions has been challenging. We suggest that studying gene-environment interactions in animal models, although more tractable, might not be sufficient to shed light on the genetic architecture of human diseases. Lastly, we propose avenues for future studies to find gene-environment interactions.  相似文献   

20.
Human microbiome consists of trillions of microorganisms. Microbiota can modulate the host physiology through molecule and metabolite interactions. Integrating microbiome and metabolomics data have the potential to predict different diseases more accurately. Yet, most datasets only measure microbiome data but without paired metabolome data. Here, we propose a novel integrative modeling framework, Microbiome-based Supervised Contrastive Learning Framework (MB-SupCon). MB-SupCon integrates microbiome and metabolome data to generate microbiome embeddings, which can be used to improve the prediction accuracy in datasets that only measure microbiome data. As a proof of concept, we applied MB-SupCon on 720 samples with paired 16S microbiome data and metabolomics data from patients with type 2 diabetes. MB-SupCon outperformed existing prediction methods and achieved high average prediction accuracies for insulin resistance status (84.62%), sex (78.98%), and race (80.04%). Moreover, the microbiome embeddings form separable clusters for different covariate groups in the lower-dimensional space, which enhances data visualization. We also applied MB-SupCon on a large inflammatory bowel disease study and observed similar advantages. Thus, MB-SupCon could be broadly applicable to improve microbiome prediction models in multi-omics disease studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号