首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 26 毫秒
1.

Background  

Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets.  相似文献   

2.

Background  

The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.  相似文献   

3.

Background  

Although molecular pathway information and the International HapMap Project data can help biomedical researchers to investigate the aetiology of complex diseases more effectively, such information is missing or insufficient in current genetic association databases. In addition, only a few of the environmental risk factors are included as gene-environment interactions, and the risk measures of associations are not indexed in any association databases.  相似文献   

4.

Background  

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.  相似文献   

5.

Background  

The potential public health benefits of targeting environmental interventions by genotype depend on the environmental and genetic contributions to the variance of common diseases, and the magnitude of any gene-environment interaction. In the absence of prior knowledge of all risk factors, twin, family and environmental data may help to define the potential limits of these benefits in a given population. However, a general methodology to analyze twin data is required because of the potential importance of gene-gene interactions (epistasis), gene-environment interactions, and conditions that break the 'equal environments' assumption for monozygotic and dizygotic twins.  相似文献   

6.

Background  

It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.  相似文献   

7.

Background  

Probing the complex fusion of genetic and environmental interactions, metabolic profiling (or metabolomics/metabonomics), the study of small molecules involved in metabolic reactions, is a rapidly expanding 'omics' field. A major technique for capturing metabolite data is 1H-NMR spectroscopy and this yields highly complex profiles that require sophisticated statistical analysis methods. However, experimental data is difficult to control and expensive to obtain. Thus data simulation is a productive route to aid algorithm development.  相似文献   

8.

Background  

Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks.  相似文献   

9.

Background  

Few genetic factors predisposing to the sporadic form of amyotrophic lateral sclerosis (ALS) have been identified, but the pathology itself seems to be a true multifactorial disease in which complex interactions between environmental and genetic susceptibility factors take place. The purpose of this study was to approach genetic data with an innovative statistical method such as artificial neural networks to identify a possible genetic background predisposing to the disease. A DNA multiarray panel was applied to genotype more than 60 polymorphisms within 35 genes selected from pathways of lipid and homocysteine metabolism, regulation of blood pressure, coagulation, inflammation, cellular adhesion and matrix integrity, in 54 sporadic ALS patients and 208 controls. Advanced intelligent systems based on novel coupling of artificial neural networks and evolutionary algorithms have been applied. The results obtained have been compared with those derived from the use of standard neural networks and classical statistical analysis  相似文献   

10.
11.

Background  

It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.  相似文献   

12.

Background

Many common diseases arise from an interaction between environmental and genetic factors. Our knowledge regarding environment and gene interactions is growing, but frameworks to build an association between gene-environment interactions and disease using preexisting, publicly available data has been lacking. Integrating freely-available environment-gene interaction and disease phenotype data would allow hypothesis generation for potential environmental associations to disease.

Methods

We integrated publicly available disease-specific gene expression microarray data and curated chemical-gene interaction data to systematically predict environmental chemicals associated with disease. We derived chemical-gene signatures for 1,338 chemical/environmental chemicals from the Comparative Toxicogenomics Database (CTD). We associated these chemical-gene signatures with differentially expressed genes from datasets found in the Gene Expression Omnibus (GEO) through an enrichment test.

Results

We were able to verify our analytic method by accurately identifying chemicals applied to samples and cell lines. Furthermore, we were able to predict known and novel environmental associations with prostate, lung, and breast cancers, such as estradiol and bisphenol A.

Conclusions

We have developed a scalable and statistical method to identify possible environmental associations with disease using publicly available data and have validated some of the associations in the literature.  相似文献   

13.

Background  

Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is Multifactor Dimensionality Reduction (MDR). Jiang et al. created a combinatorial epistasis learning method called BNMBL to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL.  相似文献   

14.
15.

Background  

Pathogenesis of complex diseases involves the integration of genetic and environmental factors over time, making it particularly difficult to tease apart relationships between phenotype, genotype, and environmental factors using traditional experimental approaches.  相似文献   

16.

Background

Having the ability to scan the entire country for potential “hotspots” with increased risk of developing chronic diseases due to various environmental, demographic, and genetic susceptibility factors may inform risk management decisions and enable better environmental public health policies.

Objectives

Develop an approach for community-level risk screening focused on identifying potential genetic susceptibility hotpots.

Methods

Our approach combines analyses of phenotype-genotype data, genetic prevalence of single nucleotide polymorphisms, and census/geographic information to estimate census tract-level population attributable risks among various ethnicities and total population for the state of California.

Results

We estimate that the rs13266634 single nucleotide polymorphism, a type 2 diabetes susceptibility genotype, has a genetic prevalence of 56.3%, 47.4% and 37.0% in Mexican Mestizo, Caucasian, and Asian populations. Looking at the top quintile for total population attributable risk, 16 California counties have greater than 25% of their population living in hotspots of genetic susceptibility for developing type 2 diabetes due to this single genotypic susceptibility factor.

Conclusions

This study identified counties in California where large portions of the population may bear additional type 2 diabetes risk due to increased genetic prevalence of a susceptibility genotype. This type of screening can easily be extended to include information on environmental contaminants of interest and other related diseases, and potentially enables the rapid identification of potential environmental justice communities. Other potential uses of this approach include problem formulation in support of risk assessments, land use planning, and prioritization of site cleanup and remediation actions.  相似文献   

17.

Background  

Discovering the genetic basis of common genetic diseases in the human genome represents a public health issue. However, the dimensionality of the genetic data (up to 1 million genetic markers) and its complexity make the statistical analysis a challenging task.  相似文献   

18.

Background

Atherosclerotic peripheral arterial disease (PAD) affects 8–10 million people in the United States and is associated with a marked impairment in quality of life and an increased risk of cardiovascular events. Noninvasive assessment of PAD is performed by measuring the ankle-brachial index (ABI). Complex traits, such as ABI, are influenced by a large array of genetic and environmental factors and their interactions. We attempted to characterize the genetic architecture of ABI by examining the main and interactive effects of individual single nucleotide polymorphisms (SNPs) and conventional risk factors.

Methods

We applied linear regression analysis to investigate the association of 435 SNPs in 112 positional and biological candidate genes with ABI and related physiological and biochemical traits in 1046 non-Hispanic white, hypertensive participants from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. The main effects of each SNP, as well as SNP-covariate and SNP-SNP interactions, were assessed to investigate how they contribute to the inter-individual variation in ABI. Multivariable linear regression models were then used to assess the joint contributions of the top SNP associations and interactions to ABI after adjustment for covariates. We reduced the chance of false positives by 1) correcting for multiple testing using the false discovery rate, 2) internal replication, and 3) four-fold cross-validation.

Results

When the results from these three procedures were combined, only two SNP main effects in NOS3, three SNP-covariate interactions (ADRB2 Gly 16 – lipoprotein(a) and SLC4A5 – diabetes interactions), and 25 SNP-SNP interactions (involving SNPs from 29 different genes) were significant, replicated, and cross-validated. Combining the top SNPs, risk factors, and their interactions into a model explained nearly 18% of variation in ABI in the sample. SNPs in six genes (ADD2, ATP6V1B1, PRKAR2B, SLC17A2, SLC22A3, and TGFB3) were also influencing triglycerides, C-reactive protein, homocysteine, and lipoprotein(a) levels.

Conclusion

We found that candidate gene SNP main effects, SNP-covariate and SNP-SNP interactions contribute to the inter-individual variation in ABI, a marker of PAD. Our findings underscore the importance of conducting systematic investigations that consider context-dependent frameworks for developing a deeper understanding of the multidimensional genetic and environmental factors that contribute to complex diseases.  相似文献   

19.

Background  

The causes of complex diseases are difficult to grasp since many different factors play a role in their onset. To find a common genetic background, many of the existing studies divide their population into controls and cases; a classification that is likely to cause heterogeneity within the two groups. Rather than dividing the study population into cases and controls, it is better to identify the phenotype of a complex disease by a set of intermediate risk factors. But these risk factors often vary over time and are therefore repeatedly measured.  相似文献   

20.

Background and Aims

This study aimed to identify and characterize the ontogenetic, environmental and individual components of forest tree growth. In the proposed approach, the tree growth data typically correspond to the retrospective measurement of annual shoot characteristics (e.g. length) along the trunk.

Methods

Dedicated statistical models (semi-Markov switching linear mixed models) were applied to data sets of Corsican pine and sessile oak. In the semi-Markov switching linear mixed models estimated from these data sets, the underlying semi-Markov chain represents both the succession of growth phases and their lengths, while the linear mixed models represent both the influence of climatic factors and the inter-individual heterogeneity within each growth phase.

Key Results

On the basis of these integrative statistical models, it is shown that growth phases are not only defined by average growth level but also by growth fluctuation amplitudes in response to climatic factors and inter-individual heterogeneity and that the individual tree status within the population may change between phases. Species plasticity affected the response to climatic factors while tree origin, sampling strategy and silvicultural interventions impacted inter-individual heterogeneity.

Conclusions

The transposition of the proposed integrative statistical modelling approach to cambial growth in relation to climatic factors and the study of the relationship between apical growth and cambial growth constitute the next steps in this research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号