首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A standard multivariate principal components (PCs) method was utilized to identify clusters of variables that may be controlled by a common gene or genes (pleiotropy). Heritability estimates were obtained and linkage analyses performed on six individual traits (total cholesterol (Chol), high and low density lipoproteins, triglycerides (TG), body mass index (BMI), and systolic blood pressure (SBP)) and on each PC to compare our ability to identify major gene effects. Using the simulated data from Genetic Analysis Workshop 13 (Cohort 1 and 2 data for year 11), the quantitative traits were first adjusted for age, sex, and smoking (cigarettes per day). Adjusted variables were standardized and PCs calculated followed by orthogonal transformation (varimax rotation). Rotated PCs were then subjected to heritability and quantitative multipoint linkage analysis. The first three PCs explained 73% of the total phenotypic variance. Heritability estimates were above 0.60 for all three PCs. We performed linkage analyses on the PCs as well as the individual traits. The majority of pleiotropic and trait-specific genes were not identified. Standard PCs analysis methods did not facilitate the identification of pleiotropic genes affecting the six traits examined in the simulated data set. In addition, genes contributing 20% of the variance in traits with over 0.60 heritability estimates could not be identified in this simulated data set using traditional quantitative trait linkage analyses. Lack of identification of pleiotropic and trait-specific genes in some cases may reflect their low contribution to the traits/PCs examined or more importantly, characteristics of the sample group analyzed, and not simply a failure of the PC approach itself.  相似文献   

2.
In this study, we used the phenotype simulation package naturalgwas to test the performance of Zhao's Random Forest method in comparison to an uncorrected Random Forest test, latent factor mixed models (LFMM), genome-wide efficient mixed models (GEMMA), and confounder adjusted linear regression (CATE). We created 400 sets of phenotypes, corresponding to five effect sizes and two, five, 15, or 30 causal loci, simulated from two empirical data sets containing SNPs from Striped Bass representing three and 13 populations. All association methods were evaluated for their ability to detect genotype–phenotype associations based on power, false discovery rates, and number of false positives. Genomic inflation was highest for uncorrected Random Forest and LFMM tests and lowest for Gemma and Zhao's Random Forest. All association tests had similar power to detect causal loci, and Zhao's Random Forest had the lowest false discovery rate in all scenarios. To measure the performance of association tests in small data sets with few loci surrounding a causal gene we also ran analyses again after removing causal loci from each data set. All association tests were only able to find true positives, defined as loci located within 30 kbp of a causal locus, in 3%–18% of simulations. In contrast, at least one false positive was found in 17%–44% of simulations. Zhao's Random Forest again identified the fewest false positives of all association tests studied. The ability to test the power of association tests for individual empirical data sets can be an extremely useful first step when designing a GWAS study.  相似文献   

3.
The APOA1-C3-A4-A5 gene complex encodes genes whose products are implicated in the metabolism of HDL and/or triglycerides. Although the relationship between polymorphisms in this gene cluster and dyslipidemias was first reported more than 15 years ago, association and linkage results have remained inconclusive. This is due, in part, to the oligogenic and multivariate nature of dyslipidemic phenotypes. Therefore, we investigate evidence of linkage of APOC3 and HDL using two samples of dyslipidemic pedigrees: familial combined hyperlipidemia (FCHL) and isolated low-HDL (ILHDL). We used a strategy that deals with several difficulties inherent in the study of complex traits: by using a Bayesian Markov Chain Monte Carlo (MCMC) approach we allow for oligogenic trait models, as well as simultaneous incorporation of covariates, in the context of multipoint analysis. By using this approach on extended pedigrees we provide evidence of linkage of APOC3 and HDL level variation in two samples with different ascertainment. In addition to APOC3, we estimate that two to three genes, each with a substantial effect on total variance, are responsible for HDL variation in both data sets. We also provide evidence, using the FCHL data set, for a pleiotropic effect between HDL, HDL3 and triglycerides at the APOC3 locus.  相似文献   

4.
Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine‐learning algorithm, in genomic studies to discern loci underlying both discrete and quantitative traits, particularly when studying wild or nonmodel organisms. RF is becoming increasingly used in ecological and population genetics because, unlike traditional methods, it can efficiently analyse thousands of loci simultaneously and account for nonadditive interactions. However, understanding both the power and limitations of Random Forest is important for its proper implementation and the interpretation of results. We therefore provide a practical introduction to the algorithm and its use for identifying associations between molecular markers and phenotypes, discussing such topics as data limitations, algorithm initiation and optimization, as well as interpretation. We also provide short R tutorials as examples, with the aim of providing a guide to the implementation of the algorithm. Topics discussed here are intended to serve as an entry point for molecular ecologists interested in employing Random Forest to identify trait associations in genomic data sets.  相似文献   

5.
Hepatic lipase encoded by the hepatic lipase gene (LIPC) is involved in the metabolism of several lipoproteins. Four promoter polymorphisms in LIPC have been found to be in complete disequilibrium and associated with high density lipoprotein cholesterol (HDL-C) and apolipoprotein (apo)A-I levels in both white and black populations. We investigated the association between the promoter polymorphism and lipid profiles as well as anthropometric phenotypes in African American men in the Coronary Artery Risk Development in Young Adults study. We performed serial cross-sectional analyses and longitudinal analyses of lipids from 578 subjects in five examinations over 10 years of follow-up. Results showed that the allele frequency (0.52) in our black population was consistent with that reported in black subjects but much higher than that reported (approximately 0.2) in white populations. Analysis of covariance tests of the three genotypic means in each examination showed that the P values ranged from 0.01 to 0.08 for HDL-C (except P = 0.54 in the fourth examination), from 0.006 to 0.01 for HDL(2)-C, and from 0.06 to 0.07 for apoA-I. Mean HDL(3)-C levels were essentially identical among the three genotypes. Total cholesterol, low density lipoprotein cholesterol (LDL-C), triglycerides, and apoB, which are mainly involved in the very low density lipoprotein-LDL pathway, were not significantly different according to the promoter polymorphism, except for triglycerides in the third examination (P = 0.01). No significant association was found between anthropometric phenotypes and the LIPC polymorphism in any of five examinations. The change of the anthropometric variables was not significantly associated with genotypes. In conclusion, our results indicated that the LIPC promoter polymorphism has exclusive effects on HDL(2)-C but not HDL(3)-C levels.  相似文献   

6.
With the advance of genome-wide association studies and newly identified SNP (single-nucleotide polymorphism) associations with complex disease, important discoveries have emerged focusing not only on individual genes but on disease-associated pathways and gene sets. The authors used prospective myocardial infarction case-control studies nested in the Nurses’ Health and Health Professionals Follow-Up Studies to investigate genetic variants associated with myocardial infarction or LDL, HDL, triglycerides, adiponectin and apolipoprotein B (apoB). Using these case-control studies to illustrate an integrative systems biology approach, the authors applied SNP set enrichment analysis to identify gene sets where expression SNPs representing genes from these sets show enrichment in their association with endpoints of interest. The authors also explored an aggregate score approach. While power limited one’s ability to detect significance for association of individual loci with myocardial infarction, the authors found significance for loci associated with LDL, HDL, apoB and triglycerides, replicating previous observations. Applying SNP set enrichment analysis and risk score methods, the authors also found significance for three gene sets and for aggregate scores associated with myocardial infarction as well as for loci-related to cardiovascular risk factors, supporting the use of these methods in practice.  相似文献   

7.
Recent advances in mouse genomics have revealed considerable variation in the form of single-nucleotide polymorphisms (SNPs) among common inbred strains. This has made it possible to characterize closely related strains and to identify genes that differ; such genes may be causal for quantitative phenotypes. The mouse strains DBA/1J and DBA/2J differ by just 5.6% at the SNP level. These strains exhibit differences in a number of metabolic and lipid phenotypes, such as plasma levels of triglycerides (TGs) and HDL. A cross between these strains revealed multiple quantitative trait loci (QTLs) in 294 progeny. We identified significant TG QTLs on chromosomes (Chrs) 1, 2, 3, 4, 8, 9, 10, 11, 12, 13, 14, 16, and 19, and significant HDL QTLs on Chrs 3, 9, and 16. Some QTLs mapped to chromosomes with limited variability between the two strains, thus facilitating the identification of candidate genes. We suggest that Tshr is the QTL gene for Chr 12 TG and HDL levels and that Ihh may account for the TG QTL on Chr 1. This cross highlights the advantage of crossing closely related strains for subsequent identification of QTL genes.  相似文献   

8.
Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter) or the multivariate (wrapper) approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.  相似文献   

9.
Observed patterns of species richness at landscape scale (gamma diversity) cannot always be attributed to a specific set of explanatory variables, but rather different alternative explanatory statistical models of similar quality may exist. Therefore predictions of the effects of environmental change (such as in climate or land cover) on biodiversity may differ considerably, depending on the chosen set of explanatory variables. Here we use multimodel prediction to evaluate effects of climate, land-use intensity and landscape structure on species richness in each of seven groups of organisms (plants, birds, spiders, wild bees, ground beetles, true bugs and hoverflies) in temperate Europe. We contrast this approach with traditional best-model predictions, which we show, using cross-validation, to have inferior prediction accuracy. Multimodel inference changed the importance of some environmental variables in comparison with the best model, and accordingly gave deviating predictions for environmental change effects. Overall, prediction uncertainty for the multimodel approach was only slightly higher than that of the best model, and absolute changes in predicted species richness were also comparable. Richness predictions varied generally more for the impact of climate change than for land-use change at the coarse scale of our study. Overall, our study indicates that the uncertainty introduced to environmental change predictions through uncertainty in model selection both qualitatively and quantitatively affects species richness projections.  相似文献   

10.
Haptoglobin (Hp) subtypes were analysed by two-dimensional high-resolution gel electrophoresis in 81 Norwegian individuals with moderate hypercholesterolemia and in 316 Norwegian control subjects. The frequencies of the genes Hp2SS and Hp2SF were higher in individuals with hypercholesterolemia than in controls but the differences did not reach statistical significance (p = 0.087). Within the control population, no effect of the different Hp subtypes was found on total serum cholesterol, triglycerides or high-density lipoprotein (HDL) cholesterol. However, in the controls a significantly higher frequency of Hp2-2 types was found among those with HDL cholesterol values in the upper quartile as compared to those with HDL cholesterol in the lower quartile. A similar phenomenon was not uncovered in analyses of total serum cholesterol or triglycerides. Our results are in agreement with others which indicate that genes belonging to the Hp polymorphism play a role in predicting an individual's total serum cholesterol level. However, our data indicate that the cholesterol effect is on the HDL rather than on the total cholesterol level.  相似文献   

11.
Alien plants invasion has negative impacts on the structure and functionality of ecosystems. Understanding the determinants of this process is fundamental for addressing environmental issues, such as the water availability in South Africa’s catchments. Both environmental and anthropogenic factors determine the invasion of alien species; however, their relative importance has to be quantified. The aim of this paper was to estimate the importance of 32 explanatory variables in predicting the distribution of the major invasive alien plant species (IAPS) of South Africa, through the use of Species Distribution Models. We used data from the National Invasive Alien Plants Survey, delineated at a quaternary catchment level, coupled with climatic, land cover, edaphic, and anthropogenic variables. Using two-part generalized linear models, we compared the accuracy of two different sets of variables in predicting the spatial distribution of IAPS; the first included environmental correlates alone, and the second included both environmental and anthropogenic variables. Using Random Forest, we explored the relative importance of the variables in producing a map of potential distribution of IAPS. Results showed that the inclusion of anthropogenic variables did not significantly improve model predictions. The most important variables influencing the distribution of IAPS appeared to be the climatic ones. The modeled potential distribution was analyzed in relation to provinces, biomes, and species’ minimum residence time.  相似文献   

12.
13.
The Praomyini tribe is one of the most diverse and abundant groups of Old World rodents. Several species are known to be involved in crop damage and in the epidemiology of several human and cattle diseases. Due to the existence of sibling species their identification is often problematic. Thus an easy, fast and accurate species identification tool is needed for non-systematicians to correctly identify Praomyini species. In this study we compare the usefulness of three genes (16S, Cytb, CO1) for identifying species of this tribe. A total of 426 specimens representing 40 species (sampled across their geographical range) were sequenced for the three genes. Nearly all of the species included in our study are monophyletic in the neighbour joining trees. The degree of intra-specific variability tends to be lower than the divergence between species, but no barcoding gap is detected. The success rate of the statistical methods of species identification is excellent (up to 99% or 100% for statistical supervised classification methods as the k-Nearest Neighbour or Random Forest). The 16S gene is 2.5 less variable than the Cytb and CO1 genes. As a result its discriminatory power is smaller. To sum up, our results suggest that using DNA markers for identifying species in the Praomyini tribe is a largely valid approach, and that the CO1 and Cytb genes are better DNA markers than the 16S gene. Our results confirm the usefulness of statistical methods such as the Random Forest and the 1-NN methods to assign a sequence to a species, even when the number of species is relatively large. Based on our NJ trees and the distribution of all intraspecific and interspecific pairwise nucleotide distances, we highlight the presence of several potentially new species within the Praomyini tribe that should be subject to corroboration assessments.  相似文献   

14.
The problem of variable selection in the generalized linear‐mixed models (GLMMs) is pervasive in statistical practice. For the purpose of variable selection, many methodologies for determining the best subset of explanatory variables currently exist according to the model complexity and differences between applications. In this paper, we develop a “higher posterior probability model with bootstrap” (HPMB) approach to select explanatory variables without fitting all possible GLMMs involving a small or moderate number of explanatory variables. Furthermore, to save computational load, we propose an efficient approximation approach with Laplace's method and Taylor's expansion to approximate intractable integrals in GLMMs. Simulation studies and an application of HapMap data provide evidence that this selection approach is computationally feasible and reliable for exploring true candidate genes and gene–gene associations, after adjusting for complex structures among clusters.  相似文献   

15.
Wei LY  Huang CL  Chen CH 《BMC genetics》2005,6(Z1):S133
Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.  相似文献   

16.
Gene set analysis allows the inclusion of knowledge from established gene sets, such as gene pathways, and potentially improves the power of detecting differentially expressed genes. However, conventional methods of gene set analysis focus on gene marginal effects in a gene set, and ignore gene interactions which may contribute to complex human diseases. In this study, we propose a method of gene interaction enrichment analysis, which incorporates knowledge of predefined gene sets (e.g. gene pathways) to identify enriched gene interaction effects on a phenotype of interest. In our proposed method, we also discuss the reduction of irrelevant genes and the extraction of a core set of gene interactions for an identified gene set, which contribute to the statistical variation of a phenotype of interest. The utility of our method is demonstrated through analyses on two publicly available microarray datasets. The results show that our method can identify gene sets that show strong gene interaction enrichments. The enriched gene interactions identified by our method may provide clues to new gene regulation mechanisms related to the studied phenotypes. In summary, our method offers a powerful tool for researchers to exhaustively examine the large numbers of gene interactions associated with complex human diseases, and can be a useful complement to classical gene set analyses which only considers single genes in a gene set.  相似文献   

17.
Chen M  Cho J  Zhao H 《PLoS genetics》2011,7(4):e1001353
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene-based method. We also illustrate the usefulness of our approach through its applications to a real data example.  相似文献   

18.
MOTIVATION: We have established a novel data mining procedure for the identification of genes associated with pre-defined phenotypes and/or molecular pathways. Based on the observation that these genes are frequently expressed in the same place or in close proximity at about the same time, we have devised an approach termed Common Denominator Procedure. One unusual feature of this approach is that the specificity and probability to identify genes linked to the desired phenotype/pathway increase with greater diversity of the input data. RESULT: To show the feasibility of our approach, the Cancer Genome Anatomy Project expression data combined with a defined set of angiogenic factors was used to identify additional and novel angiogenesis-associated genes. A multitude of these additional genes were known to be associated with angiogenesis according to published data, verifying our approach. For some of the remaining candidate genes, application of a high-throughput functional genomics platform (XantoScreen) provided further experimental evidence for association with angiogenesis.  相似文献   

19.
The identification of quantitative trait loci, QTL, in arthritis animal models is a straight forward process. However, to identify the underlying genes is a great challenge. One strategy frequently used, is to combine QTL analysis with genomic/proteomic screens. This has resulted in a number of publications where carefully performed genomic analyses present likely candidate genes for their respective QTL´s. However, seldom the findings are reconnected to the QTL controlled phenotypes. In this review, we use our own data as an illustrative example that “very likely candidate genes” identified by genomic/proteomics is not necessarily the same as true QTL underlying genes.  相似文献   

20.
Disease is often implicated as a factor in population declines of wildlife and plants. Understanding the characteristics that may predispose a species to infection by a particular pathogen can help direct conservation efforts. Recent declines in amphibian populations world-wide are a major conservation issue and may be caused in part by a fungal pathogen, Batrachochytrium dendrobatidis (Bd). We used Random Forest, a machine learning approach, to identify species-level characteristics that may be related to susceptibility to Bd. Our results suggest that body size at maturity, aspects of egg laying behavior, taxonomic order and family, and reliance on water are good predictors of documented infection for species in the continental United States. These results suggest that, whereas local-scale environmental variables are important to the spread of Bd, species-level characteristics may also influence susceptibility to Bd. The relationships identified in this study suggest future experimental tests, and may target species for conservation efforts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号