首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: The wealth of single nucleotide polymorphism (SNP) data within candidate genes and anticipated across the genome poses enormous analytical problems for studies of genotype-to-phenotype relationships, and modern data mining methods may be particularly well suited to meet the swelling challenges. In this paper, we introduce the method of Belief (Bayesian) networks to the domain of genotype-to-phenotype analyses and provide an example application. RESULTS: A Belief network is a graphical model of a probabilistic nature that represents a joint multivariate probability distribution and reflects conditional independences between variables. Given the data, optimal network topology can be estimated with the assistance of heuristic search algorithms and scoring criteria. Statistical significance of edge strengths can be evaluated using Bayesian methods and bootstrapping. As an example application, the method of Belief networks was applied to 20 SNPs in the apolipoprotein (apo) E gene and plasma apoE levels in a sample of 702 individuals from Jackson, MS. Plasma apoE level was the primary target variable. These analyses indicate that the edge between SNP 4075, coding for the well-known epsilon2 allele, and plasma apoE level was strong. Belief networks can effectively describe complex uncertain processes and can both learn from data and incorporate prior knowledge. AVAILABILITY: Various alternative and supplemental networks (not given in the text) as well as source code extensions, are available from the authors. SUPPLEMENTARY INFORMATION: http://bioinformatics.oxfordjournals.org.  相似文献   

2.
3.
4.
Using Bayesian networks to analyze expression data.   总被引:44,自引:0,他引:44  
  相似文献   

5.
6.
7.
8.
MOTIVATION: There is currently much interest in reverse-engineering regulatory relationships between genes from microarray expression data. We propose a new algorithmic method for inferring such interactions between genes using data from gene knockout experiments. The algorithm we use is the Sparse Bayesian regression algorithm of Tipping and Faul. This method is highly suited to this problem as it does not require the data to be discretized, overcomes the need for an explicit topology search and, most importantly, requires no heuristic thresholding of the discovered connections. RESULTS: Using simulated expression data, we are able to show that this algorithm outperforms a recently published correlation-based approach. Crucially, it does this without the need to set any ad hoc threshold on possible connections.  相似文献   

9.
The associations of apolipoprotein B (apoB) gene polymorphisms with blood lipid levels, also accounting for apo E polymorphisms, were assessed in 82 phenylketonuric (PKU) children on diet (34 girls, 48 boys, age 4-12 years, median 8 years). Dietary and plasma biochemical assessments were performed at six-month intervals from the age of 24 months onwards. Apo B (XbaI, MspI, EcoRI restriction sites) and apo E (E2, E3, E4) gene polymorphisms were determined by restriction-enzyme analysis after DNA extraction from blood. Subgroups of apoB polymorphisms were similar for energy intake, dietary lipids and distribution of apo E polymorphisms. Children carrying XbaI X+ / X+ showed higher plasma levels of LDL cholesterol than children carrying X- / X-/+. This gene-related response to dietary habits might play a role also in non-PKU individuals fed low-fat, low-cholesterol diets.  相似文献   

10.
Genetic variation at microsatellite loci is supposed to be constrained within some range in allele size. In this case, the average-square distance (delta mu)2 between two diverged populations moves asymptotically around and underestimates the time since the populations had split. A distance based on the between-locus correlation in the mean repeat scores, DR, is introduced. Numerical simulations show that DR is a linear function of time if the constraints are approximated by a linear centripetal force, which might be due to mutation bias toward a definite range or be caused both by directional mutation bias toward larger allele size and by selection against the greater number of repeats.  相似文献   

11.
12.
Using genetic marker data, we have developed a general methodology for estimating genetic relationships between a set of individuals. The purpose of this paper is to illustrate the practical utility of these methods as applied to the problem of paternity testing. Bayesian methods are used to compute the posterior probability distribution of the genetic relationship parameters. Use of an interval-estimation approach rather than a hypothesis-testing one avoids the problem of the specification of an appropriate null hypothesis in calculating the probability of paternity. Monte Carlo methods are used to evaluate the utility of two sets of genetic markers in obtaining suitably precise estimates of genetic relationship as well as the effect of the prior distribution chosen. Results indicate that with currently available markers a "true" father may be reliably distinguished from any other genetic relationship to the child and that with a reasonable number of markers one can often discriminate between an unrelated individual and one with a second-degree relationship to the child.  相似文献   

13.
14.
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.  相似文献   

15.
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.  相似文献   

16.
Nearly 2200 genomes that encode around 6 million proteins have now been sequenced. Around 40% of these proteins are of unknown function, even when function is loosely and minimally defined as 'belonging to a superfamily'. In addition to in silico methods, the swelling stream of high-throughput experimental data can give valuable clues for linking these unknowns with precise biological roles. The goal is to develop integrative data-mining platforms that allow the scientific community at large to access and utilize this rich source of experimental knowledge. To this end, we review recent advances in generating whole-genome experimental datasets, where this data can be accessed, and how it can be used to drive prediction of gene function.  相似文献   

17.
18.
Summary .   In this article, we present new methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell-cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arise from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling . Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. Our approach is fully Bayesian and uses Markov chain Monte Carlo methods for inference and estimation. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling. Our methodology uses regression splines, and because of the hierarchical nature of the data, dimension reduction of the covariance matrix of the spline coefficients is important: we suggest simple methods for overcoming this problem.  相似文献   

19.
Kim S  Imoto S  Miyano S 《Bio Systems》2004,75(1-3):57-65
We propose a dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data. The proposed method can overcome a shortcoming of the Bayesian network model in the sense of the construction of cyclic regulations. The proposed method can analyze the microarray data as a continuous data and can capture even nonlinear relations among genes. It can be expected that this model will give a deeper insight into complicated biological systems. We also derive a new criterion for evaluating an estimated network from Bayes approach. We conduct Monte Carlo experiments to examine the effectiveness of the proposed method. We also demonstrate the proposed method through the analysis of the Saccharomyces cerevisiae gene expression data.  相似文献   

20.
Apolipoprotein E (apoE, protein; APOE, gene) is important in lipoprotein metabolism. Three isoforms, apoE2 (Cys112 Cys158), apoE3 (Cys112 Arg158), and apoE4 (Arg112 Arg158), are present in the general population. This report investigates the frequency distribution of apoE isoforms and the association of APOE genotypes with plasma lipid profile and coronary heart disease (CHD) in a population of Taiwan. ApoE isoforms were determined genetically by polymerase chain reaction and HhaI restriction enzyme digestion in control and coronary heart disease (CHD) patients. Plasma lipid and lipoprotein concentrations were also determined. The control group exhibited frequencies of 84.6% APOE3, 7.9% APOE4, 7.5% APOE2, 70.6% APOE3E3, 14.4% APOE3E4, 13.6% APOE2E3, and 1.4% APOE2E4. Comparable frequencies were observed in the CHD group. In both APOE2 carrier and APOE3E3 groups, the CHD patients expressed abnormal lipid profiles while the control group expressed normal lipid profiles. The APOE4 carriers, however, expressed abnormal lipid profiles in both normal control and CHD groups. Extremely high apoE levels in the hypertriglyceridemic group (TG > 400 mg/dL) seemed to be undesirable and were often observed in CHD patients.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号