首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.  相似文献   

2.
3.
Analysis of variance (ANOVA) was employed to investigate 9,000 gene expression patterns from brains of both normal mice and mice with a pharmacological model of Parkinson's disease (PD). The data set was obtained using voxelation, a method that allows high-throughput acquisition of 3D gene expression patterns through analysis of spatially registered voxels (cubes). This method produces multiple volumetric maps of gene expression analogous to the images reconstructed in biomedical imaging systems. The ANOVA model was compared to the results from singular value decomposition (SVD) by using the first 42 singular vectors of the data matrix, a number equal to the rank of the ANOVA model. The ANOVA was also compared to the results from non-parametric statistics. Lastly, images were obtained for a subset of genes that emerged from the ANOVA as significant. The results suggest that ANOVA will be a valuable framework for insights into the large number of gene expression patterns obtained from voxelation.  相似文献   

4.
We present a computational approach based on a local search strategy that discovers sets of proteins that preferentially interact with each other. Such sets are referred to as protein communities and are likely to represent functional modules. Preferential interaction between module members is quantified via an analytical framework based on a network null model known as the random graph with given expected degrees. Based on this framework, the concept of local protein community is generalized to that of community of communities. Protein communities and higher-level structures are extracted from two yeast protein interaction data sets and a network of published interactions between human proteins. The high level structures obtained with the human network correspond to broad biological concepts such as signal transduction, regulation of gene expression, and intercellular communication. Many of the obtained human communities are enriched, in a statistically significant way, for proteins having no clear orthologs in lower organisms. This indicates that the extracted modules are quite coherent in terms of function.  相似文献   

5.
6.
7.
生物医药产业是中国医药制造业的第三大产业,也是粤港澳大湾区重点扶持的新兴高技术产业之一。透视产业主营业务收入与各类指标的关联情况是进一步指导粤港澳大湾区生物医药产业建设规划的必要之举。而灰色综合关联分析法是探索指标间关联程度的重要工具。通过多种权威途径获得粤港澳大湾区生物医药产业规模以上企业数量、企业孵化器数量、新增专利数、自然科学基金立项投入总额、非自然科学基金立项数量、产学研合作项数量六大指标数据。采用灰色关联分析法,将所得数据进行建模并求解,得出这六大指标与粤港澳大湾区生物医药产业主营业务收入的关联度。结果显示,规模以上企业数量与粤港澳大湾区生物医药产业主营业务收入的关联度最高,其次为自然科学基金立项投入总额,再次为产学研合作项数量,非自然科学基金立项数量位列第四,企业孵化器数量、新增专利数分别位列第五、第六。因此,谋求粤港澳大湾区生物医药产业的进一步发展应着重从提高规模以上企业数量、重视自然科学基金投入、加强政产学研合作3个层面入手。  相似文献   

8.
A catalog of all human protein-protein interactions would provide scientists with a framework to study protein deregulation in complex diseases such as cancer. Here we demonstrate that a probabilistic analysis integrating model organism interactome data, protein domain data, genome-wide gene expression data and functional annotation data predicts nearly 40,000 protein-protein interactions in humans-a result comparable to those obtained with experimental and computational approaches in model organisms. We validated the accuracy of the predictive model on an independent test set of known interactions and also experimentally confirmed two predicted interactions relevant to human cancer, implicating uncharacterized proteins into definitive pathways. We also applied the human interactome network to cancer genomics data and identified several interaction subnetworks activated in cancer. This integrative analysis provides a comprehensive framework for exploring the human protein interaction network.  相似文献   

9.
10.
11.
To assess the importance of variation in observer effort between and within bird atlas projects and demonstrate the use of relatively simple conditional autoregressive (CAR) models for analyzing grid‐based atlas data with varying effort. Pennsylvania and West Virginia, United States of America. We used varying proportions of randomly selected training data to assess whether variations in observer effort can be accounted for using CAR models and whether such models would still be useful for atlases with incomplete data. We then evaluated whether the application of these models influenced our assessment of distribution change between two atlas projects separated by twenty years (Pennsylvania), and tested our modeling methodology on a state bird atlas with incomplete coverage (West Virginia). Conditional Autoregressive models which included observer effort and landscape covariates were able to make robust predictions of species distributions in cases of sparse data coverage. Further, we found that CAR models without landscape covariates performed favorably. These models also account for variation in observer effort between atlas projects and can have a profound effect on the overall assessment of distribution change. Accounting for variation in observer effort in atlas projects is critically important. CAR models provide a useful modeling framework for accounting for variation in observer effort in bird atlas data because they are relatively simple to apply, and quick to run.  相似文献   

12.
《IRBM》2022,43(1):62-74
BackgroundThe prediction of breast cancer subtypes plays a key role in the diagnosis and prognosis of breast cancer. In recent years, deep learning (DL) has shown good performance in the intelligent prediction of breast cancer subtypes. However, most of the traditional DL models use single modality data, which can just extract a few features, so it cannot establish a stable relationship between patient characteristics and breast cancer subtypes.DatasetWe used the TCGA-BRCA dataset as a sample set for molecular subtype prediction of breast cancer. It is a public dataset that can be obtained through the following link: https://portal.gdc.cancer.gov/projects/TCGA-BRCAMethodsIn this paper, a Hybrid DL model based on the multimodal data is proposed. We combine the patient's gene modality data with image modality data to construct a multimodal fusion framework. According to the different forms and states, we set up feature extraction networks respectively, and then we fuse the output of the two feature networks based on the idea of weighted linear aggregation. Finally, the fused features are used to predict breast cancer subtypes. In particular, we use the principal component analysis to reduce the dimensionality of high-dimensional data of gene modality and filter the data of image modality. Besides, we also improve the traditional feature extraction network to make it show better performance.ResultsThe results show that compared with the traditional DL model, the Hybrid DL model proposed in this paper is more accurate and efficient in predicting breast cancer subtypes. Our model achieved a prediction accuracy of 88.07% in 10 times of 10-fold cross-validation. We did a separate AUC test for each subtype, and the average AUC value obtained was 0.9427. In terms of subtype prediction accuracy, our model is about 7.45% higher than the previous average.  相似文献   

13.
14.
In this paper we construct a model of the glycolytic-glycogenolytic converging pathway in rat liver, by integrating experimental data obtained in anin vitro system and information available from the literature. The model takes the mathematical expression of an S-system representation within the power law formalism (Savageau, 1976. Biochemical System Analysis: A study of function and design in Molecular Biology. Addison-Wesley, Reading, Mass.). By using this theoretical framework a model analysis was carried out that allowed us a) the assessment of the quality of the model in terms of its consistency and robustness, b) the steady state analysis and control characterization of the system, and c) the study of the dynamics of the system after changes in the level of two magnitudes of biological significance: the glucose concentration and the phosphofructokinase enzyme activity. Model predictions are compared with experimental measurements referred to Logarithmic Gains through fluxes and substrates concentrations showing that there is a good correlation between the model predictions and the experimentally determined values.  相似文献   

15.
16.
Liu D  Lin X  Ghosh D 《Biometrics》2007,63(4):1079-1088
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.  相似文献   

17.
This paper introduces a mathematical framework for modelling genome expression and regulation. Starting with a philosophical foundation, causation is identified as the principle of explanation of change in the realm of matter. Causation is, therefore, a relationship, not between components, but between changes of states of a system. We subsequently view genome expression (formerly known as 'gene expression') as a dynamic process and model aspects of it as dynamic systems using methodologies developed within the areas of systems and control theory. We begin with the possibly most abstract but general formulation in the setting of category theory. The class of models realised are state-space models, input--output models, autoregressive models or automata. We find that a number of proposed 'gene network' models are, therefore, included in the framework presented here. The conceptual framework that integrates all of these models defines a dynamic system as a family of expression profiles. It becomes apparent that the concept of a 'gene' is less appropriate when considering mathematical models of genome expression and regulation. The main claim of this paper is that we should treat (model) the organisation and regulation of genetic pathways as what they are: dynamic systems. Microarray technology allows us to generate large sets of time series data and is, therefore, discussed with regard to its use in mathematical modelling of gene expression and regulation.  相似文献   

18.
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.  相似文献   

19.
MOTIVATION: Genome sequencing projects and high-through-put technologies like DNA and Protein arrays have resulted in a very large amount of information-rich data. Microarray experimental data are a valuable, but limited source for inferring gene regulation mechanisms on a genomic scale. Additional information such as promoter sequences of genes/DNA binding motifs, gene ontologies, and location data, when combined with gene expression analysis can increase the statistical significance of the finding. This paper introduces a machine learning approach to information fusion for combining heterogeneous genomic data. The algorithm uses an unsupervised joint learning mechanism that identifies clusters of genes using the combined data. RESULTS: The correlation between gene expression time-series patterns obtained from different experimental conditions and the presence of several distinct and repeated motifs in their upstream sequences is examined here using publicly available yeast cell-cycle data. The results show that the combined learning approach taken here identifies correlated genes effectively. The algorithm provides an automated clustering method, but allows the user to specify apriori the influence of each data type on the final clustering using probabilities. AVAILABILITY: Software code is available by request from the first author. CONTACT: jkasturi@cse.psu.edu.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号