首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.  相似文献   

2.
3.
4.
The cancer classification problem is one of the most challenging problems in bioinformatics. The data provided by Netherland Cancer Institute consists of 295 breast cancer patient; 101 patients are with distant metastases and 194 patients are without distant metastases. Combination of features sets based on kernel method to classify the patient who are with or without distant metastases will be investigated. The single data set will be compared with three data integration strategies and also weighted data integration strategies based on kernel method. Least Square Support Vector Machine (LS-SVM) is chosen as the classifier because it can handle very high dimensional features, for instance, microarray data. The experiment result shows that the performance of weighted late integration and the using of only microarray data are almost similar. The data integration strategy is not always better than using single data set in this case. The performance of classification absolutely depends on the features that are used to represent the object.  相似文献   

5.
6.
An attempt to apply structured exploratory data analysis (SEDA) encounters problems of specificity and power, which limit its utility to supplement likelihood analysis.  相似文献   

7.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

8.

Background  

We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) was assembled to distinguish enzymes with hydrolytic activities that are expressed in the extracellular space of cancer cells. Proteins were identified with respect to six types of cancer occurring in the prostate, breast, lung, colon, ovary, and pancreas.  相似文献   

9.
10.
Microarrays have become a standard tool for investigating gene function and more complex microarray experiments are increasingly being conducted. For example, an experiment may involve samples from several groups or may investigate changes in gene expression over time for several subjects, leading to large three-way data sets. In response to this increase in data complexity, we propose some extensions to the plaid model, a biclustering method developed for the analysis of gene expression data. This model-based method lends itself to the incorporation of any additional structure such as external grouping or repeated measures. We describe how the extended models may be fitted and illustrate their use on real data.  相似文献   

11.
多源空间数据整合视角下的城市开发强度研究   总被引:3,自引:0,他引:3  
岳文泽  章佳民  刘勇  张玮 《生态学报》2019,39(21):7914-7926
城市开发强度能直观表征人类活动强度,对指导城市规划与管理、促进城市可持续发展具有重要价值。采用社会-经济-生态系统耦合视角构建城市开发强度的多维测度体系,整合多源空间数据,测度了杭州市主城及3个副城的开发强度并揭示了其空间分布特征。结果表明,杭州城市开发强度由主城向副城呈波动降低,高强度开发过度集中于主城,主城的功能疏散有待加强;各副城开发强度不一,江南城与主城呈现跨江融合,临平城、下沙城空间上较为独立;各开发维度中,建筑强度、功能强度及效益强度热点区分布基本一致,环境响应高强度区则集聚于主副城交界处,表现出空间异质性。多维测度体系可较好表征城市开发强度,对城市规划及精细化管理具有一定的应用价值。  相似文献   

12.
There are currently 100–200 microbiology-related databases in existence, although it is impossible to find answers to queries that span even a few of these. The Center for Microbial Ecology (CME) at Michigan State University seeks to change this situation by coordinating the creation of an Integrated Microbial Database (IMD), accessible through the World Wide Web (WWW). Such a system will contain up-to-date phylogeny and taxonomy, gene sequences (including genomes), biochemical data, metabolic models, ecological and phenotypic data. Current main obstacles to creation of an IMD are the lack of a single freely available organismal nomenclature with synonyms and the availability of much critical data. An IMD will have major impacts on microbial biology: currently intractable fundamental questions might be answered, experiments could be refocused, and new commercial possibilities created. An IMD should remain freely available and be created under an open development model. Received 20 June 1996/ Accepted in revised form 02 November 1996  相似文献   

13.
14.
Platyrrhine phylogeny has been investigated repeatedly with morphological characters and DNA nuclear gene sequences, with partially inconsistent results. Given the finding in the past decade that the mitochondrial genome is a potentially valuable source of phylogenetic information, we gathered DNA sequence data of a fragment of the 16S and the entire 12S mitochondrial genes. The objectives were to generate a cladistic phylogeny based on these data and to combine them in a simultaneous analysis with morphological characters and preexisting nuclear DNA sequences. Mitochondrial data analyzed on its own yielded a cladogram that was different from those generated with other data sets. The simultaneous analysis of mitochondrial, nuclear, and morphological data yielded a tree most congruent with that generated with nuclear data and to a lesser degree with the morphological one. It depicted a basal dichotomy that led to two major clades: one of them comprised [Atelinae (Callicebus + Pitheciini)] and the other major clade comprised [Aotus ((Cebus, Saimiri) (Callitrichinae))]. The weakest point of the phylogeny was the position of Aotus as basal within their clade as opposed to more closely linked with either the callitrichines or Cebus-Saimiri. Relationships within callitrichines and atelines were unstable as well. The simultaneous phylogenetic analysis of all data sets revealed congruent signal in all of them that was partially obscured in the separate analyses. Am J Phys Anthropol 106:261–281, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

15.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers.  相似文献   

16.
Protein data, from sequence and structure to interaction, is being generated through many diverse methodologies; it is stored and reported in numerous forms and multiple places. The magnitude of the data limits researchers abilities to utilize all information generated. Effective integration of protein data can be accomplished through better data modeling. We demonstrate this through the MIPD project.  相似文献   

17.
MOTIVATION: Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY: www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT: huang@dbi.udel.edu.  相似文献   

18.
19.
Jensen LJ  Steinmetz LM 《FEBS letters》2005,579(8):1802-1807
To understand a biological process it is clear that a single approach will not be sufficient, just like a single measurement on a protein--such as its expression level--does not describe protein function. Using reference sets of proteins as benchmarks different approaches can be scaled and integrated. Here, we demonstrate the power of data re-analysis and integration by applying it in a case study to data from deletion phenotype screens and mRNA expression profiling.  相似文献   

20.
The low reproducibility of differential expression of individual genes in microarray experiments has led to the suggestion that experiments be analyzed in terms of gene characteristics, such as GO categories or pathways, in order to enhance the robustness of the results. An implicit assumption of this approach is that the different experiments in effect randomly sample the genes participating in an active process. We argue that by the same rationale it is possible to perform this higher-level analysis on the aggregation of genes that are differentially-expressed in different expression-based studies, even if the experiments used different platforms. The aggregation increases the reliability of the results, it has the potential for uncovering signals that are liable to escape detection in the individual experiments, and it enables a more thorough mining of the ever more plentiful microarray data. We present here a proof-of-concept study of these ideas, using ten studies describing the changes in expression profiles of human host genes in response to infection by Retroviridae or Herpesviridae viral families. We supply a tool (accessible at www.cs.bgu.ac.il/~waytogo) which enables the user to learn about genes and processes of interest in this study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号