首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the last decade, advances in high-throughput technologies such as DNA microarrays have made it possible to simultaneously measure the expression levels of tens of thousands of genes and proteins. This has resulted in large amounts of biological data requiring analysis and interpretation. Nonnegative matrix factorization (NMF) was introduced as an unsupervised, parts-based learning paradigm involving the decomposition of a nonnegative matrix V into two nonnegative matrices, W and H, via a multiplicative updates algorithm. In the context of a pxn gene expression matrix V consisting of observations on p genes from n samples, each column of W defines a metagene, and each column of H represents the metagene expression pattern of the corresponding sample. NMF has been primarily applied in an unsupervised setting in image and natural language processing. More recently, it has been successfully utilized in a variety of applications in computational biology. Examples include molecular pattern discovery, class comparison and prediction, cross-platform and cross-species analysis, functional characterization of genes and biomedical informatics. In this paper, we review this method as a data analytical and interpretive tool in computational biology with an emphasis on these applications.  相似文献   

2.
Protein-protein interaction networks: from interactions to networks   总被引:1,自引:0,他引:1  
The goal of interaction proteomics that studies the protein-protein interactions of all expressed proteins is to understand biological processes that are strictly regulated by these interactions. The availability of entire genome sequences of many organisms and high-throughput analysis tools has led scientists to study the entire proteome (Pandey and Mann, 2000). There are various high-throughput methods for detecting protein interactions such as yeast two-hybrid approach and mass spectrometry to produce vast amounts of data that can be utilized to decipher protein functions in complicated biological networks. In this review, we discuss recent developments in analytical methods for large-scale protein interactions and the future direction of interaction proteomics.  相似文献   

3.
Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms—eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.  相似文献   

4.

Background

Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.

Results

We provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.

Conclusions

A series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.
  相似文献   

5.
In recent years, the biomolecular sciences have been driven forward by overwhelming advances in new biotechnological high-throughput experimental methods and bioinformatic genome-wide computational methods. Such breakthroughs are producing huge amounts of new data that need to be carefully analysed to obtain correct and useful scientific knowledge. One of the fields where this advance has become more intense is the study of the network of 'protein-protein interactions', i.e. the 'interactome'. In this short review we comment on the main data and databases produced in this field in last 5 years. We also present a rationalized scheme of biological definitions that will be useful for a better understanding and interpretation of 'what a protein-protein interaction is' and 'which types of protein-protein interactions are found in a living cell'. Finally, we comment on some assignments of interactome data to defined types of protein interaction and we present a new bioinformatic tool called APIN (Agile Protein Interaction Network browser), which is in development and will be applied to browsing protein interaction databases.  相似文献   

6.
MOTIVATION: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. RESULTS: In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. AVAILABILITY: The software is available as supplementary material.  相似文献   

7.
Protein–protein interactions (PPIs) play very important roles in many cellular processes, and provide rich information for discovering biological facts and knowledge. Although various experimental approaches have been developed to generate large amounts of PPI data for different organisms, high-throughput experimental data usually suffers from high error rates, and as a consequence, the biological knowledge discovered from this data is distorted or incorrect. Therefore, it is vital to assess the quality of protein interaction data and extract reliable protein interactions from the high-throughput experimental data. In this paper, we propose a new Semantic Reliability (SR) method to assess the reliability of each protein interaction and identify potential false-positive protein interactions in a dataset. For each pair of target interacting proteins, the SR method takes into account the semantic influence between proteins that interact with the target proteins, and the semantic influence between the target proteins themselves when assessing the interaction reliability. Evaluations on real protein interaction datasets demonstrated that our method outperformed other existing methods in terms of extracting more reliable interactions from original protein interaction datasets.  相似文献   

8.
长链非编码RNA(long non coding RNA, lncRNA)在多个水平参与调节机体的各项基础生物进程,其功能紊乱常伴随疾病的发生。鉴定lncRNA的生物学功能已成为近年来的研究热点。然而,目前从各种真核生物高通量测序中鉴定的几十万个lncRNA中,只有极少数的功能已被实验验证,这对于该领域的深入研究是个巨大的挑战。因此,许多科研机构都建立了lncRNA数据库,并且持续周期性更新,这为研究者共享、注释和分析lncRNA功能提供了十分有效的工具。本文从lncRNA原始资源整合、筛选、鉴定及功能分析和lncRNA与人类疾病等4个方面介绍各lncRNA数据库资源的最新特征和应用范围。这为研究者在选择不同数据库资源进行lncRNA鉴定和分析时提供参考。  相似文献   

9.
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for indepth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years,uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold:(1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data;(2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and(3)discuss remaining challenges and future directions for further method developments.  相似文献   

10.
RNA interference (RNAi) has become a powerful tool to dissect cellular pathways and characterize gene functions. The availability of genome-wide RNAi libraries for various model organisms and mammalian cells has enabled high-throughput RNAi screenings. These RNAi screens successfully identified key components that had previously been missed in classical forward genetic screening approaches and allowed the assessment of combined loss-of-function phenotypes. Crucially, the quality of RNAi screening results depends on quantitative assays and the choice of the right biological context. In this review, we provide an overview on the design and application of high-throughput RNAi screens as well as data analysis and candidate validation strategies.  相似文献   

11.
12.
Motivation: DNA microarrays are a well-known and established technology in biological and pharmaceutical research providing a wealth of information essential for understanding biological processes and aiding drug development. Protein microarrays are quickly emerging as a follow-up technology, which will also begin to experience rapid growth as the challenges in protein to spot methodologies are overcome. Like DNA microarrays, their protein counterparts produce large amounts of data that must be suitably analyzed in order to yield meaningful information that should eventually lead to novel drug targets and biomarkers. Although the statistical management of DNA microarray data has been well described, there is no available report that offers a successful consolidated approach to the analysis of high-throughput protein microarray data. We describe the novel application of a statistical methodology to analyze the data from an immune response profiling assay using human protein microarray with over 5000 proteins on each chip.  相似文献   

13.
MOTIVATION: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. RESULTS: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54-64% with a precision of 91-94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene-GO relationships and 150 000 family-GO relationships for major eukaryotes.  相似文献   

14.
Chen Y  Xu D 《Nucleic acids research》2004,32(21):6414-6424
As we are moving into the post genome-sequencing era, various high-throughput experimental techniques have been developed to characterize biological systems on the genomic scale. Discovering new biological knowledge from the high-throughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a Bayesian statistical method together with Boltzmann machine and simulated annealing for protein functional annotation in the yeast Saccharomyces cerevisiae through integrating various high-throughput biological data, including yeast two-hybrid data, protein complexes and microarray gene expression profiles. In our approach, we quantified the relationship between functional similarity and high-throughput data, and coded the relationship into ‘functional linkage graph’, where each node represents one protein and the weight of each edge is characterized by the Bayesian probability of function similarity between two proteins. We also integrated the evolution information and protein subcellular localization information into the prediction. Based on our method, 1802 out of 2280 unannotated proteins in yeast were assigned functions systematically.  相似文献   

15.
《Genomics》2020,112(1):174-183
Protein complexes are one of the most important functional units for deriving biological processes within the cell. Experimental methods have provided valuable data to infer protein complexes. However, these methods have inherent limitations. Considering these limitations, many computational methods have been proposed to predict protein complexes, in the last decade. Almost all of these in-silico methods predict protein complexes from the ever-increasing protein–protein interaction (PPI) data. These computational approaches usually use the PPI data in the format of a huge protein–protein interaction network (PPIN) as input and output various sub-networks of the given PPIN as the predicted protein complexes. Some of these methods have already reached a promising efficiency in protein complex detection. Nonetheless, there are challenges in prediction of other types of protein complexes, specially sparse and small ones. New methods should further incorporate the knowledge of biological properties of proteins to improve the performance. Additionally, there are several challenges that should be considered more effectively in designing the new complex prediction algorithms in the future. This article not only reviews the history of computational protein complex prediction but also provides new insight for improvement of new methodologies. In this article, most important computational methods for protein complex prediction are evaluated and compared. In addition, some of the challenges in the reconstruction of the protein complexes are discussed. Finally, various tools for protein complex prediction and PPIN analysis as well as the current high-throughput databases are reviewed.  相似文献   

16.
17.
The knowledge gleaned from genome sequencing and post-genome analyses is having a very significant impact on a whole range of life sciences and their applications. 'Genome-wide analysis' is a good keyword to represent this tendency. Thanks to innovations in high-throughput measurement technologies and information technologies, genome-wide analysis is becoming available in a broad range of research fields from DNA sequences, gene and protein expressions, protein structures and interactions, to pathways or networks analysis. In fact, the number of research targets has increased by more than two orders in recent years and we should change drastically the attitude to research activities. The scope and speed of research activities are expanding and the field of bioinformatics is playing an important role. In parallel with the data-driven research approach that focuses on speedy handling and analyzing of the huge amount of data, a new approach is gradually gaining power. This is a 'model-driven research' approach, that incorporates biological modeling in its research framework. Computational simulations of biological processes play a pivotal role. By modeling and simulating, this approach aims at predicting and even designing the dynamic behaviors of complex biological systems, which is expected to make rapid progress in life science researches and lead to meaningful applications to various fields such as health care, food supply and improvement of environment. Genomic sciences are now advancing as great frontiers of research and applications in the 21st century.This article starts with surveying the general progress of bioinformatics (Section 1), and describes Japanese activities in bioinformatics (Section 2). In Section 3, I will introduce recent developments in Systems Biology which I think will become more important in the future.  相似文献   

18.
The production of high-throughput gene expression data has generated a crucial need for bioinformatics tools to generate biologically interesting hypotheses. Whereas many tools are available for extracting global patterns, less attention has been focused on local pattern discovery. We propose here an original way to discover knowledge from gene expression data by means of the so-called formal concepts which hold in derived Boolean gene expression datasets. We first encoded the over-expression properties of genes in human cells using human SAGE data. It has given rise to a Boolean matrix from which we extracted the complete collection of formal concepts, i.e., all the largest sets of over-expressed genes associated to a largest set of biological situations in which their over-expression is observed. Complete collections of such patterns tend to be huge. Since their interpretation is a time-consuming task, we propose a new method to rapidly visualize clusters of formal concepts. This designates a reasonable number of Quasi-Synexpression-Groups (QSGs) for further analysis. The interest of our approach is illustrated using human SAGE data and interpreting one of the extracted QSGs. The assessment of its biological relevancy leads to the formulation of both previously proposed and new biological hypotheses.  相似文献   

19.
While the rapid development of personal computers and high-throughput recording systems for circadian rhythms allow chronobiologists to produce huge amounts of data, the software to analyze them often lags behind. Here, we announce newly developed chronobiology software that is easy to use, compatible with many different systems, and freely available. Our system can perform the most frequently used analyses: actogram drawing, periodogram analysis, and waveform analysis. The software is distributed as a pure Java plug-in for ImageJ and so works on the 3 main operating systems: Linux, Macintosh, and Windows. We believe that this free software raises the speed of data analyses and makes studying chronobiology accessible to newcomers.  相似文献   

20.
Jung S  Lee KH  Lee D 《Bio Systems》2007,90(1):197-210
The Bayesian network is a popular tool for describing relationships between data entities by representing probabilistic (in)dependencies with a directed acyclic graph (DAG) structure. Relationships have been inferred between biological entities using the Bayesian network model with high-throughput data from biological systems in diverse fields. However, the scalability of those approaches is seriously restricted because of the huge search space for finding an optimal DAG structure in the process of Bayesian network learning. For this reason, most previous approaches limit the number of target entities or use additional knowledge to restrict the search space. In this paper, we use the hierarchical clustering and order restriction (H-CORE) method for the learning of large Bayesian networks by clustering entities and restricting edge directions between those clusters, with the aim of overcoming the scalability problem and thus making it possible to perform genome-scale Bayesian network analysis without additional biological knowledge. We use simulations to show that H-CORE is much faster than the widely used sparse candidate method, whilst being of comparable quality. We have also applied H-CORE to retrieving gene-to-gene relationships in a biological system (The 'Rosetta compendium'). By evaluating learned information through literature mining, we demonstrate that H-CORE enables the genome-scale Bayesian analysis of biological systems without any prior knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号