首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Due to advances in high-throughput biotechnologies biological information is being collected in databases at an amazing rate, requiring novel computational approaches that process collected data into new knowledge in a timely manner. In this study, we propose a computational framework for discovering modular structure, relationships and regularities in complex data. The framework utilizes a semantic-preserving vocabulary to convert records of biological annotations of an object, such as an organism, gene, chemical or sequence, into networks (Anets) of the associated annotations. An association between a pair of annotations in an Anet is determined by the similarity of their co-occurrence pattern with all other annotations in the data. This feature captures associations between annotations that do not necessarily co-occur with each other and facilitates discovery of the most significant relationships in the collected data through clustering and visualization of the Anet. To demonstrate this approach, we applied the framework to the analysis of metadata from the Genomes OnLine Database and produced a biological map of sequenced prokaryotic organisms with three major clusters of metadata that represent pathogens, environmental isolates and plant symbionts.  相似文献   

2.
YV Sun 《Human genetics》2012,131(10):1677-1686
Millions of genetic variants have been assessed for their effects on the trait of interest in genome-wide association studies (GWAS). The complex traits are affected by a set of inter-related genes. However, the typical GWAS only examine the association of a single genetic variant at a time. The individual effects of a complex trait are usually small, and the simple sum of these individual effects may not reflect the holistic effect of the genetic system. High-throughput methods enable genomic studies to produce a large amount of data to expand the knowledge base of the biological systems. Biological networks and pathways are built to represent the functional or physical connectivity among genes. Integrated with GWAS data, the network- and pathway-based methods complement the approach of single genetic variant analysis, and may improve the power to identify trait-associated genes. Taking advantage of the biological knowledge, these approaches are valuable to interpret the functional role of the genetic variants, and to further understand the molecular mechanism influencing the traits. The network- and pathway-based methods have demonstrated their utilities, and will be increasingly important to address a number of challenges facing the mainstream GWAS.  相似文献   

3.
Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequilibrium (LD) patterns with other markers. We measured the effects of marginal disease effect size, haplotype LD, disease prevalence and minor allele frequency have on cross-locus interaction (epistasis). In the setting of strong allele effects and strong interaction, the correlation between the two disease genes was weak (r=0.2). In a complex system with multiple correlations (both marginal and interaction), it was difficult to determine the source of a significant result. Despite these complications, the partial least squares and modified LD contrast methods maintained adequate power to detect the epistatic effects; however, for many of the analyses we often could not separate interaction from a strong marginal effect. While we did not exhaust the entire parameter space of possible models, we do provide guidance on the effects that population parameters have on cross-locus interaction.  相似文献   

4.
TH Chueh  HH Lu 《PloS one》2012,7(8):e42095
One great challenge of genomic research is to efficiently and accurately identify complex gene regulatory networks. The development of high-throughput technologies provides numerous experimental data such as DNA sequences, protein sequence, and RNA expression profiles makes it possible to study interactions and regulations among genes or other substance in an organism. However, it is crucial to make inference of genetic regulatory networks from gene expression profiles and protein interaction data for systems biology. This study will develop a new approach to reconstruct time delay Boolean networks as a tool for exploring biological pathways. In the inference strategy, we will compare all pairs of input genes in those basic relationships by their corresponding [Formula: see text]-scores for every output gene. Then, we will combine those consistent relationships to reveal the most probable relationship and reconstruct the genetic network. Specifically, we will prove that [Formula: see text] state transition pairs are sufficient and necessary to reconstruct the time delay Boolean network of [Formula: see text] nodes with high accuracy if the number of input genes to each gene is bounded. We also have implemented this method on simulated and empirical yeast gene expression data sets. The test results show that this proposed method is extensible for realistic networks.  相似文献   

5.
Proteins are essential macromolecules of life that carry out most cellular processes. Since proteins aggregate to perform function, and since protein-protein interaction (PPI) networks model these aggregations, one would expect to uncover new biology from PPI network topology. Hence, using PPI networks to predict protein function and role of protein pathways in disease has received attention. A debate remains open about whether network properties of "biologically central (BC)" genes (i.e., their protein products), such as those involved in aging, cancer, infectious diseases, or signaling and drug-targeted pathways, exhibit some topological centrality compared to the rest of the proteins in the human PPI network.To help resolve this debate, we design new network-based approaches and apply them to get new insight into biological function and disease. We hypothesize that BC genes have a topologically central (TC) role in the human PPI network. We propose two different concepts of topological centrality. We design a new centrality measure to capture complex wirings of proteins in the network that identifies as TC those proteins that reside in dense extended network neighborhoods. Also, we use the notion of domination and find dominating sets (DSs) in the PPI network, i.e., sets of proteins such that every protein is either in the DS or is a neighbor of the DS. Clearly, a DS has a TC role, as it enables efficient communication between different network parts. We find statistically significant enrichment in BC genes of TC nodes and outperform the existing methods indicating that genes involved in key biological processes occupy topologically complex and dense regions of the network and correspond to its "spine" that connects all other network parts and can thus pass cellular signals efficiently throughout the network. To our knowledge, this is the first study that explores domination in the context of PPI networks.  相似文献   

6.

Background

The milk fat profile of the Danish Holstein (DH) and Danish Jersey (DJ) show clear differences. Identification of the genomic regions, genes and biological pathways underlying the milk fat biosynthesis will improve the understanding of the biology underlying bovine milk fat production and may provide new possibilities to change the milk fat composition by selective breeding. In this study a genome wide association scan (GWAS) in the DH and DJ was performed for a detailed milk fatty acid (FA) profile using the HD bovine SNP array and subsequently a biological pathway analysis based on the SNP data was performed.

Results

The GWAS identified in total 1,233 SNPs (FDR < 0.10) spread over 18 chromosomes for nine different FA traits for the DH breed and 1,122 SNPs (FDR < 0.10) spread over 26 chromosomes for 13 different FA traits were detected for the DJ breed. Of these significant SNPs, 108 SNP markers were significant in both DH and DJ (C14-index, BTA26; C16, BTA14; fat percentage (FP), BTA14). This was supported by an enrichment test. The QTL on BTA14 and BTA26 represented the known candidate genes DGAT and SCD. In addition we suggest ACSS3 to be a good candidate gene for the QTL on BTA5 for C10:0 and C15:0. In addition, genetic correlations between the FA traits within breed showed large similarity across breeds. Furthermore, the biological pathway analysis revealed that fat digestion and absorption (KEGG04975) plays a role for the traits FP, C14:1, C16 index and C16:1.

Conclusion

There was a clear similarity between the underlying genetics of FA in the milk between DH and DJ. This was supported by the fact that there was substantial overlap between SNPs for FP, C14 index, C14:1, C16 index and C16:1. In addition genetic correlations between FA showed a similar pattern across DH and DJ. Furthermore the biological pathway analysis suggested that fat digestion and absorption KEGG04975 is important for the traits FP, C14:1, C16 index and C16:1.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1112) contains supplementary material, which is available to authorized users.  相似文献   

7.
Bioinformatics involves the collection, organization and analysis of large amounts of biological data, using networks of computers and databases. Developing countries in the Asia-Pacific region are just moving into this new field of information-based biotechnology. However, the computational infrastructure and network bandwidths available in these countries are still at a basic level compared to that in developed countries. In this study, we assessed the utility of a BitTorrent-based Peer-to-Peer (btP2P) file distribution model for automatic synchronization and distribution of large amounts of biological data among developing countries. The initial country-level nodes in the Asia-Pacific region comprised Thailand, Korea and Singapore. The results showed a significant improvement in download performance using btP2P--three times faster overall download performance than conventional File Transfer Protocol (FTP). This study demonstrated the reliability of btP2P in the dissemination of continuously growing multi-gigabyte biological databases across the three Asia-Pacific countries. The download performance for btP2P can be further improved by including more nodes from other countries into the network. This suggests that the btP2P technology is appropriate for automatic synchronization and distribution of biological databases and software over low-bandwidth networks among developing countries in the Asia-Pacific region. AVAILABILITY: http://everest.bic.nus.edu.sg/p2p/  相似文献   

8.
The notion of scale-freeness and its prevalence in both natural and artificial networks have recently attracted much attention. The concept of scale-freeness is enthusiastically applied to almost any conceivable network, usually with affirmative conclusions. Well-known scale-free examples include the internet, electric lines among power plants, the co-starring of movie actors, the co-authorship of researchers, food webs, and neural, protein-protein interactional, genetic, and metabolic networks. The purpose of this review is to clarify the relationship between scale-freeness and power-law distribution, and to assess critically the previous related works, especially on biological networks. In addition, I will focus on the close relationship between power-law distribution and lognormal distribution to show that power-law distribution is not a special characteristic of natural selection.  相似文献   

9.
Large-scale molecular interaction networks are being increasingly used to provide a system level view of cellular processes. Modeling communications between nodes in such huge networks as information flows is useful for dissecting dynamical dependences between individual network components. In the information flow model, individual nodes are assumed to communicate with each other by propagating the signals through intermediate nodes in the network. In this paper, we first provide an overview of the state of the art of research in the network analysis based on information flow models. In the second part, we describe our computational method underlying our recent work on discovering dysregulated pathways in glioma. Motivated by applications to inferring information flow from genotype to phenotype in a very large human interaction network, we generalized previous approaches to compute information flows for a large number of instances and also provided a formal proof for the method.  相似文献   

10.
According to the experimental result of signal transmission and neuronal energetic demands being tightly coupled to information coding in the cerebral cortex, we present a brand new scientific theory that offers an unique mechanism for brain information processing. We demonstrate that the neural coding produced by the activity of the brain is well described by our theory of energy coding. Due to the energy coding model’s ability to reveal mechanisms of brain information processing based upon known biophysical properties, we can not only reproduce various experimental results of neuro-electrophysiology, but also quantitatively explain the recent experimental results from neuroscientists at Yale University by means of the principle of energy coding. Due to the theory of energy coding to bridge the gap between functional connections within a biological neural network and energetic consumption, we estimate that the theory has very important consequences for quantitative research of cognitive function.  相似文献   

11.
12.
13.
14.
生物网络是生物体内各种分子通过相互作用来完成各种复杂的生物功能的一个体系。网络水平的研究,有助于我们从整体上理解生物体内各种复杂事件发生的内在机制。microRNA(miRNA)是一类在转录后水平调控基因表达的小RNA分子。研究结果表明,miRNA调控的靶基因分布范围很广,因此必然与目前所研究的生物网络有着各种各样的联系。对这种关系的揭示,将对阐明miRNA的调控规律起到重要的作用。本文重点讨论了miRNA调控的基因调控网络、蛋白质相互作用网络以及细胞信号传导网络的特征。此外,还总结了miRNA调控的网络模体(motif)和miRNA协同作用网络的特征。  相似文献   

15.
Greedily building protein networks with confidence   总被引:2,自引:0,他引:2  
MOTIVATION: With genome sequences complete for human and model organisms, it is essential to understand how individual genes and proteins are organized into biological networks. Much of the organization is revealed by proteomics experiments that now generate torrents of data. Extracting relevant complexes and pathways from high-throughput proteomics data sets has posed a challenge, however, and new methods to identify and extract networks are essential. We focus on the problem of building pathways starting from known proteins of interest. RESULTS: We have developed an efficient, greedy algorithm, SEEDY, that extracts biologically relevant biological networks from protein-protein interaction data, building out from selected seed proteins. The algorithm relies on our previous study establishing statistical confidence levels for interactions generated by two-hybrid screens and inferred from mass spectrometric identification of protein complexes. We demonstrate the ability to extract known yeast complexes from high-throughput protein interaction data with a tunable parameter that governs the trade-off between sensitivity and selectivity. DNA damage repair pathways are presented as a detailed example. We highlight the ability to join heterogeneous data sets, in this case protein-protein interactions and genetic interactions, and the appearance of cross-talk between pathways caused by re-use of shared components. SIGNIFICANCE AND COMPARISON: The significance of the SEEDY algorithm is that it is fast, running time O[(E + V) log V] for V proteins and E interactions, a single adjustable parameter controls the size of the pathways that are generated, and an associated P-value indicates the statistical confidence that the pathways are enriched for proteins with a coherent function. Previous approaches have focused on extracting sub-networks by identifying motifs enriched in known biological networks. SEEDY provides the complementary ability to perform a directed search based on proteins of interest. AVAILABILITY: SEEDY software (Perl source), data tables and confidence score models (R source) are freely available from the author.  相似文献   

16.
Synthetic biology aims to build new functions in living organisms. Recent work has addressed the creation of synthetic epigenetic switches in mammalian cells and synthetic intracellular communication. Fundamentally new, and potentially scaleable, modes of gene regulation have been created that enable expansion of the scope of synthetic circuits. Increasingly sophisticated models of gene regulation that include stochastic effects are beginning to predict the behaviour of small synthetic networks. Overall, these advances suggest that a combination of molecular engineering and systems engineering should allow the creation of living matter capable of performing many useful and novel functions.  相似文献   

17.
Analysing biological pathways in genome-wide association studies   总被引:1,自引:0,他引:1  
Genome-wide association (GWA) studies have typically focused on the analysis of single markers, which often lacks the power to uncover the relatively small effect sizes conferred by most genetic variants. Recently, pathway-based approaches have been developed, which use prior biological knowledge on gene function to facilitate more powerful analysis of GWA study data sets. These approaches typically examine whether a group of related genes in the same functional pathway are jointly associated with a trait of interest. Here we review the development of pathway-based approaches for GWA studies, discuss their practical use and caveats, and suggest that pathway-based approaches may also be useful for future GWA studies with sequencing data.  相似文献   

18.
19.
20.

Background  

Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号