首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.

Background  

Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.  相似文献   

2.

Background  

Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation.  相似文献   

3.

Background  

Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2.  相似文献   

4.

Background  

The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics.  相似文献   

5.
6.
The budding yeast Saccharomyces cerevisiae has been used extensively for the study of cell polarity, owing to both its experimental tractability and the high conservation of cell polarity and other basic biological processes among eukaryotes. The budding yeast has also served as a pioneer model organism for virtually all genome-scale approaches, including functional genomics, which aims to define gene function and biological pathways systematically through the analysis of high-throughput experimental data. Here, we outline the contributions of functional genomics and high-throughput methodologies to the study of cell polarity in the budding yeast. We integrate data from published genetic screens that use a variety of functional genomics approaches to query different aspects of polarity. Our integrated dataset is enriched for polarity processes, as well as some processes that are not intrinsically linked to cell polarity, and may provide new areas for future study.  相似文献   

7.

Background  

In the fields of life sciences, so-called designed studies are used for studying complex biological systems. The data derived from these studies comply with a study design aimed at generating relevant information while diminishing unwanted variation (noise). Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects. Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects.  相似文献   

8.

Background  

The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group.  相似文献   

9.

Background  

In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes.  相似文献   

10.
Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called ‘multi-dimensional genomic data’. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data, uncovered associations between different layers of cellular activities and allowed the identification of clinically distinct patient subgroups. Our study provides an useful protocol for uncovering hidden patterns and their biological implications in multi-dimensional ‘omic’ data.  相似文献   

11.

Background  

Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes.  相似文献   

12.

Background  

With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data.  相似文献   

13.
Trophic dynamic studies of aquatic ecosystem are usually based on either dynamic or state-state models of energy and material transfer, comprising of a few functional tropic levels such as phytoplankton, primary consumers and secondary consumers. However, though benthic components (and production) are important parts of an aquatic ecosystem, they are often omitted from such approaches.Considering this inadequacy of previous simulation approaches, the aim of the present study, is to investigate the effect of benthic components on overall system dynamics. The objectives of this study are (a) to develop and study a dynamic model for Kakinada bay to understand how benthic components interact with others under a fixed set of conditions, and to find the sensitive parameters for such a benthic-pelagic coupled system (b) to study the effects of different biomasses of benthic components via scenario analysis and (c) to identify the different equilibrium states that a benthic-pelagic coupled system such as Kakinada Bay may reach.On running the simulation for a period of 5000 days (∼10+ years), the system was not seen to reach any equilibrium state; however, all state variables considered in the model were seen to coexist even at the end of this period. While zooplankton was found to be the most sensitive state variable, parameters directly associated with benthic components were indicated to be the most sensitive. Microphytobenthos was negatively correlated with phytoplankton which was corroborated by results of the perturbation scenario analyses as well. Only one equilibrium state – microphytobenthos dominated steady state was found for Kakinada Bay system when certain parameters were slightly changed.  相似文献   

14.

Background  

Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types.  相似文献   

15.

Background  

Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.  相似文献   

16.

Background  

Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT), which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory.  相似文献   

17.

Background  

The numerous natural phenomena that exhibit saturation behavior, e.g., ligand binding and enzyme kinetics, have been approached, to date, via empirical and particular analyses. This paper presents a mechanism-free, and assumption-free, second-order differential equation, designed only to describe a typical relationship between the variables governing these phenomena. It develops a mathematical model for this relation, based solely on the analysis of the typical experimental data plot and its saturation characteristics. Its utility complements the traditional empirical approaches.  相似文献   

18.
Natural history museums harbour a plethora of biological specimens which are of potential use in population and conservation genetic studies. Although technical advancements in museum genomics have enabled genome‐wide markers to be generated from aged museum specimens, the suitability of these data for robust biological inference is not well characterized. The aim of this study was to test the utility of museum specimens in population and conservation genomics by assessing the biological and technical validity of single nucleotide polymorphism (SNP) data derived from such samples. To achieve this, we generated thousands of SNPs from 47 red‐tailed black cockatoo (Calyptorhychus banksii) traditional museum samples (i.e. samples that were not collected with the primary intent of DNA analysis) and 113 fresh tissue samples (cryopreserved liver/muscle) using a restriction site‐associated DNA marker approach (DArTseq?). Thousands of SNPs were successfully generated from most of the traditional museum samples (with a mean age of 44 years, ranging from 5 to 123 years), although 38% did not provide useful data. These SNPs exhibited higher error rates and contained significantly more missing data compared with SNPs from fresh tissue samples, likely due to considerable DNA fragmentation. However, based on simulation results, the level of genotyping error had a negligible effect on inference of population structure in this species. We did identify a bias towards low diversity SNPs in older samples that appears to compromise temporal inferences of genetic diversity. This study demonstrates the utility of a RADseq‐based method to produce reliable genome‐wide SNP data from traditional museum specimens.  相似文献   

19.
The chicken genome is sequenced and this, together with microarray and other functional genomics technologies, makes post-genomic research possible in the chicken. At this time, however, such research is hindered by a lack of genomic structural and functional annotations. Bio-ontologies have been developed for different annotation requirements, as well as to facilitate data sharing and computational analysis, but these are not yet optimally utilized in the chicken. Here we discuss genomic annotation and bio-ontologies. We focus specifically on the Gene Ontology (GO), chicken GO annotations and how these can facilitate functional genomics in the chicken. The GO is the most developed and widely used bio-ontology. It is the de facto standard for functional annotation. Despite its critical importance in analyzing microarray and other functional genomics data, relatively few chicken gene products have any GO annotation. When these are available, the average quality of chicken gene products annotations (defined using evidence code weight and annotation depth) is much less than in mouse. Moreover, tools allowing chicken researchers to easily and rapidly use the GO are either lacking or hard to use. To address all of these problems we developed ChickGO and AgBase. Chicken GO annotations are provided by complementary work at MSU-AgBase and EBI-GOA. The GO tools pipeline at AgBase uses GO to derive functional and biological significance from microarray and other functional genomics data. Not only will improved genomic annotation and tools to use these annotations benefit the chicken research community but they will also facilitate research in other avian species and comparative genomics.  相似文献   

20.

Background  

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号