共查询到20条相似文献,搜索用时 0 毫秒
1.
Holger Froehlich Mark Fellmann Holger Sueltmann Annemarie Poustka Tim Beissbarth 《BMC bioinformatics》2007,8(1):386
Background
The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. 相似文献2.
Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns 总被引:1,自引:0,他引:1
MOTIVATION: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
3.
4.
PATRI is a new application for paternity analysis using genetic data that accounts for the sampling fraction of potential fathers. 相似文献
5.
6.
In this paper, we discuss the properties of biological data and challenges it poses for data management, and argue that, in order to meet the data management requirements for 'digital biology', careful integration of the existing technologies and the development of new data management techniques for biological data are needed. Based on this premise, we present PathCase: Case Pathways Database System. PathCase is an integrated set of software tools for modelling, storing, analysing, visualizing and querying biological pathways data at different levels of genetic, molecular, biochemical and organismal detail. The novel features of the system include: (i) genomic information integrated with other biological data and presented starting from pathways; (ii) design for biologists who are possibly unfamiliar with genomics, but whose research is essential for annotating gene and genome sequences with biological functions; (iii) database design, implementation and graphical tools which enable users to visualize pathways data in multiple abstraction levels and to pose exploratory queries; (iv) a wide range of different types of queries including, 'path' and 'neighbourhood queries' and graphical visualization of query outputs; and (v) an implementation that allows for web (XML)-based dissemination of query outputs (i.e. pathways data in BIOPAX format) to researchers in the community, giving them control on the use of pathways data. 相似文献
7.
Discrimination models using variance-stabilizing transformation of metabolomic NMR data 总被引:1,自引:0,他引:1
Purohit PV Rocke DM Viant MR Woodruff DL 《Omics : a journal of integrative biology》2004,8(2):118-130
After the extensive work that is being done in the areas of genomics, proteomics, and metabolomics, the study of metabolites has come of interest in its own right. Metabolites in biological systems give an understanding of the state of the system and provide a powerful tool for the study of disease and other maladies. Several analytical techniques such as mass spectrometry and high-resolution NMR spectroscopy have been used to study metabolites. The data, however, from these techniques remains quite complex. Traditionally, multivariate analyses have been used for such data. These methods however have an underlying assumption that the data is multivariate normal with a constant variance. This is not necessarily the case. It has been shown that a generalized log transformation renders the variance of the data constant effectively making the data more suitable for multivariate analysis. We demonstrate the effectiveness of these transformations on NMR data taken on a set of 18 abalone that were categorized as either being healthy, stunted, or diseased. We show how the transformation makes multivariate classification of the abalone into the healthy, stunted and diseased categories much more effective and gives a tool for identifying potential metabolic biomarkers for disease. 相似文献
8.
SUMMARY: We present a Cytoscape plugin for the inference and visualization of networks from high-resolution mass spectrometry metabolomic data. The software also provides access to basic topological analysis. This open source, multi-platform software has been successfully used to interpret metabolomic experiments and will enable others using filtered, high mass accuracy mass spectrometric data sets to build and analyse networks. AVAILABILITY: http://compbio.dcs.gla.ac.uk/fabien/abinitio/abinitio.html 相似文献
9.
OBJECTIVE: Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals. METHODS: We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study. RESULTS: While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study. 相似文献
10.
Background
Amongst the most commonly used molecular markers for plant phylogenetic studies are the nuclear ribosomal internal transcribed spacers (ITS). Intra-individual variability of these multicopy regions is a very common phenomenon in plants, the causes of which are debated in literature. Phylogenetic reconstruction under these conditions is inherently difficult. Our approach is to consider this problem as a special case of the general biological question of how to infer the characteristics of hosts (represented here by plant individuals) from features of their associates (represented by cloned sequences here). 相似文献11.
Advances to Bayesian network inference for generating causal networks from observational biological data 总被引:6,自引:0,他引:6
Yu J Smith VA Wang PP Hartemink AJ Jarvis ED 《Bioinformatics (Oxford, England)》2004,20(18):3594-3603
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/ 相似文献
12.
Background
The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.Methodology/Principal Findings
Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.Conclusions
Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation. 相似文献13.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request. 相似文献
14.
metaXCMS is a software program for the analysis of liquid chromatography/mass spectrometry-based untargeted metabolomic data. It is designed to identify the differences between metabolic profiles across multiple sample groups (e.g., 'healthy' versus 'active disease' versus 'inactive disease'). Although performing pairwise comparisons alone can provide physiologically relevant data, these experiments often result in hundreds of differences, and comparison with additional biologically meaningful sample groups can allow for substantial data reduction. By performing second-order (meta-) analysis, metaXCMS facilitates the prioritization of interesting metabolite features from large untargeted metabolomic data sets before the rate-limiting step of structural identification. Here we provide a detailed step-by-step protocol for going from raw mass spectrometry data to metaXCMS results, visualized as Venn diagrams and exported Microsoft Excel spreadsheets. There is no upper limit to the number of sample groups or individual samples that can be compared with the software, and data from most commercial mass spectrometers are supported. The speed of the analysis depends on computational resources and data volume, but will generally be less than 1 d for most users. metaXCMS is freely available at http://metlin.scripps.edu/metaxcms/. 相似文献
15.
This article is concerned with statistical modeling of shotgun resequencing data and the use of such data for population genetic inference. We model data produced by sequencing-by-synthesis technologies such as the Solexa, 454, and polymerase colony (polony) systems, whose use is becoming increasingly widespread. We show how such data can be used to estimate evolutionary parameters (mutation and recombination rates), despite the fact that the data do not necessarily provide complete or aligned sequence information. We also present two refinements of our methods: one that is more robust to sequencing errors and another that can be used when no reference genome is available. 相似文献
16.
The biological interpretation of metabolomic data can be misled by the extraction method used 总被引:1,自引:0,他引:1
Xavier Duportet Raphael Bastos Mereschi Aggio Sónia Carneiro Silas Granato Villas-B?as 《Metabolomics : Official journal of the Metabolomic Society》2012,8(3):410-421
The field of metabolomics is getting more and more popular and a wide range of different sample preparation procedures are in use by different laboratories. Chemical extraction methods using one or more organic solvents as the extraction agent are the most commonly used approach to extract intracellular metabolites and generate metabolite profiles. Metabolite profiles are the scaffold supporting the biological interpretation in metabolomics. Therefore, we aimed to address the following fundamental question: can we obtain similar metabolomic results and, consequently, reach the same biological interpretation by using different protocols for extraction of intracellular metabolites? We have used four different methods for extraction of intracellular metabolites using four different microbial cell types (Gram negative bacterium, Gram positive bacterium, yeast, and a filamentous fungus). All the quenched samples were pooled together before extraction, and, therefore, they were identical. After extraction and GC?CMS analysis of metabolites, we did not only detect different numbers of compounds depending on the extraction method used and regardless of the cell type tested, but we also obtained distinct metabolite levels for the compounds commonly detected by all methods (P-value?<?0.001). These differences between methods resulted in contradictory biological interpretation regarding the activity of different metabolic pathways. Therefore, our results show that different solvent-based extraction methods can yield significantly different metabolite profiles, which impact substantially in the biological interpretation of metabolomics data. Thus, development of alternative extraction protocols and, most importantly, standardization of sample preparation methods for metabolomics should be seriously pursued by the scientific community. 相似文献
17.
Rachel S. Kelly Damien C. Croteau-Chonka Amber Dahlin Hooman Mirzakhani Ann C. Wu Emily S. Wan Michael J. McGeachie Weiliang Qiu Joanne E. Sordillo Amal Al-Garawi Kathryn J. Gray Thomas F. McElrath Vincent J. Carey Clary B. Clish Augusto A. Litonjua Scott T. Weiss Jessica A. Lasky-Su 《Metabolomics : Official journal of the Metabolomic Society》2017,13(1):7
18.
19.
MOTIVATION: Although many network inference algorithms have been presented in the bioinformatics literature, no suitable approach has been formulated for evaluating their effectiveness at recovering models of complex biological systems from limited data. To overcome this limitation, we propose an approach to evaluate network inference algorithms according to their ability to recover a complex functional network from biologically reasonable simulated data. RESULTS: We designed a simulator to generate data representing a complex biological system at multiple levels of organization: behaviour, neural anatomy, brain electrophysiology, and gene expression of songbirds. About 90% of the simulated variables are unregulated by other variables in the system and are included simply as distracters. We sampled the simulated data at intervals as one would sample from a biological system in practice, and then used the sampled data to evaluate the effectiveness of an algorithm we developed for functional network inference. We found that our algorithm is highly effective at recovering the functional network structure of the simulated system-including the irrelevance of unregulated variables-from sampled data alone. To assess the reproducibility of these results, we tested our inference algorithm on 50 separately simulated sets of data and it consistently recovered almost perfectly the complex functional network structure underlying the simulated data. To our knowledge, this is the first approach for evaluating the effectiveness of functional network inference algorithms at recovering models from limited data. Our simulation approach also enables researchers a priori to design experiments and data-collection protocols that are amenable to functional network inference. 相似文献