首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway.  相似文献   

2.
MOTIVATION: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

3.
4.
PATRI is a new application for paternity analysis using genetic data that accounts for the sampling fraction of potential fathers.  相似文献   

5.
6.
In this paper, we discuss the properties of biological data and challenges it poses for data management, and argue that, in order to meet the data management requirements for 'digital biology', careful integration of the existing technologies and the development of new data management techniques for biological data are needed. Based on this premise, we present PathCase: Case Pathways Database System. PathCase is an integrated set of software tools for modelling, storing, analysing, visualizing and querying biological pathways data at different levels of genetic, molecular, biochemical and organismal detail. The novel features of the system include: (i) genomic information integrated with other biological data and presented starting from pathways; (ii) design for biologists who are possibly unfamiliar with genomics, but whose research is essential for annotating gene and genome sequences with biological functions; (iii) database design, implementation and graphical tools which enable users to visualize pathways data in multiple abstraction levels and to pose exploratory queries; (iv) a wide range of different types of queries including, 'path' and 'neighbourhood queries' and graphical visualization of query outputs; and (v) an implementation that allows for web (XML)-based dissemination of query outputs (i.e. pathways data in BIOPAX format) to researchers in the community, giving them control on the use of pathways data.  相似文献   

7.
SUMMARY: We present a Cytoscape plugin for the inference and visualization of networks from high-resolution mass spectrometry metabolomic data. The software also provides access to basic topological analysis. This open source, multi-platform software has been successfully used to interpret metabolomic experiments and will enable others using filtered, high mass accuracy mass spectrometric data sets to build and analyse networks. AVAILABILITY: http://compbio.dcs.gla.ac.uk/fabien/abinitio/abinitio.html  相似文献   

8.
After the extensive work that is being done in the areas of genomics, proteomics, and metabolomics, the study of metabolites has come of interest in its own right. Metabolites in biological systems give an understanding of the state of the system and provide a powerful tool for the study of disease and other maladies. Several analytical techniques such as mass spectrometry and high-resolution NMR spectroscopy have been used to study metabolites. The data, however, from these techniques remains quite complex. Traditionally, multivariate analyses have been used for such data. These methods however have an underlying assumption that the data is multivariate normal with a constant variance. This is not necessarily the case. It has been shown that a generalized log transformation renders the variance of the data constant effectively making the data more suitable for multivariate analysis. We demonstrate the effectiveness of these transformations on NMR data taken on a set of 18 abalone that were categorized as either being healthy, stunted, or diseased. We show how the transformation makes multivariate classification of the abalone into the healthy, stunted and diseased categories much more effective and gives a tool for identifying potential metabolic biomarkers for disease.  相似文献   

9.

Background

The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.

Methodology/Principal Findings

Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.

Conclusions

Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.  相似文献   

10.

Background  

Amongst the most commonly used molecular markers for plant phylogenetic studies are the nuclear ribosomal internal transcribed spacers (ITS). Intra-individual variability of these multicopy regions is a very common phenomenon in plants, the causes of which are debated in literature. Phylogenetic reconstruction under these conditions is inherently difficult. Our approach is to consider this problem as a special case of the general biological question of how to infer the characteristics of hosts (represented here by plant individuals) from features of their associates (represented by cloned sequences here).  相似文献   

11.
Xu H  Wu X  Spitz MR  Shete S 《Human heredity》2004,58(2):63-68
OBJECTIVE: Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals. METHODS: We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study. RESULTS: While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study.  相似文献   

12.
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/  相似文献   

13.
Qu A  Li R 《Biometrics》2006,62(2):379-391
Nonparametric smoothing methods are used to model longitudinal data, but the challenge remains to incorporate correlation into nonparametric estimation procedures. In this article, we propose an efficient estimation procedure for varying-coefficient models for longitudinal data. The proposed procedure can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models. The proposed approach yields a more efficient estimator than the generalized estimation equation approach when the working correlation is misspecified. For varying-coefficient models, it is often of interest to test whether coefficient functions are time varying or time invariant. We propose a unified and efficient nonparametric hypothesis testing procedure, and further demonstrate that the resulting test statistics have an asymptotic chi-squared distribution. In addition, the goodness-of-fit test is applied to test whether the model assumption is satisfied. The corresponding test is also useful for choosing basis functions and the number of knots for regression spline models in conjunction with the model selection criterion. We evaluate the finite sample performance of the proposed procedures with Monte Carlo simulation studies. The proposed methodology is illustrated by the analysis of an acquired immune deficiency syndrome (AIDS) data set.  相似文献   

14.
The rapid accumulation of gene sequence data is allowing evolutionary inferences of unprecedented resolution. In the area of population genetics, gene trees and polymorphism data are being used to study demographic parameters. In the area of comparative biology, the shapes of phylogenetic trees provide information about patterns of speciation, coevolution, and macroevolution. A variety of statistical methods have been developed for exploiting the information contained within organismal genomes. In this paper, we emphasise the conceptual bases of the tests, rather than details of their implementation. BioEssays 1999;21:148–156. © 1999 John Wiley & Sons, Inc.  相似文献   

15.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

16.
metaXCMS is a software program for the analysis of liquid chromatography/mass spectrometry-based untargeted metabolomic data. It is designed to identify the differences between metabolic profiles across multiple sample groups (e.g., 'healthy' versus 'active disease' versus 'inactive disease'). Although performing pairwise comparisons alone can provide physiologically relevant data, these experiments often result in hundreds of differences, and comparison with additional biologically meaningful sample groups can allow for substantial data reduction. By performing second-order (meta-) analysis, metaXCMS facilitates the prioritization of interesting metabolite features from large untargeted metabolomic data sets before the rate-limiting step of structural identification. Here we provide a detailed step-by-step protocol for going from raw mass spectrometry data to metaXCMS results, visualized as Venn diagrams and exported Microsoft Excel spreadsheets. There is no upper limit to the number of sample groups or individual samples that can be compared with the software, and data from most commercial mass spectrometers are supported. The speed of the analysis depends on computational resources and data volume, but will generally be less than 1 d for most users. metaXCMS is freely available at http://metlin.scripps.edu/metaxcms/.  相似文献   

17.
Population genetic inference from resequencing data   总被引:1,自引:1,他引:0       下载免费PDF全文
Jiang R  Tavaré S  Marjoram P 《Genetics》2009,181(1):187-197
This article is concerned with statistical modeling of shotgun resequencing data and the use of such data for population genetic inference. We model data produced by sequencing-by-synthesis technologies such as the Solexa, 454, and polymerase colony (polony) systems, whose use is becoming increasingly widespread. We show how such data can be used to estimate evolutionary parameters (mutation and recombination rates), despite the fact that the data do not necessarily provide complete or aligned sequence information. We also present two refinements of our methods: one that is more robust to sequencing errors and another that can be used when no reference genome is available.  相似文献   

18.

Background

Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles.

Results

We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  相似文献   

19.
20.
The field of metabolomics is getting more and more popular and a wide range of different sample preparation procedures are in use by different laboratories. Chemical extraction methods using one or more organic solvents as the extraction agent are the most commonly used approach to extract intracellular metabolites and generate metabolite profiles. Metabolite profiles are the scaffold supporting the biological interpretation in metabolomics. Therefore, we aimed to address the following fundamental question: can we obtain similar metabolomic results and, consequently, reach the same biological interpretation by using different protocols for extraction of intracellular metabolites? We have used four different methods for extraction of intracellular metabolites using four different microbial cell types (Gram negative bacterium, Gram positive bacterium, yeast, and a filamentous fungus). All the quenched samples were pooled together before extraction, and, therefore, they were identical. After extraction and GC?CMS analysis of metabolites, we did not only detect different numbers of compounds depending on the extraction method used and regardless of the cell type tested, but we also obtained distinct metabolite levels for the compounds commonly detected by all methods (P-value?<?0.001). These differences between methods resulted in contradictory biological interpretation regarding the activity of different metabolic pathways. Therefore, our results show that different solvent-based extraction methods can yield significantly different metabolite profiles, which impact substantially in the biological interpretation of metabolomics data. Thus, development of alternative extraction protocols and, most importantly, standardization of sample preparation methods for metabolomics should be seriously pursued by the scientific community.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号