首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.  相似文献   

3.
4.

Introduction

Metabolomics is a well-established tool in systems biology, especially in the top–down approach. Metabolomics experiments often results in discovery studies that provide intriguing biological hypotheses but rarely offer mechanistic explanation of such findings. In this light, the interpretation of metabolomics data can be boosted by deploying systems biology approaches.

Objectives

This review aims to provide an overview of systems biology approaches that are relevant to metabolomics and to discuss some successful applications of these methods.

Methods

We review the most recent applications of systems biology tools in the field of metabolomics, such as network inference and analysis, metabolic modelling and pathways analysis.

Results

We offer an ample overview of systems biology tools that can be applied to address metabolomics problems. The characteristics and application results of these tools are discussed also in a comparative manner.

Conclusions

Systems biology-enhanced analysis of metabolomics data can provide insights into the molecular mechanisms originating the observed metabolic profiles and enhance the scientific impact of metabolomics studies.
  相似文献   

5.

Background  

In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data.  相似文献   

6.

Background

The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column.

Results

Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses.

Conclusion

Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research.  相似文献   

7.
8.

Background  

To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms.  相似文献   

9.
10.
Cheng and Church algorithm is an important approach in biclustering algorithms. In this paper, the process of the extended space in the second stage of Cheng and Church algorithm is improved and the selections of two important parameters are discussed. The results of the improved algorithm used in the gene expression spectrum analysis show that, compared with Cheng and Church algorithm, the quality of clustering results is enhanced obviously, the mining expression models are better, and the data possess a strong consistency with fluctuation on the condition while the computational time does not increase significantly.  相似文献   

11.
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance.  相似文献   

12.
13.
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.  相似文献   

14.
The planar spring-mass model is frequently used to describe bouncing gaits (running, hopping, trotting, galloping) in animal and human locomotion and robotics. Although this model represents a rather simple mechanical system, an analytical solution predicting the center of mass trajectory during stance remains open. We derive an approximate solution in elementary functions assuming a small angular sweep and a small spring compression during stance. The predictive power and quality of this solution is investigated for model parameters relevant to human locomotion. The analysis shows that (i), for spring compressions of up to 20% (angle of attack > or = 60 degree, angular sweep < or = 60 degree) the approximate solution describes the stance dynamics of the center of mass within a 1% tolerance of spring compression and 0.6 degree tolerance of angular motion compared to numerical calculations, and (ii), despite its relative simplicity, the approximate solution accurately predicts stable locomotion well extending into the physiologically reasonable parameter domain. (iii) Furthermore, in a particular case, an explicit parametric dependency required for gait stability can be revealed extending an earlier, empirically found relationship. It is suggested that this approximation of the planar spring-mass dynamics may serve as an analytical tool for application in robotics and further research on legged locomotion.  相似文献   

15.
Functional genomics: learning to think about gene expression data.   总被引:2,自引:0,他引:2  
R Brent 《Current biology : CB》1999,9(9):R338-R341
Three recent studies of gene expression patterns in whole cells provide examples of the inferences one can make from this type of information. They also provide examples of the non-traditional types of reasoning we will need to use to make such inferences.  相似文献   

16.
17.
In a previous study it was shown that a simple random Boolean network model, with two input connections per node, can describe with a good approximation (with the exception of the smallest avalanches) the distribution of perturbations in gene expression levels induced by the knock-out of single genes in Saccharomyces cerevisiae. Here we address the reason why such a simple model actually works: we present a theoretical study of the distribution of avalanches and show that, in the case of a Poissonian distribution of outgoing links, their distribution is determined by the value of the Derrida exponent. This explains why the simulations based on the simple model have been effective, in spite of the unrealistic hypothesis about the number of input connections per node. Moreover, we consider here the problem of the choice of an optimal threshold for binarizing continuous data, and we show that tuning its value provides an even better agreement between model and data, valuable also in the important case of the smallest avalanches. Finally, we also discuss the choice of an optimal value of the Derrida parameter in order to match the experimental distributions: our results indicate a value slightly below the critical value 1.  相似文献   

18.
A simple distributed processing system named "Peach" was developed to meet the rising computational demands of modern structural biology (and other) laboratories without additional expense by using existing hardware resources more efficiently. A central server distributes jobs to idle workstations in such a way that each computer is used maximally, but without disturbing intermittent interactive users. As compared to other distributed systems, Peach is simple, easy to install, easy to administer, easy to use, scalable, and robust. While it was designed to queue and distribute large numbers of small tasks to participating computers, it can also be used to send single jobs automatically to the fastest currently available computer and/or survey the activity of an entire laboratory's computers. Tests of robustness and scalability are reported, as are three specific electron cryomicroscopy applications where Peach enabled projects that would not otherwise have been feasible without an expensive, dedicated cluster.  相似文献   

19.
MOTIVATION: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.  相似文献   

20.
Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号