共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Curk T Demsar J Xu Q Leban G Petrovic U Bratko I Shaulsky G Zupan B 《Bioinformatics (Oxford, England)》2005,21(3):396-398
SUMMARY: Visual programming offers an intuitive means of combining known analysis and visualization methods into powerful applications. The system presented here enables users who are not programmers to manage microarray and genomic data flow and to customize their analyses by combining common data analysis tools to fit their needs. AVAILABILITY: http://www.ailab.si/supp/bi-visprog SUPPLEMENTARY INFORMATION: http://www.ailab.si/supp/bi-visprog. 相似文献
3.
Microarray data analysis and mining approaches. 总被引:1,自引:0,他引:1
Francesca Cordero Marco Botta Raffaele A Calogero 《Briefings in Functional Genomics and Prot》2007,6(4):265-281
4.
基于SAS的多元统计方法实现芯片数据挖掘 总被引:4,自引:0,他引:4
利用SAS软件对GEO的一个肺癌芯片实验进行挖掘。采用非参数检验,判别分析和回归分析对该芯片实验中14个核受体的表达信息进行分析。结果表明,在0.05显著性水平下,ER1、VDR、RARα和RORα四个基因在腺癌和鳞癌表达具有统计学差异;RARβ在复发组和非复发组表达有差异。判别分析结果显示VDR和RORα表达量可以对病理类型进行预测,但是总误判率很高(0.2389);RARβ和PPARα对判别是否复发的总误判率更高(0.3457)。建立回归方程预测病理类型,入选模型的变量也是VDR和RORα,两者OR分别为0.126和4.452。可见,基于SAS的多元统计方法是芯片数据挖掘的一种潜在方法,一旦芯片实验标准化,利用SAS对不同芯片实验数据整合分析的结论将有益于推动假说形成。 相似文献
5.
The aim of this paper is to present a new clustering algorithm for short time-series gene expression data that is able to characterise temporal relations in the clustering environment (ie data-space), which is not achieved by other conventional clustering algorithms such as k -means or hierarchical clustering. The algorithm called fuzzy c -varieties clustering with transitional state discrimination preclustering (FCV-TSD) is a two-step approach which identifies groups of points ordered in a line configuration in particular locations and orientations of the data-space that correspond to similar expressions in the time domain. We present the validation of the algorithm with both artificial and real experimental datasets, where k -means and random clustering are used for comparison. The performance was evaluated with a measure for internal cluster correlation and the geometrical properties of the clusters, showing that the FCV-TSD algorithm had better performance than the k -means algorithm on both datasets. 相似文献
6.
Meunier B Dumas E Piec I Béchet D Hébraud M Hocquette JF 《Journal of proteome research》2007,6(1):358-366
7.
Microarray data classification is one of the most important emerging clinical applications in the medical community. Machine learning algorithms are most frequently used to complete this task. We selected one of the state-of-the-art kernel-based algorithms, the support vector machine (SVM), to classify microarray data. As a large number of kernels are available, a significant research question is what is the best kernel for patient diagnosis based on microarray data classification using SVM? We first suggest three solutions based on data visualization and quantitative measures. Different types of microarray problems then test the proposed solutions. Finally, we found that the rule-based approach is most useful for automatic kernel selection for SVM to classify microarray data. 相似文献
8.
9.
Finite-element scaling analysis (FESA), generalized procrustes analysis (GPA), and Euclidean distance matrix analysis (EDMA) are applied in a two-dimensional study of craniofacial growth in normal children and those affected with Crouzon syndrome. Longitudinal data are used and growth is measured as change local to 10 craniofacial landmarks. Although details of the results vary among the methods, all 3 methods determine Crouzon growth to be different from normal. Nuances of the methods, especially the use of superimposition in GPA and lack of superimposition in 2 others are partly responsible for the varying results. Although Crouzon craniofacial morphology is often obvious at birth, this study demonstrates that there are general differences between normal postnatal growth patterns and those of the Crouzon individual. These patterns of malgrowth are in part responsible for the adult morphology of the Crouzon craniofacial complex. 相似文献
10.
Background
DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust. 相似文献11.
Anbazhagan R 《Bioinformatics (Oxford, England)》2003,19(1):157-158
SUMMARY: Large volumes of microarray data are generated and deposited in public databases. Most of this data is in the form of tab-delimited text files or Excel spreadsheets. Combining data from several of these files to reanalyze these data sets is time consuming. Microarray Data Assembler is specifically designed to simplify this task. The program can list files and data sources, convert selected text files into Excel files and assemble data across multiple Excel worksheets and workbooks. This program thus makes data assembling easy, saves time and helps avoid manual error. AVAILABILITY: The program is freely available for non-profit use, via email request from the author, after signing a Material Transfer Agreement with Johns Hopkins University. 相似文献
12.
Comparative genomics using data mining tools 总被引:3,自引:0,他引:3
We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using
data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis wereMethanococcus jannaschii, Haemophilus influenzae andSaccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns.M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereasS. cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics
between the proteins from the different genomes we have also identified certain common characteristics. We have carried out
exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify
the proteins into clusters. Interestingly, we found that most of the proteins in each organism cluster closely together, but
there are a few ‘outliers’. We focus on the outliers for the functional investigations, which may aid in revealing any unique
features of the biology of the respective organisms. 相似文献
13.
Tseng VS Kao CP 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(4):355-365
Clustering analysis has been an important research topic in the machine learning field due to the wide applications. In recent years, it has even become a valuable and useful tool for in-silico analysis of microarray or gene expression data. Although a number of clustering methods have been proposed, they are confronted with difficulties in meeting the requirements of automation, high quality, and high efficiency at the same time. In this paper, we propose a novel, parameterless and efficient clustering algorithm, namely, correlation search technique (CST), which fits for analysis of gene expression data. The unique feature of CST is it incorporates the validation techniques into the clustering process so that high quality clustering results can be produced on the fly. Through experimental evaluation, CST is shown to outperform other clustering methods greatly in terms of clustering quality, efficiency, and automation on both of synthetic and real data sets. 相似文献
14.
Joshi RR 《Protein and peptide letters》2007,14(6):536-542
Design and synthesis of peptide vaccines is of significant pharmaceutical importance. A knowledge based statistical model is fitted here for prediction of binding of an antigenic site of a protein or a B-cell epitope on a CDR (complementarity determining region) of an immunoglobulin. Linear analogues of the 3D structure of the epitopes are computed using this model. Extension for prediction of peptide epitopes from the protein sequence alone is also presented. Validation results show promising potential of this approach in computer-aided peptide vaccine production. The computed probabilities of binding also provide a pioneering approach for ab-initio prediction of 'potency' of protein or peptide vaccines modeled by this method. 相似文献
15.
16.
Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of
samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the
training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a
small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from
the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We
describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include
filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use
information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based
optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are
used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is
provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate. 相似文献
17.
DNA microarrays are valuable tools for analyzing global gene expression. Because of the increasing popularity and the large volume of data produced, tools for facile microarray data analysis are essential. FiRe, a recently introduced computer program, has now solved the seemingly insuperable discrepancy between simplicity and evaluation of DNA microarray data. The program is available as a macro for the popular Microsoft Office Excel software and is user-friendly, interactive, versatile and platform-independent, paving the way for a further push in the evaluation of DNA microarrays. 相似文献
18.
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases. 相似文献
19.
Size and shape analysis of landmark data 总被引:1,自引:0,他引:1
20.
A method is developed for fitting smooth curves through a seriesof shapes of landmarks in two dimensions using unrolling andunwrapping procedures in Riemannian manifolds. An explicit methodof calculation is given which is analogous to that of Jupp &Kent (1987) for spherical data. The resulting splines are calledshape-space smoothing splines. The method resembles that offitting smoothing splines in real spaces in that, if the smoothingparameter is zero, the resulting curve interpolates the datapoints, and if it is infinitely large the curve is a geodesicline. The fitted path to the data is defined such that its unrolledversion at the tangent space of the starting point is a cubicspline fitted to the unwrapped data with respect to that path.Computation of the fitted path consists of an iterative procedurewhich converges quickly, and the resulting path is given ina discretised form in terms of a piecewise geodesic path. Theprocedure is applied to the analysis of some human movementdata, and a test for the appropriateness of a mean geodesiccurve is given. 相似文献