首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row–column associations within high‐dimensional data matrices. SSVD seeks a low‐rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left‐ and right‐singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity‐inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets.  相似文献   

2.
SUMMARY: We introduce a novel unsupervised approach for the organization and visualization of multidimensional data. At the heart of the method is a presentation of the full pairwise distance matrix of the data points, viewed in pseudocolor. The ordering of points is iteratively permuted in search of a linear ordering, which can be used to study embedded shapes. Several examples indicate how the shapes of certain structures in the data (elongated, circular and compact) manifest themselves visually in our permuted distance matrix. It is important to identify the elongated objects since they are often associated with a set of hidden variables, underlying continuous variation in the data. The problem of determining an optimal linear ordering is shown to be NP-Complete, and therefore an iterative search algorithm with O(n3) step-complexity is suggested. By using sorting points into neighborhoods, i.e. SPIN to analyze colon cancer expression data we were able to address the serious problem of sample heterogeneity, which hinders identification of metastasis related genes in our data. Our methodology brings to light the continuous variation of heterogeneity--starting with homogeneous tumor samples and gradually increasing the amount of another tissue. Ordering the samples according to their degree of contamination by unrelated tissue allows the separation of genes associated with irrelevant contamination from those related to cancer progression. AVAILABILITY: Software package will be available for academic users upon request.  相似文献   

3.
The multilayer perceptron, when working in auto-association mode, is sometimes considered as an interesting candidate to perform data compression or dimensionality reduction of the feature space in information processing applications. The present paper shows that, for auto-association, the nonlinearities of the hidden units are useless and that the optimal parameter values can be derived directly by purely linear techniques relying on singular value decomposition and low rank matrix approximation, similar in spirit to the well-known Karhunen-Loève transform. This approach appears thus as an efficient alternative to the general error back-propagation algorithm commonly used for training multilayer perceptrons. Moreover, it also gives a clear interpretation of the rôle of the different parameters.  相似文献   

4.
Singular value decomposition (SVD) is a technique commonly used in the analysis of spectroscopic data that both acts as a noise filter and reduces the dimensionality of subsequent least-squares fits. To establish the applicability of SVD to crystallographic data, we applied SVD to calculated difference Fourier maps simulating those to be obtained in a time-resolved crystallographic study of photoactive yellow protein. The atomic structures of one dark state and three intermediates were used in qualitatively different kinetic mechanisms to generate time-dependent difference maps at specific time points. Random noise of varying levels in the difference structure factor amplitudes, different extents of reaction initiation, and different numbers of time points were all employed to simulate a range of realistic experimental conditions. Our results show that SVD allows for an unbiased differentiation between signal and noise; a small subset of singular values and vectors represents the signal well, reducing the random noise in the data. Due to this, phase information of the difference structure factors can be obtained. After identifying and fitting a kinetic mechanism, the time-independent structures of the intermediates could be recovered. This demonstrates that SVD will be a powerful tool in the analysis of experimental time-resolved crystallographic data.  相似文献   

5.
For an adequate analysis of pathological speech signals, a sizeable number of parameters is required, such as those related to jitter, shimmer and noise content. Often this kind of high-dimensional signal representation is difficult to understand, even for expert voice therapists and physicians. Data visualization of a high-dimensional dataset can provide a useful first step in its exploratory data analysis, facilitating an understanding about its underlying structure. In the present paper, eight dimensionality reduction techniques, both classical and recent, are compared on speech data containing normal and pathological speech. A qualitative analysis of their dimensionality reduction capabilities is presented. The transformed data are also quantitatively evaluated, using classifiers, and it is found that it may be advantageous to perform the classification process on the transformed data, rather than on the original. These qualitative and quantitative analyses allow us to conclude that a nonlinear, supervised method, called kernel local Fisher discriminant analysis is superior for dimensionality reduction in the actual context.  相似文献   

6.
We describe functions recently added to the r package popgenreport that can be used to perform a landscape genetic analysis (LGA) based on landscape resistance surfaces, which aims to detect the effect of landscape features on gene flow. These functions for the first time implement a LGA in a single framework. Although the approach has been shown to be a valuable tool to study gene flow in landscapes, it has not been widely used to date, despite the type of data being widely available. In part, this is likely due to the necessity to use several software packages to perform landscape genetic analyses. To apply LGA functions, two types of data sets are required: a data set with spatially referenced and genotyped individuals, and a resistance layer representing the effect of the landscape. The function outputs three pairwise distance matrices from these data: a genetic distance matrix, a cost distance matrix and a Euclidean distance matrix. Statistical tests are performed to test whether the cost matrix contributes to the understanding of the observed population structure. A full report on the analysis and outputs in the form of plots and tables of all intermediate steps of the LGA is produced. It is possible to customize the LGA to allow for different cost path approaches and measures of genetic distances. The package is written in the r language and is available through the Comprehensive r Archive. Comprehensive tutorials and information on how to install and use the package are provided at the authors’ website ( www.popgenreport.org ).  相似文献   

7.
MOTIVATION: Functional analyses based on the association of Gene Ontology (GO) terms to genes in a selected gene list are useful bioinformatic tools and the GOstats package has been widely used to perform such computations. In this paper we report significant improvements and extensions such as support for conditional testing. RESULTS: We discuss the capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditional hypergeometric that uses the relationships among GO terms to decorrelate the results. AVAILABILITY: GOstats is available as an R package from the Bioconductor project: http://bioconductor.org  相似文献   

8.
9.
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq.  相似文献   

10.
11.
MOTIVATION: This paper introduces the application of a novel clustering method to microarray expression data. Its first stage involves compression of dimensions that can be achieved by applying SVD to the gene-sample matrix in microarray problems. Thus the data (samples or genes) can be represented by vectors in a truncated space of low dimensionality, 4 and 5 in the examples studied here. We find it preferable to project all vectors onto the unit sphere before applying a clustering algorithm. The clustering algorithm used here is the quantum clustering method that has one free scale parameter. Although the method is not hierarchical, it can be modified to allow hierarchy in terms of this scale parameter. RESULTS: We apply our method to three data sets. The results are very promising. On cancer cell data we obtain a dendrogram that reflects correct groupings of cells. In an AML/ALL data set we obtain very good clustering of samples into four classes of the data. Finally, in clustering of genes in yeast cell cycle data we obtain four groups in a problem that is estimated to contain five families. AVAILABILITY: Software is available as Matlab programs at http://neuron.tau.ac.il/~horn/QC.htm.  相似文献   

12.
Huang YH  Lee MH  Chen WJ  Hsiao CK 《PloS one》2011,6(7):e21890
Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available.  相似文献   

13.
Satu Ramula  Kari Lehtilä 《Oikos》2005,111(3):563-573
Large data requirements may restrict the use of matrix population models for analysis of population dynamics. Less data are required for a small population matrix than for a large matrix because the smaller matrix contains fewer vital rates that need to be estimated. Smaller matrices, however, tend to have a lower precision. Based on 37 plant species, we studied the effects of matrix dimensionality on the long-term population growth rate (λ) and the elasticity of λ in herbaceous and woody species. We found that when matrix dimensionality was reduced, changes in λ were significantly larger for herbaceous than for woody species. In many cases, λ of woody species remained virtually the same after a substantial decrease in matrix dimensionality, suggesting that woody species are less susceptible to matrix dimensionality. We demonstrated that when adjacent stages of a transition matrix are combined, the magnitude of a change in λ depends on the distance of the population structure from a stable stage distribution, and the difference in the combined vital rates weighted by their reproductive values. Elasticity of λ to survival and fecundity usually increased, whereas elasticity to growth decreased both in herbaceous and in woody species with reduced matrix dimensionality. Changes in elasticity values tended to be larger for herbaceous than for woody species. Our results show that by reducing matrix dimensionality, the amount of demographic data can be decreased to save time, money, and field effort. We recommend the use of a small matrix dimensionality especially when a limited amount of data is available, and for slow-growing species having a simple matrix structure that mainly consists of stasis and growth to the next stage.  相似文献   

14.
beadarray: R classes and methods for Illumina bead-based data   总被引:2,自引:0,他引:2  
The R/Bioconductor package beadarray allows raw data from Illumina experiments to be read and stored in convenient R classes. Users are free to choose between various methods of image processing, background correction and normalization in their analysis rather than using the defaults in Illumina's; proprietary software. The package also allows quality assessment to be carried out on the raw data. The data can then be summarized and stored in a format which can be used by other R/Bioconductor packages to perform downstream analyses. Summarized data processed by Illumina's; BeadStudio software can also be read and analysed in the same manner. Availability: The beadarray package is available from the Bioconductor web page at www.bioconductor.org. A user's guide and example data sets are provided with the package.  相似文献   

15.
Heat stress (HS) causes serious physiological dysfunction associated with cardiovascular diseases. Curcumin (CUR) may increase animal survival and lifespan under HS. However, its effects and mechanism on mammal are underexplored. The goal of this study was to examine the protective effect of CUR on the cardiac health of mice exposed to HS. Mice were divided into six groups (n=8 per group): no-heat treatment (NHT), heat treatment (HT), aspirin, CUR 50 mg/kg/day, CUR 100 mg/kg/day and CUR 200 mg/kg/day. After administration for 4 weeks, except for NHT, other groups were exposed once to HS at 41°C for 20 min. After HS treatment, the physiological-related indexes of blood pressure, rectal temperature and heart rate were measured. Serum biochemical indexes and the levels of cardiac troponin I (cTn-I) in serum and angiotensin II (Ang II) in cardiomyocytes were analyzed. Furthermore, the mRNA and proteins levels of angiotensin receptor 1 (AT1), 78-kDa glucose-regulated protein (GRP78), C/EBP homologous protein (CHOP) and B-cell lymphoma 2 (Bcl-2) were measured. Our results indicated that CUR supplementation could alleviate HS-induced physiological disorders and the increasing of cTn-I and Ang II. The expression of AT1 gene in HT group was significantly higher than that of CUR groups, indicating the cardioprotective effects of CUR. Moreover, the levels of GRP78 and CHOP proteins in the HT group were significantly higher than those of CUR groups, indicating that CUR supplementation reversed the endoplasmic reticulum HS-mediated apoptosis. In summary, CUR supplementation alleviates physiological stress and cardiac damage caused by HS.  相似文献   

16.
We present EMAN (Electron Micrograph ANalysis), a software package for performing semiautomated single-particle reconstructions from transmission electron micrographs. The goal of this project is to provide software capable of performing single-particle reconstructions beyond 10 A as such high-resolution data become available. A complete single-particle reconstruction algorithm is implemented. Options are available to generate an initial model for particles with no symmetry, a single axis of rotational symmetry, or icosahedral symmetry. Model refinement is an iterative process, which utilizes classification by model-based projection matching. CTF (contrast transfer function) parameters are determined using a new paradigm in which data from multiple micrographs are fit simultaneously. Amplitude and phase CTF correction is then performed automatically as part of the refinement loop. A graphical user interface is provided, so even those with little image processing experience will be able to begin performing reconstructions. Advanced users can directly use the lower level shell commands and even expand the package utilizing EMAN's extensive image-processing library. The package was written from scratch in C++ and is provided free of charge on our Web site. We present an overview of the package as well as several conformance tests with simulated data.  相似文献   

17.
In this study msap, an R package which analyses methylation‐sensitive amplified polymorphism (MSAP or MS‐AFLP) data is presented. The program provides a deep analysis of epigenetic variation starting from a binary data matrix indicating the banding pattern between the isoesquizomeric endonucleases HpaII and MspI, with differential sensitivity to cytosine methylation. After comparing the restriction fragments, the program determines if each fragment is susceptible to methylation (representative of epigenetic variation) or if there is no evidence of methylation (representative of genetic variation). The package provides, in a user‐friendly command line interface, a pipeline of different analyses of the variation (genetic and epigenetic) among user‐defined groups of samples, as well as the classification of the methylation occurrences in those groups. Statistical testing provides support to the analyses. A comprehensive report of the analyses and several useful plots could help researchers to assess the epigenetic and genetic variation in their MSAP experiments. msap is downloadable from CRAN ( http://cran.r-project.org/ ) and its own webpage ( http://msap.r-forge.R-project.org/ ). The package is intended to be easy to use even for those people unfamiliar with the R command line environment. Advanced users may take advantage of the available source code to adapt msap to more complex analyses.  相似文献   

18.
SUMMARY: We present Serial SimCoal, a program that models population genetic data from multiple time points, as with ancient DNA data. An extension of SIMCOAL, it also allows simultaneous modeling of complex demographic histories, and migration between multiple populations. Further, we incorporate a statistical package to calculate relevant summary statistics, which, for the first time allows users to investigate the statistical power provided by, conduct hypothesis-testing with, and explore sample size limitations of ancient DNA data. AVAILABILITY: Source code and Windows/Mac executables at http://www.stanford.edu/group/hadlylab/ssc.html CONTACT: senka@stanford.edu.  相似文献   

19.
Uncovering community structures is important for understanding networks. Currently, several nonnegative matrix factorization algorithms have been proposed for discovering community structure in complex networks. However, these algorithms exhibit some drawbacks, such as unstable results and inefficient running times. In view of the problems, a novel approach that utilizes an initialized Bayesian nonnegative matrix factorization model for determining community membership is proposed. First, based on singular value decomposition, we obtain simple initialized matrix factorizations from approximate decompositions of the complex network’s adjacency matrix. Then, within a few iterations, the final matrix factorizations are achieved by the Bayesian nonnegative matrix factorization method with the initialized matrix factorizations. Thus, the network’s community structure can be determined by judging the classification of nodes with a final matrix factor. Experimental results show that the proposed method is highly accurate and offers competitive performance to that of the state-of-the-art methods even though it is not designed for the purpose of modularity maximization.  相似文献   

20.
Curcumin (CUR) has various pharmacological effects, but its extensive first-pass metabolism and short elimination half-life limit its bioavailability. Therefore, transdermal application has become a potential alternative to delivery CUR. To increase CUR solubility for the development of a transparent homogenous gel and also enhance the permeation rate of CUR into the skin, β-cyclodextrin–curcumin nanoparticle complex (BCD–CUR-N) was developed. CUR encapsulation efficiency was increased by raising the percentage of CUR to BCD up to 20%. The mean particle size of the best CUR loading formula was 156 nm. All evaluation data using infrared spectroscopy, Raman spectroscopy, powder X-ray diffractometry, differential thermal analysis and scanning electron microscopy confirmed the successful formation of the inclusion complex. BCD–CUR-N increased the CUR dissolution rate of 10-fold (p < 0.01). In addition, the improvement of CUR permeability acrossed skin model tissue was observed in gel containing the BCD–CUR-N and was about 1.8-fold when compared with the free CUR gel (p < 0.01). Overall, CUR in the form of the BCD–CUR-N improved the solubility further on the penetration of CUR.KEY WORDS: β-cyclodextrin, curcumin, diffusion kinetic, hydrophilic gel, nanoparticle, skin permeation  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号