期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mining gene expression data by interpreting principal components

Joseph C Roden Brandon W King Diane Trout Ali Mortazavi Barbara J Wold Christopher E Hart 《BMC bioinformatics》2006,7(1):194-22

Background

There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. 相似文献

2.

Clustering gene expression data with kernel principal components

Liu Z Chen D Bensmail H Xu Y 《Journal of bioinformatics and computational biology》2005,3(2):303-316

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms. 相似文献

3.

基于逐步提取偏最小二乘主成分的特征选择方法

李建更耿涛阮晓钢《生物学杂志》2010,27(4):85-87

特征选择技术被广泛应用于生物信息学中。通过重复利用偏最小二乘（partial least square,PLS）方法提取主成分,通过逐次选择在主成分中权重较大的基因,将PLS应用于特征选择中。将这种方法用于对肿瘤基因表达谱数据的特征基因选择中,并用提取的特征基因分类,用8个特征基因进行分类时,能达到92.5%的正确率。相似文献

4.

Improve survival prediction using principal components of gene expression data

Shen YJ Huang SG 《基因组蛋白质组与生物信息学报(英文版)》2006,4(2):110-119

The purpose of many microarray studies is to find the association between gene expression and sample characteristics such as treatment type or sample phenotype. There has been a surge of efforts developing different methods for delineating the association. Aside from the high dimensionality of microarray data, one well recognized challenge is the fact that genes could be complicatedly inter-related, thus making many statistical methods inappropriate to use directly on the expression data. Multivariate methods such as principal component analysis （PCA） and clustering are often used as a part of the effort to capture the gene correlation, and the derived components or clusters are used to describe the association between gene expression and sample phenotype. We propose a method for patient population dichotomization using maximally selected test statistics in combination with the PCA method, which shows favorable results. The proposed method is compared with a currently well-recognized method. 相似文献

5.

Stochastic search variable selection based on two mixture components and continuous‐scale weighting

Marko J. Rinta‐aho Mikko J. Sillanp 《Biometrical journal. Biometrische Zeitschrift》2019,61(3):729-746

Stochastic search variable selection (SSVS) is a Bayesian variable selection method that employs covariate‐specific discrete indicator variables to select which covariates (e.g., molecular markers) are included in or excluded from the model. We present a new variant of SSVS where, instead of discrete indicator variables, we use continuous‐scale weighting variables (which take also values between zero and one) to select covariates into the model. The improved model performance is shown and compared to standard SSVS using simulated and real quantitative trait locus mapping datasets. The decision making to decide phenotype‐genotype associations in our SSVS variant is based on median of posterior distribution or using Bayes factors. We also show here that by using continuous‐scale weighting variables it is possible to improve mixing properties of Markov chain Monte Carlo sampling substantially compared to standard SSVS. Also, the separation of association signals and nonsignals (control of noise level) seems to be more efficient compared to the standard SSVS. Thus, the novel method provides efficient new framework for SSVS analysis that additionally provides whole posterior distribution for pseudo‐indicators which means more information and may help in decision making. 相似文献

6.

Control of population stratification by correlation-selected principal components

Lee S Wright FA Zou F 《Biometrics》2011,67(3):967-974

In genome-wide association studies, population stratification is recognized as producing inflated type I error due to the inflation of test statistics. Principal component-based methods applied to genotypes provide information about population structure, and have been widely used to control for stratification. Here we explore the precise relationship between genotype principal components and inflation of association test statistics, thereby drawing a connection between principal component-based stratification control and the alternative approach of genomic control. Our results provide an inherent justification for the use of principal components, but call into question the popular practice of selecting principal components based on significance of eigenvalues alone. We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the (disease) phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Analyses of simulated and real data demonstrate the usefulness of the proposed approach. 相似文献

7.

Finding haplotype tagging SNPs by use of principal components analysis

下载免费PDF全文

Lin Z Altman RB 《American journal of human genetics》2004,75(5):850-861

The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision. 相似文献

8.

Robust biclustering by sparse singular value decomposition incorporating stability selection

Sill M Kaiser S Benner A Kopp-Schneider A 《Bioinformatics (Oxford, England)》2011,27(15):2089-2097

相似文献

9.

The influence of iliotibial band syndrome history on running biomechanics examined via principal components analysis

Eric Foch Clare E. Milner 《Journal of biomechanics》2014

Iliotibial band syndrome (ITBS) is a common knee overuse injury among female runners. Atypical discrete trunk and lower extremity biomechanics during running may be associated with the etiology of ITBS. Examining discrete data points limits the interpretation of a waveform to a single value. Characterizing entire kinematic and kinetic waveforms may provide additional insight into biomechanical factors associated with ITBS. Therefore, the purpose of this cross-sectional investigation was to determine whether female runners with previous ITBS exhibited differences in kinematics and kinetics compared to controls using a principal components analysis (PCA) approach. Forty participants comprised two groups: previous ITBS and controls. Principal component scores were retained for the first three principal components and were analyzed using independent t-tests. The retained principal components accounted for 93–99% of the total variance within each waveform. Runners with previous ITBS exhibited low principal component one scores for frontal plane hip angle. Principal component one accounted for the overall magnitude in hip adduction which indicated that runners with previous ITBS assumed less hip adduction throughout stance. No differences in the remaining retained principal component scores for the waveforms were detected among groups. A smaller hip adduction angle throughout the stance phase of running may be a compensatory strategy to limit iliotibial band strain. This running strategy may have persisted after ITBS symptoms subsided. 相似文献

10.

Mining gene expression profiles: an integrated implementation of kernel principal component analysis and singular value decomposition

Reverter F Vegas E Sánchez P 《基因组蛋白质组与生物信息学报(英文版)》2010,8(3):200-210

The detection of genes that show similar profiles under different experimental conditions is often an initial step in inferring the biological significance of such genes. Visualization tools are used to identify genes with similar profiles in microarray studies. Given the large number of genes recorded in microarray experiments, gene expression data are generally displayed on a low dimensional plot, based on linear methods. However, microarray data show nonlinearity, due to high-order terms of interaction between genes, so alternative approaches, such as kernel methods, may be more appropriate. We introduce a technique that combines kernel principal component analysis (KPCA) and Biplot to visualize gene expression profiles. Our approach relies on the singular value decomposition of the input matrix and incorporates an additional step that involves KPCA. The main properties of our method are the extraction of nonlinear features and the preservation of the input variables (genes) in the output display. We apply this algorithm to colon tumor, leukemia and lymphoma datasets. Our approach reveals the underlying structure of the gene expression profiles and provides a more intuitive understanding of the gene and sample association. 相似文献

11.

A singular bacteriophytochrome acquired by lateral gene transfer

Jaubert M Lavergne J Fardoux J Hannibal L Vuillet L Adriano JM Bouyer P Pignol D Giraud E Verméglio A 《The Journal of biological chemistry》2007,282(10):7320-7328

Bacteriophytochromes are phytochrome-like proteins that mediate photosensory responses in various bacteria according to their light environment. The genome of the photosynthetic and plant-symbiotic Bradyrhizobium sp. strain ORS278 revealed the presence of a genomic island acquired by lateral transfer harboring a bacteriophytochrome gene, BrBphP3.ORS278, and genes involved in the synthesis of phycocyanobilin and gas vesicles. The corresponding protein BrBphP3.ORS278 is phylogenetically distant from the other (bacterio)phytochromes described thus far and displays a series of unusual properties. It binds phycocyanobilin as a chromophore, a unique feature for a bacteriophytochrome. Moreover, its C-terminal region is short and displays no homology with any known functional domain. Its dark-adapted state absorbs maximally around 610 nm, an unusually short wavelength for (bacterio)phytochromes. This form is designated as Po for orange-absorbing form. Upon illumination, a photo-reversible switch occurs between the Po form and a red (670 nm)-absorbing form (Pr), which rapidly backreacts in the dark. Because of this instability, illumination results in a mixture of the Po and Pr states in proportions that depend on the intensity. These uncommon features suggest that BrBphP3.ORS278 could be fitted to measure light intensity rather than color. 相似文献

12.

Unsupervised feature selection via two-way ordering in gene expression analysis

Ding CH 《Bioinformatics (Oxford, England)》2003,19(10):1259-1266

MOTIVATION: Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Most current methods select genes based on known phenotype information. However, certain set of genes may correspond to new phenotypes which are yet unknown, and it is important to develop novel effective selection methods for their discovery without using any prior phenotype information. RESULTS: We propose and study a new method to select relevant genes based on their similarity information only. The method relies on a mechanism for discarding irrelevant genes. A two-way ordering of gene expression data can force irrelevant genes towards the middle in the ordering and thus can be discarded. Mechanisms based on variance and principal component analysis are also studied. When applied to expression profiles of colon cancer and leukemia, the unsupervised method outperforms the baseline algorithm that simply uses all genes, and it also selects relevant genes close to those selected using supervised methods. SUPPLEMENT: More results and software are online: http://www.nersc.gov/~cding/2way. 相似文献

13.

pcaGoPromoter--an R package for biological and regulatory interpretation of principal components in genome-wide gene expression data

Hansen M Gerds TA Nielsen OH Seidelin JB Troelsen JT Olsen J 《PloS one》2012,7(2):e32394

相似文献

14.

Simultaneous gene clustering and subset selection for sample classification via MDL

Jörnsten R Yu B 《Bioinformatics (Oxford, England)》2003,19(9):1100-1109

相似文献

15.

V E Deriabin 《Nauchnye doklady vysshe? shkoly. Biologicheskie nauki》1987,(1):50-55

The method for analysis of the ontogenetic changes of size and shape based on the decomposition of Kullback divergence into parts corresponding to the principal components, found for intragroup variability of signs is suggested. The application of this method is illustrated by the analysis of age changes of body proportions for boys 3-17 years old. Before the pubertal period the stable tendency to a relative lengthening of extremities which changes to a contrast one at age of 12-14 has been pointed out. The definitive differences of proportions to a considerable extent is determined by variability of biological age at the group of children. 相似文献

16.

Yeast metabolic innovations emerged via expanded metabolic network and gene positive selection

Hongzhong Lu Feiran Li Le Yuan Ivn Domenzain Rosemary Yu Hao Wang Gang Li Yu Chen Boyang Ji Eduard J Kerkhoven Jens Nielsen 《Molecular systems biology》2021,17(10)

Yeasts are known to have versatile metabolic traits, while how these metabolic traits have evolved has not been elucidated systematically. We performed integrative evolution analysis to investigate how genomic evolution determines trait generation by reconstructing genome‐scale metabolic models (GEMs) for 332 yeasts. These GEMs could comprehensively characterize trait diversity and predict enzyme functionality, thereby signifying that sequence‐level evolution has shaped reaction networks towards new metabolic functions. Strikingly, using GEMs, we can mechanistically map different evolutionary events, e.g. horizontal gene transfer and gene duplication, onto relevant subpathways to explain metabolic plasticity. This demonstrates that gene family expansion and enzyme promiscuity are prominent mechanisms for metabolic trait gains, while GEM simulations reveal that additional factors, such as gene loss from distant pathways, contribute to trait losses. Furthermore, our analysis could pinpoint to specific genes and pathways that have been under positive selection and relevant for the formulation of complex metabolic traits, i.e. thermotolerance and the Crabtree effect. Our findings illustrate how multidimensional evolution in both metabolic network structure and individual enzymes drives phenotypic variations. 相似文献

17.

Metabolomic profiling of Cheonggukjang during fermentation by 1H NMR spectrometry and principal components analysis

《Process Biochemistry》2007,42(2):263-266

Metabolomic analysis of extracts of Cheonggukjang was carried out using ¹H nuclear magnetic resonance (NMR) spectrometry and principal components analysis (PCA). The major peaks in the ¹H NMR spectra of the 50% methanol fraction were assigned to isoleucine/leucine, lactate, alanine, acetic acid, citric acid, choline, fructose, sucrose, tyrosine, phenylalanine and formic acid. The first two principle components (PC1 and PC2) of the ¹H NMR spectra of the aqueous fraction allowed discrimination of Cheonggukjang extracts of samples obtained after different periods of fermentation. These two principal components cumulatively accounted for 98.5% of the total variation of all variables. The major peaks within the ¹H NMR spectra that contributed to discrimination of different samples were assigned to isoleucine/leucine, lactate, acetic acid, citric acid, choline, fructose, glucose and sucrose. This metabolomic analysis of samples of Cheonggukjang extract demonstrates that NMR and PCA can be used to obtain standard trajectory plots and related information for Cheonggukjang and other fermented foods. 相似文献

18.

Determination of compound aminopyrine phenacetin tablets by using artificial neural networks combined with principal components analysis

Dou Y Mi H Zhao L Ren Y Ren Y 《Analytical biochemistry》2006,351(2):174-180

A method for simultaneous, nondestructive analysis of aminopyrine and phenacetin in compound aminopyrine phenacetin tablets with different concentrations has been developed by principal component artificial neural networks (PC-ANNs) on near-infrared (NIR) spectroscopy. In PC-ANN models, the spectral data were initially analyzed by principal component analysis. Then the scores of the principal components were chosen as input nodes for the input layer instead of the spectral data. The artificial neural network models using the spectral data as input nodes were also established and compared with the PC-ANN models. Four different preprocessing methods (first-derivative, second-derivative, standard normal variate (SNV), and multiplicative scatter correction) were applied to three sets of NIR spectra of compound aminopyrine phenacetin tablets. The PC-ANNs approach with SNV preprocessing spectra was found to provide the best results. The degree of approximation was performed as the selective criterion of the optimum network parameters. 相似文献

19.

Adipose tissue distribution: the stability of principal components by sex, ethnicity and maturation stage 总被引：1，自引：0，他引：1

R N Baumgartner A F Roche S Guo T Lohman R A Boileau M H Slaughter 《Human biology; an international record of research》1986,58(5):719-735

相似文献

20.

Regulation of rat mammary gene expression by extracellular matrix components 总被引：20，自引：0，他引：20

Joanne L. Blum Mary E. Zeigler Max S. Wicha 《Experimental cell research》1987,173(2):322-340

In the mammary gland the induction and maintenance of differentiation are dependent on both lactogenic hormones and the extracellular matrix (ECM). Since mammary epithelial cells differentiate on a basement membrane in vivo we have examined the effects of basement membrane components on the expression of milk protein genes in primary rat mammary cultures. We examined the effects of a basement membrane gel derived from the Englebreth-Holm-Swarm tumor as well as its major component, laminin, on the expression of a group of milk protein genes. We demonstrate that the basement membrane gel induces alpha-casein and alpha-lactalbumin (alpha-LA) accumulation up to 160- and 70-fold, respectively, of that on tissue culture plastic. Laminin, a major component of the basement membrane, also caused significant induction of these same proteins. In order to determine whether these ECM effects occurred at a translational or post-translational level, pulse-chase experiments were performed. These experiments demonstrated that a laminin substratum selectively effects milk protein turnover and secretion. In order to demonstrate whether ECM effects occurred at the level of steady state accumulation of mRNA we performed dot blot and Northern analyses using cloned cDNA probes for alpha-, beta-, and gamma-caseins and alpha-LA. These studies demonstrated that ECM components induced alpha- and beta-caseins up to 10-fold, and alpha-LA up to 3-fold, with no significant effect on gamma-casein. These results demonstrate that milk protein genes are not coordinately regulated by ECM components. Furthermore, since the amount of induction of milk proteins exceeds the amount of induction of mRNAs for these proteins, we conclude that in our system a major effect of ECM components is at the translational and/or post-translational levels. Based on these findings we propose a model in which basement membrane components effect mammary gene expression at multiple levels. 相似文献