首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
This paper describes the use of Principal Component Analysis (PCA) as a tool for modeling chromatographic separations. PCA is an analytical technique developed to extract key information out of large data sets and to develop relationships and correlations. The basis of the proposed model is the use of PCA to correlate experimental chromatographic data across different process variables or scales. The generated correlations are then used to provide for the simulation of additional chromatographic runs not included in the initial dataset. The approach is demonstrated by application to the cation exchange separation of a four protein component feed comprising ovalbumin, ovatransferrin, lysozyme, and myoglobin. A good fit between modeled and experimental data was found, and the ability of the method to model additional chromatographic separations not within the original dataset is demonstrated. The technique has the potential to accommodate changing system variables such as column dimensions as well as process variables including sample volume and salt gradient. It provides a potentially powerful tool for the rapid investigation of scale-up effects and for the minimization of the material inventories needed for such studies.  相似文献   

2.
MOTIVATION: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p(genes) far exceeding the number of samples N. Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. RESULTS: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-Small-Cell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies.  相似文献   

3.
The Mr 15000 protein associated with water-washed wheat starch granules from soft wheats was shown to be heterogeneous: it could be divided into a fraction containing one or more-amylase inhibitor subunits and a fraction largely composed of a previously uncharacterised polypeptide(s) referred to as the grainsoftness protein (GSP). The major N-terminal sequence and sequences of peptides derived from protease digests of GSP are reported. An antiserum specific for GSP was used to show that GSP accumulated in both hard and soft wheat grains, but the GSP in soft grains associated more strongly with starch granules than the GSP in hard grains. A positive correlation between grain softness and accumulation of GSP in the seed was demonstrated for a range of cultivars. This differs from the qualitative relationship, based on the isolated starch fraction, between GSP and grain softness that has already been reported. Analysis of wholemeal extracts with the antiserum demonstrated that the accumulation of GSP in the seed was dependent on the short arm of chromosome 5D, which also encodes theHa locus. In addition, examination of near-isogenic lines differing in hardness indicated that the gene(s) controlling GSP was (were) linked with theHa locus. The findings indicate that GSP may be the product of theHa locus and thus be the major factor that determines the milling characteristics of bread wheats.  相似文献   

4.
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.  相似文献   

5.
Background: Hepatic ischemia and reperfusion injury (IRI) is a major complication in liver surgery, and hepatic steatosis is a primary factor aggravating cellular injury during IRI. Both pro-inflammatory cytokines and reactive oxygen species (ROS) are key mediators of hepatic IRI. Ischemic preconditioning (IpreC), remote ischemia preconditioning (RIPC) and ischemic postconditioning (IpostC) have offered protections on hepatic IRI, but all these methods have their own shortcomings. Grape seed proanthocyanidins (GSP) has a broad spectrum of pharmacological properties against oxidative stress. Thus, GSP has potential protective effects against hepatic IRI.Methods: C57BL/6 mice suffering 30mins hepatic ischemia process were sacrificed after 1h reperfusion to build murine warm hepatic IRI model. The mice were injected GSP intraperitoneally 10, 20, 40mg/kg/day for 3 weeks as pharmacological preconditioning. Obese mice fed with high-fat diet for 24 weeks before used. Three pathways related to IRI, including ROS elimination, pro-inflammatory cytokines release and hypoxia responses were examined.Results: Our data show that GSP could significantly reduce hepatic IRI by protecting hepatocyte function and increasing the activity of ROS scavengers, as well as decreasing cytokines levels. At the same time, GSP also enhance the hypoxia tolerance response. Combined GSP and postconditioning can provided synergistic protection. In the obese mice suffering hepatic IRI group, GSP was more effective than postconditioning on protecting liver against IRI, and the combined strategy was obviously superior to the solo treatment.Conclusion: GSP could protect liver against IRI: particularly in high-fat diet induced obese mice. GSP used as pharmacological preconditioning and combined with other protocols have huge potential to be used in clinical.  相似文献   

6.
Abstract. Numerous ecological studies use Principal Components Analysis (PCA) for exploratory analysis and data reduction. Determination of the number of components to retain is the most crucial problem confronting the researcher when using PCA. An incorrect choice may lead to the underextraction of components, but commonly results in overextraction. Of several methods proposed to determine the significance of principal components, Parallel Analysis (PA) has proven consistently accurate in determining the threshold for significant components, variable loadings, and analytical statistics when decomposing a correlation matrix. In this procedure, eigenvalues from a data set prior to rotation are compared with those from a matrix of random values of the same dimensionality (p variables and n samples). PCA eigenvalues from the data greater than PA eigenvalues from the corresponding random data can be retained. All components with eigenvalues below this threshold value should be considered spurious. We illustrate Parallel Analysis on an environmental data set. We reviewed all articles utilizing PCA or Factor Analysis (FA) from 1987 to 1993 from Ecology, Ecological Monographs, Journal of Vegetation Science and Journal of Ecology. Analyses were first separated into those PCA which decomposed a correlation matrix and those PCA which decomposed a covariance matrix. Parallel Analysis (PA) was applied for each PCA/FA found in the literature. Of 39 analy ses (in 22 articles), 29 (74.4 %) considered no threshold rule, presumably retaining interpretable components. According to the PA results, 26 (66.7 %) overextracted components. This overextraction may have resulted in potentially misleading interpretation of spurious components. It is suggested that the routine use of PA in multivariate ordination will increase confidence in the results and reduce the subjective interpretation of supposedly objective methods.  相似文献   

7.
The purpose of many microarray studies is to find the association between gene expression and sample characteristics such as treatment type or sample phenotype. There has been a surge of efforts developing different methods for delineating the association. Aside from the high dimensionality of microarray data, one well recognized challenge is the fact that genes could be complicatedly inter-related, thus making many statistical methods inappropriate to use directly on the expression data. Multivariate methods such as principal component analysis (PCA) and clustering are often used as a part of the effort to capture the gene correlation, and the derived components or clusters are used to describe the association between gene expression and sample phenotype. We propose a method for patient population dichotomization using maximally selected test statistics in combination with the PCA method, which shows favorable results. The proposed method is compared with a currently well-recognized method.  相似文献   

8.
Principal Component Analysis (PCA) is a classical technique in statistical data analysis, feature extraction and data reduction, aiming at explaining observed signals as a linear combination of orthogonal principal components. Independent Component Analysis (ICA) is a technique of array processing and data analysis, aiming at recovering unobserved signals or 'sources' from observed mixtures, exploiting only the assumption of mutual independence between the signals. The separation of the sources by ICA has great potential in applications such as the separation of sound signals (like voices mixed in simultaneous multiple records, for example), in telecommunication or in the treatment of medical signals. However, ICA is not yet often used by statisticians. In this paper, we shall present ICA in a statistical framework and compare this method with PCA for electroencephalograms (EEG) analysis.We shall see that ICA provides a more useful data representation than PCA, for instance, for the representation of a particular characteristic of the EEG named event-related potential (ERP).  相似文献   

9.
利用4种产生平端切头的限制性内切酶消化小菜蛾(Plutella xylostella)的基因组DNA,然后利用DNA连接酶的催化作用,在4种不同平端切头的小菜蛾基因组DNA上连接一个氨基化的基因组步移衔接头序列,针对衔接头及已克隆的CYP9G2基因的序列,设计两对PCR上、下游引物,进行PCR扩增、T-A克隆和阳性克隆的巢式PCR验证,通过测序克隆到了小菜蛾CYP9G2基因上游未知序列约1.8 kb.通过对该基因的上游序列进行信息分析,发现1个可能的节肢动物动物转录起始子(Inr),3个CAAT样盒及1个抗氧化剂样反应因子,共5个可能的顺式调控元件.研究还表明,利用基因组步移方法可以快速地克隆已知序列的上游未知序列,实验操作经济、简便,对于已知cDNA序列或部分基因组序列的基因,其上游调控序列的克隆,基因组步移具有较高的实用价值.  相似文献   

10.
Our lack of knowledge about the biological mechanisms of 50 Hz magnetic fields makes it hard to improve exposure assessment. To provide better information about these exposure measures, we use multidimensional analysis techniques to examine the relations between different exposure metrics for a group of subjects. We used a combination of a two stage Principal Component Analysis (PCA) followed by an ascending hierarchical classification (AHC) to identify a set of measures that would capture the characteristics of the total exposure. This analysis gives an indication of the aspects of the exposure that are important to capture to get a complete picture of the magnetic field environment. We calculated 44 metrics of exposure measures from 16 exposed EDF employees and 15 control subjects, containing approximately 20,000 recordings of magnetic field measurements, taken every 30 s for 7 days with an EMDEX II dosimeter. These metrics included parameters used routinely or occasionally and some that were new. To eliminate those that expressed the least variability and that were most highly correlated to one another, we began with an initial Principal Component Analysis (PCA). A second PCA of the remaining 12 metrics enabled us to identify from the foreground 82.7% of the variance: the first component (62.0%) was characterized by central tendency metrics, and the second (20.7%) by dispersion characteristics. We were able to use AHC to divide the entire sample (of individuals) into four groups according to the axes that emerged from the PCA. Finally, discriminant analysis tested the discriminant power of the variables in the exposed/control classification as well as those from the AHC classification. The first showed that two subjects had been incorrectly classified, while no classification error was observed in the second. This exploratory study underscores the need to improve exposure measures by using at least two dimensions: intensity and dispersion. It also indicates the usefulness of constructing a typology of magnetic field exposures.  相似文献   

11.
Knowledge about grassland biomass and its dynamics is critical for studying regional carbon cycles and for the sustainable use of grassland resources. In this study, we investigated the spatio-temporal variation of biomass in the Xilingol grasslands of northern China. Field-based biomass samples and MODIS time series data sets were used to establish two empirical models based on the relationship of the normalized difference vegetation index (NDVI) with above-ground biomass (AGB) as well as that of AGB with below-ground biomass (BGB). We further explored the climatic controls of these variations. Our results showed that the biomass averaged 99.01 Tg (1 Tg=1012 g) over a total area of 19.6×104 km2 and fluctuated with no significant trend from 2001 to 2012. The mean biomass density was 505.4 g/m2, with 62.6 g/m2 in AGB and 442.8 g/m2 in BGB, which generally decreased from northeast to southwest and exhibited a large spatial heterogeneity. The year-to-year AGB pattern was generally consistent with the inter-annual variation in the growing season precipitation (GSP), showing a robust positive correlation (R2=0.82, P<0.001), but an opposite coupled pattern was observed with the growing season temperature (GST) (R2=0.61, P=0.003). Climatic factors also affected the spatial distribution of AGB, which increased progressively with the GSP gradient (R2=0.76, P<0.0001) but decreased with an increasing GST (R2=0.70, P<0.0001). An improved moisture index that combined the effects of GST and GSP explained more variation in AGB than did precipitation alone (R2=0.81, P<0.0001). The relationship between AGB and GSP could be fit by a power function. This increasing slope of the GSP–AGB relationships along the GSP gradient may be partly explained by the GST–GSP spatial pattern in Xilingol. Our findings suggest that the relationships between climatic factors and AGB may be scale-dependent and that multi-scale studies and sufficient long-term field data are needed to examine the relationships between AGB and climatic factors.  相似文献   

12.
Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method.  相似文献   

13.
基于SOD和EST同工酶的19种苔藓植物种间关系排序分析   总被引:7,自引:0,他引:7  
采用聚丙烯酰胺凝胶电泳的方法, 获得了金华北山苔藓植物19 个种之间酯酶(EST)、超氧化物歧化酶(SOD)的同工酶酶谱。将19 个种苔藓植物的酶谱进行量化后, 采用主成分分析(PCA)方法, 比较了19 个种种间关系的差异特点。研究表明, 苔藓植物具有很高的遗传多样性, 以同工酶酶谱资料为基础, 应用主成分分析方法能够比较直观和有效地反映出苔藓植物分类群间的系统关系。  相似文献   

14.
遥感技术已成为大尺度植被分类的重要手段,而地面植物群落特征与其光谱特征之间的关系是解译遥感影像的关键。该研究选择上海崇明东滩自然保护区的盐沼植物群落为对象,应用ASD地物光谱仪测定其植物群落的光谱反射率,并采用10个小型机载成像光谱仪(CASI)默认植被波段组,应用主分量分析法和相关分析分析了不同群落光谱特征与生态环境因子之间的关系。分析结果表明,间接排序法PCA能够识别盐沼植被中光滩、海三棱 草(Scirpus mariqueter)群落、芦苇(Phragmites australis)群落和互花米草(Spartina alterniflora)等群落的光谱特征,绝大多数盐沼湿地植物群落组成与光谱特征之间有显著的相关,识别效果最好的波段组是736~744 nm、746~753 nm、775~784 nm、815~824 nm和860~870 nm;对光谱反射率影响最大的生态环境因子分别是植物群落的高度和盖度,高程和其它环境因子的影响次之。研究成果可为遥感监测崇明东滩自然保护区内入侵种互花米草的空间分布和扩散规律提供技术支撑,为高光谱遥感影像的影像判读和解译分类以及盐沼湿地植被制图提供科学依据。  相似文献   

15.
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).  相似文献   

16.
A spore cortex-lytic enzyme of Clostridium perfringens S40 is synthesized during sporulation as a precursor consisting of four domains. After cleavage of an N-terminal preregion and a C-terminal proregion, inactive proenzyme (termed C35) is converted to active enzyme by processing of an N-terminal prosequence with germination-specific protease (GSP) during germination. The present results demonstrated that the cleaved N-terminal prepeptide remained associated with C35. After the isolated complex was denatured and dissociated in 6 M urea solution, removal of urea regenerated a prepeptide-C35 complex which produces active enzyme when incubated with GSP. However, isolated C35 alone could not be activated by GSP. The prepeptide-C35 complex was more heat stable than active enzyme. Thus, non-covalent attachment of the prepeptide to C35 is required to assist correct folding of C35 and to stabilize its conformation, suggesting that the prepeptide functions as an intramolecular chaperone. Recombinant proteins, which have prepeptide covalently bonded to C35, were processed by GSP as well as the in vivo prepeptide-C35 complex, and the full length of the N-terminal presequence was needed to fulfil its role. Although the C-terminal prosequence is present as an independent domain which is not involved in the activation process of the enzyme, it appears that the N-terminal prosequence contributes to the regulation of enzyme activity as an inhibitor of the enzyme.  相似文献   

17.
18.
The immunostimulatory effects of orally administered Panax ginseng root or its polysaccharides (GSP) in white shrimp, Litopenaeus vannamei, were investigated in this study. Shrimp were fed a diet containing 0.4 g kg?1 GSP over a period of 84 days, during which the activities of total superoxide dismutase (T-SOD), catalase (CAT), glutathione peroxidase (GSH-Px), acid phosphatase (ACP), and alkaline phosphatase (AKP), as well as malondialdehyde (MDA) content, and expressions of cytosolic superoxide dismutase (cyt-SOD), CAT, GSH-Px, and peroxiredoxin (Prx) genes were determined in various tissues of the shrimp. Results showed that the shrimp fed the GSP diet had significantly increased ACP and AKP activities in the gills. The GSP-fed shrimp also displayed significantly increased T-SOD and GSH-Px activities in the gills and hepatopancreas of the shrimp; meanwhile there was enhanced CAT activity in the gills, but decreased MDA content in the gills, hepatopancreas and muscle. The mRNA expressions of cyt-SOD, CAT, GSH-Px and Prx were significantly elevated in the gills and hepatopancreas of the shrimp fed the GSP diet for 84 days, compared with that of the control. Therefore, GSP can be used as an immunostimulant for shrimp through dietary administration to increase immune enzyme activity and modify expression of immune genes in shrimp.  相似文献   

19.
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号