共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper examines the selection of the appropriate representation of chromatogram data prior to using principal component analysis (PCA), a multivariate statistical technique, for the diagnosis of chromatogram data sets. The effects of four process variables were investigated; flow rate, temperature, loading concentration and loading volume, for a size exclusion chromatography system used to separate three components (monomer, dimer, trimer). The study showed that major positional shifts in the elution peaks that result when running the separation at different flow rates caused the effects of other variables to be masked if the PCA is performed using elapsed time as the comparative basis. Two alternative methods of representing the data in chromatograms are proposed. In the first data were converted to a volumetric basis prior to performing the PCA, while in the second, having made this transformation the data were adjusted to account for the total material loaded during each separation. Two datasets were analysed to demonstrate the approaches. The results show that by appropriate selection of the basis prior to the analysis, significantly greater process insight can be gained from the PCA and demonstrates the importance of pre-processing prior to such analysis. 相似文献
2.
Principal component analysis (PCA) was used to analyse the behaviour of a chromatographic separation as its scale increased. Three 4.6 mm diameter columns identical in every respect except for column length (25, 15 and 5 cm), were used to generate the data from a test system based on the reversed-phase HPLC separation of crude erythromycin on a polystyrene matrix (PLRP 1000) having a particle diameter of 8 mu;m and a pore diameter of 100 nm. The species were separated with an isocratic solvent composed of 45/55 acetonitrile/water at about pH 7. An experimental design technique was used to investigate the effects of four process variables (load volume, load concentration, temperature and pH of buffer) on the chromatogram shapes. Following appropriate pre-processing of the chromatographic data, subsets of critical chromatograms were selected which sufficiently characterised the entire data set. From this subset, the corresponding runs were performed on the different sized columns and principal component models were generated for each. At 5 and 15 cm a single principal component was sufficient to characterise all the variance in the chromatograms which the range of process variables introduced, but at 25 cm two principal components were required, particularly to characterise the chromatograms with small loads. Excellent correlations were observed between the first principal components at the three scales. The possibility of predicting the separations on the 25 cm column from an analysis of the separations observed at 5 cm was investigated. The study revealed that good predictions could be made at high loads (>92%) , but the model was not effective at low loads because of the need to incorporate a second principal component which was not defined by the range of variables applied to the 5 cm column. 相似文献
3.
We have developed a program for microarray data analysis, which features the false discovery rate for testing statistical significance and the principal component analysis using the singular value decomposition method for detecting the global trends of gene-expression patterns. Additional features include analysis of variance with multiple methods for error variance adjustment, correction of cross-channel correlation for two-color microarrays, identification of genes specific to each cluster of tissue samples, biplot of tissues and corresponding tissue-specific genes, clustering of genes that are correlated with each principal component (PC), three-dimensional graphics based on virtual reality modeling language and sharing of PC between different experiments. The software also supports parameter adjustment, gene search and graphical output of results. The software is implemented as a web tool and thus the speed of analysis does not depend on the power of a client computer. AVAILABILITY: The tool can be used on-line or downloaded at http://lgsun.grc.nia.nih.gov/ANOVA/ 相似文献
4.
Protein folds are built primarily from the packing together of two types of structures: alpha-helices and beta-sheets. Neither structure is rigid, and the flexibility of helices and sheets is often important in determining the final fold (e.g., coiled coils and beta-barrels). Recent work has quantified the flexibility of alpha-helices using a principal component analysis (PCA) of database helical structures (J. Mol. Bio. 2003, 327, pp. 229-237). Here, we extend the analysis to beta-sheet flexibility using PCA on a database of beta-sheet structures. For sheets of varying dimension and geometry, we find two dominant modes of flexibility: twist and bend. The distributions of amplitudes for these modes are found to be Gaussian and independent, suggesting that the PCA twist and bend modes can be identified as the soft elastic normal modes of sheets. We consider the scaling of mode eigenvalues with sheet size and find that parallel beta-sheets are more rigid than antiparallel sheets over the entire range studied. Finally, we discuss the application of our PCA results to modeling and design of beta-sheet proteins. 相似文献
7.
The GeoPCA package is the first tool developed for multivariate analysis of dihedral angles based on principal component geodesics. Principal component geodesic analysis provides a natural generalization of principal component analysis for data distributed in non-Euclidean space, as in the case of angular data. GeoPCA presents projection of angular data on a sphere composed of the first two principal component geodesics, allowing clustering based on dihedral angles as opposed to Cartesian coordinates. It also provides a measure of the similarity between input structures based on only dihedral angles, in analogy to the root-mean-square deviation of atoms based on Cartesian coordinates. The principal component geodesic approach is shown herein to reproduce clusters of nucleotides observed in an η-θ plot. GeoPCA can be accessed via http://pca.limlab.ibms.sinica.edu.tw. 相似文献
8.
Background Survivors of transient ischemic attack (TIA) or stroke are at high risk for recurrent vascular events and aggressive treatment of vascular risk factors can reduce this risk. However, vascular risk factors, especially hypertension and high cholesterol, are not managed optimally even in those patients seen in specialized clinics. This gap between the evidence for secondary prevention of stroke and the clinical reality leads to suboptimal patient outcomes. In this study, we will be testing a pharmacist case manager for delivery of stroke prevention services. We hypothesize this new structure will improve processes of care which in turn should lead to improved outcomes. Methods We will conduct a prospective, randomized, controlled open-label with blinded ascertainment of outcomes (PROBE) trial. Treatment allocation will be concealed from the study personnel, and all outcomes will be collected in an independent and blinded manner by observers who have not been involved in the patient's clinical care or trial participation and who are masked to baseline measurements. Patients will be randomized to control or a pharmacist case manager treating vascular risk factors to guideline-recommended target levels. Eligible patients will include all adult patients seen at stroke prevention clinics in Edmonton, Alberta after an ischemic stroke or TIA who have uncontrolled hypertension (defined as systolic blood pressure (BP) > 140 mm Hg) or dyslipidemia (fasting LDL-cholesterol > 2.00 mmol/L) and who are not cognitively impaired or institutionalized. The primary outcome will be the proportion of subjects who attain 'optimal BP and lipid control'(defined as systolic BP < 140 mm Hg and fasting LDL cholesterol < 2.0 mmol/L) at six months compared to baseline; 12-month data will also be collected for analyses of sustainability of any effects. A variety of secondary outcomes related to vascular risk and health-related quality of life will also be collected. Conclusions Nearly one-quarter of those who survive a TIA or minor stroke suffer another vascular event within a year. If our intervention improves the provision of secondary prevention therapies in these patients, the clinical (and financial) implications will be enormous. 相似文献
9.
We report on the analysis of three human cranial fragments from a Mousterian context at the site of La Quina (France), which show anthropogenic surface modifications. Macroscopic and microscopic analyses, including SEM observation, demonstrate that the modifications visible on one of these fragments are similar to those produced on bone fragments used experimentally to retouch flakes. The microscopic analysis also identified ancient scraping marks, possibly resulting from the cleaning of the skull prior to its breakage and utilisation of a resulting fragment as a tool. The traces of utilisation and the dimensions of this object are compared to those on a sample of 67 bone retouchers found in the same excavation area and layer. Results show that the tool size, as well as the dimensions and location of the utilised area, fall well within the range of variation observed on faunal shaft fragments from La Quina that were used as retouchers. This skull fragment represents the earliest known use of human bone as a raw material and the first reported use of human bone for this purpose by hominins other than modern humans. The two other skull fragments, which probably come from the same individual, also bear anthropogenic surface modifications in the form of percussion, cut, and scraping marks. The deliberate versus unintentional hypotheses for the unusual choice of the bone are presented in light of contextual information, modifications identified on the two skull fragments not used as tools, and data on bone retouchers from the same layer, the same site, and other Mousterian sites. 相似文献
10.
The Proteome Analysis database (http://www.ebi.ac.uk/proteome/) has been developed by the Sequence Database Group at EBI utilizing existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archeae and eukaryotes. Three main projects are used, InterPro, CluSTr and GO Slim, to give an overview on families, domains, sites, and functions of the proteins from each of the complete genomes. Complete proteome analysis is available for a total of 89 proteome sets. A specifically designed application enables InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database. 相似文献
11.
Eucalyptus grandis wood was biodegraded by eight basidiomycetes and two ascomycetes. Four groups of decayed wood samples were recognized based on the principal component analysis (PCA) of weight and component loss data. Among the 10 fungal species studied, no selective lignin biodegradation was achieved. PCA was very efficient in recognizing wood decay patterns and seems to be a useful tool to analyse large group of weight- and component-loss data. 相似文献
12.
Forty different antibiotics with diverse kingdom and functional specificities were used to measure the functional characteristics of the archaebacterial translation apparatus. The resulting inhibitory curves, which are characteristic of the cell-free system analyzed, were transformed into quantitative values that were used to cluster the different archaebacteria analyzed. This cluster resembles the phylogenetic tree generated by 16S rRNA sequence comparisons. These results strongly suggest that functional analysis of an appropriate evolutionary clock, such as the ribosome, is of intrinsic phylogenetic value. More importantly, they indicate that the study of the nexus between genotypic and phenotypic (functional) information may shed considerable light on the evolution of the protein synthetic machinery. 相似文献
13.
Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low. 相似文献
14.
Summary Field data of a detailed vegetation survey, undertaken in a wet grassland in the nature reserve “De Kampina”, near Boxtel,
Netherlands, were used to prepare a Braun-Blanquet phytosociological table. From this table two associations could be derived: Cirsio-Molinietum and Senecioni-Brometum racemosi.
Association-analysis and principal component analysis of a species correlation coefficient matrix was carried out on the same
data in order to test their usefulness in a further ecological interpretation of the Braun-Blanquet phytosociological table.
It was found that association-analysis, although it was able to detect a similar pattern in vegetation, did not allow a further
interpretation of the phytosociological table.
Principal component analysis proved to be very useful in the ecological interpretation of details of the table and in the
correct placing of less representative and intermediate stands.
Zusammenfassung Aufnahmen einer detaillierten Untersuchung einer sumpfigen Wiese im Naturschutzgebiet “De Kampina”, bei Boxtel, Niederlande,
sind dargestellt worden in einer pflanzensoziologischen Tabelle. Zwei Assoziationen konnten herausgearbeitet werden: Cirsio-Molinietum
und Senecioni-Brometum racemosi.
Dieselben Daten wurden für eine “Assoziationsanalyse” und eine prinzipale Komponentenanalyse benutzt, um ihre Brauchbarkeit
für eine weitere ?kologische Interpretation der pflanzensoziologischen Tabelle zu prüfen.
Obwohl mit der “Assoziationsanalyse” ein ?hnliches Muster in der Vegetation zu entdecken war wie mit der pflanzensoziologischen
Tabelle, erm?glichte sie keine weitere ?kologische Interpretation.
Die prinzipale Komponentenanalyse zeigte sich sehr nützlich sowohl bei der ?kologischen Interpretation von Einzelheiten der
Tabelle als auch bei der Einordnung nicht v?llig typischer bzw. intermedi?rer Aufnahmen.
相似文献
15.
Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low. 相似文献
16.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. 相似文献
17.
The authors tested a new procedure for the discrimination of EPs obtained in different stimulus situations. In contrast with principal component analysis (PCA) used so far for the purpose of data compression, the method referred to as canonical component analysis (CCA) is optimal for the purpose of discrimination. To illustrate this, the authors performed both PCA and CCA for the same material, then after carrying out discriminant analysis (SDWA) for the data transformed in this way, compared the performance of the two procedures in discrimination. In view of both the theoretical and practical considerations, the authors recommend that in the future researchers use CCA instead of PCA in EP studies for data reduction carried out for discrimination. 相似文献
19.
We present a new WWW-based tool for plant gene analysis, the Arabidopsis Co-Expression Tool (ACT), based on a large Arabidopsis thaliana microarray data set obtained from the Nottingham Arabidopsis Stock Centre. The co-expression analysis tool allows users to identify genes whose expression patterns are correlated across selected experiments or the complete data set. Results are accompanied by estimates of the statistical significance of the correlation relationships, expressed as probability (P) and expectation (E) values. Additionally, highly ranked genes on a correlation list can be examined using the novel clique finder tool to determine the sets of genes most likely to be regulated in a similar manner. In combination, these tools offer three levels of analysis: creation of correlation lists of co-expressed genes, refinement of these lists using two-dimensional scatter plots, and dissection into cliques of co-regulated genes. We illustrate the applications of the software by analysing genes encoding functionally related proteins, as well as pathways involved in plant responses to environmental stimuli. These analyses demonstrate novel biological relationships underlying the observed gene co-expression patterns. To demonstrate the ability of the software to develop testable hypotheses on gene function within a defined biological process we have used the example of cell wall biosynthesis genes. The resource is freely available at http://www.arabidopsis.leeds.ac.uk/ACT/ 相似文献
|