首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 409 毫秒
1.
MOTIVATION: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p(genes) far exceeding the number of samples N. Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. RESULTS: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-Small-Cell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies.  相似文献   

2.
文章研究了基于微阵列基因表达数据的胃癌亚型分类。微阵列基因表达数据样本少、纬度高、噪声大的特点,使得数据降维成为分类成功的关键。作者将主成分分析(PCA) 和偏最小二乘(PLS)两种降维方法应用于胃癌亚型分类研究,以支持向量机(SVM)、K- 近邻法(KNN)为分类器对两套胃癌数据进行亚型分类。分类效果相比传统的医理诊断略高,最高准确率可达100%。研究结果表明,主成分分析和偏最小二乘方法能够有效地提取分类特征信息,并能在保持较高的分类准确率的前提下大幅度地降低基因表达数据的维数。  相似文献   

3.
4.
Robust PCA and classification in biosciences   总被引:7,自引:0,他引:7  
MOTIVATION: Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS: First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY: All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.  相似文献   

5.
Integrative taxonomy is considered a reliable taxonomic approach of closely related and cryptic species by integrating different sources of taxonomic data (genetic, ecological, and morphological characters). In order to infer the boundaries of seven species of the evacanthine leafhopper genus Bundera Distant, 1908 (Hemiptera: Cicadellidae), an integrated analysis based on morphology, mitochondrial DNA, and hyperspectral reflectance profiling (37 spectral bands from 411–870 nm) was conducted. Despite their morphological similarities, the genetic distances of the cytochrome c oxidase subunit I (COI) gene among the tested species are relatively large (5.8–17.3%). The species‐specific divergence of five morphologically similar species (Bundera pellucida and Bundera spp. 1–4) was revealed in mitochondrial DNA data and reflectance profiling. A key to identifying males is provided, and their morphological characters are described. Average reflectance profiles from the dorsal side of specimens were classified based on linear discriminant analysis. Cross‐validation of reflectance‐based classification revealed that the seven species could be distinguished with 91.3% classification accuracy. This study verified the feasibility of using hyperspectral imaging data in insect classification, and our work provides a good example of using integrative taxonomy in studies of closely related and cryptic species. © 2015 The Linnean Society of London  相似文献   

6.
MOTIVATION: One important aspect of data-mining of microarray data is to discover the molecular variation among cancers. In microarray studies, the number n of samples is relatively small compared to the number p of genes per sample (usually in thousands). It is known that standard statistical methods in classification are efficient (i.e. in the present case, yield successful classifiers) particularly when n is (far) larger than p. This naturally calls for the use of a dimension reduction procedure together with the classification one. RESULTS: In this paper, the question of classification in such a high-dimensional setting is addressed. We view the classification problem as a regression one with few observations and many predictor variables. We propose a new method combining partial least squares (PLS) and Ridge penalized logistic regression. We review the existing methods based on PLS and/or penalized likelihood techniques, outline their interest in some cases and theoretically explain their sometimes poor behavior. Our procedure is compared with these other classifiers. The predictive performance of the resulting classification rule is illustrated on three data sets: Leukemia, Colon and Prostate.  相似文献   

7.
The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an np constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.  相似文献   

8.
Summary The microorganisms from two industrial (I1, I2) activated sludges that treat glyphosate (N-phosphonomethyl glycine) wastes and one domestic (D1) sludge were enumerated by microscopic examination and by the use of eight selective media. I1 and I2 had higher total counts but fewer pseudomonads and no yeasts. The enumerations correlated directly with traditional biological performance measurements. A total of 393 microbial strains were isolated from the sludges to correlate the occurrence and relationship of glyphosate-degrading activity (GDA) to 155 biochemical and morphological characteristics. Each activated sludge contained unique bacterial populations with the microbes treating industrial wastes, capable of utilizing a wide range of carbohydrates. Numerical taxonomy (arithmetic average linkage) using simple matching and Jaccard coefficients confirmed that there were five (D1), three (I1), and 12 (I2) clusters. GDA was found in only a small portion of the industrial clusters and did not correlate with any other characteristic tested, even though the GDA strains had a large phenotypic diversity. This suggests that GDA is not a universal trait and its expression requires enrichment through specific selective pressures.  相似文献   

9.
云南医学革螨数值分类研究   总被引:4,自引:0,他引:4  
罗礼溥  郭宪国 《昆虫学报》2007,50(2):172-177
以云南省57种医学革螨作为分类单元,以形态特征为主列出60项分类性状特征来探讨云南省医学革螨不同属和种的亲缘关系。运用SPSS 11.5 统计软件中的系统聚类分析和主成分分析,对57种医学革螨进行了数值分类分析。结果显示:57种医学革螨划分为厉螨科(La elapidae)、寄螨科(Parasitidae)、皮刺螨科(Dermanyssidae)、赫刺螨科(Hirstionyssidae)和裂胸螨科(Aceosejidae)5个类群。赫刺螨属和棘刺螨属从厉螨科中分离出来另立为赫刺螨科,柏氏禽刺螨归入了皮刺螨科而不是巨刺螨科。分类结果与传统形态分类结果基本一致,因而认为数值分类能比较客观地反映医学革螨各分类阶元的分类地位与亲缘关系。  相似文献   

10.
中国沙拐枣属植物的数值分类研究   总被引:7,自引:0,他引:7  
陶玲 《西北植物学报》2002,22(5):1073-1085
选择了中国19种沙拐枣属(CalligonumL.)植物,共测定及引用了35个形态分类指标,应用单因素方差分析(MANOVA)和主成分分析(PCA),分别对形态因子进行了单元和多元分析。结果表明,所有的种间形态指标均差异显著,冠幅(BC),木质枝枝节长度(LKWB),果实直径(DF),雄蕊长度(SL),同化枝枝节长度(LKAS)和同化枝化枝角度(ARAS)指标在沙拐枣属植物的数值分类上,具有很强的差异性分析意义。依据平方欧氏距离,应用类平均法(UPGMA)将19种沙拐枣植物聚为5类,系统聚类结论与主成分分析的三维排序结果基本一致,与传统的形态分类结果有一定的差异。  相似文献   

11.
丛枝菌根真菌系统分类及群落研究技术进展   总被引:1,自引:2,他引:1  
丛枝菌根真菌(AMF)是自然生态系统重要的组成部分,能与植物根系形成互惠共生体.传统的AMF分类主要依赖于对土壤无性孢子的形态鉴定,具有一定的局限性.近年来基于核酸分析的分子鉴定技术使AMF的分类更具科学性和准确性,补充和完善了基于孢子形态鉴定所建立的分类系统.AMF群落研究依赖于AMF的分类鉴定,主要包括孢子形态鉴定和分子生物学分析两类研究法.本文综述了AMF的分类系统和群落研究方法,着重介绍了近年来应用较多的AMF群落研究的分子生物学技术.作者认为,采取形态与分子相结合的办法将有助于推动AMF群落研究和AMF自然分类系统的建立和完善.  相似文献   

12.
The purpose of this study was to utilize near-infrared spectroscopy and chemical imaging to characterize extrusion-spheronized drug beads, lipid-based placebo beads, and modified release tablets prepared from blends of these beads. The tablet drug load (10.5–19.5 mg) of theophylline (2.25 mg increments) and cimetidine (3 mg increments) could easily be differentiated using univariate analyses. To evaluate other tablet attributes (i.e., compression force, crushing force, content uniformity), multivariate analyses were used. Partial least squares (PLS) models were used for prediction and principal component analysis (PCA) was used for classification. The PLS prediction models (R 2 > 0.98) for content uniformity of uncoated compacted theophylline and cimetidine beads produced the most robust models. Content uniformity data for tablets with drug content ranging between 10.5 and 19.5 mg showed standard error of calibration (SEC), standard error of cross-validation, and standard error of prediction (SEP) values as 0.31, 0.43, and 0.37 mg, and 0.47, 0.59, and 0.49 mg, for theophylline and cimetidine, respectively, with SEP/SEC ratios less than 1.3. PCA could detect blend segregation during tableting for preparations using different ratios of uncoated cimetidine beads to placebo beads (20:80, 50:50, and 80:20). Using NIR chemical imaging, the 80:20 formulations showed the most pronounced blend segregation during the tableting process. Furthermore, imaging was capable of quantitating the cimetidine bead content among the different blend ratios. Segregation testing (ASTM D6940-04 method) indicated that blends of coated cimetidine beads and placebo beads (50:50 ratio) also tended to segregate.  相似文献   

13.
不同居群野生早樱形态变异研究   总被引:7,自引:1,他引:6  
利用数量分类学手段对福建武夷山和江苏宝华山不同居群野生早樱形态特征进行了比较研究。通过对2大居群24项形态指标的分析,不论是聚类分析还是主成分分析,2大居群的各个个体均归并为2类,自然体现出不同居群野生早樱表型特征的较大差异。主成分分析显示,叶部特征的长度、宽度、叶基夹角等,以及花部特征的花萼筒、花冠幅、花序总梗等是造成不同居群野生早樱表型差异的主要因素,单因素方差分析也印证了这一结论。最后,建议在分类学关系上将福建武夷山野生早樱做为大叶早樱新变种处理。  相似文献   

14.
Resolving the infrageneric classification of species-rich genera has been challenging in plant taxonomy. Ilex L. is a subcosmopolitan genus with over 600 species of dioecious trees and shrubs. Many classification systems based on morphological data have been proposed during the past 250 years. However, these systems (such as Loesener's and Galle's systems) may not truly reflect Ilex's evolutionary trajectories because most of those system's infrageneric hierarchies are not monophyletic. In this study, we reconstructed a phylogeny of Ilex L. comprising 15 moderately to highly supported clades using rigorously identified samples (202 species) and closely authenticated gene sequences of three nuclear genes [internal transcribed spacer (ITS), external transcribed spacer (ETS), and nepGS]. The newly generated phylogenetic tree resembles essentially that of the nuclear tree of Manen et al., but shows conspicuous topological differences with the phylogeny of Yao et al. Closely scrutinizing morphological variation and distributional patterns of 202 species, this study found that most lineages of Ilex identified herein are well defined by a particular trait or a combination of morphological and distributional traits, displaying phylogeny–morphology–distribution conformity that has seldom been uncovered in previous studies. Given the general phylogeny–morphology–distribution conformity revealed in this genus, we put forward an updated sectional classification system for Ilex that temporarily contains 14 sections. The new classification will provide a robust framework for studying the evolution and diversification of this ecologically and economically important genus.  相似文献   

15.
Faunistic survey using a DNA taxonomy approach may provide different results from morphological methods, especially for small and understudied animals. In this study, we report the results from morphometric analyses (linear measurements of the lorica) and DNA taxonomy (generalized mixed Yule coalescent model on the barcoding mtDNA locus cytochrome c oxidase subunit I) performed on 15 clonal lineages of the rotifer Brachionus plicatilis species complex from six Iranian inland saltwaters. The DNA taxonomy approach found more units of diversity (four) than the morphometric approach (two) in the studied rotifers. Three of the taxa identified in this study are already known as described valid species or as‐yet unnamed lineages, but a new, additional lineage is also identified from Iran. © 2014 The Linnean Society of London  相似文献   

16.
The taxonomy of the genets (genus Genetta) has long been discussed, thus hampering endeavours towards evolutionary reconstruction. Sequence data from the complete cytochrome b gene (cyt b) were generated for 50 specimens representing 15 morphological species in order to allow the production of the first exhaustive molecular phylogeny of the genets. Second, a revised morphological matrix comprising 50 characters was combined with the cyt b data to estimate the level of morphological homoplasy. Phylogenetic analyses were conducted using parsimony, maximum likelihood and Bayesian procedures. Our results based on cyt b contradict a part of the traditional taxonomy of genus Genetta, the servaline and small‐spotted genets being paraphyletic, but confirmed the species status recently re‐investigated for three genets belonging to the large‐spotted complex, including the newly described G. bourloni. The combined analysis yielded similar results although morphological characters were clearly homoplasic. Partitioned Bremer supports indicated conflicting signals between the two data sets throughout the tree, and species‐diagnostic characters, useful for delimiting species boundaries, were significantly correlated to habitat. However, morphological data supported the monophyly of clades (G. victoriae, other genets) (G. servalina, G. cristata), large‐spotted genet complex and forest forms. Our results suggest a complex evolutionary history of the genets in Africa, with a Poiana‐like ancestor inhabiting rain forest, and then a diversification involving two independent invasions of open habitats and one reversion to rain forest. Divergence estimates based on cyt b revealed that splitting events within genets partly follow a climatic speciation model during the cyclical periods of the Quaternary, although ‘primitive’ rain forest lineages diverged earlier, during the Late Miocene and Early Pliocene. © 2004 The Linnean Society of London, Biological Journal of the Linnean Society, 2004, 81 , 589–610.  相似文献   

17.
Chromodorid nudibranchs (Chromodorididae) are brightly coloured sea slugs that live in some of the most biodiverse and threatened coral reefs on the planet. However, the evolutionary relationships within this family have not been well understood, especially in the genus Glossodoris. Members of Glossodoris have experienced large‐scale taxonomic instability over the last century and have been the subject of repeated taxonomic changes, in part due to morphological characters being the sole traditional taxonomic sources of data. Changing concepts of traditional generic boundaries based on morphology also have contributed to this instability. Despite recent advances in molecular systematics, many aspects of chromodorid taxonomy remain poorly understood, particularly at the traditional species and generic levels. In this study, 77 individuals comprising 32 previously defined species were used to build the most robust phylogenetic tree of Glossodoris and related genera using mitochondrial genes cytochrome c oxidase subunit I and 16S, and the nuclear gene 28S. Bayesian inference, maximum likelihood, and maximum parsimony analyses verify the most recent hypothesized evolutionary relationships within Glossodoris. Additionally, a pseudocryptic and cryptic species complex within Glossodoris cincta and a pseudocryptic complex within Glossodoris pallida emerged, and three new species of Doriprismatica are identified.  相似文献   

18.
Organismal taxonomy is often based on a single or a small number of morphological characters. When they are morphologically simple or known to be plastic, we may not have great confidence in the taxonomic conclusions of analyses based on these characters. For example, calyptraeid gastropod shells are well known for their simplicity and plasticity, and appear to be subject to frequent evolutionary convergences, but are nevertheless the basis for calyptraeid taxonomy. In a case like this, knowing how the pattern of relationships inferred from morphological features used in traditional taxonomy compares to the patterns of relationships inferred from other morphological characters or DNA sequence data would be useful. In this paper, I examine the relative utility of traditional taxonomic characters (shell characters), anatomical characters and molecular characters for reconstructing the phylogeny of calyptraeid gastropods. The results of an ILD test and comparisons of the recovered tree topologies suggest that there is conflict between the DNA sequence data and the morphological data. Very few of the nodes recovered by the morphological data were recovered by any other dataset. Despite this conflict, the inclusion of morphological data increased the resolution and support of nodes in the topology recovered from a combined dataset. The RIs and CIs of the morphological data on the best estimate topology were not any worse than these indices for the other datasets. This analysis demonstrates that although analyses can be misled by these convergences if morphological characters are used alone, these characters contribute significantly to the combined dataset.  © 2003 The Linnean Society of London . Biological Journal of the Linnean Society , 2003, 78 , 541–593.  相似文献   

19.
Authenticity and quality coherence are the major elements in ensuring the consistency of the expected beneficial outcomes from the use of traditional or herbal remedies. Metabolomics offers the possibility of addressing these issues. Principal Component Analysis (PCA) was applied to select the best solvent system for sample extraction. Partial Least Square (PLS) regression analysis was found useful in evaluating the relationship between Nigella sativa seeds from four different origins on the basis of their metabolite profiles. In this study, different bioactivities were displayed by different samples with the Qasemi and Syrian samples exhibited high α-glucosidase inhibitory activity, which was correlated to the high fatty acid contents based on the PLS model. The Ethiopian sample exhibited high DPPH radical scavenging and nitric oxide (NO) inhibition activities, which may be related to the presence of high levels of thymoquinone and thymol. The method was also successfully used to classify “new test” samples into their proper groups.  相似文献   

20.
Chionactis occipitalis (Western Shovel-nosed Snake) is a small colubrid snake inhabiting the arid regions of the Mojave, Sonoran, and Colorado deserts. Morphological assessments of taxonomy currently recognize four subspecies. However, these taxonomic proposals were largely based on weak morphological differentiation and inadequate geographic sampling. Our goal was to explore evolutionary relationships and boundaries among subspecies of C. occipitalis, with particular focus on individuals within the known range of C. o. klauberi (Tucson Shovel-nosed snake). Population sizes and range for C. o. klauberi have declined over the last 25 years due to habitat alteration and loss prompting a petition to list this subspecies as endangered. We examined the phylogeography, population structure, and subspecific taxonomy of C. occipitalis across its geographic range with genetic analysis of 1100 bases of mitochondrial DNA sequence and reanalysis of 14 morphological characters from 1543 museum specimens. We estimated the species gene phylogeny from 81 snakes using Bayesian inference and explored possible factors influencing genetic variation using landscape genetic analyses. Phylogenetic and population genetic analyses reveal genetic isolation and independent evolutionary trajectories for two primary clades. Our data indicate that diversification between these clades has developed as a result of both historical vicariance and environmental isolating mechanisms. Thus these two clades likely comprise ‘evolutionary significant units’ (ESUs). Neither molecular nor morphological data are concordant with the traditional C. occipitalis subspecies taxonomy. Mitochondrial sequences suggest specimens recognized as C. o. klauberi are embedded in a larger geographic clade whose range has expanded from western Arizona populations, and these data are concordant with clinal longitudinal variation in morphology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号