首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.  相似文献   

2.
Question: How useful are Ellenberg N‐values for predicting the herbage yield of Central European grasslands in comparison to approaches based on ordination scores of plant species composition or on soil parameters? Location: Central Germany (11°00′‐11°37’E, 50°21‐50°34’N, 500–840 m a.s.l.). Methods: Based on data from a field survey in 2001, the following models were constructed for predicting herbage yield in montane Central European grasslands: (1) Linear regression of mean Ellenberg N‐, R‐ and F‐values; (2) Linear regression of ordination scores derived from Non‐metric Multidimensional Scaling (NMDS) of vegetation data; and (3) Multiple linear regression (MLR) of soil variables. Models were evaluated by cross‐validation and validation with additional data collected in 2002. Results: Best predictions were obtained with models based on species composition. Ellenberg N‐values and NMDS scores performed equally well and better than models based on Ellenberg R‐ or F‐values. Predictions based on soil variables were least accurate. When tested with data from 2002, models based on Ellenberg N‐values or on NMDS scores accurately predicted productivity rank order of sites, but not the actual herbage yield of particular sites. Conclusions: Mean Ellenberg N‐values, which are easy to calculate, are as accurate as ordination scores in predicting herbage yield from plant species composition. In contrast, models based on soil variables may be useful for generating hypotheses about the factors limiting herbage yield, but not for prediction. We support the view that Ellenberg N‐values should be called productivity values rather than nitrogen values.  相似文献   

3.
A fine-scaled approach for predicting soil acidity using plant species in a spatially limited area (?epú?ky Nature Reserve, Slovakia) is presented here. This approach copes with some specific limitations: i) a limited pool of vegetation data may make the predictions too sensitive to the lack of species information, and ii) the predictions may be sensitive to the narrow pH gradient. Vegetation relevés and soil reaction (pH-H2O and pH-CaCl2) were systematically recorded. A set of species indicator values and amplitudes was calibrated with physical pH data using the Weighted Averaging (WA), HOF modelling and Non-Metric Multidimensional Scaling (NMDS) methods, along with Ellenberg indicator values. Two prediction methods were tested: i) WA and ii) Amplitude Overlap (AO). WA prediction with Ellenberg’s and WA-calibrated species indicator values were the most powerful technique (R 2?=?68.4–68.7% and 53.4–59.1% for pH-CaCl2 and pH-H2O, respectively). WA-prediction with HOF-based indicator values was less effective (R 2?=?61.7% and 50.7%) due to the decrease in species’ information because with HOF modelling many species are assumed indifferent or too rare. The NMDS method does not bring any significant gain to the calibration, though it avoids the lack of species information. The AO method was proven to be less powerful under studied circumstances, because it is sensitive both to the lack of species’ information and to the truncation of species responses. The results prove that a spatially explicit approach can provide significant indices to estimate changes in soil acidity – pH-CaCl2 better than pH-H2O.  相似文献   

4.
Moist lower montane vegetation has rarely been classified beyond broad zonational belts over large altitudinal ranges due to highly diverse species composition and structure. This study shows it is possible to further classify such forest types within Bwindi‐Impenetrable National Park (BINP), and that these assemblages can be explained by a combination of environmental conditions and past management. Botanical and environmental data were collected along some 4000 m of linear transects from the area surrounding Mubwindi Swamp, BINP. Ordination using Nonmetric Multidimensional Scaling (NMDS) and classification using Two‐way Indicator Species Analysis (TWINSPAN) successfully identified four different species assemblages. These forest types were then named on the basis of the ecological characteristics of the species within the group, and the environmental conditions influencing the distribution and past disturbance of the forest. The techniques used were in agreement for three out of the four forest types identified. Analysis using an environmental overlay showed a significant association between forest type and altitude. The results of this study indicate that a regional classification of forest types within moist lower montane forest belt using only tree species is possible, and that the forest types identified can be explained by environmental conditions and past management.  相似文献   

5.
Background: Most studies on tropical bryophytes deal with epiphytic species. This is the first ecological study of tropical forests that focuses specifically on terrestrial bryophytes.

Aim: To investigate the differences between slope and ridge environments in upper montane forests of southern Ecuador in terms of species diversity (richness, abundance), species composition and life forms of terrestrial bryophytes.

Methods: We used Non-metric Multidimensional Scaling (NMDS) to group bryophyte relevés by study location, habitat type and exposure class. Species indicator values were calculated and compared for different habitats.

Results: In total, 140 species were recorded, the majority being liverworts. NMDS analyses and Mantel correlations clearly separated between slope and ridge relevés, and between sunny and shaded microhabitats on ridges. Bryophyte life forms also showed different distribution patterns in slope and in ridge habitats. Mosses were more prominent in sunny than in shaded microhabitats.

Conclusions: Environmental differentiation between ridges and slopes, and small-scale variation in microclimatic conditions caused by differences in exposure, were stronger predictors of species richness and composition than geographical distance between study sites.  相似文献   

6.

Background  

Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets.  相似文献   

7.
MOTIVATION: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.  相似文献   

8.
Abstract. This article investigates whether the Braun‐Blanquet abundance/dominance (AD) scores that commonly appear in phytosociological tables can properly be analysed by conventional multivariate analysis methods such as Principal Components Analysis and Correspondence Analysis. The answer is a definite NO. The source of problems is that the AD values express species performance on a scale, namely the ordinal scale, on which differences are not interpretable. There are several arguments suggesting that no matter which methods have been preferred in contemporary numerical syntaxonomy and why, ordinal data should be treated in an ordinal way. In addition to the inadmissibility of arithmetic operations with the AD scores, these arguments include interpretability of dissimilarities derived from ordinal data, consistency of all steps throughout the analysis and universality of the method which enables simultaneous treatment of various measurement scales. All the ordination methods that are commonly used, for example, Principal Components Analysis and all variants of Correspondence Analysis as well as standard cluster analyses such as Ward's method and group average clustering, are inappropriate when using AD data. Therefore, the application of ordinal clustering and scaling methods to traditional phytosociological data is advocated. Dissimilarities between relevés should be calculated using ordinal measures of resemblance, and ordination and clustering algorithms should also be ordinal in nature. A good ordination example is Non‐metric Multidimensional Scaling (NMDS) as long as it is calculated from an ordinal dissimilarity measure such as the Goodman & Kruskal γ coefficient, and for clustering the new OrdClAn‐H and OrdClAn‐N methods.  相似文献   

9.
B. Peco 《Plant Ecology》1989,83(1-2):269-276
Pasture vegetation in an open woodland of Quercus rotundifolia subjected to periodic ploughing was sampled in spring during 8 consecutive years. The frequency of herbaceous species was recorded in a total of 69 permanent plots located on 5 adjacent sites with similar lithology, slope and orientation but differing in age since previous ploughing.Vegetation dynamics expressed as trajectories of permanent plots in a non-metric multidimensional scaling space has been modelled in terms of evironmental variables. By fitting a generalized linear model, the dynamics are shown to be related to years since last ploughing, geographical location of plots, total annual rainfall and November rainfall. Meteorological patterns of the sampling period are also described.Abbreviations GLM = Generalized Linear Model - NMDS = Non-metric Multi-Dimensional Scaling - UPGMA = Unweighted Pair-Group Method using Arithmetic Averages  相似文献   

10.
为了研究长江口丰水季邻近海域大型底栖动物群落特征, 我们根据2012年6、8和10月长江口邻近海域大型底栖动物的调查资料, 应用双因素方差分析(Two-Way ANOVA)、聚类分析(Cluster)、非参数多维标度排序(Non-metric Multidimensional Scaling, NMDS)以及丰度生物量比较曲线(Abundance-Biomass Comparison Curves, ABC Curves)对数据资料进行分析。本研究共记录大型底栖动物181种, 其中多毛类动物82种, 甲壳动物46种, 软体动物31种, 棘皮动物11种, 其他类群11种。大型底栖动物丰度、生物量、种类丰富度和多样性指数月份间差异和空间差异均不显著。均匀度指数月份间差异不显著, 而空间上远海显著高于近海。6、8和10月大型底栖动物在20%的相似性水平上划分为3-4个群聚, 不同站位相似性水平较低。ABC曲线表明远海大型底栖动物群落受扰动的程度小于近海。受人类活动的持续影响, 长江口邻近海域大型底栖动物种类变化剧烈, 空间分布不均匀。  相似文献   

11.
Benthic communities of macroinvertebrates, algae, and microorganisms were concurrently collected using a Surber sampler (30 × 30 m2; 300 μm mesh), brush (5 × 5 cm2), and syringe (100 mL; Denaturing Gradient Gel Electrophoresis), respectively, to determine the ecological integrity of streams with different levels of pollution. Macroinvertebrates provided a clearer representation of the gradient of pollution, while a broader scope of species distribution was observed for algae and microorganisms, including sites severely polluted with heavy metals. Species associations among different taxa were presented on the Self-Organizing Map (SOM) and Nonmetric Multidimensional Scaling (NMDS) based on environmental factors. After screening, indicator species visualized on the SOM represented a wider range of environmental impacts and were more illustrative with benthic macroinvertebrates in least polluted sites. In contrast NMDS presented species more closely associated with overall variance of communities with severe pollution, mainly in microorganisms and algae. Multi-taxa community analysis using SOM and NMDS in combination would provide a comprehensive assessment for addressing ecological integrity in streams.  相似文献   

12.
VEGAN,a package of R functions for community ecology   总被引:4,自引:0,他引:4  
Abstract. VEGAN adds vegetation analysis functions to the general‐purpose statistical program R. Both R and VEGAN can be downloaded for free. VEGAN implements several ordination methods, including Canonical Correspondence Analysis and Non‐metric Multidimensional Scaling, vector fitting of environmental variables, randomization tests, and various other analyses of vegetation data. It can be used for large data. Graphical output can be customized using the R language's extensive graphics capabilities. VEGAN is appropriate for routine and research use, if you are willing to learn some R.  相似文献   

13.
Spatial heterogeneity in the plant species composition of tropical forests is expected to influence animal species abundance and composition because vegetation constitutes the primary habitat feature for forest animals. Floristic variation is tied to variation in soils, so edaphic properties should ultimately influence animal species composition as well. The study of covariation in floristic and faunistic turnover has been hindered by the difficulty of completing coordinated surveys in hyperdiverse tropical communities, but this can be overcome with the use of a few plant taxa that function as surrogates for general floristic turnover. We used avian and plant transect surveys and soil sampling in a western Amazonian upland (terra firme) forest landscape to test whether spatial variation in bird community composition is associated with floristic turnover and corresponding edaphic gradients. Partial Mantel tests and Non‐metric Multidimensional Scaling showed floristic distinctiveness between two forest types closely associated with differences in soil cation concentrations, and differences in both floristic composition and cation concentrations were further linked to compositional differences in avian species, independent of geographic distances among sites. Ten percent of bird species included in Indicator Species Analyses showed significant associations with one of the two forest types. The upland forest types that we sampled, each corresponding to a different geological formation, are intermediate relative to edaphically extreme environments in the region. Models of avian diversification should take into account this environmental heterogeneity, as should conservation planning approaches that aim to represent faunal diversity. Abstract in Spanish is available in the online version of this article.  相似文献   

14.

The effects of sea level rise and coastal saltwater intrusion on wetland plants can extend well above the high-tide line due to drought, hurricanes, and groundwater intrusion. Research has examined how coastal salt marsh plant communities respond to increased flooding and salinity, but more inland coastal systems have received less attention. The aim of this study was to identify whether ground layer plants exhibit threshold responses to salinity exposure. We used two vegetation surveys throughout the Albemarle-Pamlico Peninsula (APP) of North Carolina, USA to assess vegetation in a low elevation landscape (≤?3.8 m) experiencing high rates of sea level rise (3–4 mm/year). We examined the primary drivers of community composition change using Non-metric Multidimensional Scaling (NMDS) and used Threshold Indicator Taxa Analysis (TITAN) to detect thresholds of compositional change based on indicator taxa, in response to potential indicators of exposure to saltwater (Na, and the Σ Ca?+?Mg) and elevation. Salinity and elevation explained 64% of the variation in community composition, and we found two salinity thresholds for both soil Na+ (265 and 3843 g Na+/g) and Ca+ +?Mg+ (42 and 126 µeq/g) where major changes in community composition occur on the APP. Similar sets of species showed sensitivity to these different metrics of salt exposure. Overall, our results showed that ground layer plants can be used as reliable indicators of salinity thresholds in coastal wetlands. These results can be used for monitoring salt exposure of ecosystems and for identifying areas at risk for undergoing future community shifts.

  相似文献   

15.
Five statistically appropriate multivariate analyses were applied to the same data on burrowing in the sea hare Aplysia brasiliana to: (1) identify homogeneous subject-related subgroups within a heterogeneous sample, and (2) compare the extent of congruency among the analyses in terms of the number of extracted subgroups and each subject's placement within the subgroups. Raw scores from 32 subjects on ten burrowing parameters were origin-corrected, standardized to z-scores, and normalized in order to facilitate comparisons among the analyses. One to five identified subgroups were extracted which indicated sensitivity differences to sampling variability among the methods. These results suggested that selecting a biologically interpretable analysis represents the subjective aspect of quantitative data treatment. Q-factor analysis (three subgroups) and linear typal analysis (four subgroups) yielded the most biologically interpretable subgroups for these data. Multidimensional scaling (one group) and principal-components analysis (two subgroups) tended to “lump” subjects, while simple distance-function cluster analysis (five subgroups) tended to “split” subjects into additional groups. As a diagonistic tool, multivariate analyses provide insight into underlying dimensions of individual variation and help generate testable hypotheses for guiding future research.  相似文献   

16.
四川江津四面山常绿阔叶林永久样地的非线性排序   总被引:13,自引:0,他引:13       下载免费PDF全文
 本文以四川省江津四面山1 ha永久样地的常绿阔叶林为研究对象,利用样地内20个样方优势乔木种的重要值数据,采用主分量分析(PCA),无偏主分量分析(DPC)和非度量多维调节(NMDS)等方法进行分类和排序。PCA分析结果发现样方坐标数据具有非线性,DPC检测证实其非线性相当明显。通过NMDS分析,经过39次反复迭代,获得结果清楚地反映了群落优势种与环境因子的相互关系,为以后的次生演替研究提供了基础资料。  相似文献   

17.
Insect pollinators are important means for a stable ecosystem. The habitat types play a crucial role in the community composition, abundance, diversity, and species richness of the pollinators. The present study in Shivapuri‐Nagarjun National Park explored the species richness and abundances of insect pollinators in four different habitats and different environmental variables in determining the community composition of the pollinators. Data were collected from 1,500 m to 2,700 m using color pan traps and hand sweeping methods. Non‐Metric Multidimensional Scaling (NMDS) and Redundancy Analysis (RDA) were conducted to show the association between insect pollinators and environmental variables. The results firmly demonstrated that species richness and abundances were higher (158) in Open trail compared to other habitats. The distribution of the pollinator species was more uniform in the Open trail followed by the Grassland. Similarly, a strong positive correlation between flower resources and pollinators'' abundance (R2 = .63, P < .001) was found. In conclusion, the Open trail harbors rich insect pollinators in lower elevation. The community structure of the pollinators was strongly influenced by the presence of flowers in the trails.  相似文献   

18.
Based on species endemism, three biodiversity centers, called “Ecological Corridors” have been proposed as one of the main conservation strategies for the Atlantic Rain Forest. This study tested whether the organization of the social paper wasp assemblage fits those centers. A standardized protocol was used for sampling the social paper wasp fauna. The structural organization was estimated by Nonmetric Multidimensional Scaling (NMDS) based on the similarity indexes of Sorensen (qualitative data) and Morisita-Horn (quantitative data). Regressive models were applied to the first axes’ site scores of the NMDS, to the latitudinal and altitudinal variations, and to the speciation and immigration probabilities predicted by the neutral theory for a metacommunity. Our results indicated that the social paper wasp assemblage is organized in a continuum, with two distinct biodiversity centers. The organization of the assemblage along the gradient was dependent on latitudinal and altitudinal variations and their interactions, and also on the speciation and immigration probabilities. Several studies have demonstrated that the current biodiversity patterns of the Atlantic Forest might be explained by the past climate and, consequently, by the connection between the Amazon and the Atlantic Forest. In addition, speciation and immigration probabilities strongly influence the compositional and structural variations of the social paper wasp assemblage along the latitudinal gradient.  相似文献   

19.
We construct embedded functional connectivity networks (FCN) from benchmark resting-state functional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia and healthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimensional Scaling, Isometric Feature Mapping, Diffusion Maps, Locally Linear Embedding and kernel PCA. Furthermore, based on key global graph-theoretic properties of the embedded FCN, we compare their classification potential using machine learning. We also assess the performance of two metrics that are widely used for the construction of FCN from fMRI, namely the Euclidean distance and the cross correlation metric. We show that diffusion maps with the cross correlation metric outperform the other combinations.  相似文献   

20.
Question: What are the trends and patterns in the application of ordination techniques in vegetation science since 1990? Location: Worldwide literature analysis. Methods: Evaluation of five major journals of vegetation science; search of all ISI‐listed ecological journals. Data were analysed with ANCOVAs, Spearman rank correlations, GLMs, biodiversity indices and simple graphs. Results: The ISI search retrieved fewer papers that used ordinations than the manual evaluation of five selected journals. Both retrieval methods revealed a clear trend in increasing frequency of ordination applications from 1990 to the present. Canonical Correspondence Analysis was far more frequently detected by the ISI search than any other method. Applications such as Correspondence Analysis/Reciprocal Averaging and Detrended Correspondence Analysis have increasingly been used in studies published in “applied” journals, while Canonical Correspondence Analysis, Redundancy Analysis and Non‐Metric Multidimensional Scaling were more frequently used in journals focusing on more “basic” research. Overall, Detrended Correspondence Analysis was the most commonly applied method within the five major journals, although the number of publications slightly decreased over time. Use of Non‐Metric Multidimensional Scaling has increased over the last 10 years. Conclusion: The availability of suitable software packages has facilitated the application of certain techniques such as Non‐Metric Multidimensional Scaling. However, choices of ordination techniques are currently less driven by the constraints imposed by the software; there is also limited evidence that the choice of methods follows social considerations such as the need to use fashionable methods. Methodological diversity has been maintained or has even increased over time and reflects the researcher's need for diverse analytical tools suitable to address a wide range of questions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号