首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although climate is known to be one of the key factors determining animal species distributions amongst others, projections of global change impacts on their distributions often rely on bioclimatic envelope models. Vegetation structure and landscape configuration are also key determinants of distributions, but they are rarely considered in such assessments. We explore the consequences of using simulated vegetation structure and composition as well as its associated landscape configuration in models projecting global change effects on Iberian bird species distributions. Both present-day and future distributions were modelled for 168 bird species using two ensemble forecasting methods: Random Forests (RF) and Boosted Regression Trees (BRT). For each species, several models were created, differing in the predictor variables used (climate, vegetation, and landscape configuration). Discrimination ability of each model in the present-day was then tested with four commonly used evaluation methods (AUC, TSS, specificity and sensitivity). The different sets of predictor variables yielded similar spatial patterns for well-modelled species, but the future projections diverged for poorly-modelled species. Models using all predictor variables were not significantly better than models fitted with climate variables alone for ca. 50% of the cases. Moreover, models fitted with climate data were always better than models fitted with landscape configuration variables, and vegetation variables were found to correlate with bird species distributions in 26-40% of the cases with BRT, and in 1-18% of the cases with RF. We conclude that improvements from including vegetation and its landscape configuration variables in comparison with climate only variables might not always be as great as expected for future projections of Iberian bird species.  相似文献   

2.
In Australia and increasingly worldwide, methamphetamine is one of the most commonly seized drugs analysed by forensic chemists. The current well-established GC/MS methods used to identify and quantify methamphetamine are lengthy, expensive processes, but often rapid analysis is requested by undercover police leading to an interest in developing this new analytical technique. Ninety six illicit drug seizures containing methamphetamine (0.1%–78.6%) were analysed using Fourier Transform Infrared Spectroscopy with an Attenuated Total Reflectance attachment and Chemometrics. Two Partial Least Squares models were developed, one using the principal Infrared Spectroscopy peaks of methamphetamine and the other a Hierarchical Partial Least Squares model. Both of these models were refined to choose the variables that were most closely associated with the methamphetamine % vector. Both of the models were excellent, with the principal peaks in the Partial Least Squares model having Root Mean Square Error of Prediction 3.8, R2 0.9779 and lower limit of quantification 7% methamphetamine. The Hierarchical Partial Least Squares model had lower limit of quantification 0.3% methamphetamine, Root Mean Square Error of Prediction 5.2 and R2 0.9637. Such models offer rapid and effective methods for screening illicit drug samples to determine the percentage of methamphetamine they contain.  相似文献   

3.
Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin, Northwest China. Using species distribution models to predict the biogeographical distribution of species is regularly used in conservation and other management activities. However, the uncertainty in the data and models inevitably reduces their prediction power. The major purpose of this study is to assess the impacts of predictor variables and species distribution models on simulating T. ramosissima distribution, to explore the relationships between predictor variables and species distribution models and to model the potential distribution of T. ramosissima in this basin.Methods Three models—the generalized linear model (GLM), classification and regression tree (CART) and Random Forests—were selected and were processed on the BIOMOD platform. The presence/absence data of T. ramosissima in the Tarim Basin, which were calculated from vegetation maps, were used as response variables. Climate, soil and digital elevation model (DEM) data variables were divided into four datasets and then used as predictors. The four datasets were (i) climate variables, (ii) soil, climate and DEM variables, (iii) principal component analysis (PCA)-based climate variables and (iv) PCA-based soil, climate and DEM variables.Important findings The results indicate that predictive variables for species distribution models should be chosen carefully, because too many predictors can reduce the prediction power. The effectiveness of using PCA to reduce the correlation among predictors and enhance the modelling power depends on the chosen predictor variables and models. Our results implied that it is better to reduce the correlating predictors before model processing. The Random Forests model was more precise than the GLM and CART models. The best model for T. ramosissima was the Random Forests model with climate predictors alone. Soil variables considered in this study could not significantly improve the model's prediction accuracy for T. ramosissima. The potential distribution area of T. ramosissima in the Tarim Basin is ~3.57 × 10 4 km 2, which has the potential to mitigate global warming and produce bioenergy through restoring T. ramosissima in the Tarim Basin.  相似文献   

4.
Aim Eleutherodactylus coqui (commonly known as the coqui) is a frog species native to Puerto Rico and non‐native in Hawaii. Despite its ecological and economic impacts, its potential range in Hawaii is unknown, making control and management efforts difficult. Here, we predicted the distribution potential of the coqui on the island of Hawaii. Location Puerto Rico and Hawaii. Methods We predicted its potential distribution in Hawaii using five biophysical variables derived from Moderate Resolution Imaging Spectroradiometer (MODIS) as predictors, presence/absence data collected from Puerto Rico and Hawaii and three classification methods – Classification Trees (CT), Random Forests (RF) and Support Vector Machines (SVM). Results Models developed separately using data from the native range and the invaded range predicted potential coqui habitats in Hawaii with high performance. Across the three classification methods, mean area under the ROC curve (AUC) was 0.75 for models trained using the native range data and 0.88 for models trained using the invaded range data. We achieved the highest AUC value of 0.90 using RF for models trained with invaded range data. Main conclusions Our results showed that the potential distribution of coquis on the island of Hawaii is much larger than its current distribution, with RF predicting up to 49% of the island as suitable coqui habitat. Predictions also show that most areas with an elevation between 0 and 2000 m are suitable coqui habitats, whereas the cool and dry high elevation areas beyond 2000 m elevation are unsuitable. Results show that MODIS‐derived biophysical variables are capable of characterizing coqui habitats in Hawaii.  相似文献   

5.
ABSTRACT: BACKGROUND: Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional. RESULTS: RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions. CONCLUSIONS: While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.  相似文献   

6.
Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.  相似文献   

7.
Since 1996 when Highly Pathogenic Avian Influenza type H5N1 first emerged in southern China, numerous studies sought risk factors and produced risk maps based on environmental and anthropogenic predictors. However little attention has been paid to the link between the level of intensification of poultry production and the risk of outbreak. This study revised H5N1 risk mapping in Central and Western Thailand during the second wave of the 2004 epidemic. Production structure was quantified using a disaggregation methodology based on the number of poultry per holding. Population densities of extensively- and intensively-raised ducks and chickens were derived both at the sub-district and at the village levels. LandSat images were used to derive another previously neglected potential predictor of HPAI H5N1 risk: the proportion of water in the landscape resulting from floods. We used Monte Carlo simulation of Boosted Regression Trees models of predictor variables to characterize the risk of HPAI H5N1. Maps of mean risk and uncertainty were derived both at the sub-district and the village levels. The overall accuracy of Boosted Regression Trees models was comparable to that of logistic regression approaches. The proportion of area flooded made the highest contribution to predicting the risk of outbreak, followed by the densities of intensively-raised ducks, extensively-raised ducks and human population. Our results showed that as little as 15% of flooded land in villages is sufficient to reach the maximum level of risk associated with this variable. The spatial pattern of predicted risk is similar to previous work: areas at risk are mainly located along the flood plain of the Chao Phraya river and to the south-east of Bangkok. Using high-resolution village-level poultry census data, rather than sub-district data, the spatial accuracy of predictions was enhanced to highlight local variations in risk. Such maps provide useful information to guide intervention.  相似文献   

8.
Species distribution models and consensus models allow knowing the distribution of species in large areas where there is no field data and identifying the most important drivers for those distributions. In this study, seven individual models were used to obtain a consensus model to determine the potential distribution for six freshwater fish species in several watersheds of Northern Spain. Moreover, three different methods of model evaluation were used for performance comparison. Fish data were obtained from databases provided by different organisms related to aquatic systems containing information on 759 field sites sampled between October 2002 and June 2011 using electrofishing techniques. Dependent variables were obtained after filtering field sites according to a human pressure gradient analysis, while independent variables were derived from a Synthetic River Network for the study area. The ‘best’ individual models were obtained using Random Forest, Generalized Boosted Models and Generalized Additive Models, but with differing results among species and evaluation methods. The different consensus models revealed a high degree of adjustment between modelled and observed data. The most important factors related to fish distributions were the width of the valley floor, mean annual flow, average catchment elevation, distance to the sea, and total catchment area. The importance and critical limits of presence‐absence for these key variables differed among species. Use of these models could assist in the prioritization and selection of specific catchment and river reach actions for fish population management, restoration and/or conservation.  相似文献   

9.
The Food and Drug Administration (FDA) initiative of Process Analytical Technology (PAT) encourages the monitoring of biopharmaceutical manufacturing processes by innovative solutions. Raman spectroscopy and the chemometric modeling tool partial least squares (PLS) have been applied to this aim for monitoring cell culture process variables. This study compares the chemometric modeling methods of Support Vector Machine radial (SVMr), Random Forests (RF), and Cubist to the commonly used linear PLS model for predicting cell culture components—glucose, lactate, and ammonia. This research is performed to assess whether the use of PLS as standard practice is justified for chemometric modeling of Raman spectroscopy and cell culture data. Model development data from five small-scale bioreactors (2 × 1 L and 3 × 5 L) using two Chinese hamster ovary (CHO) cell lines were used to predict against a manufacturing scale bioreactor (2,000 L). Analysis demonstrated that Cubist predictive models were better for average performance over PLS, SVMr, and RF for glucose, lactate, and ammonia. The root mean square error of prediction (RMSEP) of Cubist modeling was acceptable for the process concentration ranges of glucose (1.437 mM), lactate (2.0 mM), and ammonia (0.819 mM). Interpretation of variable importance (VI) results theorizes the potential advantages of Cubist modeling in avoiding interference of Raman spectral peaks. Predictors/Raman wavenumbers (cm−1) of interest for individual variables are X1139–X1141 for glucose, X846–X849 for lactate, and X2941–X2943 for ammonia. These results demonstrate that other beneficial chemometric models are available for use in monitoring cell culture with Raman spectroscopy.  相似文献   

10.
Fan  Chao  Liu  Diwei  Huang  Rui  Chen  Zhigang  Deng  Lei 《BMC bioinformatics》2016,17(1):85-95
Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints.  相似文献   

11.
Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.  相似文献   

12.
翟天庆  李欣海 《生态学报》2012,32(8):2361-2370
气候变化的不确定性和物种与环境关系的不确定性使气候变化生物学的研究充满变数。为了降低不确定性,人们开始用组合模型综合比较的方法研究物种对气候变化的响应。以朱鹮(Nipponia nippon)为研究对象,介绍组合模型综合比较方法的特点。朱鹮曾经高度濒危,目前种群大小在迅速恢复中;然而其分布区依旧狭小,气候变化可能是朱鹮面临的新威胁。应用BIOMOD模型中的9种模型,选择了每年的最低温和最高温、温度的季节性变异、每年的总降水量和降水的季节性变异共5个气候因子,依据WorldClim气候数据的CGCM2气候模型的A2a排放情形,计算了朱鹮当前(1950—2000年)的适宜生境和2020年、2050年、2080年3个阶段的潜在生境范围。结果表明朱鹮潜在生境将逐渐北移,生境中心脱离现在的保护区。因此,制定朱鹮的长期保护策略是必要的。9个模型在预测结果上、变量权重上和拟合优度的指标上都有差异,反映了模型本身的不确定性。气候变化的生物学效应比较复杂,应用多个模型进行综合比较,可以尽可能地减少模型所导致的误差。  相似文献   

13.
Concentration addition (CA) and independent action (IA) models are often applied to estimate the mixture toxicity of similarly and dissimilarly acting chemicals, respectively. An integrated addition model (IAM), called the “integrated CA with IA based on a multiple linear regression (ICIM) model” was recently proposed for predicting additive toxicity of non-interactive mixtures regardless of whether mixture components produce similar, dissimilar, or both similar and dissimilar modes of action. In the ICIM, the effective concentrations of mixtures experimentally tested were regarded as the response variable, and the results estimated by CA and IA were considered as the predictor variable. However, it can be highlighted that the multicollinearity problem (i.e., a linear relationship between predictor variables), which may be caused in the existing ICIM model employing ordinary least squares regression. Therefore, the objectives of this study were to develop and evaluate a Partial Least Squares-based IAM (PLS-IAM) not only to overcome the multicollinearity problem, but also to combine the CA and IA into an IAM using the latent variable that accounts for most of the variation in the response. Through four test datasets, this study showed that the PLS-IAM overall outperformed the other reference models, including the CA, IA, and ICIM models.  相似文献   

14.
Species distribution modeling was used to determine factors among the large predictor candidate data set that affect the distribution of Muscari latifolium , an endemic bulbous plant species of Turkey, to quantify the relative importance of each factor and make a potential spatial distribution map of M. latifolium . Models were built using the Boosted Regression Trees method based on 35 presence and 70 absence records obtained through field sampling in the Gönen Dam watershed area of the Kazda?? Mountains in West Anatolia. Large candidate variables of monthly and seasonal climate, fine‐scale land surface, and geologic and biotic variables were simplified using a BRT simplifying procedure. Analyses performed on these resources, direct and indirect variables showed that there were 14 main factors that influence the species’ distribution. Five of the 14 most important variables influencing the distribution of the species are bedrock type, Quercus cerris density, precipitation during the wettest month, Pinus nigra density, and northness. These variables account for approximately 60% of the relative importance for determining the distribution of the species. Prediction performance was assessed by 10 random subsample data sets and gave a maximum the area under a receiver operating characteristic curve (AUC) value of 0.93 and an average AUC value of 0.8. This study provides a significant contribution to the knowledge of the habitat requirements and ecological characteristics of this species. The distribution of this species is explained by a combination of biotic and abiotic factors. Hence, using biotic interaction and fine‐scale land surface variables in species distribution models improved the accuracy and precision of the model. The knowledge of the relationships between distribution patterns and environmental factors and biotic interaction of M. latifolium can help develop a management and conservation strategy for this species.  相似文献   

15.
1. We tested how strongly aquatic macroinvertebrate taxa richness and composition were associated with natural variation in both flow regime and stream temperatures across streams of the western United States. 2. We used long‐term flow records from 543 minimally impacted gauged streams to quantify 12 streamflow variables thought to be ecologically important. A principal component analysis reduced the dimensionality of the data from 12 variables to seven principal component (PC) factors that characterised statistically independent aspects of streamflow: (1) zero flow days, (2) flow magnitude, (3) predictability, (4) flood duration, (5) seasonality, (6) flashiness and (7) base flow. K‐means clustering was used to group streams into 4–8 hydrologically different classes based on these seven factors. 3. We also used empirical models to estimate mean annual, mean summer and mean winter stream temperatures at each stream site. We then used invertebrate data from 63 sites to develop Random Forest models to predict taxa richness and taxon‐specific probabilities of capture at a site from flow and temperature. We used the predicted taxon‐specific probabilities of capture to estimate how well predicted assemblages matched observed assemblages as measured by RIVPACS‐type observed/expected (O/E) indices and Bray–Curtis dissimilarities. 4. Macroinvertebrate taxon richness was only weakly associated with streamflow and temperature variables, implying that other factors more strongly influenced taxa richness. 5. In contrast to taxa richness, taxa composition was strongly associated with streamflow and temperature. Predictions of taxa composition (O/E and Bray–Curtis) were most precise when both temperature and streamflow PC factors were used, although predictions based on either streamflow PC factors or temperature alone were also better than null model predictions. Of the seven aspects of the streamflow regime we examined, variation in baseflow conditions appeared to be most directly associated with invertebrate biotic composition. We were also able to predict assemblage composition from the conditional probabilities of hydrological class membership nearly as well as Random Forests models that were based directly on continuous PC factors. 6. Our results have direct implication for understanding the relative importance of streamflow and temperature in regulating the structure and composition of stream assemblages and for improving the accuracy and precision of biological assessments.  相似文献   

16.
The accurate representation of species distribution derived from sampled data is essential for management purposes and to underpin population modelling. Additionally, the prediction of species distribution for an expanded area, beyond the sampling area can reduce sampling costs. Here, several well-established and recently developed habitat modelling techniques are investigated in order to identify the most suitable approach to use with presence–absence acoustic data. The fitting efficiency of the modelling techniques are initially tested on the training dataset while their predictive capacity is evaluated using a verification set. For the comparison among models, Receiver Operating Characteristics (ROC), Kappa statistics, correlation and confusion matrices are used. Boosted Regression Trees (BRT) and Associative Neural Networks (ASNN), which are both within the machine learning category, outperformed the other modelling approaches tested.  相似文献   

17.
Even if European river management and restoration are largely supported by the use of reliable tools, these tools are most often “generalist” and provide only initial leads of alteration sources. Acknowledging that young-of-the-year (YOY) fish assemblages are highly dependent on riverine habitat conditions, the development of a YOY-based tool might be very useful or even essential in the design and implementation of conservation or restoration plan of large rivers, in measuring more straight-forward the losses and gains of hydro-ecological functionalities. In the past 20 years, new modeling techniques have emerged from a growing sophistication of statistical model applied to ecology. “Machine learning methods” (ML) are now recognized as holding great promise for the advancement of understanding and prediction of ecological phenomena. The aim of this work was to select the appropriate statistical technique to model YOY assemblages according to different meso-scale habitat variables that are meaningful to planners. To do this, two “Machine Learning” methods, Classification and Regression Trees (CART) and Boosted Regression Trees (BRT), were compared to Generalized Linear Models (GLM). We modeled the occurrence of 9 species from the Seine River basin (France) in order to compare models abilities to accurately predict the presence and absence of each species. BRT appeared to be the best technique for modeling 0+ fish occurrences in our dataset.  相似文献   

18.

Aims

The aim of this study is on the one hand to identify the most determining variables predicting the site productivity of pedunculate oak, common beech and Scots pine in temperate lowland forests of Flanders; and on the other hand to test whether the accuracy of site productivity models based exclusively on soil or forest floor predictor variables is similar to the accuracy achieved by full ecosystem models, combining all soil, vegetation, humus and litterfall composition related variables.

Methods

Boosted Regression Trees (BRT) were used to model in a climatically homogeneous region the relationship between environmental variables and site productivity. A distinction was made between soil (soil physical and chemical), forest floor (vegetation and humus) and ecosystem (soil, forest floor and litterfall composition jointly) predictors.

Results

Our results have illustrated the strength of BRT to model the non-linear behaviour of ecological processes. The ecosystem models, based on all collected variables, explained most of the variability and were more accurate than those limited to either soil or forest floor variables. Nevertheless, both the soil and forest floor models can serve as good predictive models for many forest management practices.

Conclusions

Soil granulometric fractions and litterfall nitrogen concentrations were the most effective predictors of forest site productivity in Flanders. Although many studies revealed a fertilising effect of increased nitrogen deposition, nitrogen saturation seemed to reduce species’ productivity in this region.  相似文献   

19.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

20.
Random Amplified Polymorphic DNA analysis (RAPD) is a methodology that has been used as a tool for monitoring microbial communities. To be useful in this application RAPD, and any other methodology, must show properties that allows for the detection of quantitative changes in composition of the microbiota. Therefore, the objective of this study was to establish whether RAPD possesses such properties. The strategy was to use genomic DNA, extracted from a set of tertiary bacterial mixtures defined according to an experimental mixture design, and containing varying proportions of Escherichia coli, Bacillus subtilis, and Pseudomonas CF600. RAPD-PCR was performed on the mixed DNA extracts and the amplified DNA fragments were separated on sequencing gels to produce genomic fingerprints that were digitized and modeled by Partial Least Squares regression (PLS). Significant predictions were obtained using an external test set for validation, with Root Mean Square Error of Predictions (RMSEP) of 0.21, 0.19 and 0.20 for the proportion of E. coli, B. subtilis and Pseudomonas CF600 respectively. Taken together, the results showed that RAPD patterns quantitatively represented the initial mixture proportions. Therefore, the view that RAPD could be useful for whole microbial community monitoring was strengthened.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号