首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
将数据可靠性作为有序变量进行分级,在理论上使数据可靠性与主要生态过程、次级生态过程、外部过程等数据源建立关联,构建了一种生态监测数据质量评估方法,提供了一个新的数据质量指数.它通过观察记录的合格率来估计数据集的质量,其检测结果包括了每一条数据的可靠性级别、标记为离群或错误数据的原因,以及完整数据集的质量指数值.将该方法应用于CERN的两个乔木生长数据集,发现该数据质量指数可以定量评估乔木生长数据集的质量.该方法为相关软件的开发提供了基础.  相似文献   

2.
Protein–protein interactions (PPIs) play very important roles in many cellular processes, and provide rich information for discovering biological facts and knowledge. Although various experimental approaches have been developed to generate large amounts of PPI data for different organisms, high-throughput experimental data usually suffers from high error rates, and as a consequence, the biological knowledge discovered from this data is distorted or incorrect. Therefore, it is vital to assess the quality of protein interaction data and extract reliable protein interactions from the high-throughput experimental data. In this paper, we propose a new Semantic Reliability (SR) method to assess the reliability of each protein interaction and identify potential false-positive protein interactions in a dataset. For each pair of target interacting proteins, the SR method takes into account the semantic influence between proteins that interact with the target proteins, and the semantic influence between the target proteins themselves when assessing the interaction reliability. Evaluations on real protein interaction datasets demonstrated that our method outperformed other existing methods in terms of extracting more reliable interactions from original protein interaction datasets.  相似文献   

3.
Phylogenetic relationships between species of Allium section Cepa and A. rqylei (section Rhizirideum) have been inferred from nuclear DNA variation (RAPDs; nDNA dataset) and from morphological, pollen epidermis texture, chromosomal and chemical variation (supranuclear dataset). These sets were complemented with data, taken from the literature, on cpDNA variation and crossability. The trees produced with the supranuclear, nDNA and cpDNA datasets were compared by using the topology of the most parsimonious tree of one dataset as the constraint for the construction of a most parsimonious tree of another dataset. The accuracy of the trees were evaluated by calculating several Consistency and Incongruence Indices. The constrained tree of supranuclear-nDNA datasets showed the highest index values. The tree topologies of the supranuclear and cpDNA datasets were the least similar. The cpDNA tree and crossability dendrograms were identical. The most important difference between the nDNA-supranuclear trees and the cpDNA-crossability trees pertains to the position of Allium roylei , which is much closer to the clade A. cepa/A. vavilovii in the cpDNA tree than in the nDNA tree. This difference is considered to be the result of chloroplast capture from one species to another after an introgression event. A shorter distance between species inferred from a cpDNA tree than from a nDNA or comparable tree might be indicative for the level of crossability.  相似文献   

4.
5.
Ecological communities consist of a large number of species. Most species are rare or have low abundance, and only a few are abundant and/or frequent. In quantitative community analysis, abundant species are commonly used to interpret patterns of habitat disturbance or ecosystem degradation. Rare species cause many difficulties in quantitative analysis by introducing noises and bulking datasets, which is worsened by the fact that large datasets suffer from difficulties of data handling. In this study we propose a method to reduce the size of large datasets by selecting the most ecologically representative species using a self organizing map (SOM) and structuring index (SI). As an example, we used diatom community data sampled at 836 sites with 941 species throughout the French hydrosystem. Out of the 941 species, 353 were selected. The selected dataset was effectively classified according to the similarities of community assemblages in the SOM map. Compared to the SOM map generated with the original dataset, the community pattern gave a very similar representation of ecological conditions of the sampling sites, displaying clear gradients of environmental factors between different clusters. Our results showed that this computational technique can be applied to preprocessing data in multivariate analysis. It could be useful for ecosystem assessment and management, helping to reduce both the list of species for identification and the size of datasets to be processed for diagnosing the ecological status of water courses.  相似文献   

6.
Class boundaries of three European assessment systems based on macroinvertebrates were compared and harmonized. Three different approaches to comparison, one based on regression analysis and the other two on statistical testing, were described and used, however only one was considered useful for the harmonization of boundaries. In all cases, the calculations were based on a set of six Intercalibration Common Metrics, combined into a simple multimetric index (ICMi). The ICMi was calculated for three test datasets from Italy, Poland and the UK, all belonging to the same stream type (small lowland siliceous sand rivers). For comparison, a regression model was employed to convert national assessment boundary values into ICMi values. The ICMi was also calculated on samples included in a strictly WFD-compliant benchmark dataset. The values of the ICMi obtained for the quality classes Good and High for the test and benchmark datasets were statistically compared. When significant differences were observed in the harmonization phase, the boundaries of the national method were refined until no further differences were observed. For the test datasets and assessment systems of Italy (IBE index) and Poland (Polish BMWP index) small refinements of the boundaries between High/Good and Good/Moderate classes were sufficient to remove the differences from the benchmark dataset. After harmonization, in the studied stream type, the percentage of samples requiring restoration to Good quality increased by 22 and 6% for Italy and Poland, respectively. For the UK dataset (EQI ASPT) the comparison to benchmark dataset showed no significant differences, thus no harmonization was proposed. A general discussion of the options used to compare boundaries based on the ICMi and their potential for harmonization is provided. Lastly, the option of harmonizing class boundaries through comparison to an external, benchmarking dataset and then re-setting them until no differences are found is supported.  相似文献   

7.
When performing bioinformatics analysis on tandem mass spectrometry data, there is a computational need to efficiently store and sort these semi-ordered datasets. To solve this problem, a new data structure based on dynamic arrays was designed and implemented in an algorithm that parses semi-ordered data made by Mascot, a separate software program that matches peptide tandem mass spectra to protein sequences in a database. By accommodating the special features of these large datasets, the combined dynamic array (CDA) provides efficient searching and insertion operations. The operations on real datasets using this new data structure are hundreds times faster than operations using binary tree and red-black tree structures. The difference becomes more significant when the dataset size grows. This data structure may be useful for improving the speed of other related types of protein assembling software or other types of software that operate on datasets with similar semi-ordered features.  相似文献   

8.
Kinetic experiments provide much information about protein folding mechanisms. Time-resolved signals are often best described by expressions with many exponential terms, but this hinders the extraction of rate constants by nonlinear least squares (NLS) fitting. Numerical inverse Laplace transformation, which converts a time-resolved dataset into a spectrum of amplitudes as a function of rate constant, allows easy estimation of the rate constants, amplitudes, and number of processes underlying the data. Here, we present a Tikhonov regularization-based method that converts a dataset into a rate spectrum, subject to regularization constraints, without requiring an iterative search of parameter space. This allows more rapid generation of rate spectra as well as analysis of datasets too noisy to process by existing iterative search algorithms. This method's simplicity also permits highly objective, largely automatic analysis with minimal human guidance. We show that this regularization method reproduces results previously obtained by NLS fitting and that it is effective for analyzing datasets too complex for traditional fitting methods. This method's reliability and speed, as well as its potential for objective, model-free analysis, make it extremely useful as a first step in analysis of complicated noisy datasets and an excellent guide for subsequent NLS analysis.  相似文献   

9.
Carrer M 《PloS one》2011,6(7):e22813
The development of dendrochronological time series in order to analyze climate-growth relationships usually involves first a rigorous selection of trees and then the computation of the mean tree-growth measurement series. This study suggests a change in the perspective, passing from an analysis of climate-growth relationships that typically focuses on the mean response of a species to investigating the whole range of individual responses among sample trees. Results highlight that this new approach, tested on a larch and stone pine tree-ring dataset, outperforms, in terms of information obtained, the classical one, with significant improvements regarding the strength, distribution and time-variability of the individual tree-ring growth response to climate. Moreover, a significant change over time of the tree sensitivity to climatic variability has been detected. Accordingly, the best-responder trees at any one time may not always have been the best-responders and may not continue to be so. With minor adjustments to current dendroecological protocol and adopting an individualistic approach, we can improve the quality and reliability of the ecological inferences derived from the climate-growth relationships.  相似文献   

10.
Biodiversity databases are increasingly available and have fostered accelerated advances in many disciplines within ecology and evolution. However, the quality of the evidence generated depends critically on the quality of the input data, and species misidentifications are present in virtually any occurrence dataset. Yet, the lack of automatized tools makes the assessment of the quality of species identification in big datasets time-consuming, which often induces researchers to assume that all species are reliably identified. In this study, we address this issue by evaluating how species misidentification can impact our ability to capture ecological patterns, and by presenting an R package, called naturaList, designed to classify species occurrence data according to identification reliability. naturaList allows the classification of species occurrences up to six confidence levels, in which the highest level is assigned to records identified by specialists. We obtained a list of specialists by using the species occurrence dataset itself, based on the identifier names within it, and by entering an independent list, obtained by contacting experts. Further, we evaluate the effects of filtering out occurrence records not identified by specialists on the estimations of species niche and diversity patterns. We used the tribe Myrteae (Myrtaceae) as a study model, which is a species-rich group in Central and South America and with challenging taxonomy. We found a significant change in species niche in 13% of species when using only occurrences identified by specialists. We found changes in patterns of alpha diversity in four genera and changes in beta diversity in all genera analyzed. We show how the uncertainty in species identification in occurrence datasets affects conclusions on macroecological patterns by generating bias or noise in different aspects of macroecological patterns (niche, alpha, and beta diversity). Therefore, to guarantee reliability in species identification in big data sets we recommend the use of automated tools such as the naturaList package, especially when analyzing variation in species composition. This study also represents a step forward to increasing the quality of large-scale studies that rely on species occurrence data.  相似文献   

11.
Advances in GPS tracking technologies have allowed for rapid assessment of important oceanographic regions for seabirds. This allows us to understand seabird distributions, and the characteristics which determine the success of populations. In many cases, quality GPS tracking data may not be available; however, long term population monitoring data may exist. In this study, a method to infer important oceanographic regions for seabirds will be presented using breeding sooty shearwaters as a case study. This method combines a popular machine learning algorithm (generalized boosted regression modeling), geographic information systems, long-term ecological data and open access oceanographic datasets. Time series of chick size and harvest index data derived from a long term dataset of Maori ‘muttonbirder’ diaries were obtained and used as response variables in a gridded spatial model. It was found that areas of the sub-Antarctic water region best capture the variation in the chick size data. Oceanographic features including wind speed and charnock (a derived variable representing ocean surface roughness) came out as top predictor variables in these models. Previously collected GPS data demonstrates that these regions are used as “flyways” by sooty shearwaters during the breeding season. It is therefore likely that wind speeds in these flyways affect the ability of sooty shearwaters to provision for their chicks due to changes in flight dynamics. This approach was designed to utilize machine learning methodology but can also be implemented with other statistical algorithms. Furthermore, these methods can be applied to any long term time series of population data to identify important regions for a species of interest.  相似文献   

12.
Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution dataset mergers, such as the one exemplified here, can serve as a baseline towards comprehensive species distribution datasets.  相似文献   

13.
Sustaining or restoring riparian quality is essential to achieve and maintain good stream health, as well as to guarantee the ecological functions that natural riparian areas provide. Therefore, quantifying riparian quality is a fundamental step to identify river reaches for conservation and/or restoration purposes. Most of the existing methods assessing riparian quality concentrate on field surveys of a few hundreds of metres, which become very laborious when trying to evaluate whole catchments or long river corridors. Riparian quality assessment obtains higher scores when riparian vegetation consists of forested areas, while land-uses lacking woody vegetation typically represent physical and functional discontinuities along river corridors that undermine riparian quality. Thus, this study aimed to analyse the ability of riparian land-cover data for modelling riparian quality over large areas. Multiple linear regression and Random Forest techniques were performed using land-use datasets at three different spatial scales: 1:5000 (Cantabrian Riparian Cover map), 1:25,000 (Spanish Land Cover Information System) and 1:100,000 (Corine Land Cover). Riparian quality field data was obtained using the Riparian Quality Index. Hydromorphological pressures affecting riparian vegetation were also included in the analysis to determine their relative weight in controlling riparian quality. Linear regression showed better predictive ability than Random Forest, although this may be due to our relatively small dataset (approx. 150 cases). Forest coverage highly determined riparian quality, while hydromorphological pressures and land-use coverage related to human activities played a smaller role in the models. While acceptable results were obtained when using high-resolution datasets, the use of Corine Land Cover led to a poor predictive ability.  相似文献   

14.
Leaf area index (LAI) is one of the key biophysical parameters for understanding land surface photosynthesis, transpiration, and energy balance processes. Estimation of LAI from remote sensing data has been a premier method for a large scale in recent years. Recent studies have revealed that the within-canopy vertical variations in LAI and biochemical properties greatly affect canopy reflectance and significantly complicate the retrieval of LAI inversely from reflectance based vegetation indices, which has yet been explicitly addressed. In this study, we have used both simulated datasets (dataset I with constant vertical profiles of LAI and biochemical properties, dataset II with varied vertical profile of LAI but constant vertical biochemical properties, and dataset III with both varied vertical profiles) generated from the multiple-layer canopy radiative transfer model (MRTM) and a ground-measured dataset to identify robust spectral indices that are insensitive to such within canopy vertical variations for LAI prediction. The results clearly indicated that published indices such as normalized difference vegetation index (NDVI) had obvious discrepancies when applied to canopies with different vertical variations, while the new indices identified in this study performed much better. The best index for estimating canopy LAI under various conditions was D(920,1080), with overall RMSEs of 0.62–0.96 m2/m2 and biases of 0.42–0.55 m2/m2 for all three simulated datasets and an RMSE of 1.22 m2/m2 with the field-measured dataset, although it was not the most conservative one among all new indices identified. This index responded mostly to the quantity of LAI but was insensitive to within-canopy variations, allowing it to aid the retrieval LAI from remote sensing data without prior information of within-canopy vertical variations of LAI and biochemical properties.  相似文献   

15.
Computational analysis of human protein interaction networks   总被引:4,自引:0,他引:4  
Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high-throughput results from yeast two-hybrid screens, and literature-curated protein-protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain-domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.  相似文献   

16.
Intra-annual repeated micro-sampling of the developing tree ring is getting more and more applied in xylogenesis studies. Variability in growth magnitude, notably due to different sampling positions on the stem, encouraged application of standardization and modelling techniques. Among these, methods using Gompertz equation had become widely spread, but tests made with black spruce revealed a frequent occurrence of crossovers between the cumulative number of cells in enlargement and the cumulative number of cells in wall thickening. This was due to a localized problem in the fitting for values close to the asymptote and was a major problem for estimating the timing of each individual cell development phases, which is an interesting application of these data. In this paper, a new method, based on a different approach, has been developed in order to avoid that problem and applied to intra-annual growth curves from four sites in Quebec (Canada). Since tracheid development analysis allows discriminating between active and inactive period of a phase, modelling can be restricted on the active period alone. The new method did not cause crossovers between the fitted curves. Therefore, it has been considered appropriate for estimating the timing for each individual cell in the whole range of data. Since resulting functions are polynomials from degree 1 to 3, possible studies concerning general tendency should be easy to lead. Also, the method has been tested with different sampling frequencies. To do this, number of observations from weekly samplings has been halved to simulate a semi-monthly sampling frequency and a comparison of the results from the new method applied on each version of the datasets has been tested. Generally, the simulated semi-monthly sampled dataset did not give significantly different results from the original weekly sampled dataset, in terms of general tendency and predicted intercept time in the extremities of the data range. This is very encouraging for situations when only semi-monthly sampling is available.  相似文献   

17.
Spatial and/or temporal biases in biodiversity data can directly influence the utility, comparability, and reliability of ecological and evolutionary studies. While the effects of biased spatial coverage of biodiversity data are relatively well known, temporal variation in data quality (i.e., the congruence between recorded and actual information) has received much less attention. Here, we develop a conceptual framework for understanding the influence of time on biodiversity data quality based on three main processes: (1) the natural dynamics of ecological systems—such as species turnover or local extinction; (2) periodic taxonomic revisions, and; (3) the loss of physical and metadata due to inefficient curation, accidents, or funding shortfalls. Temporal decay in data quality driven by these three processes has fundamental consequences for the usage and comparability of data collected in different time periods. Data decay can be partly ameliorated by adopting standard protocols for generation, storage, and sharing data and metadata. However, some data degradation is unavoidable due to natural variations in ecological systems. Consequently, changes in biodiversity data quality over time need be carefully assessed and, if possible, taken into account when analyzing aging datasets.  相似文献   

18.
Dynamic model-based clustering for time-course gene expression data   总被引:1,自引:0,他引:1  
Microarray technology has produced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.  相似文献   

19.
良好的生态系统质量是维持人类社会供给需求和可持续发展目标实现的重要保障。针对尼泊尔自然地理环境复杂多样,区域间气候差异明显的特点,结合基于参照条件的评估方法可以得到生态系统质量的相对水平值,其结果能够反映出不同的变化信息。植被是区域生态系统质量变化的重要指示器,利用尼泊尔五大地理区以及四种主要植被生态系统类型划分出20个生态评估区,从表征植被生态系统的水平结构、生产功能和垂直结构3个方面计算生态参数相对密度指标(RVI),结合主成分分析法构建植被生态系统质量指数(VEQI),并以其国家自然保护区为参照,构建基于参照条件的生态系统质量评估模型,计算了尼泊尔2016和2020年基于参照条件的植被生态系统质量指数(VEQI'')并分析其生态系统质量的时空格局变化。结果表明:(1)2016至2020年,尼泊尔生态系统质量现实值VEQI的平均值增加了3.49%,总体上,在参照生态系统质量(VEQIref)提高(约1.41%)的背景下,生态系统质量相对水平值VEQI''增加了1.42%;(2)对于尼泊尔地区,评估区89%分位数的VEQI与其对应的国家自然保护区的参照值具有很强的相关性,总体差异较小,可以代替作为参照值;(3)从空间格局变化趋势来看,尼泊尔生态系统质量变好、基本稳定和变差的面积分别占植被生态系统总面积的74.16%、14.25%和11.59%。与数量不足、较难收集利用的野外观测台站数据相比,国家自然保护区更接近理想参照生态系统的假设,通过有限的自然保护区确定生态评估区的参照值,实现生态系统质量的快速评估,其结果具有更好的时空可比性,可以为区域生态质量变化评估及量化分析等方面提供参考。  相似文献   

20.

Purpose

Habitat loss is a significant cause of biodiversity loss, but while its importance is widely recognized, there is no generally accepted method on how to include impacts on biodiversity from land use and land use changes in cycle assessment (LCA), and existing methods are suffering from data gaps. This paper proposes a methodology for assessing the impact of land use on biodiversity using ecological structures as opposed to information on number of species.

Methods

Two forms of the model (global and local scales) were used to assess environmental quality, combining ecosystem scarcity, vulnerability, and conditions for maintaining biodiversity. A case study for New Zealand kiwifruit production is presented. As part of the sensitivity analysis, model parameters (area and vulnerability) were altered and New Zealand datasets were also used.

Results and discussion

When the biodiversity assessment was implemented using a global dataset, the importance of productivity values was shown to depend on the area the results were normalized against. While the area parameter played an important role in the results, the proposed alternative vulnerability scale had little influence on the final outcome.

Conclusions

Overall, the paper successfully implements a model to assess biodiversity impacts in LCA using easily accessible, free-of-charge data and software. Comparing the model using global vs. national datasets showed that there is a potential loss of regional significance when using the generalized model with the global dataset. However, as a guide to assessing biodiversity impact, the model allows for consistent comparison of product systems on an international basis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号