首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于生态系统服务功能的生态系统评估是识别生态环境问题、开展生态系统恢复和生物多样性保护、建立生态补偿机制的重要基础,也是保障国家生态安全、推进生态文明建设的重要环节。生态系统评估涉及生态系统多个方面,需要多要素、多类型、多尺度的生态系统观测数据作为支撑。地面观测数据和遥感数据是生态系统评估的两大数据源,但是其在使用时常存在观测标准不一、观测要素不全面、时间连续性不足、尺度不匹配等问题,给生态系统评估增加了极大的不确定性。如何融合不同尺度的观测数据量化生态系统服务功能是实现生态系统准确评估的关键。为此,从观测尺度出发,阐述了地面观测数据、近地面遥感数据、机载遥感数据和卫星遥感数据的特点及其在问题,并综述了这几类数据源进行融合的常用方法,并以生产力、固碳能力、生物多样性几个关键生态参数为例介绍了“基于多源数据融合的生态系统评估技术及其应用研究”项目的多源数据融合体系。最后,总结面向生态系统评估的多源数据融合体系,并指出了该研究的未来发展方向。  相似文献   

2.
RT Schuh 《ZooKeys》2012,(209):255-267
Arguments are presented for the merit of integrating specimen databases into the practice of revisionary systematics. Work flows, data connections, data outputs, and data standardization are enumerated as critical aspects of such integration. Background information is provided on the use of "barcodes" as unique specimen identifiers and on methods for efficient data capture. Examples are provided on how to achieve efficient workflows and data standardization, as well as data outputs and data integration.  相似文献   

3.
Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi-)automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database (PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.  相似文献   

4.
We present a fast, versatile and adaptive-multiscale algorithm for analyzing a wide-variety of DNA microarray data. Its primary application is in normalization of array data as well as subsequent identification of 'enriched targets', e.g. differentially expressed genes in expression profiling arrays and enriched sites in ChIP-on-chip experimental data. We show how to accommodate the unique characteristics of ChIP-on-chip data, where the set of 'enriched targets' is large, asymmetric and whose proportion to the whole data varies locally. SUPPLEMENTARY INFORMATION: Supplementary figures, related preprint, free software as well as our raw DNA microarray data with PCR validations are available at http://www.math.umn.edu/~lerman/supp/bioinfo06 as well as Bioinformatics online.  相似文献   

5.
The intraclass correlation is commonly used with clustered data. It is often estimated based on fitting a model to hierarchical data and it leads, in turn, to several concepts such as reliability, heritability, inter‐rater agreement, etc. For data where linear models can be used, such measures can be defined as ratios of variance components. Matters are more difficult for non‐Gaussian outcomes. The focus here is on count and time‐to‐event outcomes where so‐called combined models are used, extending generalized linear mixed models, to describe the data. These models combine normal and gamma random effects to allow for both correlation due to data hierarchies as well as for overdispersion. Furthermore, because the models admit closed‐form expressions for the means, variances, higher moments, and even the joint marginal distribution, it is demonstrated that closed forms of intraclass correlations exist. The proposed methodology is illustrated using data from agricultural and livestock studies.  相似文献   

6.
等位基因多态性群体遗传结构的多元非线性分析方法   总被引:4,自引:0,他引:4  
长期以来,对于多维基因多态性数据的多元统计分析,如计算遗传距离时昕用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等,一直应用为无约束条件数据而设计的经典多元线性分析方法,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手,文中指出了基因多态性分布具有“闭合数据”的特点,分析了由于“闭合效应”的影响,经典多元线性方法用于群体遗传结构分析昕面临的困难。根据成分数据统计分析的理论和方法,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法,具有特异、灵敏等优点,其结果符合群体遗传学规律。  相似文献   

7.
Lam KF  Lee YW  Leung TL 《Biometrics》2002,58(2):316-323
In this article, the focus is on the analysis of multivariate survival time data with various types of dependence structures. Examples of multivariate survival data include clustered data and repeated measurements from the same subject, such as the interrecurrence times of cancer tumors. A random effect semiparametric proportional odds model is proposed as an alternative to the proportional hazards model. The distribution of the random effects is assumed to be multivariate normal and the random effect is assumed to act additively to the baseline log-odds function. This class of models, which includes the usual shared random effects model, the additive variance components model, and the dynamic random effects model as special cases, is highly flexible and is capable of modeling a wide range of multivariate survival data. A unified estimation procedure is proposed to estimate the regression and dependence parameters simultaneously by means of a marginal-likelihood approach. Unlike the fully parametric case, the regression parameter estimate is not sensitive to the choice of correlation structure of the random effects. The marginal likelihood is approximated by the Monte Carlo method. Simulation studies are carried out to investigate the performance of the proposed method. The proposed method is applied to two well-known data sets, including clustered data and recurrent event times data.  相似文献   

8.
Changing frequency of interim analysis in sequential monitoring   总被引:1,自引:0,他引:1  
K K Lan  D L DeMets 《Biometrics》1989,45(3):1017-1020
In clinical trial data monitoring, one can either introduce a discrete sequential boundary for a set of specified decision times or adopt a use function and then derive the boundary when data are monitored. If the use function approach is employed, one can adjust the frequency of data monitoring as long as the decision is not data-dependent. However, if the frequency of future data monitoring is affected by the observed data, then the probability of Type I error will no longer be preserved exactly. But the effect on the significance level and power is very small, perhaps negligible, as indicated by simulation results.  相似文献   

9.
刘国波  戎恺  唐力  王伟民  周伟奇  韩宝龙  刘凯  黄洪 《生态学报》2022,42(24):10051-10059
生态环境是人类赖以生存和发展的基础,城市生态大数据智慧管理和服务平台建设是生态城市和美丽城市建设的需要。以深圳市为例,借助物联网、移动互联网、计算机、数据库和网络地理信息系统技术,以及时空地理大数据整合和共享、大数据挖掘分析和云端一体化业务协同等关键技术,结合城市生态系统评估分析决策模型/方法/对策库,在实现深圳市“空-地-网-统计-众源”等多源异构生态大数据有效集成的基础上,搭建了 “数据采集-信息提取-知识发现-决策生成-快速服务”全流程、一体化的深圳市生态大数据智慧管理和服务平台,构建了面向业务部门和科研人员等专业用户的生态野外数据调查采集系统和城市生态监测与评估管理决策分析系统,以及面向社会大众的深圳生态大调查APP。平台首次揭示了深圳市1979年以来不同生态系统的格局、构成、过程、服务和健康状况的变化,提升了深圳市生态环境综合决策科学化、生态环境监管精准化、生态环境公共服务便民化水平。平台有效降低了用户数据收集与处理和操作专业模型的难度,突破了原始数据应用的瓶颈和难点,提高了专业模型在业务部门中的使用率。未来,依靠“生态大数据+生态专业模型”的技术方案实现从数据到知识的挖掘,是实现城市生态大数据智慧化和专业化管理的关键,也是全面提高城市生态环境保护信息化服务水平的重要途径。  相似文献   

10.
Palm-Pitviper (Bothriechis) Phylogeny, mtDNA, and Consilience   总被引:1,自引:0,他引:1  
The phylogeny of the neotropical palm-pitviper genus Bothriechis has been previously inferred from morphology and allozymes. These nuclear-based data sets were found to be congruent and also consilient with the geologic history of the region. We present mtDNA sequence data as an additional data set in the inference of Bothriechis phylogeny and analyze it separately and combined with previous data. The mtDNA phylogeny is incongruent with the nuclear data sets. Based on a number of factors, we hypothesize that the incongruence is due to both mtDNA introgression and lineage sorting. We argue that mtDNA represents extrinsic data and as such should be used as a consilient data set.  相似文献   

11.
12.
Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables.  相似文献   

13.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.  相似文献   

14.
The decisiveness of a data set has been defined as the degree to which all possible dichotomous trees for that data set differ in length, and the DD statistic (the data decisiveness index) has been proposed to measure this degree. In this paper, we first discuss an exact nonre cursive formula for the length of indecisive datasets (DD = 0) that consist of informative binary characters in which no missing entries are allowed. Next, the concept of indecisive data sets is extended to data sets in which missing entries may be present. Last, indecisive data sets with missing entries are used as an aid to construct hypothetical data sets that single out some of the factors that influence the DD statistic. On the basis of these examples, it is concluded that the concept of data decisiveness is too elusive to be captured into a single and simple index such as DD.  相似文献   

15.
An issue for class‐imbalanced learning is what assessment metric should be employed. So far, precision‐recall curve (PRC) as a metric is rarely used in practice as compared with its alternative of receiver operating characteristic (ROC). This study investigates the performance of PRC as the evaluating criterion to address the class‐imbalanced data and focuses on the comparison of PRC with ROC. The advantages of PRC over ROC on assessing class‐imbalanced data are also investigated and tested on our proposed algorithm by tuning the whole model parameters in simulation studies and real data examples. The result shows that PRC is competitive with ROC as performance measurement for handling class‐imbalanced data in tuning the model parameters. PRC can be considered as an alternative but effective assessment for preprocessing (such as variable selection) skewed data and building a classifier in class‐imbalanced learning.  相似文献   

16.
In the last 10-15years, many new technologies and approaches have been implemented in research in the pharmaceutical industry; these include high-throughput screening or combinatorial chemistry, which result in a rapidly growing amount of biological assay and structural data in the corporate databases. Efficient use of the data from this growing data mountain is a key success factor; 'provide as much knowledge as possible as early as possible and therefore enable research teams to make the best possible decision whenever this decision can be supported by stored data'. Here, an approach which started several years ago to obtain as much information as possible out of historical assay data stored in the corporate database is described. It will be shown how important a careful preprocessing of the stored data is to enhance its information. Different possibilities for accessing and to analyzing the preconditioned data are in place. Some of will be described in the examples.  相似文献   

17.
蛋白质二级结构预测样本集数据库的设计与实现   总被引:1,自引:0,他引:1  
张宁  张涛 《生物信息学》2006,4(4):163-166
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。  相似文献   

18.
Scientific publications should provide sufficient detail in terms of methodology and presented data to enable the community to reproduce the methodology to generate similar data and arrive at the same conclusion, if an identical sample is provided for analysis. The advent of high-throughput methods in biological experimentation impose some unique challenges both in data presentation in classical print format, as well as in describing methodology and data analysis in sufficient detail to conform to good publication practice. To facilitate this process, Proteome Science is adopting a set of methodology and data presentation guidelines to enable both peer reviewers, as well as the scientific community, to better evaluate high-throughput proteomic studies.  相似文献   

19.
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.  相似文献   

20.
Halligan BD  Greene AS 《Proteomics》2011,11(6):1058-1063
A major challenge in the field of high-throughput proteomics is the conversion of the large volume of experimental data that is generated into biological knowledge. Typically, proteomics experiments involve the combination and comparison of multiple data sets and the analysis and annotation of these combined results. Although there are some commercial applications that provide some of these functions, there is a need for a free, open source, multifunction tool for advanced proteomics data analysis. We have developed the Visualize program that provides users with the abilities to visualize, analyze, and annotate proteomics data; combine data from multiple runs, and quantitate differences between individual runs and combined data sets. Visualize is licensed under GNU GPL and can be downloaded from http://proteomics.mcw.edu/visualize. It is available as compiled client-based executable files for both Windows and Mac OS X platforms as well as PERL source code.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号