首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Functional data are smooth, often continuous, random curves, which can be seen as an extreme case of multivariate data with infinite dimensionality. Just as componentwise inference for multivariate data naturally performs feature selection, subsetwise inference for functional data performs domain selection. In this paper, we present a unified testing framework for domain selection on populations of functional data. In detail, p-values of hypothesis tests performed on pointwise evaluations of functional data are suitably adjusted for providing control of the familywise error rate (FWER) over a family of subsets of the domain. We show that several state-of-the-art domain selection methods fit within this framework and differ from each other by the choice of the family over which the control of the FWER is provided. In the existing literature, these families are always defined a priori. In this work, we also propose a novel approach, coined thresholdwise testing, in which the family of subsets is instead built in a data-driven fashion. The method seamlessly generalizes to multidimensional domains in contrast to methods based on a priori defined families. We provide theoretical results with respect to consistency and control of the FWER for the methods within the unified framework. We illustrate the performance of the methods within the unified framework on simulated and real data examples and compare their performance with other existing methods.  相似文献   

2.
景观生态学中空间数据的模拟和显示方法概述   总被引:3,自引:2,他引:1  
为了准确描述连续数据在景观中变化的规律,介绍了常用的7种生态学空间数据模拟方法和4种空间数据显示方法的基本原理和方法,以及不同的空间数据取样方法的特点和适用范围,并初步探讨了影响生态学空间数据表达方法的主要因素。  相似文献   

3.
Study designs where data have been aggregated by geographical areas are popular in environmental epidemiology. These studies are commonly based on administrative databases and, providing a complete spatial coverage, are particularly appealing to make inference on the entire population. However, the resulting estimates are often biased and difficult to interpret due to unmeasured confounders, which typically are not available from routinely collected data. We propose a framework to improve inference drawn from such studies exploiting information derived from individual-level survey data. The latter are summarized in an area-level scalar score by mimicking at ecological level the well-known propensity score methodology. The literature on propensity score for confounding adjustment is mainly based on individual-level studies and assumes a binary exposure variable. Here, we generalize its use to cope with area-referenced studies characterized by a continuous exposure. Our approach is based upon Bayesian hierarchical structures specified into a two-stage design: (i) geolocated individual-level data from survey samples are up-scaled at ecological level, then the latter are used to estimate a generalized ecological propensity score (EPS) in the in-sample areas; (ii) the generalized EPS is imputed in the out-of-sample areas under different assumptions about the missingness mechanisms, then it is included into the ecological regression, linking the exposure of interest to the health outcome. This delivers area-level risk estimates, which allow a fuller adjustment for confounding than traditional areal studies. The methodology is illustrated by using simulations and a case study investigating the risk of lung cancer mortality associated with nitrogen dioxide in England (UK).  相似文献   

4.
Problem: A series of long‐term field experiments is described, with particular reference to monitoring and quality control. This paper addresses problems in data‐management of particular importance for long‐term studies, including data manipulation, archiving, quality assessment, and flexible retrieval for analysis Method: The problems were addressed using a purpose‐built database system, using commercial software and running under Microsoft Windows. Conclusion: The database system brings many advantages compared to available software, including significantly improved quality checking and access. The query system allows for easy access to data sets thus improving the efficiency of analysis. Quality assessments of the initial dataset demonstrated that the database system can also provide general insight into types and magnitudes of error in data‐sets. Finally, the system can be generalised to include data from a number of different projects, thus simplifying data manipulation for meta‐analysis.  相似文献   

5.
6.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

7.
During the 20th century ecologists largely relied on the frequentist system of inference for the analysis of their data. However, in the past few decades ecologists have become increasingly interested in the use of Bayesian methods of data analysis. In this article I provide guidance to ecologists who would like to decide whether Bayesian methods can be used to improve their conclusions and predictions. I begin by providing a concise summary of Bayesian methods of analysis, including a comparison of differences between Bayesian and frequentist approaches to inference when using hierarchical models. Next I provide a list of problems where Bayesian methods of analysis may arguably be preferred over frequentist methods. These problems are usually encountered in analyses based on hierarchical models of data. I describe the essentials required for applying modern methods of Bayesian computation, and I use real-world examples to illustrate these methods. I conclude by summarizing what I perceive to be the main strengths and weaknesses of using Bayesian methods to solve ecological inference problems.  相似文献   

8.
The value of scientific studies increases and is extended when their data are stored in a manageable and accessible format. This is demonstrated through development of a raccoon ecology database (REDB) to store, manage and disseminate available peer-reviewed and unpublished data on raccoon (Procyon lotor) biology, ecology and raccoon rabies, including citations for data sources. Over 800 documents were identified and citations for them entered into the database as literature references. Approximately 1000 trait values were entered from almost 200 of these sources. These data included estimates of population density, survival rates, rabies incubation period, litter size, body weight, dispersal distance and home-range size, often by age or sex class. Each datum is linked to a citation for its source, and to information about location and land use in the study area, time of year the study was undertaken, sample size, and variance. The relational database design enables querying and easy updating and manipulation of data.

The relational data model is presented, as is its application in further developing an individual-based, spatially-explicit population model of raccoon rabies. Using information queried from the REDB benefits model development by: i) assessing the appropriateness of input parameter values, ii) providing sources for citing input values, iii) parameterising the model to different geographic regions, iv) enabling meta-analyses for evaluating model structure, as well as further contributing to parameterisation at specific locations, and v) providing biologically appropriate parameter input values for model sensitivity testing. The REDB is a useful research resource that will increase in value with ongoing inclusion of data from future raccoon and raccoon rabies studies and serves as a model for database design and research applications to other species. The database and an empty database for use with other species are available online (http://redb.nrdpfc.ca).  相似文献   


9.
The effect of intensive human intervention, poor socio-economic conditions and little knowledge on mangrove ecology pose enormous challenges for mangrove restoration in Southeast Asia. We present a framework for tropical mangrove restoration. Our proposed restoration framework addresses the ecology, economy and social issues simultaneously by considering the causes of mangrove degradation. We provide a step by step guideline for its restoration. We argue that although, ecological issues are of prime importance, economic and social issues must be considered in the restoration plan in order for it to be successful. Since mangrove ecology is not adequately studied in this region, local ecological knowledge can be used to fill the baseline information gaps. Unwanted human disturbance can be minimized by encouraging community participation. This can be ensured and sustained by facilitating the livelihood of the coastal community. We translated the restoration paradigm into a readily available practical guideline for the executors of the plans. We provide an example of mangrove restoration project that is closely related to our proposed framework. We are optimistic that this framework has the potential for universal application with necessary adjustments.  相似文献   

10.
In this paper, we present a multi-agent framework for data mining in electromyography. This application, based on a web interface, provides a set of functionalities allowing to manipulate 1000 medical cases and more than 25,000 neurological tests stored in a medical database. The aim is to extract medical information using data mining algorithms and to supply a knowledge base with pertinent information. The multi-agent platform gives the possibility to distribute the data management process between several autonomous entities. This framework provides a parallel and flexible data manipulation.  相似文献   

11.
Lin  Pei-Sheng 《Biometrika》2008,95(4):847-858
We use the quasilikelihood concept to propose an estimatingequation for spatial data with correlation across the studyregion in a multi-dimensional space. With appropriate mixingconditions, we develop a central limit theorem for a randomfield under various Lp metrics. The consistency and asymptoticnormality of quasilikelihood estimators can then be derived.We also conduct simulations to evaluate the performance of theproposed estimating equation, and a dataset from East LansingWoods is used to illustrate the method.  相似文献   

12.
13.
In LCA, normalisation is applied to quantify the relative size of the impact scores. Several sets of normalisation data exist in the Netherlands, which all have a certain degree of unreliability. The purpose of this study is to actualise Dutch normalisation data and to make a framework for deriving these data. In this study normalisation data are calculated for three different levels in order to give the LCA practitioner a more extended basis for preparing the interpretation process. The first level of normalisation contains all impacts relating to activities that take place within the Dutch territory. The second level is based on the Dutch final consumption, which means that import and export are taken into account. The third level is an attempt to estimate impacts in Europe based on European data if possible, and otherwise based on extrapolation from the Dutch situation.  相似文献   

14.
15.

Background

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.

Results

Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods.For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD—baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes—baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders—baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia—baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72).

Conclusions

The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0425-8) contains supplementary material, which is available to authorized users.  相似文献   

16.
A wide variety of information or ‘metadata’ is required when undertaking dendrochronological sampling. Traditionally, researchers record observations and measurements on field notebooks and/or paper recording forms, and use digital cameras and hand-held GPS devices to capture images and record locations. In the lab, field notes are often manually entered into spreadsheets or personal databases, which are then sometimes linked to images and GPS waypoints. This process is both time consuming and prone to human and instrument error. Specialised hardware technology exists to marry these data sources, but costs can be prohibitive for small scale operations (>$2000 USD). Such systems often include proprietary software that is tailored to very specific needs and might require a high level of expertise to use. We report on the successful testing and deployment of a dendrochronological field data collection system utilising affordable off-the-shelf devices ($100–300 USD). The method builds upon established open source software that has been widely used in developing countries for public health projects as well as to assist in disaster recovery operations. It includes customisable forms for digital data entry in the field, and a marrying of accurate GPS location with geotagged photographs (with possible extensions to other measuring devices via Bluetooth) into structured data fields that are easy to learn and operate. Digital data collection is less prone to human error and efficiently captures a range of important metadata. In our experience, the hardware proved field worthy in terms of size, ruggedness, and dependability (e.g., battery life). The system integrates directly with the Tellervo software to both create forms and populate the database, providing end users with the ability to tailor the solution to their particular field data collection needs.  相似文献   

17.
18.
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at the authors' website and the R/C++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.  相似文献   

19.
20.
Cho KH  Choo SM  Wellstead P  Wolkenhauer O 《FEBS letters》2005,579(20):4520-4528
We propose a unified framework for the identification of functional interaction structures of biomolecular networks in a way that leads to a new experimental design procedure. In developing our approach, we have built upon previous work. Thus we begin by pointing out some of the restrictions associated with existing structure identification methods and point out how these restrictions may be eased. In particular, existing methods use specific forms of experimental algebraic equations with which to identify the functional interaction structure of a biomolecular network. In our work, we employ an extended form of these experimental algebraic equations which, while retaining their merits, also overcome some of their disadvantages. Experimental data are required in order to estimate the coefficients of the experimental algebraic equation set associated with the structure identification task. However, experimentalists are rarely provided with guidance on which parameters to perturb, and to what extent, to perturb them. When a model of network dynamics is required then there is also the vexed question of sample rate and sample time selection to be resolved. Supplying some answers to these questions is the main motivation of this paper. The approach is based on stationary and/or temporal data obtained from parameter perturbations, and unifies the previous approaches of Kholodenko et al. (PNAS 99 (2002) 12841-12846) and Sontag et al. (Bioinformatics 20 (2004) 1877-1886). By way of demonstration, we apply our unified approach to a network model which cannot be properly identified by existing methods. Finally, we propose an experiment design methodology, which is not limited by the amount of parameter perturbations, and illustrate its use with an in numero example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号