首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Many biomedical studies have identified important imaging biomarkers that are associated with both repeated clinical measures and a survival outcome. The functional joint model (FJM) framework, proposed by Li and Luo in 2017, investigates the association between repeated clinical measures and survival data, while adjusting for both high-dimensional images and low-dimensional covariates based on the functional principal component analysis (FPCA). In this paper, we propose a novel algorithm for the estimation of FJM based on the functional partial least squares (FPLS). Our numerical studies demonstrate that, compared to FPCA, the proposed FPLS algorithm can yield more accurate and robust estimation and prediction performance in many important scenarios. We apply the proposed FPLS algorithm to a neuroimaging study. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.  相似文献   

2.
Ecologists are increasingly asking large‐scale and/or broad‐scope questions that require vast datasets. In response, various top‐down efforts and incentives have been implemented to encourage data sharing and integration. However, despite general consensus on the critical need for more open ecological data, several roadblocks still discourage compliance and participation in these projects; as a result, ecological data remain largely unavailable. Grassroots initiatives (i.e. efforts initiated and led by cohesive groups of scientists focused on specific goals) have thus far been overlooked as a powerful means to meet these challenges. These bottom‐up collaborative data integration projects can play a crucial role in making high quality datasets available because they tackle the heterogeneity of ecological data at a scale where it is still manageable, all the while offering the support and structure to do so. These initiatives foster best practices in data management and provide tangible rewards to researchers who choose to invest time in sound data stewardship. By maintaining proximity between data generators and data users, grassroots initiatives improve data interpretation and ensure high‐quality data integration while providing fair acknowledgement to data generators. We encourage researchers to formalize existing collaborations and to engage in local activities that improve the availability and distribution of ecological data. By fostering communication and interaction among scientists, we are convinced that grassroots initiatives can significantly support the development of global‐scale data repositories. In doing so, these projects help address important ecological questions and support policy decisions.  相似文献   

3.
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.  相似文献   

4.
Summary   This paper explores data compatibility issues arising from the assessment of remnant native vegetation condition using satellite remote sensing and field-based data. Space-borne passive remote sensing is increasingly used as a way of providing a total sample and synoptic overview of the spectral and spatial characteristics of native vegetation canopies at a regional scale. However, integrating field-collected data often not designed for integration with remotely sensed data can lead to data compatibility issues. Subsequent problems associated with the integration of unsuited datasets can contribute to data uncertainty and result in inconclusive findings. It is these types of problems (and potential solutions) that form the basis of this paper. In other words, how can field surveys be designed to support and improve compatibility with remotely sensed total surveys? Key criteria were identified for consideration when designing field-based surveys of native vegetation condition (and other similar applications) with the intent to incorporate remotely sensed data. The criteria include recommendations for the siting of plots, the need for reference location plots, the number of sample sites and plot size and distribution, within a study area. The difficulties associated with successfully integrating these data are illustrated using real examples taken from a study of the vegetation in the Little River Catchment, New South Wales, Australia.  相似文献   

5.
Rosner B  Glynn RJ  Lee ML 《Biometrics》2006,62(1):185-192
The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.  相似文献   

6.
7.
There is an increasing need for life cycle data for bio‐based products, which becomes particularly evident with the recent drive for greenhouse gas reporting and carbon footprinting studies. Meeting this need is challenging given that many bio‐products have not yet been studied by life cycle assessment (LCA), and those that have are specific and limited to certain geographic regions. In an attempt to bridge data gaps for bio‐based products, LCA practitioners can use either proxy data sets (e.g., use existing environmental data for apples to represent pears) or extrapolated data (e.g., derive new data for pears by modifying data for apples considering pear‐specific production characteristics). This article explores the challenges and consequences of using these two approaches. Several case studies are used to illustrate the trade‐offs between uncertainty and the ease of application, with carbon footprinting as an example. As shown, the use of proxy data sets is the quickest and easiest solution for bridging data gaps but also has the highest uncertainty. In contrast, data extrapolation methods may require extensive expert knowledge and are thus harder to use but give more robust results in bridging data gaps. They can also provide a sound basis for understanding variability in bio‐based product data. If resources (time, budget, and expertise) are limited, the use of averaged proxy data may be an acceptable compromise for initial or screening assessments. Overall, the article highlights the need for further research on the development and validation of different approaches to bridging data gaps for bio‐based products.  相似文献   

8.
9.
The improved accessibility to data that can be used in human health risk assessment (HHRA) necessitates advanced methods to optimally incorporate them in HHRA analyses. This article investigates the application of data fusion methods to handling multiple sources of data in HHRA and its components. This application can be performed at two levels, first, as an integrative framework that incorporates various pieces of information with knowledge bases to build an improved knowledge about an entity and its behavior, and second, in a more specific manner, to combine multiple values for a state of a certain feature or variable (e.g., toxicity) into a single estimation. This work first reviews data fusion formalisms in terms of architectures and techniques that correspond to each of the two mentioned levels. Then, by handling several data fusion problems related to HHRA components, it illustrates the benefits and challenges in their application.  相似文献   

10.
Data integration is key to functional and comparative genomics because integration allows diverse data types to be evaluated in new contexts. To achieve data integration in a scalable and sensible way, semantic standards are needed, both for naming things (standardized nomenclatures, use of key words) and also for knowledge representation. The Mouse Genome Informatics database and other model organism databases help to close the gap between information and understanding of biological processes because these resources enforce well-defined nomenclature and knowledge representation standards. Model organism databases have a critical role to play in ensuring that diverse kinds of data, especially genome-scale data sets and information, remain useful to the biological community in the long-term. The efforts of model organism database groups ensure not only that organism-specific data are integrated, curated and accessible but also that the information is structured in such a way that comparison of biological knowledge across model organisms is facilitated.  相似文献   

11.
将数据可靠性作为有序变量进行分级,在理论上使数据可靠性与主要生态过程、次级生态过程、外部过程等数据源建立关联,构建了一种生态监测数据质量评估方法,提供了一个新的数据质量指数.它通过观察记录的合格率来估计数据集的质量,其检测结果包括了每一条数据的可靠性级别、标记为离群或错误数据的原因,以及完整数据集的质量指数值.将该方法应用于CERN的两个乔木生长数据集,发现该数据质量指数可以定量评估乔木生长数据集的质量.该方法为相关软件的开发提供了基础.  相似文献   

12.
张源笙  夏琳  桑健  李漫  刘琳  李萌伟  牛广艺  曹佳宝  滕徐菲  周晴  章张 《遗传》2018,40(11):1039-1043
生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,中国科学院北京基因组研究所于2016年初成立生命与健康大数据中心(BIG Data Center, BIGD),围绕国家人口健康和重要战略生物资源,建立生物大数据管理平台和多组学数据资源体系。本文重点介绍BIGD的生命与健康大数据资源系统,主要包括组学原始数据归档库、基因组数据库、基因组变异数据库、基因表达数据库、甲基化数据库、生物信息工具库和生命科学维基知识库,提供生物大数据汇交、整合与共享服务,为促进我国生命科学数据管理、推动国家生物信息中心建设奠定重要基础。  相似文献   

13.
Insight into current scientific applications of Big Data in the precision dairy farming area may help us to understand the inflated expectations around Big Data. The objective of this invited review paper is to give that scientific background and determine whether Big Data has overcome the peak of inflated expectations. A conceptual model was created, and a literature search in Scopus resulted in 1442 scientific peer reviewed papers. After thorough screening on relevance and classification by the authors, 142 papers remained for further analysis. The area of precision dairy farming (with classes in the primary chain (dairy farm, feed, breed, health, food, retail, consumer) and levels for object of interest (animal, farm, network)), the Big Data-V area (with categories on Volume, Velocity, Variety and other V’s) and the data analytics area (with categories in analysis methods (supervised learning, unsupervised learning, semi-supervised classification, reinforcement learning) and data characteristics (time-series, streaming, sequence, graph, spatial, multimedia)) were analysed. The animal sublevel, with 83% of the papers, exceeds the farm sublevel and network sublevel. Within the animal sublevel, topics within the dairy farm level prevailed with 58% over the health level (33%). Within the Big Data category, the Volume category was most favoured with 59% of the papers, followed by 37% of papers that included the Variety category. None of the papers included the Velocity category. Supervised learning, representing 87% of the papers, exceeds unsupervised learning (12%). Within supervised learning, 64% of the papers dealt with classification issues and exceeds the regression methods (36%). Time-series were used in 61% of the papers and were mostly dealing with animal-based farm data. Multimedia data appeared in a greater number of recent papers. Based on these results, it can be concluded that Big Data is a relevant topic of research within the precision dairy farming area, but that the full potential of Big Data in this precision dairy farming area is not utilised yet. However, the present authors expect the full potential of Big Data, within the precision dairy farming area, will be reached when multiple Big Data characteristics (Volume, Variety and other V’s) and sources (animal, groups, farms and chain parts) are used simultaneously, adding value to operational and strategic decision.  相似文献   

14.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

15.
ABSTRACT Wildlife biologists are using land-characteristics data sets for a variety of applications. Many kinds of landscape variables have been characterized and the resultant data sets or maps are readily accessible. Often, too little consideration is given to the accuracy or traits of these data sets, most likely because biologists do not know how such data are compiled and rendered, or the potential pitfalls that can be encountered when applying these data. To increase understanding of the nature of land-characteristics data sets, I introduce aspects of source information and data-handling methodology that include the following: ambiguity of land characteristics; temporal considerations and the dynamic nature of the landscape; type of source data versus landscape features of interest; data resolution, scale, and geographic extent; data entry and positional problems; rare landscape features; and interpreter variation. I also include guidance for determining the quality of land-characteristics data sets through metadata or published documentation, visual clues, and independent information. The quality or suitability of the data sets for wildlife applications may be improved with thematic or spatial generalization, avoidance of transitional areas on maps, and merging of multiple data sources. Knowledge of the underlying challenges in compiling such data sets will help wildlife biologists to better assess the strengths and limitations and determine how best to use these data.  相似文献   

16.
医疗大数据的应用对于临床医学研究、科学管理和医疗服务模式转型发展都具有重要意义。文章介绍了国内外医疗大数据应用现状,以及作者所在单位在医疗数据利用方面的做法经验,并从医务人员、患者、管理人员和科研人员的角度,分析了医疗大数据的应用需求。最后,结合已有实践,提出了医疗大数据应用平台的建设构想和步骤方法等。  相似文献   

17.
传统的作物种质数据组织方法,针对不同作物种类建立不同数据表,这种方法已不能有效适应种质数据综合分析的需要.本文提出了一种基于属性分离存储的种质数据组织方法,根据种质的每个属性分别建立数据表,各属性间没有从属关系.该方法可统一数据查询操作,优化查询过程,提高分析效率,具有灵活、可扩展的特点,可以方便地集成与种质分析相关的数据,适用于种质资源分布式数据库和相关信息系统的建立.  相似文献   

18.
医疗服务数据中心能够通过对医疗数据的采集、存储、维护和分析,在评价和提升患者安全、助力医疗质量管理、为患者就医提供导向、推动生物银行的建设发展等方面发挥着非常重要的作用。尽管相比发达国家,我国的国家医疗服务数据中心的建设起步较晚,但已在指导医疗服务和服务医疗管理方面取得了一定的成绩。  相似文献   

19.
20.
Madsen L  Fang Y 《Biometrics》2011,67(3):1171-5; discussion 1175-6
Summary We introduce an approximation to the Gaussian copula likelihood of Song, Li, and Yuan (2009, Biometrics 65, 60–68) used to estimate regression parameters from correlated discrete or mixed bivariate or trivariate outcomes. Our approximation allows estimation of parameters from response vectors of length much larger than three, and is asymptotically equivalent to the Gaussian copula likelihood. We estimate regression parameters from the toenail infection data of De Backer et al. (1996, British Journal of Dermatology 134, 16–17), which consist of binary response vectors of length seven or less from 294 subjects. Although maximizing the Gaussian copula likelihood yields estimators that are asymptotically more efficient than generalized estimating equation (GEE) estimators, our simulation study illustrates that for finite samples, GEE estimators can actually be as much as 20% more efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号