首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rosner B  Glynn RJ  Lee ML 《Biometrics》2006,62(1):185-192
The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.  相似文献   

2.
Summary   This paper explores data compatibility issues arising from the assessment of remnant native vegetation condition using satellite remote sensing and field-based data. Space-borne passive remote sensing is increasingly used as a way of providing a total sample and synoptic overview of the spectral and spatial characteristics of native vegetation canopies at a regional scale. However, integrating field-collected data often not designed for integration with remotely sensed data can lead to data compatibility issues. Subsequent problems associated with the integration of unsuited datasets can contribute to data uncertainty and result in inconclusive findings. It is these types of problems (and potential solutions) that form the basis of this paper. In other words, how can field surveys be designed to support and improve compatibility with remotely sensed total surveys? Key criteria were identified for consideration when designing field-based surveys of native vegetation condition (and other similar applications) with the intent to incorporate remotely sensed data. The criteria include recommendations for the siting of plots, the need for reference location plots, the number of sample sites and plot size and distribution, within a study area. The difficulties associated with successfully integrating these data are illustrated using real examples taken from a study of the vegetation in the Little River Catchment, New South Wales, Australia.  相似文献   

3.
There is an increasing need for life cycle data for bio‐based products, which becomes particularly evident with the recent drive for greenhouse gas reporting and carbon footprinting studies. Meeting this need is challenging given that many bio‐products have not yet been studied by life cycle assessment (LCA), and those that have are specific and limited to certain geographic regions. In an attempt to bridge data gaps for bio‐based products, LCA practitioners can use either proxy data sets (e.g., use existing environmental data for apples to represent pears) or extrapolated data (e.g., derive new data for pears by modifying data for apples considering pear‐specific production characteristics). This article explores the challenges and consequences of using these two approaches. Several case studies are used to illustrate the trade‐offs between uncertainty and the ease of application, with carbon footprinting as an example. As shown, the use of proxy data sets is the quickest and easiest solution for bridging data gaps but also has the highest uncertainty. In contrast, data extrapolation methods may require extensive expert knowledge and are thus harder to use but give more robust results in bridging data gaps. They can also provide a sound basis for understanding variability in bio‐based product data. If resources (time, budget, and expertise) are limited, the use of averaged proxy data may be an acceptable compromise for initial or screening assessments. Overall, the article highlights the need for further research on the development and validation of different approaches to bridging data gaps for bio‐based products.  相似文献   

4.
Ecologists are increasingly asking large‐scale and/or broad‐scope questions that require vast datasets. In response, various top‐down efforts and incentives have been implemented to encourage data sharing and integration. However, despite general consensus on the critical need for more open ecological data, several roadblocks still discourage compliance and participation in these projects; as a result, ecological data remain largely unavailable. Grassroots initiatives (i.e. efforts initiated and led by cohesive groups of scientists focused on specific goals) have thus far been overlooked as a powerful means to meet these challenges. These bottom‐up collaborative data integration projects can play a crucial role in making high quality datasets available because they tackle the heterogeneity of ecological data at a scale where it is still manageable, all the while offering the support and structure to do so. These initiatives foster best practices in data management and provide tangible rewards to researchers who choose to invest time in sound data stewardship. By maintaining proximity between data generators and data users, grassroots initiatives improve data interpretation and ensure high‐quality data integration while providing fair acknowledgement to data generators. We encourage researchers to formalize existing collaborations and to engage in local activities that improve the availability and distribution of ecological data. By fostering communication and interaction among scientists, we are convinced that grassroots initiatives can significantly support the development of global‐scale data repositories. In doing so, these projects help address important ecological questions and support policy decisions.  相似文献   

5.
Data integration is key to functional and comparative genomics because integration allows diverse data types to be evaluated in new contexts. To achieve data integration in a scalable and sensible way, semantic standards are needed, both for naming things (standardized nomenclatures, use of key words) and also for knowledge representation. The Mouse Genome Informatics database and other model organism databases help to close the gap between information and understanding of biological processes because these resources enforce well-defined nomenclature and knowledge representation standards. Model organism databases have a critical role to play in ensuring that diverse kinds of data, especially genome-scale data sets and information, remain useful to the biological community in the long-term. The efforts of model organism database groups ensure not only that organism-specific data are integrated, curated and accessible but also that the information is structured in such a way that comparison of biological knowledge across model organisms is facilitated.  相似文献   

6.
为准确、快速地获取入侵生物野外调查数据, 我们基于全球卫星导航系统、地理信息系统、移动互联网等现代信息技术提出了外来物种入侵大数据采集方法, 设计并研发了数据表单可自定义的野外调查工具软件——云采集。该系统以Android手机为数据采集终端, 采用C#和Java语言设计开发, 运用卫星导航定位技术实现野外调查发生位置的快速采集, 通过定义9种调查指标的数据类型及指标(列值)默认值、图像拍摄、语音录入、排序等4个辅助属性, 建立调查指标与手机客户端数据录入界面的关联, 实现用户界面可定制的数据录入模式。该系统在国家重点研发项目、福建省科技重大专项及福建省红火蚁(Solenopsis invicta)疫情普查等项目的调查任务中予以应用。实践检验表明: 该系统实现了野外调查数据的离线采集、数据同步、数据查询与输出管理, 将移动智能终端采集取代传统的纸笔记录, 简化了野外调查的流程, 提高了入侵生物野外调查的数据质量, 为外来生物入侵野外调查大数据采集提供了信息化支持。  相似文献   

7.
张源笙  夏琳  桑健  李漫  刘琳  李萌伟  牛广艺  曹佳宝  滕徐菲  周晴  章张 《遗传》2018,40(11):1039-1043
生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,中国科学院北京基因组研究所于2016年初成立生命与健康大数据中心(BIG Data Center, BIGD),围绕国家人口健康和重要战略生物资源,建立生物大数据管理平台和多组学数据资源体系。本文重点介绍BIGD的生命与健康大数据资源系统,主要包括组学原始数据归档库、基因组数据库、基因组变异数据库、基因表达数据库、甲基化数据库、生物信息工具库和生命科学维基知识库,提供生物大数据汇交、整合与共享服务,为促进我国生命科学数据管理、推动国家生物信息中心建设奠定重要基础。  相似文献   

8.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

9.
10.
ABSTRACT Wildlife biologists are using land-characteristics data sets for a variety of applications. Many kinds of landscape variables have been characterized and the resultant data sets or maps are readily accessible. Often, too little consideration is given to the accuracy or traits of these data sets, most likely because biologists do not know how such data are compiled and rendered, or the potential pitfalls that can be encountered when applying these data. To increase understanding of the nature of land-characteristics data sets, I introduce aspects of source information and data-handling methodology that include the following: ambiguity of land characteristics; temporal considerations and the dynamic nature of the landscape; type of source data versus landscape features of interest; data resolution, scale, and geographic extent; data entry and positional problems; rare landscape features; and interpreter variation. I also include guidance for determining the quality of land-characteristics data sets through metadata or published documentation, visual clues, and independent information. The quality or suitability of the data sets for wildlife applications may be improved with thematic or spatial generalization, avoidance of transitional areas on maps, and merging of multiple data sources. Knowledge of the underlying challenges in compiling such data sets will help wildlife biologists to better assess the strengths and limitations and determine how best to use these data.  相似文献   

11.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

12.
传统的作物种质数据组织方法,针对不同作物种类建立不同数据表,这种方法已不能有效适应种质数据综合分析的需要.本文提出了一种基于属性分离存储的种质数据组织方法,根据种质的每个属性分别建立数据表,各属性间没有从属关系.该方法可统一数据查询操作,优化查询过程,提高分析效率,具有灵活、可扩展的特点,可以方便地集成与种质分析相关的数据,适用于种质资源分布式数据库和相关信息系统的建立.  相似文献   

13.
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements.  相似文献   

14.
医疗服务数据中心能够通过对医疗数据的采集、存储、维护和分析,在评价和提升患者安全、助力医疗质量管理、为患者就医提供导向、推动生物银行的建设发展等方面发挥着非常重要的作用。尽管相比发达国家,我国的国家医疗服务数据中心的建设起步较晚,但已在指导医疗服务和服务医疗管理方面取得了一定的成绩。  相似文献   

15.
Liu M  Taylor JM  Belin TR 《Biometrics》2000,56(4):1157-1163
This paper outlines a multiple imputation method for handling missing data in designed longitudinal studies. A random coefficients model is developed to accommodate incomplete multivariate continuous longitudinal data. Multivariate repeated measures are jointly modeled; specifically, an i.i.d. normal model is assumed for time-independent variables and a hierarchical random coefficients model is assumed for time-dependent variables in a regression model conditional on the time-independent variables and time, with heterogeneous error variances across variables and time points. Gibbs sampling is used to draw model parameters and for imputations of missing observations. An application to data from a study of startle reactions illustrates the model. A simulation study compares the multiple imputation procedure to the weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) that can be used to address similar data structures.  相似文献   

16.
Research data management (RDM) requires standards, policies, and guidelines. Findable, accessible, interoperable, and reusable (FAIR) data management is critical for sustainable research. Therefore, collaborative approaches for managing FAIR-structured data are becoming increasingly important for long-term, sustainable RDM. However, they are rather hesitantly applied in bioengineering. One of the reasons may be found in the interdisciplinary character of the research field. In addition, bioengineering as application of principles of biology and tools of process engineering, often have to meet different criteria. In consequence, RDM is complicated by the fact that researchers from different scientific institutions must meet the criteria of their home institution, which can lead to additional conflicts. Therefore, centrally provided general repositories implementing a collaborative approach that enables data storage from the outset In a biotechnology research network with over 20 tandem projects, it was demonstrated how FAIR-RDM can be implemented through a collaborative approach and the use of a data structure. In addition, the importance of a structure within a repository was demonstrated to keep biotechnology research data available throughout the entire data lifecycle. Furthermore, the biotechnology research network highlighted the importance of a structure within a repository to keep research data available throughout the entire data lifecycle.  相似文献   

17.
We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.  相似文献   

18.
The method known as Analysis of Concentration (AOC) is proposed as a tool to measure the predictivity of binary data for cover data. The application of AOC to structured tables of oak forests of Central Italy has proved that binary data are more predictive for cover than cover for binary data. The ordinations produced by AOC with binary and cover data are very similar and interpretable with similar results.  相似文献   

19.
It ought to be easy to exchange digital micrographs and other computer data files with a colleague even on another continent. In practice, this often is not the case. The advantages and disadvantages of various methods that are available for exchanging data files between computers are discussed. When possible, data should be transferred through computer networking. When data are to be exchanged locally between computers with similar operating systems, the use of a local area network is recommended. For computers in commercial or academic environments that have dissimilar operating systems or are more widely spaced, the use of FTPs is recommended. Failing this, posting the data on a website and transferring by hypertext transfer protocol is suggested. If peer to peer exchange between computers in domestic environments is needed, the use of Messenger services such as Microsoft Messenger or Yahoo Messenger is the method of choice. When it is not possible to transfer the data files over the internet, single use, writable CD ROMs are the best media for transferring data. If for some reason this is not possible, DVD-R/RW, DVD+R/RW, 100 MB ZIP disks and USB flash media are potentially useful media for exchanging data files.  相似文献   

20.
It ought to be easy to exchange digital micrographs and other computer data files with a colleague even on another continent. In practice, this often is not the case. The advantages and disadvantages of various methods that are available for exchanging data files between computers are discussed. When possible, data should be transferred through computer networking. When data are to be exchanged locally between computers with similar operating systems, the use of a local area network is recommended. For computers in commercial or academic environments that have dissimilar operating systems or are more widely spaced, the use of FTPs is recommended. Failing this, posting the data on a website and transferring by hypertext transfer protocol is suggested. If peer to peer exchange between computers in domestic environments is needed, the use of Messenger services such as Microsoft Messenger or Yahoo Messenger is the method of choice. When it is not possible to transfer the data files over the internet, single use, writable CD ROMs are the best media for transferring data. If for some reason this is not possible, DVD-R/RW, DVD+R/RW, 100 MB ZIP disks and USB flash media are potentially useful media for exchanging data files.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号