首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

2.
It is well known that statistical classification procedures should be assessed using data that are separate from those used to train the classifier. This principle is commonly overlooked when the classification procedure in question is population assignment using a set of genetic markers that were chosen specifically on the basis of their allele frequencies from amongst a larger number of candidate markers. This oversight leads to a systematic upward bias in the predicted accuracy of the chosen set of markers for population assignment. Three widely used software programs for selecting markers informative for population assignment suffer from this bias. The extent of this bias is documented through a small set of simulations. The relative effect of the bias is largest when screening many candidate loci from poorly differentiated populations. Simple unbiased methods are presented and their use encouraged.  相似文献   

3.
Multivariate latent variable methods have become a popular and versatile toolset to analyze bioprocess data in industry and academia. This work spans such applications from the evaluation of the role of the standard process variables and metabolites to the metabolomics level, that is, to the extensive number metabolic compounds detectable in the extracellular and intracellular domains. Given the substantial effort currently required for the measurement of the latter groups, a tailored methodology is presented that is capable of providing valuable process insights as well as predicting the glycosylation profile based on only four experiments measured over 12 cell culture days. An important result of the work is the possibility to accurately predict many of the glycan variables based on the information of three experiments. An additional finding is that such predictive models can be generated from the more accessible process and extracellular information only, that is, without including the more experimentally cumbersome intracellular data. With regards to the incorporation of omics data in the standard process analytics framework in the future, this works provides a comprehensive data analysis pathway which can efficiently support numerous bioprocessing tasks.  相似文献   

4.
The origin of correlations in metabolomics data   总被引:7,自引:0,他引:7  
A phenomenon observed earlier in the development of metabolomics as a systems biology methodology, consists of a small but significant number of metabolites whose levels are highly correlated between biological replicates. Contrary to initial interpretations, these correlations are not necessarily only between neighboring metabolites in the metabolic network. Most metabolites that participate in common reactions are not correlated in this way, while some non-neighboring metabolites are highly correlated. Here we investigate the origin of such correlations using metabolic control analysis and computer simulation of biochemical networks. A series of cases is identified which lead to high correlation between metabolite pairs in replicate measurement. These are (1) chemical equilibrium, (2) mass conservation, (3) asymmetric control distribution, and (4) unusually high variance in the expression of a single gene. The importance of identifying metabolite correlations within a physiological state and changes of correlation between different states is discussed in the context of systems biology.  相似文献   

5.
6.
Functional data are smooth, often continuous, random curves, which can be seen as an extreme case of multivariate data with infinite dimensionality. Just as componentwise inference for multivariate data naturally performs feature selection, subsetwise inference for functional data performs domain selection. In this paper, we present a unified testing framework for domain selection on populations of functional data. In detail, p-values of hypothesis tests performed on pointwise evaluations of functional data are suitably adjusted for providing control of the familywise error rate (FWER) over a family of subsets of the domain. We show that several state-of-the-art domain selection methods fit within this framework and differ from each other by the choice of the family over which the control of the FWER is provided. In the existing literature, these families are always defined a priori. In this work, we also propose a novel approach, coined thresholdwise testing, in which the family of subsets is instead built in a data-driven fashion. The method seamlessly generalizes to multidimensional domains in contrast to methods based on a priori defined families. We provide theoretical results with respect to consistency and control of the FWER for the methods within the unified framework. We illustrate the performance of the methods within the unified framework on simulated and real data examples and compare their performance with other existing methods.  相似文献   

7.
8.
周继华  来利明  郑元润 《生态学报》2015,35(19):6435-6438
模拟结果的准确性是衡量生态学模型是否成功的关键,但采用统计学方法判别模型模拟结果与观察值相符程度的报道较少。根据两个直线回归方程能否合并为一个方程的统计学检验方法,提出了通过检验观察值与模拟值直线回归方程和1∶1直线方程截距与斜率是否相同,进而在统计显著水平上判断生态学模型模拟值与观察值一致性的统计学检验方法。数据检验表明,此方法可以较好解决判断生态学模型模拟结果准确性的问题。  相似文献   

9.
For Swedish Warmblood sport horses, breeding values (BVs) are predicted using a multiple-trait animal model with results from competitions and young horse performance tests. Data go back to the beginning of the 1970s, and earlier studies have indicated that some of the recorded traits have changed through the years. The objective of this study was to investigate the effects of including all performance data or excluding the older ones compared to a bivariate model (BM) considering performance traits in early and late periods as separate traits. The bivariate approach was assumed to give the most correct BVs for the actual breeding population. Competition results in dressage and show jumping for almost 40 000 horses until 2006 were available. For riding horse quality test (RHQT), data of 14 000 horses judged between 1973 and 2007 were used. Genetic correlations of 0.69 to 1.00 were estimated between traits recorded at different time periods (RHQT data) or different birth year groups (competition data). A cross-validation study and comparison of BVs using different sets of data showed that most accurate and similar results were obtained when BVs were predicted from either the BM or the univariate model including all data from the beginning of the recording. We recommend using all data and applying the univariate model to minimise the computational efforts for genetic evaluations and for provision of reliable BVs for as many horses as possible.  相似文献   

10.
This report presents the conclusions of the X-ray Validation Task Force of the worldwide Protein Data Bank (PDB). The PDB has expanded massively since current criteria for validation of deposited structures were adopted, allowing a much more sophisticated understanding of all the components of macromolecular crystals. The size of the PDB creates new opportunities to validate structures by comparison with the existing database, and the now-mandatory deposition of structure factors creates new opportunities to validate the underlying diffraction data. These developments highlighted the need for a new assessment of validation criteria. The Task Force recommends that a small set of validation data be presented in an easily understood format, relative to both the full PDB and the applicable resolution class, with greater detail available to interested users. Most importantly, we recommend that referees and editors judging the quality of structural experiments have access to a concise summary of well-established quality indicators.  相似文献   

11.
Bladder cancer (BC) is latent in its early stage and lethal in its late stage. Therefore, early diagnosis and intervention are essential for successful BC treatment. Considering the limitations of current diagnostic tools, noninvasive biomarkers that are both highly sensitive and specific are needed to improve the overall survival and quality of life of patients. With the advent of systems biology, “-omics” technologies have been developed over the past few decades. As a promising member, global metabolomics has increasingly been found to have clear potential for biomarker discovery. However, urinary metabolomics studies related to BC have lagged behind those of other urinary cancers, and major findings have not been systematically reported. The objective of this review is to comprehensively list the currently identified potential urinary metabolite biomarkers for BC.  相似文献   

12.
qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and nonhuman) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.  相似文献   

13.
高压蒸汽灭菌柜在使用之前和运行一定时间后,必须进行性能验证,采用热分布,热穿透和微生物挑战试验法对GE,GEV型脉冲式蒸汽灭菌柜的性能进行验证,多孔物质及流体物质灭菌循环中,灭菌腔内不存在冷点,具备有效的热穿透力,嗜热脂肪芽孢杆菌(ATCC7953)在规定灭菌时间内被完全杀死。因而确认高压灭菌柜的各项性能均达到生产要求,由此建立的一套验证方案及试验方法和结果得到了国家GMP认证中心的认可。  相似文献   

14.
New methods for better identification of timber geographical origin would constitute an important technical element in the forest industry, for phytosanitary certification procedures or in the chain of custody developed for the certification of timber from sustainably managed forests. In the case of the European white oaks, a detailed reference map of chloroplast (cp) DNA variation across the range exists, and we propose here to use the strong geographical structure, characterized by a differentiation of western vs. eastern populations, for the purpose of oak wood traceability. We first developed cpDNA markers permitting the characterization of haplotype on degraded DNA obtained from wood samples. The techniques were subsequently validated by confirming the full correspondence between genotypes obtained from living tissues (buds) and from wood collected from the same individual oak. Finally, a statistical procedure was used to test if the haplotype composition of a lot of wood samples is consistent with its presumed geographical origin. Clearly, the technique cannot permit the unambiguous identification of wood products of unknown origin but can be used to check the conformity of genetic composition of wood samples with the region of alleged origin. This could lead to major applications not only in the forest industry but also in archaeology or in palaeobotany.  相似文献   

15.
16.
17.
代谢组学是系统生物学的重要分支,因其高效、高通量等特点而广泛应用于食品科学、药物学等研究领域。本文概述了代谢组学的分离和检测技术,综述了代谢组学在乳酸菌鉴定、发酵调控、肠道菌群研究等方面中的应用,对代谢组学在乳酸菌研究中潜在的问题和未来发展趋势进行了讨论,期望为代谢组学在食品工业微生物中的应用提供参考。  相似文献   

18.
Informatics standards and controlled vocabularies are essentialfor allowing information technology to help exchange, manage,interpret and compare large data collections. In a rapidly evolvingfield, the challenge is to work out how best to describe, butnot prescribe, the use of these technologies and methods. AMetabolomics Standards Workshop was held by the US NationalInstitutes of Health (NIH) to bring together multiple ongoingstandards efforts in metabolomics with the NIH research community.The goals were to discuss metabolomics workflows (methods, technologiesand data treatments) and the needs, challenges and potentialapproaches to developing a Metabolomics Standards Initiativethat will help facilitate this rapidly growing field which hasbeen a focus of the NIH roadmap effort. This report highlightsspecific aspects of what was presented and discussed at the1st and 2nd August 2005 Metabolomics Standards Workshop.   相似文献   

19.
植物应答非生物胁迫的代谢组学研究进展   总被引:4,自引:0,他引:4       下载免费PDF全文
代谢组学技术是研究植物代谢的理想平台, 通过现代检测分析技术对胁迫环境下植物中代谢产物进行定性和定量分析, 可以监测其随时间变化的规律。而各种组学平台包括基因组学、转录组学及代谢组学的整合, 更是一个强有力的工具箱, 将所获得的不同组学的信息联系起来, 有利于从整体研究生物系统对基因或环境变化的响应, 如可判断代谢物的变化是从哪一个层面开始发生的, 帮助人们揭开复杂的植物胁迫应答机制。该文对近期代谢组学技术及其与蛋白质组学、基因组学技术相结合探索植物应答非生物胁迫的研究进行了综述。代谢组学的应用, 拓展了对植物耐受非生物胁迫分子机制的认识, 开展更多这方面的研究, 再通过植物代谢组学、转录组学、蛋白质组学和基因组学整合, 有助于从整体水平上把握植物胁迫应答机制。  相似文献   

20.
刘玉杰  刘毅慧 《生物信息学》2011,9(3):255-258,262
特征提取和分类是模式识别中的关键问题。结合小波分析理论和支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。提取小波低频系数表征原始数据并送入支持向量机分类器分类,实验证明:提取db1小波4层分解下的低频系数,送入分类器分类后正确分类率达到93.53%。Haar小波的正确率是92.94%。可见提取不同小波低频系数,得到的分类效果相差不大。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号