首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Multivariate data analysis (MVDA) is a highly valuable and significantly underutilized resource in biomanufacturing. It offers the opportunity to enhance understanding and leverage useful information from complex high‐dimensional data sets, recorded throughout all stages of therapeutic drug manufacture. To help standardize the application and promote this resource within the biopharmaceutical industry, this paper outlines a novel MVDA methodology describing the necessary steps for efficient and effective data analysis. The MVDA methodology is followed to solve two case studies: a “small data” and a “big data” challenge. In the “small data” example, a large‐scale data set is compared to data from a scale‐down model. This methodology enables a new quantitative metric for equivalence to be established by combining a two one‐sided test with principal component analysis. In the “big data” example, this methodology enables accurate predictions of critical missing data essential to a cloning study performed in the ambr15 system. These predictions are generated by exploiting the underlying relationship between the off‐line missing values and the on‐line measurements through the generation of a partial least squares model. In summary, the proposed MVDA methodology highlights the importance of data pre‐processing, restructuring, and visualization during data analytics to solve complex biopharmaceutical challenges.  相似文献   

3.
基因芯片数据分析   总被引:9,自引:0,他引:9  
杨畅  方福德 《生命科学》2004,16(1):41-48
基因芯片是近年发展起来的自动化的、高通量的研究生物学问题的一门新技术。它综合了多学科的成就,在大规模研究基因功能的领域中已经有了卓有成效的应用。对大量产生的数据如何有效地分析,成为芯片研究中的一个热点。总体上,数据分析方法可分为非指导的方法和指导的方法。在分析前需要对数据进行标准化和精简,对分析结果需要检验和进行生物学分析。作者对目前常用的一些统计学方法作一介绍,并讨论其适用范围及优缺点。  相似文献   

4.
5.
Data‐independent acquisition (DIA) is an emerging technology for quantitative proteomics. Current DIA focusses on the identification and quantitation of fragment ions that are generated from multiple peptides contained in the same selection window of several to tens of m/z. An alternative approach is WiSIM‐DIA, which combines conventional DIA with wide‐SIM (wide selected‐ion monitoring) windows to partition the precursor m/z space to produce high‐quality precursor ion chromatograms. However, WiSIM‐DIA has been underexplored; it remains unclear if it is a viable alternative to DIA. We demonstrate that WiSIM‐DIA quantified more than 24 000 unique peptides over five orders of magnitude in a single 2 h analysis of a neuronal synapse‐enriched fraction, compared to 31 000 in DIA. There is a strong correlation between abundance values of peptides quantified in both the DIA and WiSIM‐DIA datasets. Interestingly, the S/N ratio of these peptides is not correlated. We further show that peptide identification directly from DIA spectra identified >2000 proteins, which included unique peptides not found in spectral libraries generated by DDA.  相似文献   

6.
7.
基因芯片数据分析与处理   总被引:7,自引:1,他引:6  
基因芯片技术在基因表达分析等应用过程中产生大量的数据,如何处理和分析这些数据并从中提取出有价值的生物学信息是一个极为重要的问题.其过程包括原始数据的获取及处理、标准化数据的统计学分析、以及数据的存储和交流等.  相似文献   

8.
医疗大数据的应用对于临床医学研究、科学管理和医疗服务模式转型发展都具有重要意义。文章介绍了国内外医疗大数据应用现状,以及作者所在单位在医疗数据利用方面的做法经验,并从医务人员、患者、管理人员和科研人员的角度,分析了医疗大数据的应用需求。最后,结合已有实践,提出了医疗大数据应用平台的建设构想和步骤方法等。  相似文献   

9.
10.
基于粗糙集的数据挖掘技术及其在临床医学诊断中的应用   总被引:14,自引:0,他引:14  
数据挖掘是一个利用各种分析工具在海量数据中发现模型和数据间关系的过程。临床医学上大量的数据中蕴含着丰富的信息,利用数据挖掘技术,特别是基于粗糙集理论的数据挖掘技术,通过数据训练集所训练得到的算法模型能够有效应用于疾病诊断。并获得很高的准确率,本文简单介绍了数据挖掘技术的基本原理和主要方法,以及粗糙集理论的基本原理,并给出了一个利用数据挖掘技术对肺部肿瘤进行诊断评价的应用实例。  相似文献   

11.
随着医院信息化建设的不断深入发展和建设,医院信息系统中积累了大量宝贵的临床数据。为了提高医院数据在临床科研中的应用,我们在智能数据平台的基础上建立了单病种科研数据中心。通过对临床科研需求的深入调研和数据的深度解析,我们在数据中心提取出临床科研所关心的患者诊疗数据。利用多维度筛选、队列研究、维恩图对比等工具临床科研人员可以方便获取科研样本集合。单病种数据中心的建立使得医院系统中临床数据资源价值得到了极大提升。  相似文献   

12.
Summary This note is in response to Wouters et al. (2003, Biometrics 59, 1131–1139) who compared three methods for exploring gene expression data. Contrary to their summary that principal component analysis is not very informative, we show that it is possible to determine principal component analyses that are useful for exploratory analysis of microarray data. We also present another biplot representation, the GE‐biplot (Gene Expression biplot), that is a useful method for exploring gene expression data with the major advantage of being able to aid interpretation of both the samples and the genes relative to each other.  相似文献   

13.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance.  相似文献   

14.
15.
结合实例详细地叙述了生物实验数据的两因素方差分析计算方法,具有较强的实用意义。并介绍了如何分析计算结果。  相似文献   

16.
基因芯片作为一种新兴的技术手段已经在植物学、动物学、医学和农学等多个研究领域中发挥了重要作用。本文就基因芯片数据分析的各个环节,包括芯片数据的预处理、归一化、差异基因的判断、聚类分析以及基因芯片在植物功能基因组研究中的应用进行了综述。  相似文献   

17.
Aim Various methods are employed to recover patterns of area relationships in extinct and extant clades. The fidelity of these patterns can be adversely affected by sampling error in the form of missing data. Here we use simulation studies to evaluate the sensitivity of an analytical biogeographical method, namely tree reconciliation analysis (TRA), to this form of sampling failure. Location Simulation study. Methods To approximate varying degrees of taxonomic sampling failure within phylogenies varying in size and in redundancy of biogeographical signal, we applied sequential pruning protocols to artificial taxon–area cladograms displaying congruent patterns of area relationships. Initial trials assumed equal probability of sampling failure among all areas. Additional trials assigned weighted probabilities to each of the areas in order to explore the effects of uneven geographical sampling. Pruned taxon–area cladograms were then analysed with TRA to determine if the optimal area cladograms recovered match the original biogeographical signal, or if they represent false, ambiguous or uninformative signals. Results The results indicate a period of consistently accurate recovery of the true biogeographical signal, followed by a nonlinear decrease in signal recovery as more taxa are pruned. At high levels of sampling failure, false biogeographical signals are more likely to be recovered than the true signal. However, randomization testing for statistical significance greatly decreases the chance of accepting false signals. The primary inflection of the signal recovery curve, and its steepness and slope depend upon taxon–area cladogram size and area redundancy, as well as on the evenness of sampling. Uneven sampling across geographical areas is found to have serious deleterious effects on TRA, with the accuracy of recovery of biogeographical signal varying by an order of magnitude or more across different sampling regimes. Main conclusions These simulations reiterate the importance of taxon sampling in biogeographical analysis, and attest to the importance of considering geographical, as well as overall, sampling failure when interpreting the robustness of biogeographical signals. In addition to randomization testing for significance, we suggest the use of randomized sequential taxon deletions and the construction of signal decay curves as a means to assess the robustness of biogeographical signals for empirical data sets.  相似文献   

18.
A procedure for comparing survival times between several groups of patients through rank analysis of covariance was introduced by WOOLSON and LACHENBRUCH (1983). It is a modification of Quade' rank analysis of covariance procedure (1967) and can be used for the analysis of right-censored data. In this paper, two additional modifications of Quade' original test statistic are proposed and compared to the original modification introduced by Woolson and Lachenbruch. These statistics are compared to one another and to the score test from Cox' proportional hazards model by way of a limited Monte Carlo study. One of the statistics, QR2, is recommended for general use for the rank analysis of covariance of right-censored survivorship data.  相似文献   

19.
Data‐independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data‐dependent acquisition (DDA) libraries for deep peptide‐centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter‐laboratory comparison.  相似文献   

20.
Na Cai  Wenbin Lu  Hao Helen Zhang 《Biometrics》2012,68(4):1093-1102
Summary In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject‐specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random‐effect model and assumes a time‐varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance–covariance matrix that has a closed form and can be consistently estimated by the usual plug‐in method. One additional advantage of the procedure is that it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time‐varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号