共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Stephen Goldrick Viktor Sandner Matthew Cheeks Richard Turner Suzanne S. Farid Graham McCreath Jarka Glassey 《Biotechnology journal》2020,15(3)
Multivariate data analysis (MVDA) is a highly valuable and significantly underutilized resource in biomanufacturing. It offers the opportunity to enhance understanding and leverage useful information from complex high‐dimensional data sets, recorded throughout all stages of therapeutic drug manufacture. To help standardize the application and promote this resource within the biopharmaceutical industry, this paper outlines a novel MVDA methodology describing the necessary steps for efficient and effective data analysis. The MVDA methodology is followed to solve two case studies: a “small data” and a “big data” challenge. In the “small data” example, a large‐scale data set is compared to data from a scale‐down model. This methodology enables a new quantitative metric for equivalence to be established by combining a two one‐sided test with principal component analysis. In the “big data” example, this methodology enables accurate predictions of critical missing data essential to a cloning study performed in the ambr15 system. These predictions are generated by exploiting the underlying relationship between the off‐line missing values and the on‐line measurements through the generation of a partial least squares model. In summary, the proposed MVDA methodology highlights the importance of data pre‐processing, restructuring, and visualization during data analytics to solve complex biopharmaceutical challenges. 相似文献
3.
4.
5.
Comparative Analyses of Data Independent Acquisition Mass Spectrometric Approaches: DIA,WiSIM‐DIA,and Untargeted DIA
下载免费PDF全文
![点击此处可从《Proteomics》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Data‐independent acquisition (DIA) is an emerging technology for quantitative proteomics. Current DIA focusses on the identification and quantitation of fragment ions that are generated from multiple peptides contained in the same selection window of several to tens of m/z. An alternative approach is WiSIM‐DIA, which combines conventional DIA with wide‐SIM (wide selected‐ion monitoring) windows to partition the precursor m/z space to produce high‐quality precursor ion chromatograms. However, WiSIM‐DIA has been underexplored; it remains unclear if it is a viable alternative to DIA. We demonstrate that WiSIM‐DIA quantified more than 24 000 unique peptides over five orders of magnitude in a single 2 h analysis of a neuronal synapse‐enriched fraction, compared to 31 000 in DIA. There is a strong correlation between abundance values of peptides quantified in both the DIA and WiSIM‐DIA datasets. Interestingly, the S/N ratio of these peptides is not correlated. We further show that peptide identification directly from DIA spectra identified >2000 proteins, which included unique peptides not found in spectral libraries generated by DDA. 相似文献
6.
7.
基因芯片数据分析与处理 总被引:7,自引:1,他引:6
基因芯片技术在基因表达分析等应用过程中产生大量的数据,如何处理和分析这些数据并从中提取出有价值的生物学信息是一个极为重要的问题.其过程包括原始数据的获取及处理、标准化数据的统计学分析、以及数据的存储和交流等. 相似文献
8.
9.
10.
基于粗糙集的数据挖掘技术及其在临床医学诊断中的应用 总被引:14,自引:0,他引:14
数据挖掘是一个利用各种分析工具在海量数据中发现模型和数据间关系的过程。临床医学上大量的数据中蕴含着丰富的信息,利用数据挖掘技术,特别是基于粗糙集理论的数据挖掘技术,通过数据训练集所训练得到的算法模型能够有效应用于疾病诊断。并获得很高的准确率,本文简单介绍了数据挖掘技术的基本原理和主要方法,以及粗糙集理论的基本原理,并给出了一个利用数据挖掘技术对肺部肿瘤进行诊断评价的应用实例。 相似文献
11.
12.
Luc Wouters Hinrich W. Göhlmann Luc Bijnens Stefan U. Kass Geert Molenberghs Paul J. Lewi 《Biometrics》2005,61(2):630-632
Summary This note is in response to Wouters et al. (2003, Biometrics 59, 1131–1139) who compared three methods for exploring gene expression data. Contrary to their summary that principal component analysis is not very informative, we show that it is possible to determine principal component analyses that are useful for exploratory analysis of microarray data. We also present another biplot representation, the GE‐biplot (Gene Expression biplot), that is a useful method for exploring gene expression data with the major advantage of being able to aid interpretation of both the samples and the genes relative to each other. 相似文献
13.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance. 相似文献
14.
15.
结合实例详细地叙述了生物实验数据的两因素方差分析计算方法,具有较强的实用意义。并介绍了如何分析计算结果。 相似文献
16.
陈郁 《氨基酸和生物资源》2008,30(1):33-36,46
基因芯片作为一种新兴的技术手段已经在植物学、动物学、医学和农学等多个研究领域中发挥了重要作用。本文就基因芯片数据分析的各个环节,包括芯片数据的预处理、归一化、差异基因的判断、聚类分析以及基因芯片在植物功能基因组研究中的应用进行了综述。 相似文献
17.
Aim Various methods are employed to recover patterns of area relationships in extinct and extant clades. The fidelity of these patterns can be adversely affected by sampling error in the form of missing data. Here we use simulation studies to evaluate the sensitivity of an analytical biogeographical method, namely tree reconciliation analysis (TRA), to this form of sampling failure. Location Simulation study. Methods To approximate varying degrees of taxonomic sampling failure within phylogenies varying in size and in redundancy of biogeographical signal, we applied sequential pruning protocols to artificial taxon–area cladograms displaying congruent patterns of area relationships. Initial trials assumed equal probability of sampling failure among all areas. Additional trials assigned weighted probabilities to each of the areas in order to explore the effects of uneven geographical sampling. Pruned taxon–area cladograms were then analysed with TRA to determine if the optimal area cladograms recovered match the original biogeographical signal, or if they represent false, ambiguous or uninformative signals. Results The results indicate a period of consistently accurate recovery of the true biogeographical signal, followed by a nonlinear decrease in signal recovery as more taxa are pruned. At high levels of sampling failure, false biogeographical signals are more likely to be recovered than the true signal. However, randomization testing for statistical significance greatly decreases the chance of accepting false signals. The primary inflection of the signal recovery curve, and its steepness and slope depend upon taxon–area cladogram size and area redundancy, as well as on the evenness of sampling. Uneven sampling across geographical areas is found to have serious deleterious effects on TRA, with the accuracy of recovery of biogeographical signal varying by an order of magnitude or more across different sampling regimes. Main conclusions These simulations reiterate the importance of taxon sampling in biogeographical analysis, and attest to the importance of considering geographical, as well as overall, sampling failure when interpreting the robustness of biogeographical signals. In addition to randomization testing for significance, we suggest the use of randomized sequential taxon deletions and the construction of signal decay curves as a means to assess the robustness of biogeographical signals for empirical data sets. 相似文献
18.
A procedure for comparing survival times between several groups of patients through rank analysis of covariance was introduced by WOOLSON and LACHENBRUCH (1983). It is a modification of Quade' rank analysis of covariance procedure (1967) and can be used for the analysis of right-censored data. In this paper, two additional modifications of Quade' original test statistic are proposed and compared to the original modification introduced by Woolson and Lachenbruch. These statistics are compared to one another and to the score test from Cox' proportional hazards model by way of a limited Monte Carlo study. One of the statistics, QR2, is recommended for general use for the rank analysis of covariance of right-censored survivorship data. 相似文献
19.
Bart Van Puyvelde Sander Willems Ralf Gabriels Simon Daled Laura De Clerck Sofie Vande Casteele An Staes Francis Impens Dieter Deforce Lennart Martens Sven Degroeve Maarten Dhaenens 《Proteomics》2020,20(3-4)
Data‐independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data‐dependent acquisition (DDA) libraries for deep peptide‐centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter‐laboratory comparison. 相似文献
20.
Summary In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject‐specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random‐effect model and assumes a time‐varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance–covariance matrix that has a closed form and can be consistently estimated by the usual plug‐in method. One additional advantage of the procedure is that it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time‐varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology. 相似文献