共查询到20条相似文献,搜索用时 0 毫秒
1.
Most statistical solutions to the problem of statistical inferencewith missing data involve integration or expectation. This canbe done in many ways: directly or indirectly, analytically ornumerically, deterministically or stochastically. Missing-dataproblems can be formulated in terms of latent random variables,so that hierarchical likelihood methods of Lee & Nelder(1996) can be applied to missing-value problems to provide onesolution to the problem of integration of the likelihood. Theresulting methods effectively use a Laplace approximation tothe marginal likelihood with an additional adjustment to themeasures of precision to accommodate the estimation of the fixedeffects parameters. We first consider missing at random caseswhere problems are simpler to handle because the integrationdoes not need to involve the missing-value mechanism and thenconsider missing not at random cases. We also study tobit regressionand refit the missing not at random selection model to the antidepressanttrial data analyzed in Diggle & Kenward (1994). 相似文献
2.
Dianne M. Finkelstein Rui Wang Linda H. Ficociello David A. Schoenfeld 《Biometrics》2010,66(3):726-732
Summary : Often clinical studies periodically record information on disease progression as well as results from laboratory studies that are believed to reflect the progressing stages of the disease. A primary aim of such a study is to determine the relationship between the lab measurements and a disease progression. If there were no missing or censored data, these analyses would be straightforward. However, often patients miss visits, and return after their disease has progressed. In this case, not only is their progression time interval censored, but their lab test series is also incomplete. In this article, we propose a simple test for the association between a longitudinal marker and an event time from incomplete data. We derive the test using a very intuitive technique of calculating the expected complete data score conditional on the observed incomplete data (conditional expected score test, CEST). The problem was motivated by data from an observational study of patients with diabetes. 相似文献
3.
《IRBM》2022,43(1):62-74
BackgroundThe prediction of breast cancer subtypes plays a key role in the diagnosis and prognosis of breast cancer. In recent years, deep learning (DL) has shown good performance in the intelligent prediction of breast cancer subtypes. However, most of the traditional DL models use single modality data, which can just extract a few features, so it cannot establish a stable relationship between patient characteristics and breast cancer subtypes.DatasetWe used the TCGA-BRCA dataset as a sample set for molecular subtype prediction of breast cancer. It is a public dataset that can be obtained through the following link: https://portal.gdc.cancer.gov/projects/TCGA-BRCAMethodsIn this paper, a Hybrid DL model based on the multimodal data is proposed. We combine the patient's gene modality data with image modality data to construct a multimodal fusion framework. According to the different forms and states, we set up feature extraction networks respectively, and then we fuse the output of the two feature networks based on the idea of weighted linear aggregation. Finally, the fused features are used to predict breast cancer subtypes. In particular, we use the principal component analysis to reduce the dimensionality of high-dimensional data of gene modality and filter the data of image modality. Besides, we also improve the traditional feature extraction network to make it show better performance.ResultsThe results show that compared with the traditional DL model, the Hybrid DL model proposed in this paper is more accurate and efficient in predicting breast cancer subtypes. Our model achieved a prediction accuracy of 88.07% in 10 times of 10-fold cross-validation. We did a separate AUC test for each subtype, and the average AUC value obtained was 0.9427. In terms of subtype prediction accuracy, our model is about 7.45% higher than the previous average. 相似文献
4.
Done Bogdan Khatri Purvesh Done Arina Draghici Sorin 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(1):91-99
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach. 相似文献
5.
A. L. Bello 《Biometrical journal. Biometrische Zeitschrift》1994,36(4):453-464
Bootstrap is a time-honoured distribution-free approach for attaching standard error to any statistic of interest, but has not received much attention for data with missing values especially when using imputation techniques to replace missing values. We propose a proportional bootstrap method that allows effective use of imputation techniques for all bootstrap samples. Five detcnninistic imputation techniques are examined and particular emphasis is placed on the estimation of standard error for correlation coefficient. Some real data examples are presented. Other possible applications of the proposed bootstrap method are discussed. 相似文献
6.
针对基因芯片数据缺失问题,利用蛋白质相互作用关系与基因表达的内在联系,提出了一种利用蛋白质相互作用信息提高基因芯片缺失数据估计精度的方法.将蛋白质间的相互作用关系与基因表达数据间的距离相结合来计算基因间的表达相似度,根据这个新的相似性度量标准为含有缺失数据的基因选择更为合适的用于估计缺失值的基因集合.将新的相似性度量标准与传统的KNNimpute、 LLSimpute方法相结合,描述了对应的改进算法PPI-KNNimpute、 PPI-LLSimpute.对真实的数据集测试表明,蛋白质相互作用信息能有效改善基因缺失数据估计的精度. 相似文献
7.
《Biophysical journal》2020,118(9):2086-2102
Reprogramming of human somatic cells to induced pluripotent stem cells (iPSCs) generates valuable resources for disease modeling, toxicology, cell therapy, and regenerative medicine. However, the reprogramming process can be stochastic and inefficient, creating many partially reprogrammed intermediates and non-reprogrammed cells in addition to fully reprogrammed iPSCs. Much of the work to identify, evaluate, and enrich for iPSCs during reprogramming relies on methods that fix, destroy, or singularize cell cultures, thereby disrupting each cell’s microenvironment. Here, we develop a micropatterned substrate that allows for dynamic live-cell microscopy of hundreds of cell subpopulations undergoing reprogramming while preserving many of the biophysical and biochemical cues within the cells’ microenvironment. On this substrate, we were able to both watch and physically confine cells into discrete islands during the reprogramming of human somatic cells from skin biopsies and blood draws obtained from healthy donors. Using high-content analysis, we identified a combination of eight nuclear characteristics that can be used to generate a computational model to predict the progression of reprogramming and distinguish partially reprogrammed cells from those that are fully reprogrammed. This approach to track reprogramming in situ using micropatterned substrates could aid in biomanufacturing of therapeutically relevant iPSCs and be used to elucidate multiscale cellular changes (cell-cell interactions as well as subcellular changes) that accompany human cell fate transitions. 相似文献
8.
One of the criticisms of industry-sponsored human subject testing of toxicants is based on the perception that it is often motivated by an attempt to raise the acceptable exposure limit for the chemical. When Reference Doses (RfDs) or Reference Concentrations (RfCs) are based upon no-effect levels from human rather than animal data, an animal-to-human uncertainty factor (usually 10) is not required, which could conceivably result in a higher safe exposure limit. There has been little in the way of study of the effect of using human vs. animal data on the development of RfDs and RfCs to lend empirical support to this argument. We have recently completed an analysis comparing RfDs and RfCs derived from human data with toxicity values for the same chemicals based on animal data. The results, published in detail elsewhere, are summarized here. We found that the use of human data did not always result in higher RfDs or RfCs. In 36% of the comparisons, human-based RfDs or RfCs were lower than the corresponding animal-based toxicity values, and were more than 3-fold lower in 23% of the comparisons. In 10 out of 43 possible comparisons (23%), insufficient experimental animal data are readily available or data are inappropriate to estimate either RfDs or RfCs. Although there are practical limitations in conducting this type of analysis, it nonetheless suggests that the use of human data does not routinely lead to higher toxicity values. Given the inherent ability of human data to reduce uncertainty regarding risks from human exposures, its use in conjunction with data gathered from experimental animals is a public health protective policy that should be encouraged. 相似文献
9.
Jose Eduardo de la Torre-Bárcena Sergios-Orestis Kolokotronis Ernest K. Lee Dennis Wm. Stevenson Eric D. Brenner Manpreet S. Katari Gloria M. Coruzzi Rob DeSalle 《PloS one》2009,4(6)
Background
Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.Methodology
We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.Conclusions
We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized. 相似文献10.
Objectives
Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.Methods
In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.Results
Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).Conclusions
Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. 相似文献11.
The high-dimensional search space involved in markerless full-body articulated human motion tracking from multiple-views video sequences has led to a number of solutions based on metaheuristics, the most recent form of which is Particle Swarm Optimization (PSO). However, the classical PSO suffers from premature convergence and it is trapped easily into local optima, significantly affecting the tracking accuracy. To overcome these drawbacks, we have developed a method for the problem based on Hierarchical Multi-Swarm Cooperative Particle Swarm Optimization (H-MCPSO). The tracking problem is formulated as a non-linear 34-dimensional function optimization problem where the fitness function quantifies the difference between the observed image and a projection of the model configuration. Both the silhouette and edge likelihoods are used in the fitness function. Experiments using Brown and HumanEva-II dataset demonstrated that H-MCPSO performance is better than two leading alternative approaches—Annealed Particle Filter (APF) and Hierarchical Particle Swarm Optimization (HPSO). Further, the proposed tracking method is capable of automatic initialization and self-recovery from temporary tracking failures. Comprehensive experimental results are presented to support the claims. 相似文献
12.
In this study, a new concept for particle size prediction during the fluid bed granulation is presented. Using the process
measurements data obtained from a design of experimental study, predictive partial least squares models were developed for
spraying and drying phases. Measured and calculated process parameters from an instrumented fluid bed granulation environment
were used as explaining factors, whereas an in-line particle size data determined by spatial filtering technique were used
as response. Modeling was carried out by testing all possible combinations of two to six process parameters (factors) of the
total of 41 parameters. Eleven batches were used for model development and four batches for model testing. The selected models
predicted particle size (d
50) well, especially during the spraying phase (Q
2 = 0.86). While the measured in-line d
50 data were markedly influenced by different process failures, e.g., impaired fluidization activity, the predicted data remained
more consistent. This introduced concept can be applied in fluid bed granulation processes if the granulation environment
is soundly instrumented and if reliable real-time particle size data from the design of experiment batches are retrieved for
the model development. 相似文献
13.
With ever-increasing available data, predicting individuals'' preferences and helping them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a “new” computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals'' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups. 相似文献
14.
A thermal convection loop is a annular chamber filled with water, heated on the bottom half and cooled on the top half. With sufficiently large forcing of heat, the direction of fluid flow in the loop oscillates chaotically, dynamics analogous to the Earth’s weather. As is the case for state-of-the-art weather models, we only observe the statistics over a small region of state space, making prediction difficult. To overcome this challenge, data assimilation (DA) methods, and specifically ensemble methods, use the computational model itself to estimate the uncertainty of the model to optimally combine these observations into an initial condition for predicting the future state. Here, we build and verify four distinct DA methods, and then, we perform a twin model experiment with the computational fluid dynamics simulation of the loop using the Ensemble Transform Kalman Filter (ETKF) to assimilate observations and predict flow reversals. We show that using adaptively shaped localized covariance outperforms static localized covariance with the ETKF, and allows for the use of less observations in predicting flow reversals. We also show that a Dynamic Mode Decomposition (DMD) of the temperature and velocity fields recovers the low dimensional system underlying reversals, finding specific modes which together are predictive of reversal direction. 相似文献
15.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration. 相似文献
16.
Noncancer risk assessments are generally forced to rely on animal bioassay data to estimate a Tolerable Daily Intake or Reference Dose, as a proxy for the threshold of human response. In cases where animal bioassays are missing from a complete data base, the critical NOAEL (no-observed-adverse-effect level) needs to be adjusted to account for the impact of the missing bioassay(s). This paper presents two approaches for making such adjustments. One is based on regression analysis and seeks to provide a point estimate of the adjustment needed. The other relies on non-parametric analysis and is intended to provide a distributional estimate of the needed adjustment. The adjustment needed is dependent on the definition of a complete data base, the number of bioassays missing, the specific bioassays which are missing, and the method used for interspecies scaling. The results from either approach can be used in conjunction with current practices for computing the TDI or RfD, or as an element of distributional approaches for estimating the human population threshold. 相似文献
17.
18.
Aidan G. O’Keeffe Daniel M. Farewell Brian D. M. Tom Vernon T. Farewell 《Statistics in biosciences》2016,8(2):310-332
In longitudinal randomised trials and observational studies within a medical context, a composite outcome—which is a function of several individual patient-specific outcomes—may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome. 相似文献
19.
Sandip K. Dash Minakshi Sharma Shashi Khare Ashok Kumar 《Indian journal of microbiology》2013,53(2):238-240
The usual diagnosis of life-threatening human brain bacterial meningitis are expensive, time consuming or non-confirmatory. A quick PCR based diagnosis of meningitis in cerebrospinal fluids (CSF) using specific primers of virulent Omp85 gene of Neisseria meningitidis can detect as low as 1.0 ng of genomic DNA (G-DNA) in 80 min for confirmation of bacterial meningitis caused by N. meningitidis infection. The 257 bp amplicon of Omp85 gene does not show homology with other suspected pathogens in CSF and can be used as a specific genetic marker for diagnosis of the disease. 相似文献
20.
Nick W. Ruktanonchai Patrick DeLeenheer Andrew J. Tatem Victor A. Alegana T. Trevor Caughlin Elisabeth zu Erbach-Schoenberg Christopher Louren?o Corrine W. Ruktanonchai David L. Smith 《PLoS computational biology》2016,12(4)
Humans move frequently and tend to carry parasites among areas with endemic malaria and into areas where local transmission is unsustainable. Human-mediated parasite mobility can thus sustain parasite populations in areas where they would otherwise be absent. Data describing human mobility and malaria epidemiology can help classify landscapes into parasite demographic sources and sinks, ecological concepts that have parallels in malaria control discussions of transmission foci. By linking transmission to parasite flow, it is possible to stratify landscapes for malaria control and elimination, as sources are disproportionately important to the regional persistence of malaria parasites. Here, we identify putative malaria sources and sinks for pre-elimination Namibia using malaria parasite rate (PR) maps and call data records from mobile phones, using a steady-state analysis of a malaria transmission model to infer where infections most likely occurred. We also examined how the landscape of transmission and burden changed from the pre-elimination setting by comparing the location and extent of predicted pre-elimination transmission foci with modeled incidence for 2009. This comparison suggests that while transmission was spatially focal pre-elimination, the spatial distribution of cases changed as burden declined. The changing spatial distribution of burden could be due to importation, with cases focused around importation hotspots, or due to heterogeneous application of elimination effort. While this framework is an important step towards understanding progressive changes in malaria distribution and the role of subnational transmission dynamics in a policy-relevant way, future work should account for international parasite movement, utilize real time surveillance data, and relax the steady state assumption required by the presented model. 相似文献