首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Modern high-throughput biotechnologies such as microarray and next-generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical “large p, small n” problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package “adaptiveHM,” which is freely available from https://github.com/benliemory/adaptiveHM.  相似文献   

2.
Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than −0.4 in native-like binding modes.  相似文献   

3.
Neural crest cells exhibit dramatic migration behaviors as they populate their distant targets. Using a line of zebrafish expressing green fluorescent protein (sox10:EGFP) in neural crest cells we developed an assay to analyze and quantify cell migration as a population, and use it here to characterize in detail the subtle defects in cell migration caused by ethanol exposure during early development. The challenge was to quantify changes in the in vivo migration of all Sox10:EGFP expressing cells in the visual field of time-lapse movies. To perform this analysis we used an Optical Flow algorithm for motion detection and combined the analysis with a fit to an affine transformation. Through this analysis we detected and quantified significant differences in the cell migrations of Sox10:EGFP positive cranial neural crest populations in ethanol treated versus untreated embryos. Specifically, treatment affected migration by increasing the left-right asymmetry of the migrating cells and by altering the direction of cell movements. Thus, by applying this novel computational analysis, we were able to quantify the movements of populations of cells, allowing us to detect subtle changes in cell behaviors. Because cranial neural crest cells contribute to the formation of the frontal mass these subtle differences may underlie commonly observed facial asymmetries in normal human populations.  相似文献   

4.
We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D''Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.  相似文献   

5.
Migration is a fundamental stage in the life history of several taxa, including birds, and is under strong selective pressure. At present, the only data that may allow for both an assessment of patterns of bird migration and for retrospective analyses of changes in migration timing are the databases of ring recoveries. We used ring recoveries of the Barn Swallow Hirundo rustica collected from 1908–2008 in Europe to model the calendar date at which a given proportion of birds is expected to have reached a given geographical area (‘progression of migration’) and to investigate the change in timing of migration over the same areas between three time periods (1908–1969, 1970–1990, 1991–2008). The analyses were conducted using binomial conditional autoregressive (CAR) mixed models. We first concentrated on data from the British Isles and then expanded the models to western Europe and north Africa. We produced maps of the progression of migration that disclosed local patterns of migration consistent with those obtained from the analyses of the movements of ringed individuals. Timing of migration estimated from our model is consistent with data on migration phenology of the Barn Swallow available in the literature, but in some cases it is later than that estimated by data collected at ringing stations, which, however, may not be representative of migration phenology over large geographical areas. The comparison of median migration date estimated over the same geographical area among time periods showed no significant advancement of spring migration over the whole of Europe, but a significant advancement of autumn migration in southern Europe. Our modelling approach can be generalized to any records of ringing date and locality of individuals including those which have not been recovered subsequently, as well as to geo-referenced databases of sightings of migratory individuals.  相似文献   

6.
Comparative effectiveness research aims, in part, to provide evidence most relevant to clinical decision making. One decision relevant to hypertensive patients is which therapeutic drug class is the most safe and effective. In addition, once a drug class has been chosen it would be useful to know whether there are differences in effectiveness between drugs within class. Randomized trials are unlikely to provide sufficient evidence for answering these questions. We therefore propose a modeling approach that can be used to address the questions using administrative databases. We propose a Bayesian hierarchical model, where drugs are nested within their corresponding class. We account for the type of missing data that are common in these databases using a pattern mixture model. The methodology is illustrated using data from a comparative effectiveness study of antihypertensive medications.  相似文献   

7.
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene–gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.  相似文献   

8.
9.
10.

The performance of surface plasmon resonance (SPR) sensors has great dependence on its plasmonic material’s frequency response, which is described by the complex dielectric function. Through history, researchers developed and enhanced mathematical models to accurately describe the material dielectric function. Although many papers compared the accuracy of different dielectric function models and stated its limitations, none of it addressed the effect of dielectric function model on the SPR sensor’s characteristics. In this paper, we investigated the performance of the three most used dielectric function models (Drude, Lorentz-Drude, and Brendel-Bormann) and their effect on the theoretically obtained sensor parameters when used in a gold SPR sensor’s model and validated it with the experimentally measured dielectric function. The result showed that using less accurate dielectric function’s model has a drastic effect on the theoretically obtained sensor’s parameters. Among the three models, the widely used Drude model was not the most accurate; alternatively, Brendel-Bormann model was the most accurate.

  相似文献   

11.
12.
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.  相似文献   

13.
Understanding the spatial pattern of species distributions is fundamental in biogeography, and conservation and resource management applications. Most species distribution models (SDMs) require or prefer species presence and absence data for adequate estimation of model parameters. However, observations with unreliable or unreported species absences dominate and limit the implementation of SDMs. Presence-only models generally yield less accurate predictions of species distribution, and make it difficult to incorporate spatial autocorrelation. The availability of large amounts of historical presence records for freshwater fishes of the United States provides an opportunity for deriving reliable absences from data reported as presence-only, when sampling was predominantly community-based. In this study, we used boosted regression trees (BRT), logistic regression, and MaxEnt models to assess the performance of a historical metacommunity database with inferred absences, for modeling fish distributions, investigating the effect of model choice and data properties thereby. With models of the distribution of 76 native, non-game fish species of varied traits and rarity attributes in four river basins across the United States, we show that model accuracy depends on data quality (e.g., sample size, location precision), species’ rarity, statistical modeling technique, and consideration of spatial autocorrelation. The cross-validation area under the receiver-operating-characteristic curve (AUC) tended to be high in the spatial presence-absence models at the highest level of resolution for species with large geographic ranges and small local populations. Prevalence affected training but not validation AUC. The key habitat predictors identified and the fish-habitat relationships evaluated through partial dependence plots corroborated most previous studies. The community-based SDM framework broadens our capability to model species distributions by innovatively removing the constraint of lack of species absence data, thus providing a robust prediction of distribution for stream fishes in other regions where historical data exist, and for other taxa (e.g., benthic macroinvertebrates, birds) usually observed by community-based sampling designs.  相似文献   

14.
15.
16.
This paper deals with the use of queuing network (QN) models for quantitatively evaluating the impact of the material handling system (MHS) on the steady-state performance of a flexible manufacturing system (FMS) at the strategic and tactical decision levels. A direct exploitation of the workload data provided by industrial experts often results in QN models that cannot be analyzed efficiently because of the prohibitive number of customer classes. In this paper, we propose a systematic data aggregation approach for deriving the aggregated characteristics of the service offered by a device of the MHS at steady state. This generic aggregation scheme explicitly captures empty trips on the device, MHS devices that have different motions depending on whether they travel empty or loaded, and enables further, consistent use of the various central server QN models appearing in the literature. The quality of the estimates provided by this automated data aggregation approach is tested on several examples, and their integration into a QN model for performance evaluation is illustrated on an FMS presented in the literature.  相似文献   

17.
Several natural language processing tools, both commercial and freely available, are used to extract protein interactions from publications. Methods used by these tools include pattern matching to dynamic programming with individual recall and precision rates. A methodical survey of these tools, keeping in mind the minimum interaction information a researcher would need, in comparison to manual analysis has not been carried out. We compared data generated using some of the selected NLP tools with manually curated protein interaction data (PathArt and IMaps) to comparatively determine the recall and precision rate. The rates were found to be lower than the published scores when a normalized definition for interaction is considered. Each data point captured wrongly or not picked up by the tool was analyzed. Our evaluation brings forth critical failures of NLP tools and provides pointers for the development of an ideal NLP tool.  相似文献   

18.
19.
Comparative metabolic modelling is emerging as a novel field, supported by the development of reliable and standardized approaches for constructing genome-scale metabolic models in high throughput. New software solutions are needed to allow efficient comparative analysis of multiple models in the context of multiple cellular objectives. Here, we present the user-friendly software framework Multi-Metabolic Evaluator (MultiMetEval), built upon SurreyFBA, which allows the user to compose collections of metabolic models that together can be subjected to flux balance analysis. Additionally, MultiMetEval implements functionalities for multi-objective analysis by calculating the Pareto front between two cellular objectives. Using a previously generated dataset of 38 actinobacterial genome-scale metabolic models, we show how these approaches can lead to exciting novel insights. Firstly, after incorporating several pathways for the biosynthesis of natural products into each of these models, comparative flux balance analysis predicted that species like Streptomyces that harbour the highest diversity of secondary metabolite biosynthetic gene clusters in their genomes do not necessarily have the metabolic network topology most suitable for compound overproduction. Secondly, multi-objective analysis of biomass production and natural product biosynthesis in these actinobacteria shows that the well-studied occurrence of discrete metabolic switches during the change of cellular objectives is inherent to their metabolic network architecture. Comparative and multi-objective modelling can lead to insights that could not be obtained by normal flux balance analyses. MultiMetEval provides a powerful platform that makes these analyses straightforward for biologists. Sources and binaries of MultiMetEval are freely available from https://github.com/PiotrZakrzewski/MetEval/downloads.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号