首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 32 毫秒
1.
Summary The aim of this article is to develop a spatial model for multi‐subject fMRI data. There has been extensive work on univariate modeling of each voxel for single and multi‐subject data, some work on spatial modeling of single‐subject data, and some recent work on spatial modeling of multi‐subject data. However, there has been no work on spatial models that explicitly account for inter‐subject variability in activation locations. In this article, we use the idea of activation centers and model the inter‐subject variability in activation locations directly. Our model is specified in a Bayesian hierarchical framework which allows us to draw inferences at all levels: the population level, the individual level, and the voxel level. We use Gaussian mixtures for the probability that an individual has a particular activation. This helps answer an important question that is not addressed by any of the previous methods: What proportion of subjects had a significant activity in a given region. Our approach incorporates the unknown number of mixture components into the model as a parameter whose posterior distribution is estimated by reversible jump Markov chain Monte Carlo. We demonstrate our method with a fMRI study of resolving proactive interference and show dramatically better precision of localization with our method relative to the standard mass‐univariate method. Although we are motivated by fMRI data, this model could easily be modified to handle other types of imaging data.  相似文献   

2.
Multi‐component, multi‐scale Raman spectroscopy modeling results from a monoclonal antibody producing CHO cell culture process including data from two development scales (3 L, 200 L) and a clinical manufacturing scale environment (2,000 L) are presented. Multivariate analysis principles are a critical component to partial least squares (PLS) modeling but can quickly turn into an overly iterative process, thus a simplified protocol is proposed for addressing necessary steps including spectral preprocessing, spectral region selection, and outlier removal to create models exclusively from cell culture process data without the inclusion of spectral data from chemically defined nutrient solutions or targeted component spiking studies. An array of single‐scale and combination‐scale modeling iterations were generated to evaluate technology capabilities and model scalability. Analysis of prediction errors across models suggests that glucose, lactate, and osmolality are well modeled. Model strength was confirmed via predictive validation and by examining performance similarity across single‐scale and combination‐scale models. Additionally, accurate predictive models were attained in most cases for viable cell density and total cell density; however, these components exhibited some scale‐dependencies that hindered model quality in cross‐scale predictions where only development data was used in calibration. Glutamate and ammonium models were also able to achieve accurate predictions in most cases. However, there are differences in the absolute concentration ranges of these components across the datasets of individual bioreactor scales. Thus, glutamate and ammonium PLS models were forced to extrapolate in cases where models were derived from small scale data only but used in cross‐scale applications predicting against manufacturing scale batches. © 2014 American Institute of Chemical Engineers Biotechnol. Prog., 31:566–577, 2015  相似文献   

3.
Aim Species distribution models are increasingly used to predict the impacts of global change on whole ecological communities by modelling the individualistic niche responses of large numbers of species. However, it is not clear whether this single‐species ensemble approach is preferable to community‐wide strategies that represent interspecific associations or shared responses to environmental gradients. Here, we test the performance of two multi‐species modelling approaches against equivalent single‐species models. Location Great Britain. Methods Single‐ and multi‐species distribution models were fitted for 701 native British plant species at a 10‐km grid scale. Two machine learning methods were used – classification and regression trees (CARTs) and artificial neural networks (ANNs). The single‐species versions are widely used in ecology but their multivariate extensions are less well known and have not previously been evaluated against one another. We compared their abilities to predict species distributions, community compositions and species richness in an independent geographical region reserved from model‐fitting. Results The single‐ and multi‐species models performed similarly, although the community models gave slightly poorer predictive accuracy by all measures. However, from the point of view of the whole community they were much simpler than the array of single‐species models, involving orders of magnitude fewer parameters. Multi‐species approaches also left greater residual spatial autocorrelation than the individualistic models and, contrary to expectation, were relatively less accurate for rarer species. However, the fitted multi‐species response curves had lower tendency for pronounced discontinuities that are unlikely to be a feature of realized niche responses. Main conclusions Although community distribution models were slightly less accurate than single‐species models, they offered a highly simplified way of modelling spatial patterns in British plant diversity. Moreover, an advantage of the multi‐species approach was that the modelling of shared environmental responses resolved more realistic response curves. However, there was a slight tendency for community models to predict rare species less accurately, which is potentially disadvantageous for conservation applications. We conclude that multi‐species distribution models may have potential for understanding and predicting the structure of ecological communities, but were slightly inferior to single‐species ensembles for our data.  相似文献   

4.
Many quantitative traits are composites of other traits that contribute differentially to genetic variation. Quantitative trait locus (QTL) mapping of these composite traits can benefit by incorporating the mechanistic process of how their formation is mediated by the underlying components. We propose a dissection model by which to map these interconnected components traits under a joint likelihood setting. The model can test how a composite trait is determined by pleiotropic QTLs for its component traits or jointly by different sets of QTLs each responsible for a different component. The model can visualize the pattern of time‐varying genetic effects for individual components and their impacts on composite traits. The dissection model was used to map two composite traits, stemwood volume growth decomposed into its stem height, stem diameter and stem form components for Euramerican poplar adult trees, and total lateral root length constituted by its average lateral root length and lateral root number components for Euphrates poplar seedlings. We found the pattern of how QTLs for different components contribute to phenotypic variation in composite traits. The detailed understanding of the genetic machineries of composite traits will not only help in the design of molecular breeding in plants and animals, but also shed light on the evolutionary processes of quantitative traits under natural selection.  相似文献   

5.
When analyzing the geographical variations of disease risk, one common problem is data sparseness. In such a setting, we investigate the possibility of using Bayesian shared spatial component models to strengthen inference and correct for any spatially structured sources of bias, when distinct data sources on one or more related diseases are available. Specifically, we apply our models to analyze the spatial variation of risk of two forms of scrapie infection affecting sheep in Wales (UK) using three surveillance sources on each disease. We first model each disease separately from the combined data sources and then extend our approach to jointly analyze diseases and data sources. We assess the predictive performances of several nested joint models through pseudo cross‐validatory predictive model checks.  相似文献   

6.
Short‐finned pilot whales (Globicephala macrorhynchus) have complex vocal repertoires that include calls with two time‐frequency contours known as two‐component calls. We attached digital acoustic recording tags (DTAGs) to 23 short‐finned pilot whales off Cape Hatteras, North Carolina, and assessed the similarity of two‐component calls within and among tags. Two‐component calls made up <3% of the total number of calls on 19 of the 23 tag records. For the remaining four tags, two‐component calls comprised 9%, 23%, 24%, and 57% of the total calls recorded. Measurements of six acoustic parameters for both the low and high frequency components of all two‐component calls from the five tags were compared using a generalized linear model. There were significant differences in the acoustic parameters of two‐component calls between tags, verifying that acoustic parameters were more similar for two‐component calls recorded on the same tag than for calls between tags. Spectrograms of all two‐component calls from the five tags were visually graded and independently categorized by five observers. A test of inter‐rater reliability showed substantial agreement, suggesting that each tag contained a predominant two‐component call type that was not shared across tags.  相似文献   

7.
1. We discuss aspects of resource selection based on observing a given vector of resource variables for different individuals at discrete time steps. A new technique for estimating preference of habitat characteristics, applicable when there are multiple individual observations, is proposed. 2. We first show how to estimate preference on the population and individual level when only a single site- or resource component is observed. A variance component model based on normal scores in used to estimate mean preference for the population as well as the heterogeneity among individuals defined by the intra-class correlation. 3. Next, a general technique is proposed for time series of observations of a vector with several components, correcting for the effect of correlations between these. The preference of each single component is analyzed under the assumption of arbitrarily complex selection of the other components. This approach is based on the theory for conditional distributions in the multi-normal model. 4. The method is demonstrated using a data set of radio-tagged dispersing juvenile goshawks and their site characteristics, and can be used as a general tool in resource or habitat selection analysis.  相似文献   

8.
Antagonistic pleiotropy (AP)—where alleles of a gene increase some components of fitness at a cost to others—can generate balancing selection, and contribute to the maintenance of genetic variation in fitness traits, such as survival, fecundity, fertility, and mate competition. Previous theory suggests that AP is unlikely to maintain variation unless antagonistic selection is strong, or AP alleles exhibit pronounced differences in genetic dominance between the affected traits. We show that conditions for balancing selection under AP expand under the likely scenario that the strength of selection on each fitness component differs between the sexes. Our model also predicts that the vast majority of balanced polymorphisms have sexually antagonistic effects on total fitness, despite the absence of sexual antagonism for individual fitness components. We conclude that AP polymorphisms are less difficult to maintain than predicted by prior theory, even under our conservative assumption that selection on components of fitness is universally sexually concordant. We discuss implications for the maintenance of genetic variation, and for inferences of sexual antagonism that are based on sex‐specific phenotypic selection estimates—many of which are based on single fitness components.  相似文献   

9.
Large‐scale proteomic approaches have been used to study signaling pathways. However, identification of biologically relevant hits from a single screen remains challenging due to limitations inherent in each individual approach. To overcome these limitations, we implemented an integrated, multi‐dimensional approach and used it to identify Wnt pathway modulators. The LUMIER protein–protein interaction mapping method was used in conjunction with two functional screens that examined the effect of overexpression and siRNA‐mediated gene knockdown on Wnt signaling. Meta‐analysis of the three data sets yielded a combined pathway score (CPS) for each tested component, a value reflecting the likelihood that an individual protein is a Wnt pathway regulator. We characterized the role of two proteins with high CPSs, Ube2m and Nkd1. We show that Ube2m interacts with and modulates β‐catenin stability, and that the antagonistic effect of Nkd1 on Wnt signaling requires interaction with Axin, itself a negative pathway regulator. Thus, integrated physical and functional mapping in mammalian cells can identify signaling components with high confidence and provides unanticipated insights into pathway regulators.  相似文献   

10.
One of the main goals in spatial epidemiology is to study the geographical pattern of disease risks. For such purpose, the convolution model composed of correlated and uncorrelated components is often used. However, one of the two components could be predominant in some regions. To investigate the predominance of the correlated or uncorrelated component for multiple scale data, we propose four different spatial mixture multiscale models by mixing spatially varying probability weights of correlated (CH) and uncorrelated heterogeneities (UH). The first model assumes that there is no linkage between the different scales and, hence, we consider independent mixture convolution models at each scale. The second model introduces linkage between finer and coarser scales via a shared uncorrelated component of the mixture convolution model. The third model is similar to the second model but the linkage between the scales is introduced through the correlated component. Finally, the fourth model accommodates for a scale effect by sharing both CH and UH simultaneously. We applied these models to real and simulated data, and found that the fourth model is the best model followed by the second model.  相似文献   

11.
We explore the problem of variable selection in a case‐control setting with mass spectrometry proteomic data consisting of paired measurements. Each pair corresponds to a distinct isotope cluster and each component within pair represents a summary of isotopic expression based on either the intensity or the shape of the cluster. Our objective is to identify a collection of isotope clusters associated with the disease outcome and at the same time assess the predictive added‐value of shape beyond intensity while maintaining predictive performance. We propose a Bayesian model that exploits the paired structure of our data and utilizes prior information on the relative predictive power of each source by introducing multiple layers of selection. This allows us to make simultaneous inference on which are the most informative pairs and for which—and to what extent—shape has a complementary value in separating the two groups. We evaluate the Bayesian model on pancreatic cancer data. Results from the fitted model show that most predictive potential is achieved with a subset of just six (out of 1289) pairs while the contribution of the intensity components is much higher than the shape components. To demonstrate how the method behaves under a controlled setting we consider a simulation study. Results from this study indicate that the proposed approach can successfully select the truly predictive pairs and accurately estimate the effects of both components although, in some cases, the model tends to overestimate the inclusion probability of the second component.  相似文献   

12.
Perhaps the most important recent advance in species delimitation has been the development of model‐based approaches to objectively diagnose species diversity from genetic data. Additionally, the growing accessibility of next‐generation sequence data sets provides powerful insights into genome‐wide patterns of divergence during speciation. However, applying complex models to large data sets is time‐consuming and computationally costly, requiring careful consideration of the influence of both individual and population sampling, as well as the number and informativeness of loci on species delimitation conclusions. Here, we investigated how locus number and information content affect species delimitation results for an endangered Mexican salamander species, Ambystoma ordinarium. We compared results for an eight‐locus, 137‐individual data set and an 89‐locus, seven‐individual data set. For both data sets, we used species discovery methods to define delimitation models and species validation methods to rigorously test these hypotheses. We also used integrated demographic model selection tools to choose among delimitation models, while accounting for gene flow. Our results indicate that while cryptic lineages may be delimited with relatively few loci, sampling larger numbers of loci may be required to ensure that enough informative loci are available to accurately identify and validate shallow‐scale divergences. These analyses highlight the importance of striking a balance between dense sampling of loci and individuals, particularly in shallowly diverged lineages. They also suggest the presence of a currently unrecognized, endangered species in the western part of A. ordinarium's range.  相似文献   

13.
The three‐state progressive model is a special multi‐state model with important applications in Survival Analysis. It provides a suitable representation of the individual’s history when an intermediate event (with a possible influence on the survival prognosis) is experienced before the main event of interest. Estimation of transition probabilities in this and other multi‐state models is usually performed through the Aalen–Johansen estimator. However, Aalen–Johansen may be biased when the underlying process is not Markov. In this paper, we provide a new approach for testing Markovianity in the three‐state progressive model. The new method is based on measuring the future‐past association along time. This results in a deep inspection of the process that often reveals a non‐Markovian behaviour with different trends in the association measure. A test of significance for zero future‐past association at each time point is introduced, and a significance trace is proposed accordingly. The finite sample performance of the test is investigated through simulations. We illustrate the new method through real data analysis.  相似文献   

14.
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.  相似文献   

15.
Habitat modelling is increasingly relevant in biodiversity and conservation studies. A typical application is to predict potential zones of specific conservation interest. With many environmental covariates, a large number of models can be investigated but multi‐model inference may become impractical. Shrinkage regression overcomes this issue by dealing with the identification and accurate estimation of effect size for prediction. In a Bayesian framework we investigated the use of a shrinkage prior, the Horseshoe, for variable selection in spatial generalized linear models (GLM). As study cases, we considered 5 datasets on small pelagic fish abundance in the Gulf of Lion (Mediterranean Sea, France) and 9 environmental inputs. We compared the predictive performances of a simple kriging model, a full spatial GLM model with independent normal priors for regression coefficients, a full spatial GLM model with a Horseshoe prior for regression coefficients and 2 zero‐inflated models (spatial and non‐spatial) with a Horseshoe prior. Predictive performances were evaluated by cross‐validation on a hold‐out subset of the data: models with a Horseshoe prior performed best, and the full model with independent normal priors worst. With an increasing number of inputs, extrapolation quickly became pervasive as we tried to predict from novel combinations of covariate values. By shrinking regression coefficients with a Horseshoe prior, only one model needed to be fitted to the data in order to obtain reasonable and accurate predictions, including extrapolations.  相似文献   

16.
Summary Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time‐varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi‐Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi‐Markovian manner. The underlying semi‐Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi‐Markov chain represent—in the corresponding growth phase—both the influence of time‐varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi‐Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation–maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates.  相似文献   

17.
Unhealthy alcohol use is one of the leading causes of morbidity and mortality in the United States. Brief interventions with high‐risk drinkers during an emergency department (ED) visit are of great interest due to their possible efficacy and low cost. In a collaborative study with patients recruited at 14 academic ED across the United States, we examined the self‐reported number of drinks per week by each patient following the exposure to a brief intervention. Count data with overdispersion have been mostly analyzed with generalized linear mixed models (GLMMs), of which only a limited number of link functions are available. Different choices of link function provide different fit and predictive power for a particular dataset. We propose a class of link functions from an alternative way to incorporate random effects in a GLMM, which encompasses many existing link functions as special cases. The methodology is naturally implemented in a Bayesian framework, with competing links selected with Bayesian model selection criteria such as the conditional predictive ordinate (CPO). In application to the ED intervention study, all models suggest that the intervention was effective in reducing the number of drinks, but some new models are found to significantly outperform the traditional model as measured by CPO. The validity of CPO in link selection is confirmed in a simulation study that shared the same characteristics as the count data from high‐risk drinkers. The dataset and the source code for the best fitting model are available in Supporting Information.  相似文献   

18.
Ensemble habitat selection modeling is becoming a popular approach among ecologists to answer different questions. Since we are still in the early stages of development and application of ensemble modeling, there remain many questions regarding performance and parameterization. One important gap, which this paper addresses, is how the number of background points used to train models influences the performance of the ensemble model. We used an empirical presence-only dataset and three different selections of background points to train scale-optimized habitat selection models using six modeling algorithms (GLM, GAM, MARS, ANN, Random Forest, and MaxEnt). We tested four ensemble models using different combinations of the component models: (a) equal numbers of background points and presences, (b) background points equaled ten times the number of presences, (c) 10,000 background points, and (d) optimized background points for each component model. Among regression-based approaches, MARS performed best when built with 10,000 background points. Among machine learning models, RF performed the best when built with equal presences and background points. Among the four ensemble models, AUC indicated that the best performing model was the ensemble with each component model including the optimized number of background points, while TSS increased as the number of background points models increased. We found that an ensemble of models, each trained with an optimal number of background points, outperformed ensembles of models trained with the same number of background points, although differences in performance were slight. When using a single modeling method, RF with equal number of presences and background points can perform better than an ensemble model, but the performance fluctuates when the number of background points is not properly selected. On the other hand, ensemble modeling provides consistently high accuracy regardless of background point sampling approach. Further, optimizing the number of background points for each component model within an ensemble model can provide the best model improvement. We suggest evaluating more models across multiple species to investigate how background point selection might affect ensemble models in different scenarios.  相似文献   

19.
Mixed models are now well‐established methods in ecology and evolution because they allow accounting for and quantifying within‐ and between‐individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi‐modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life‐history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life‐history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long‐term studies of large mammals to illustrate the potential of using mixture models for assessing within‐population variation in life‐history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life‐history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users.  相似文献   

20.
The trade‐off between offspring size and number is a central component of life‐history theory, postulating that larger investment into offspring size inevitably decreases offspring number. This trade‐off is generally discussed in terms of genetic, physiological or morphological constraints; however, as among‐individual differences can mask individual trade‐offs, the underlying mechanisms may be difficult to reveal. In this study, we use multivariate analyses to investigate whether there is a trade‐off between offspring size and number in a population of sand lizards by separating among‐ and within‐individual patterns using a 15‐year data set collected in the wild. We also explore the ecological and evolutionary causes and consequences of this trade‐off by investigating how a female's resource (condition)‐ vs. age‐related size (snout‐vent length) influences her investment into offspring size vs. number (OSN), whether these traits are heritable and under selection and whether the OSN trade‐off has a genetic component. We found a negative correlation between offspring size and number within individual females and physical constraints (size of body cavity) appear to limit the number of eggs that a female can produce. This suggests that the OSN trade‐off occurs due to resource constraints as a female continues to grow throughout life and, thus, produces larger clutches. In contrast to the assumptions of classic OSN theory, we did not detect selection on offspring size; however, there was directional selection for larger clutch sizes. The repeatabilities of both offspring size and number were low and we did not detect any additive genetic variance in either trait. This could be due to strong selection (past or current) on these life‐history traits, or to insufficient statistical power to detect significant additive genetic effects. Overall, the findings of this study are an important illustration of how analyses of within‐individual patterns can reveal trade‐offs and their underlying causes, with potential evolutionary and ecological consequences that are otherwise hidden by among‐individual variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号