共查询到20条相似文献,搜索用时 15 毫秒
1.
Mann RP 《PloS one》2011,6(8):e22827
The emergence of similar collective patterns from different self-propelled particle models of animal groups points to a restricted set of "universal" classes for these patterns. While universality is interesting, it is often the fine details of animal interactions that are of biological importance. Universality thus presents a challenge to inferring such interactions from macroscopic group dynamics since these can be consistent with many underlying interaction models. We present a Bayesian framework for learning animal interaction rules from fine scale recordings of animal movements in swarms. We apply these techniques to the inverse problem of inferring interaction rules from simulation models, showing that parameters can often be inferred from a small number of observations. Our methodology allows us to quantify our confidence in parameter fitting. For example, we show that attraction and alignment terms can be reliably estimated when animals are milling in a torus shape, while interaction radius cannot be reliably measured in such a situation. We assess the importance of rate of data collection and show how to test different models, such as topological and metric neighbourhood models. Taken together our results both inform the design of experiments on animal interactions and suggest how these data should be best analysed. 相似文献
2.
Multimodality or multiconstruct data arise increasingly in functional neuroimaging studies to characterize brain activity under different cognitive states. Relying on those high-resolution imaging collections, it is of great interest to identify predictive imaging markers and intermodality interactions with respect to behavior outcomes. Currently, most of the existing variable selection models do not consider predictive effects from interactions, and the desired higher-order terms can only be included in the predictive mechanism following a two-step procedure, suffering from potential misspecification. In this paper, we propose a unified Bayesian prior model to simultaneously identify main effect features and intermodality interactions within the same inference platform in the presence of high-dimensional data. To accommodate the brain topological information and correlation between modalities, our prior is designed by compiling the intermediate selection status of sequential partitions in light of the data structure and brain anatomical architecture, so that we can improve posterior inference and enhance biological plausibility. Through extensive simulations, we show the superiority of our approach in main and interaction effects selection, and prediction under multimodality data. Applying the method to the Adolescent Brain Cognitive Development (ABCD) study, we characterize the brain functional underpinnings with respect to general cognitive ability under different memory load conditions. 相似文献
3.
Background
Information for mapping of quantitative trait loci (QTL) comes from two sources: linkage disequilibrium (non-random association of allele states) and cosegregation (non-random association of allele origin). Information from LD can be captured by modeling conditional means and variances at the QTL given marker information. Similarly, information from cosegregation can be captured by modeling conditional covariances. Here, we consider a Bayesian model based on gene frequency (BGF) where both conditional means and variances are modeled as a function of the conditional gene frequencies at the QTL. The parameters in this model include these gene frequencies, additive effect of the QTL, its location, and the residual variance. Bayesian methodology was used to estimate these parameters. The priors used were: logit-normal for gene frequencies, normal for the additive effect, uniform for location, and inverse chi-square for the residual variance. Computer simulation was used to compare the power to detect and accuracy to map QTL by this method with those from least squares analysis using a regression model (LSR).Results
To simplify the analysis, data from unrelated individuals in a purebred population were simulated, where only LD information contributes to map the QTL. LD was simulated in a chromosomal segment of 1 cM with one QTL by random mating in a population of size 500 for 1000 generations and in a population of size 100 for 50 generations. The comparison was studied under a range of conditions, which included SNP density of 0.1, 0.05 or 0.02 cM, sample size of 500 or 1000, and phenotypic variance explained by QTL of 2 or 5%. Both 1 and 2-SNP models were considered. Power to detect the QTL for the BGF, ranged from 0.4 to 0.99, and close or equal to the power of the regression using least squares (LSR). Precision to map QTL position of BGF, quantified by the mean absolute error, ranged from 0.11 to 0.21 cM for BGF, and was better than the precision of LSR, which ranged from 0.12 to 0.25 cM.Conclusions
In conclusion given a high SNP density, the gene frequency model can be used to map QTL with considerable accuracy even within a 1 cM region. 相似文献4.
Blangero J Goring HH Kent JW Williams JT Peterson CP Almasy L Dyer TD 《Human biology; an international record of research》2005,77(5):541-559
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis. 相似文献
5.
SUMMARY: The fundamental problem of gene selection via cDNA data is to identify which genes are differentially expressed across different kinds of tissue samples (e.g. normal and cancer). cDNA data contain large number of variables (genes) and usually the sample size is relatively small so the selection process can be unstable. Therefore, models which incorporate sparsity in terms of variables (genes) are desirable for this kind of problem. This paper proposes a two-level hierarchical Bayesian model for variable selection which assumes a prior that favors sparseness. We adopt a Markov chain Monte Carlo (MCMC) based computation technique to simulate the parameters from the posteriors. The method is applied to leukemia data from a previous study and a published dataset on breast cancer. SUPPLEMENTARY INFORMATION: http://stat.tamu.edu/people/faculty/bmallick.html. 相似文献
6.
7.
In this article, we propose a model selection method, the Bayesian composite model space approach, to map quantitative trait loci (QTL) in a half-sib population for continuous and binary traits. In our method, the identity-by-descent-based variance component model is used. To demonstrate the performance of this model, the method was applied to map QTL underlying production traits on BTA6 in a Chinese half-sib dairy cattle population. A total of four QTLs were detected, whereas only one QTL was identified using the traditional least square (LS) method. We also conducted two simulation experiments to validate the efficiency of our method. The results suggest that the proposed method based on a multiple-QTL model is efficient in mapping multiple QTL for an outbred half-sib population and is more powerful than the LS method based on a single-QTL model. 相似文献
8.
9.
CLINT D. KELLY 《Mammal Review》2005,35(2):188-198
1. Phylogenetic trees are critical in addressing evolutionary hypotheses; however, the reconstruction of a phylogeny is no easy task. This process has recently been made less arduous by using a Bayesian statistical approach. This method offers the advantage that one can determine the probability of some hypothesis (i.e. a phylogeny), conditional on the observed data (i.e. nucleotide sequences). 2. By reconstructing phylogenies using Bayes’ theorem in combination with Markov chain Monte Carlo, phylogeneticists are able to test hypotheses more quickly compared with using standard methods such as neighbour-joining, maximum likelihood or parsimony. Critics of the Bayesian approach suggest that it is not a panacea, and argue that the prior probability is too subjective and the resulting posterior probability is too liberal compared with maximum likelihood. 3. These issues are currently debated in the arena of mammalian evolution. Recently, proponents and opponents of the Bayesian approach have constructed the mammalian phylogeny using different methods under different conditions and with a variety of parameters. These analyses showed the robustness (or lack of) of the Bayesian approach. In the end, consensus suggests that Bayesian methods are robust and give essentially the same answer as maximum likelihood methods but in less time. 4. Approaches based on fossils and molecules typically agree on ordinal-level relationships among mammals but not on higher-level relationships, as Bayesian analyses recognize the African radiation, Afrotheria, and the two Laurasian radiations, Laurasiatheria and Euarchontoglires, whereas fossils did not predict Afrotheria. 相似文献
10.
Perrin BE Ralaivola L Mazurie A Bottani S Mallet J d'Alché-Buc F 《Bioinformatics (Oxford, England)》2003,19(Z2):ii138-ii148
This article deals with the identification of gene regulatory networks from experimental data using a statistical machine learning approach. A stochastic model of gene interactions capable of handling missing variables is proposed. It can be described as a dynamic Bayesian network particularly well suited to tackle the stochastic nature of gene regulation and gene expression measurement. Parameters of the model are learned through a penalized likelihood maximization implemented through an extended version of EM algorithm. Our approach is tested against experimental data relative to the S.O.S. DNA Repair network of the Escherichia coli bacterium. It appears to be able to extract the main regulations between the genes involved in this network. An added missing variable is found to model the main protein of the network. Good prediction abilities on unlearned data are observed. These first results are very promising: they show the power of the learning algorithm and the ability of the model to capture gene interactions. 相似文献
11.
Ralph Mac Nally 《Diversity & distributions》2005,11(6):499-508
Sharp ecological transitions in space (ecotones, edges, boundaries) often are where ecologically important events occur, such as elevated or reduced biodiversity or altered ecological functions (e.g. changes in productivity, pollination rates or parasitism loads, nesting success). While human observers often identify these transitions by using intuitive or gestalt assignments (e.g. the boundary between a remnant woodland patch and the surrounding farm paddock seems obvious), it is clearly desirable to make statistical assessments based on measurements. These assessments often are straightforward to make if the data are univariate, but identifying boundaries or transitions using compositional or multivariate data sets is more difficult. There is a need for an intermediate step in which pairwise similarities between points or temporal samples are computed. Here, I describe an approach that treats points along a transect as alternative hypotheses (models) about the location of the boundary. Carlin and Chib (1995) introduced a Bayesian technique for comparing non‐hierarchical models, which I adapted to compute the probabilities of each boundary location (i.e. a model) relative to the ensemble of models constituting the set of possible points of the boundary along the transect. Several artificial data sets and two field data sets (on vegetation and soils and on cave‐dwelling invertebrates and microclimates) are used to illustrate the approach. The method can be extended to cases in with several boundaries along a gradient, such as where there is an ecotone of non‐zero thickness. 相似文献
12.
Phylogeographic inference using Bayesian model comparison across a fragmented chorus frog species complex 下载免费PDF全文
Lisa N. Barrow Alyssa T. Bigelow Christopher A. Phillips Emily Moriarty Lemmon 《Molecular ecology》2015,24(18):4739-4758
Fragmented species complexes provide an interesting system for investigating biogeographic history and the present distribution of genetic variation. Recent advances in sequencing technology and statistical phylogeography enable the collection and rigorous analysis of large multilocus data sets, but designing studies that produce meaningful phylogeographic inferences remains challenging. We implemented a Bayesian model comparison approach to investigate previous biogeographic hypotheses while simultaneously inferring the presence of genetic structure in a chorus frog species complex. The Illinois chorus frog (Pseudacris illinoensis), originally described as a subspecies of the broadly distributed Strecker's chorus frog (Pseudacris streckeri), occurs in small, disjunct regions associated with scarce sand prairie habitats that have been impacted by human development. We used high‐throughput sequencing to develop and collect a multitiered genetic data set comprised of three different marker types (23 anonymous nuclear sequence loci, four mitochondrial genes and 14 microsatellite loci) designed to address questions across different evolutionary timescales. Phylogenetic analyses uncovered a deep divergence between populations in the Edwards Plateau of central Texas and all other P. streckeri/P. illinoensis populations, but suggest the disjunct distribution of P. illinoensis occurred more recently. Our best‐supported migration model is consistent with the hypothesis that central Texas represented a refugium from which populations expanded via multiple routes. This model also indicates that disjunct northern and southern regions of P. illinoensis should be considered genetically distinct management units. Our study provides an evolutionary context for future studies and conservation efforts in P. illinoensis and demonstrates the utility of model‐based approaches for phylogeographic inference. 相似文献
13.
Pinar Demetci Wei Cheng Gregory Darnell Xiang Zhou Sohini Ramachandran Lorin Crawford 《PLoS genetics》2021,17(8)
In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content. 相似文献
14.
A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution. 相似文献
15.
Bayesian inference in ecology 总被引:14,自引:1,他引:13
Aaron M. Ellison 《Ecology letters》2004,7(6):509-520
Bayesian inference is an important statistical tool that is increasingly being used by ecologists. In a Bayesian analysis, information available before a study is conducted is summarized in a quantitative model or hypothesis: the prior probability distribution. Bayes’ Theorem uses the prior probability distribution and the likelihood of the data to generate a posterior probability distribution. Posterior probability distributions are an epistemological alternative to P‐values and provide a direct measure of the degree of belief that can be placed on models, hypotheses, or parameter estimates. Moreover, Bayesian information‐theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection. These methods are demonstrated through a simple worked example. Ecologists are using Bayesian inference in studies that range from predicting single‐species population dynamics to understanding ecosystem processes. Not all ecologists, however, appreciate the philosophical underpinnings of Bayesian inference. In particular, Bayesians and frequentists differ in their definition of probability and in their treatment of model parameters as random variables or estimates of true values. These assumptions must be addressed explicitly before deciding whether or not to use Bayesian methods to analyse ecological data. 相似文献
16.
Bayesian inference of recent migration rates using multilocus genotypes 总被引:25,自引:0,他引:25
A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci. 相似文献
17.
Ferrè F Via A Ausiello G Brannetti B Zanzoni A Helmer-Citterich M 《Comparative and Functional Genomics》2003,4(4):416-419
Relatively few protein structures are known, compared to the enormous amount of sequence data produced in the sequencing of different genomes, and relatively few protein complexes are deposited in the PDB with respect to the great amount of interaction data coming from high-throughput experiments (two-hybrid or affinity purification of protein complexes and mass spectrometry). Nevertheless, we can rely on computational techniques for the extraction of high-quality and information-rich data from the known structures and for their spreading in the protein sequence space. We describe here the ongoing research projects in our group: we analyse the protein complexes stored in the PDB and, for each complex involving one domain belonging to a family of interaction domains for which some interaction data are available, we can calculate its probability of interaction with any protein sequence. We analyse the structures of proteins encoding a function specified in a PROSITE pattern, which exhibits relatively low selectivity and specificity, and build extended patterns. To this aim, we consider residues that are well-conserved in the structure, even if their conservation cannot easily be recognized in the sequence alignment of the proteins holding the function. We also analyse protein surface regions and, through the annotation of the solvent-exposed residues, we annotate protein surface patches via a structural comparison performed with stringent parameters and independently of the residue order in the sequence. Local surface comparison may also help in identifying new sequence patterns, which could not be highlighted with other sequence-based methods. 相似文献
18.
Karla Moreno‐Torres Barbara Wolfe William Saville Rebecca Garabed 《Ecology and evolution》2016,6(7):2216-2225
Prevalence of disease in wildlife populations, which is necessary for developing disease models and conducting epidemiologic analyses, is often understudied. Laboratory tests used to screen for diseases in wildlife populations often are validated only for domestic animals. Consequently, the use of these tests for wildlife populations may lead to inaccurate estimates of disease prevalence. We demonstrate the use of Bayesian latent class analysis (LCA) in determining the specificity and sensitivity of a competitive enzyme‐linked immunosorbent assay (cELISA; VMRD®, Inc.) serologic test used to identify exposure to Neospora caninum (hereafter N. caninum) in three wildlife populations in southeastern Ohio, USA. True prevalence of N. caninum exposure in these populations was estimated to range from 0.1% to 3.1% in American bison (Bison bison), 51.0% to 53.8% in Père David's deer (Elaphurus davidianus), and 40.0% to 45.9% in white‐tailed deer (Odocoileus virginianus). The accuracy of the cELISA in American bison and Père David's deer was estimated to be close to the 96% sensitivity and 99% specificity reported by the manufacturer. Sensitivity in white‐tailed deer, however, ranged from 78.9% to 99.9%. Apparent prevalence of N. caninum from the test results is not equal to the true prevalence in white‐tailed deer and Père David's deer populations. Even when these species inhabit the same community, the true prevalence in the two deer populations differed from the true prevalence in the American bison population. Variances in prevalence for some species suggest differences in the epidemiology of N. caninum for these colocated populations. Bayesian LCA methods could be used as in this example to overcome some of the constraints on validating tests in wildlife species. The ability to accurately evaluate disease status and prevalence in a population improves our understanding of the epidemiology of multihost pathogen systems at the community level. 相似文献
19.
In the study of immune responses to infectious pathogens, the minimum protective antibody concentration (MPAC) is a quantity of great interest. We use case-control data to estimate the posterior distribution of the conditional risk of disease given a lower bound on antibody concentration in an at-risk subject. The concentration bound beyond which there is high credibility that infection risk is zero or nearly so is a candidate for the MPAC. A very simple Gibbs sampling procedure that permits inference on the risk of disease given antibody level is presented. In problems involving small numbers of patients, the procedure is shown to have favorable accuracy and robustness to choice/misspecification of priors. Frequentist evaluation indicates good coverage probabilities of credibility intervals for antibody-dependent risk, and rules for estimation of the MPAC are illustrated with epidemiological data. 相似文献
20.