首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data. RESULTS: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data. AVAILABILITY: miniTUBA is available at http://www.minituba.org.  相似文献   

2.
In genome-wide association studies (GWAS) it is now common to search for, and find, multiple causal variants located in close proximity. It has also become standard to ask whether different traits share the same causal variants, but one of the popular methods to answer this question, coloc, makes the simplifying assumption that only a single causal variant exists for any given trait in any genomic region. Here, we examine the potential of the recently proposed Sum of Single Effects (SuSiE) regression framework, which can be used for fine-mapping genetic signals, for use with coloc. SuSiE is a novel approach that allows evidence for association at multiple causal variants to be evaluated simultaneously, whilst separating the statistical support for each variant conditional on the causal signal being considered. We show this results in more accurate coloc inference than other proposals to adapt coloc for multiple causal variants based on conditioning. We therefore recommend that coloc be used in combination with SuSiE to optimise accuracy of colocalisation analyses when multiple causal variants exist.  相似文献   

3.
Understanding the evolutionary history of species is at the core of molecular evolution and is done using several inference methods. The critical issue is to quantify the uncertainty of the inference. The posterior probabilities in Bayesian phylogenetic inference and the bootstrap values in frequentist approaches measure the variability of the estimates due to the sampling of sites from genes and the sampling of genes from genomes. However, they do not measure the uncertainty due to taxon sampling. Taxa that experienced molecular homoplasy, recent selection, a spur of evolution, and so forth may disrupt the inference and cause incongruences in the estimated phylogeny. We define a taxon influence index to assess the influence of each taxon on the phylogeny. We found that although most taxa have a weak influence on the phylogeny, a small fraction of influential taxa strongly alter it even in clades only loosely related to them. We conclude that highly influential taxa should be given special attention and sampling them more thoroughly can lead to more dependable phylogenies.  相似文献   

4.
In most circumstances, probability sampling is the only way to ensure unbiased inference about population quantities where a complete census is not possible. As we enter the era of ‘big data’, however, nonprobability samples, whose sampling mechanisms are unknown, are undergoing a renaissance. We explain why the use of nonprobability samples can lead to spurious conclusions, and why seemingly large nonprobability samples can be (effectively) very small. We also review some recent controversies surrounding the use of nonprobability samples in biodiversity monitoring. These points notwithstanding, we argue that nonprobability samples can be useful, provided that their limitations are assessed, mitigated where possible and clearly communicated. Ecologists can learn much from other disciplines on each of these fronts.  相似文献   

5.
This paper describes the spatial and temporal distribution of Anopheles gambiae s.l. Giles in two Tanzanian villages based on data collected from a five-month intensive mosquito sampling programme and analysed using Taylor's power law. The degree of spatial aggregation of female A. gambiae in each village was similar to its corresponding temporal aggregation, indicating that in designing sampling routines for estimating the abundance of mosquitoes, sampling effort should be allocated equally to houses (spatial) and nights (temporal). The analysis also showed that for a given amount of sampling effort, estimates of village-level mosquito abundance are more precise when sampling is carried out in randomly selected houses, than when the same houses are used on each sampling occasion. Also, the precision of estimating parous rates does not depend on whether mosquito sampling is carried out in the same or a random selection of houses. The implications of these findings for designing sampling routines for entomological evaluation of vector control trials are discussed.  相似文献   

6.
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability.  相似文献   

7.
MOTIVATION: Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS: We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.  相似文献   

8.
In the last years, biostatistical research has begun to apply linear models and design theory to develop efficient experimental designs and analysis tools for gene expression microarray data. With two-colour microarrays, direct comparisons of RNA-targets are possible and lead to incomplete block designs. In this setting, efficient designs for simple and factorial microarray experiments have mainly been proposed for technical replicates. But for biological replicates, which are crucial to obtain inference that can be generalised to a biological population, this question has only been discussed recently and is not fully solved yet. In this paper, we propose efficient designs for independent two-sample experiments using two-colour microarrays enabling biologists to measure their biological random samples in an efficient manner to draw generalisable conclusions. We give advice for experimental situations with differing group sizes and show the impact of different designs on the variance and degrees of freedom of the test statistics. The designs proposed in this paper can be evaluated using SAS PROC MIXED or S+/R lme.  相似文献   

9.
Summary This article develops semiparametric approaches for estimation of propensity scores and causal survival functions from prevalent survival data. The analytical problem arises when the prevalent sampling is adopted for collecting failure times and, as a result, the covariates are incompletely observed due to their association with failure time. The proposed procedure for estimating propensity scores shares interesting features similar to the likelihood formulation in case‐control study, but in our case it requires additional consideration in the intercept term. The result shows that the corrected propensity scores in logistic regression setting can be obtained through standard estimation procedure with specific adjustments on the intercept term. For causal estimation, two different types of missing sources are encountered in our model: one can be explained by potential outcome framework; the other is caused by the prevalent sampling scheme. Statistical analysis without adjusting bias from both sources of missingness will lead to biased results in causal inference. The proposed methods were partly motivated by and applied to the Surveillance, Epidemiology, and End Results (SEER)‐Medicare linked data for women diagnosed with breast cancer.  相似文献   

10.
Leeyoung Park  Ju H. Kim 《Genetics》2015,199(4):1007-1016
Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases. Familial aggregation and segregation analyses based on polygenic threshold models have been the primary approach to fitting genetic models to the family data of complex diseases. In the current study, an advanced approach to obtaining appropriate causal models for complex diseases based on the sufficient component cause (SCC) model involving combinations of traditional genetics principles was proposed. The probabilities for the entire population, i.e., normal–normal, normal–disease, and disease–disease, were considered for each model for the appropriate handling of common complex diseases. The causal model in the current study included the genetic effects from single genes involving epistasis, complementary gene interactions, gene–environment interactions, and environmental effects. Bayesian inference using a Markov chain Monte Carlo algorithm (MCMC) was used to assess of the proportions of each component for a given population lifetime incidence. This approach is flexible, allowing both common and rare variants within a gene and across multiple genes. An application to schizophrenia data confirmed the complexity of the causal factors. An analysis of diabetes data demonstrated that environmental factors and gene–environment interactions are the main causal factors for type II diabetes. The proposed method is effective and useful for identifying causal models, which can accelerate the development of efficient strategies for identifying causal factors of complex diseases.  相似文献   

11.
Knowledge of temporal change in ecological condition is important for the understanding and management of ecosystems. However, analyses of trends in biological condition have been rare, as there are usually too few data points at any single site to use many trend analysis techniques. We used a Bayesian hierarchical model to analyse temporal trends in stream ecological condition (as measured by the invertebrate-based index SIGNAL) across Melbourne, Australia. The Bayesian hierarchical approach assumes dependency amongst the sampling sites. Results for each site "borrow strength" from the other data because model parameter values are assumed to be drawn from a larger common distribution. This leads to robust inference despite the few data that exist at each site. Utilising the flexibility of the Bayesian approach, we also modelled change over time as a function of catchment urbanisation, allowed for potential temporal and spatial autocorrelation of the data and trend estimates, and used prior information to improve the estimate of data uncertainty. We found strong evidence of a widespread decline in SIGNAL scores for edge habitats (areas of little or no flow). The rate of decline was positively associated with catchment urbanisation. There was no evidence of such declines for riffle habitats (areas with rapid and turbulent flow). Melbourne has experienced a decline in rainfall, indicative of either drought and/or longer-term climate change. The results are consistent with the expected coupled effects of these rainfall changes and increasing urbanisation, but more research is needed to isolate a causal mechanism. More immediately, however, the Bayesian hierarchical approach has allowed us to identify a pattern in a biological monitoring data set that might otherwise have gone un-noticed, and to demonstrate a large-scale temporal decline in biological condition.  相似文献   

12.
When correlation implies causation in multisensory integration   总被引:1,自引:0,他引:1  
Inferring which signals have a common underlying cause, and hence should be integrated, represents a primary challenge for a perceptual system dealing with multiple sensory inputs [1-3]. This challenge is often referred to as the correspondence problem or causal inference. Previous research has demonstrated that spatiotemporal cues, along with prior knowledge, are exploited by the human brain to solve this problem [4-9]. Here we explore the role of correlation between the fine temporal structure of auditory and visual signals in causal inference. Specifically, we investigated whether correlated signals are inferred to originate from the same distal event and hence are integrated optimally [10]. In a localization task with visual, auditory, and combined audiovisual targets, the improvement in precision for combined relative to unimodal targets was statistically optimal only when audiovisual signals were correlated. This result demonstrates that humans use the similarity in the temporal structure of multisensory signals to solve the correspondence problem, hence inferring causation from correlation.  相似文献   

13.
Performing causal inference in observational studies requires we assume confounding variables are correctly adjusted for. In settings with few discrete-valued confounders, standard models can be employed. However, as the number of confounders increases these models become less feasible as there are fewer observations available for each unique combination of confounding variables. In this paper, we propose a new model for estimating treatment effects in observational studies that incorporates both parametric and nonparametric outcome models. By conceptually splitting the data, we can combine these models while maintaining a conjugate framework, allowing us to avoid the use of Markov chain Monte Carlo (MCMC) methods. Approximations using the central limit theorem and random sampling allow our method to be scaled to high-dimensional confounders. Through simulation studies we show our method can be competitive with benchmark models while maintaining efficient computation, and illustrate the method on a large epidemiological health survey.  相似文献   

14.
Abstract Patch or island area is one of the most frequently used variables for inference in conservation biology and biogeography, and is often used in ecological applications. Given that all of these disciplines deal with large spatial scales, exhaustive censusing is not often possible, especially when there are large numbers of patches (e.g. for replication and control purposes). Therefore, data for patches or islands are usually collected by sampling. We argue that if area is to be used as an inferential factor, then the objects under study (i.e. the patches) must be characterized on an areal basis. This necessarily means that fixed‐area sampling is inadequate (e.g. a single standard quadrat or transect set within patches irrespective of the patch area) and that some form of area‐proportionate sampling is needed (e.g. a fixed areal proportion of each patch is surveyed by random allocation of standard quadrats across each patch). However, use of area‐proportionate sampling is not usually dissociated from the increased temporal intensity of sampling that arises from using this approach. The dilemma we see is deciding how much of the area‐specificity of variables such as species richness, rare‐species indices or probabilities of occurrence of individual species is related to the area‐proportionate survey protocol and how much is due to the temporal intensity of surveys. We undertook a study in which we balanced temporal and spatial effects by increasing the time spent surveying smaller patches of vegetation to account for the area‐ratio difference. The estimated species richness of birds of the box–ironbark system of central Victoria, Australia, was found to depend strongly upon area when area‐proportionate sampling alone was performed. When time‐balancing was imposed upon area‐proportionate sampling, the differences between smaller (10‐ha) and larger (40‐ha) areas were much reduced or effectively disappeared. We show that species found in the additional surveys used to conduct the time‐balancing were significantly less abundant than species recorded in area‐proportionate sampling. This effect is probably most severe for mobile animals, but may emerge in other forms of sampling.  相似文献   

15.
Within‐site variability in species detectability is a problem common to many biodiversity assessments and can strongly bias the results. Such variability can be caused by many factors, including simple counting inaccuracies, which can be solved by increasing sample size, or by temporal changes in species behavior, meaning that the way the temporal sampling protocol is designed is also very important. Here we use the example of mist‐netted tropical birds to determine how design decisions in the temporal sampling protocol can alter the data collected and how these changes might affect the detection of ecological patterns, such as the species‐area relationship (SAR). Using data from almost 3400 birds captured from 21,000 net‐hours at 31 sites in the Brazilian Atlantic Forest, we found that the magnitude of ecological trends remained fairly stable, but the probability of detecting statistically significant ecological patterns varied depending on sampling effort, time of day and season in which sampling was conducted. For example, more species were detected in the wet season, but the SAR was strongest in the dry season. We found that the temporal distribution of sampling effort was more important than its total amount, discovering that similar ecological results could have been obtained with one‐third of the total effort, as long as each site had been equally sampled over 2 yr. We discuss that projects with the same sampling effort and spatial design, but with different temporal sampling protocol are likely to report different ecological patterns, which may ultimately lead to inappropriate conservation strategies.  相似文献   

16.
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors.  相似文献   

17.
The 24 extant crocodylian species are the remnants of a once much more diverse and widespread clade. Crocodylomorpha has an approximately 230 million year evolutionary history, punctuated by a series of radiations and extinctions. However, the group's fossil record is biased. Previous studies have reconstructed temporal patterns in subsampled crocodylomorph palaeobiodiversity, but have not explicitly examined variation in spatial sampling, nor the quality of this record. We compiled a dataset of all taxonomically diagnosable non‐marine crocodylomorph species (393). Based on the number of phylogenetic characters that can be scored for all published fossils of each species, we calculated a completeness value for each taxon. Mean average species completeness (56%) is largely consistent within subgroups and for different body size classes, suggesting no significant biases across the crocodylomorph tree. In general, average completeness values are highest in the Mesozoic, with an overall trend of decreasing completeness through time. Many extant taxa are identified in the fossil record from very incomplete remains, but this might be because their provenance closely matches the species’ present‐day distribution, rather than through autapomorphies. Our understanding of nearly all crocodylomorph macroevolutionary ‘events’ is essentially driven by regional patterns, with no global sampling signal. Palaeotropical sampling is especially poor for most of the group's history. Spatiotemporal sampling bias impedes our understanding of several Mesozoic radiations, whereas molecular divergence times for Crocodylia are generally in close agreement with the fossil record. However, the latter might merely be fortuitous, i.e. divergences happened to occur during our ephemeral spatiotemporal sampling windows.  相似文献   

18.
In Dictyostelium, development begins with the aggregation of free living amoebae, which soon become organized into a relatively simple organism with a few different cell types. Coordinated cell type differentiation and morphogenesis lead to a final fruiting body that allows the dispersal of spores. The study of these processes is having increasing impact on our understanding of general developmental mechanisms. The availability of biochemical and molecular genetics techniques has allowed the discovery of complex signaling networks which are essential for Dictyostelium development and are also conserved in other organisms. The levels of cAMP (both intracellular and extracellular) play essential roles in every stage of Dictyostelium development, regulating many different signal transduction pathways. Two-component systems, involving histidine kinases and response regulators, have been found to regulate intracellular cAMP levels and PKA during terminal differentiation. The sequence of the Dictyostelium genome is expected to be completed in less than two years. Nevertheless, the available sequences that are already being released, together with the results of expressed sequence tags (ESTs), are providing invaluable tools to identify new and interesting genes for further functional analysis. Global expression studies, using DNA microarrays in synchronous development to study temporal changes in gene expression, are presently being developed. In the near future, the application of this type of technology to the complete set of Dictyostelium genes (approximately 10,000) will facilitate the discovery of the effects of mutation of components of the signaling networks that regulate Dictyostelium development on changes in gene expression.  相似文献   

19.
Compressive sensing microarrays (CSMs) are DNA-based sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM, each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from CS theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Lab experiments suggest that in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80% with all targets to be detected. Furthermore, out-of-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications in which only short hybridization times are allowed.  相似文献   

20.
In the decade since their invention, spotted microarrays have been undergoing technical advances that have increased the utility, scope and precision of their ability to measure gene expression. At the same time, more researchers are taking advantage of the fundamentally quantitative nature of these tools with refined experimental designs and sophisticated statistical analyses. These new approaches utilise the power of microarrays to estimate differences in gene expression levels, rather than just categorising genes as up- or down-regulated, and allow the comparison of expression data across multiple samples. In this review, some of the technical aspects of spotted microarrays that can affect statistical inference are highlighted, and a discussion is provided of how several methods for estimating gene expression level across multiple samples deal with these challenges. The focus is on a Bayesian analysis method, BAGEL, which is easy to implement and produces easily interpreted results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号