首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bayesian LASSO for quantitative trait loci mapping   总被引:7,自引:1,他引:6       下载免费PDF全文
Yi N  Xu S 《Genetics》2008,179(2):1045-1055
The mapping of quantitative trait loci (QTL) is to identify molecular markers or genomic loci that influence the variation of complex traits. The problem is complicated by the facts that QTL data usually contain a large number of markers across the entire genome and most of them have little or no effect on the phenotype. In this article, we propose several Bayesian hierarchical models for mapping multiple QTL that simultaneously fit and estimate all possible genetic effects associated with all markers. The proposed models use prior distributions for the genetic effects that are scale mixtures of normal distributions with mean zero and variances distributed to give each effect a high probability of being near zero. We consider two types of priors for the variances, exponential and scaled inverse-chi(2) distributions, which result in a Bayesian version of the popular least absolute shrinkage and selection operator (LASSO) model and the well-known Student's t model, respectively. Unlike most applications where fixed values are preset for hyperparameters in the priors, we treat all hyperparameters as unknowns and estimate them along with other parameters. Markov chain Monte Carlo (MCMC) algorithms are developed to simulate the parameters from the posteriors. The methods are illustrated using well-known barley data.  相似文献   

2.
The molecular clock, i.e., constancy of the rate of evolution over time, is commonly assumed in estimating divergence dates. However, this assumption is often violated and has drastic effects on date estimation. Recently, a number of attempts have been made to relax the clock assumption. One approach is to use maximum likelihood, which assigns rates to branches and allows the estimation of both rates and times. An alternative is the Bayes approach, which models the change of the rate over time. A number of models of rate change have been proposed. We have extended and evaluated models of rate evolution, i.e., the lognormal and its recent variant, along with the gamma, the exponential, and the Ornstein-Uhlenbeck processes. These models were first applied to a small hominoid data set, where an empirical Bayes approach was used to estimate the hyperparameters that measure the amount of rate variation. Estimation of divergence times was sensitive to these hyperparameters, especially when the assumed model is close to the clock assumption. The rate and date estimates varied little from model to model, although the posterior Bayes factor indicated the Ornstein-Uhlenbeck process outperformed the other models. To demonstrate the importance of allowing for rate change across lineages, this general approach was used to analyze a larger data set consisting of the 18S ribosomal RNA gene of 39 metazoan species. We obtained date estimates consistent with paleontological records, the deepest split within the group being about 560 million years ago. Estimates of the rates were in accordance with the Cambrian explosion hypothesis and suggested some more recent lineage-specific bursts of evolution.  相似文献   

3.
Yi Jia  Jean-Luc Jannink 《Genetics》2012,192(4):1513-1522
Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesCπ) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesCπ to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored.  相似文献   

4.
This article is an attempt to survey the vast literature on flexibility in manufacturing that has accumulated over the last 10 to 20 years. The survey begins with a brief review of the classical literature on flexibility in economics and organization theory, which provides a background for manufacturing flexibility. Several kinds of flexibilities in manufacturing are then defined carefully along with their purposes, the means to obtain them, and some suggested measurements and valuations. Then we examine the interrelationships among the several flexibilities. Various empirical studies and analytical/optimization models dealing with these flexibilities are reported and discussed. The article concludes with suggestions for some possible future research directions.  相似文献   

5.
In this paper, I am dealing with some epistemological aspects of what Christopher Langton (1989) and some other scientists have been calling recently «artificial life», whose history, in fact, is far older. I want to take a view on the origin, further developments and latest issues of these models, and try to point out the major philosophical and epistemological problems arising with them.  相似文献   

6.
The behavior of guaiacol resembles that of certain protoplasmic surfaces to such an extent that it can be advantageously used in models designed to imitate certain aspects of protoplasmic behavior. In these models the electrical potentials appear to consist of diffusion potentials and this may be true of certain living cells. In dealing with models we determine ionic mobilities and use these to predict potentials. In studying living cells we measure potentials and from these calculate ionic mobilities. The question arises, how far is this method justified. To test this we have treated guaiacol like a living cell, measuring potentials and from these estimating ionic mobilities. The results Justify the use of this method. This is of interest because the method is most useful in studying protoplasmic activity. In its extended form it enables us to follow changes in mobilities and in partition coefficients due to applied reagents and to metabolism.  相似文献   

7.
Optical Mapping is an emerging technology for constructing ordered restriction maps of DNA molecules. The underlying computational problems for this technology have been studied and several models have been proposed in recent literature. Most of these propose combinatorial models; some of them also present statistical approaches. However, it is not a priori clear as to how these models relate to one another and to the underlying problem. We present a uniform framework for the restriction map problems where each of these various models is a specific instance of the basic framework. We achieve this by identifying two "signature" functions f() and g() that characterize the models. We identify the constraints these two functions must satisfy, thus opening up the possibility of exploring other plausible models. We show that for all of the combinatorial models proposed in literature, the signature functions are semi-algebraic. We also analyze a proposed statistical method in this framework and show that the signature functions are transcendental for this model. We also believe that this framework would provide useful guidelines for dealing with other inferencing problems arising in practice. Finally, we indicate the open problems by including a survey of the best known results for these problems.  相似文献   

8.

Background  

Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically.  相似文献   

9.
An overview of available literature on the use of protective facemasks by children for protection from respiratory infectious agents reveals relatively few articles dealing specifically with the topic, despite their use during recent outbreaks (eg, severe acute respiratory syndrome, pandemic influenza). Little is known about the physiological and psychological burdens imposed by these devices and a child's ability to correctly use and tolerate them. This article focuses on the myriad issues associated with protective facemask use by children in the hope of educating public health personnel, healthcare professionals, and families on their limitations and associated risks, and in the hope of fostering much-needed research.  相似文献   

10.
Many biological quantities cannot be measured directly but rather need to be estimated from models. Estimates from models are statistical objects with variance and, when derived simultaneously, covariance. It is well known that their variance–covariance (VC) matrix must be considered in subsequent analyses. Although it is always preferable to carry out the proposed analyses on the raw data themselves, a two‐step approach cannot always be avoided. This situation arises when the parameters of a multinomial must be regressed against a covariate. The Delta method is an appropriate and frequently recommended way of deriving variance approximations of transformed and correlated variables. Implementing the Delta method is not trivial, and there is a lack of a detailed information on the procedure in the literature for complex situations such as those involved in constraining the parameters of a multinomial distribution. This paper proposes a how‐to guide for calculating the correct VC matrices of dependant estimates involved in multinomial distributions and how to use them for testing the effects of covariates in post hoc analyses when the integration of these analyses directly into a model is not possible. For illustrative purpose, we focus on variables calculated in capture–recapture models, but the same procedure can be applied to all analyses dealing with correlated estimates with multinomial distribution and their variances and covariances.  相似文献   

11.
Deep Learning models are preferred for complex image analysis-based solutions to application-oriented problems. However, the architecture of such models largely influences the results which includes several hyperparameters that need to be tuned. This study aims at developing an optimized 1D-CNN model for medicinal Psyllium Husk crop mapping using open source temporal optical Sentinel-2A/2B satellite data. In this study, a sequential 1D-CNN model architecture was developed by optimizing hyperparameters which includes convolution layers, number of neurons, activation function, and batch size. Psyllium Husk crop fields were mapped in the Jalore district of Rajasthan using Sentinel 2A/ 2B (10 m) optical data. For spectral dimensionality reduction of the data, Modified Soil Adjusted Vegetation Index (MSAVI2) was used to maintain the data dimensionality since temporal data was utilized. The dataset was subsequently refined to include the target crop's specific phenological stages that distinguish it from other closely resembling species. The information corresponding to these specific crop stages was fed to the 1D-CNN model to carry out the classification. A range of training sample sizes were explored to determine the optimal number of training data points. As the output from the model, fractional images are obtained consisting of values proportional to the probability of a pixel lying in the target class. Accuracy assessment was carried out using fuzzy error matrix (FERM) by generating fractional output images from temporal optical PlanetScope data (3m) which was used as a reference. The best overall accuracy among the test cases came out to be 89.85% using conventional MSAVI2 with 1000 training samples.  相似文献   

12.
A number of scholars have recently defended the claim that there is a close connection between the evolutionary biological notion of fitness and the economic notion of utility: both are said to refer to an organism’s success in dealing with its environment, and both are said to play the same theoretical roles in their respective sciences. However, an analysis of two seemingly disparate but in fact structurally related phenomena—‘niche construction’ (the case where organisms change their environment to make it fit to their needs) and ‘adaptive preferences’ (the case where agents change their wants to make them fit to what the world has given them)—shows that one needs to be very careful about the postulation of this sort of fitness–utility connection. Specifically, I here use the analysis of these two phenomena to establish when connecting fitness and utility is and is not possible.  相似文献   

13.
More than 350 inherited diseases have been reported in dogs and at least 50% of them have human counterparts. To remove the diseases from dog breeds and to identify canine models for human diseases, it is necessary to find the mutations underlying them. To this end, two methods have been used: the functional candidate gene approach and linkage analysis. Here we present an evaluation of these in canine retinal diseases, which have been the subject of a large number of molecular genetic studies, and we show the contrasting outcomes of these approaches when dealing with genetically heterogeneous diseases. The candidate gene approach has led to 377 published results with 23 genes. Most of the results (66.6%) excluded the presence of a mutation in a gene or its coding region, while only 3.4% of the results identified the mutation causing the disease. On the other hand, five linkage analysis studies have been done on retinal diseases, resulting in three identified mutations and two mapped disease loci. Mapping studies have relied on dog research colonies. If this favorable application of linkage analysis can be extended to dogs in the pet population, success in identifying canine mutations could increase, with advantages to veterinary and human medicine.  相似文献   

14.
Observational cohort studies of individuals with chronic disease provide information on rates of disease progression, the effect of fixed and time-varying risk factors, and the extent of heterogeneity in the course of disease. Analysis of this information is often facilitated by the use of multistate models with intensity functions governing transition between disease states. We discuss modeling and analysis issues for such models when individuals are observed intermittently. Frameworks for dealing with heterogeneity and measurement error are discussed including random effect models, finite mixture models, and hidden Markov models. Cohorts are often defined by convenience and ways of addressing outcome-dependent sampling or observation of individuals are also discussed. Data on progression of joint damage in psoriatic arthritis and retinopathy in diabetes are analysed to illustrate these issues and related methodology.  相似文献   

15.

Background

The reliability of whole-genome prediction models (WGP) based on using high-density single nucleotide polymorphism (SNP) panels critically depends on proper specification of key hyperparameters. A currently popular WGP model labeled BayesB specifies a hyperparameter π, that is `loosely used to describe the proportion of SNPs that are in linkage disequilibrium (LD) with causal variants. The remaining markers are specified to be random draws from a Student t distribution with key hyperparameters being degrees of freedom v and scale s2.

Methods

We consider three alternative Markov chain Monte Carlo (MCMC) approaches based on the use of Metropolis-Hastings (MH) to estimate these key hyperparameters. The first approach, termed DFMH, is based on a previously published strategy for which s2 is drawn by a Gibbs step and v is drawn by a MH step. The second strategy, termed UNIMH, substitutes MH for Gibbs when drawing s2 and further collapses or marginalizes the full conditional density of v. The third strategy, termed BIVMH, is based on jointly drawing the two hyperparameters in a bivariate MH step. We also tested the effect of misspecification of s2 for its effect on accuracy of genomic estimated breeding values (GEBV), yet allowing for inference on the other hyperparameters.

Results

The UNIMH and BIVMH strategies had significantly greater (P < 0.05) computational efficiencies for estimating v and s2 than DFMH in BayesA (π = 1) and BayesB implementations. We drew similar conclusions based on an analysis of the public domain heterogeneous stock mice data. We also determined significant drops (P < 0.01) in accuracies of GEBV under BayesA by overspecifying s2, whereas BayesB was more robust to such misspecifications. However, understating s2 was compensated by counterbalancing inferences on v in BayesA and BayesB, and on π in BayesB.

Conclusions

Sampling strategies based solely on MH updates of v and s2, and collapsed representations of full conditional densities can improve the computational efficiency of MCMC relative to the use of Gibbs updates. We believe that proper inferences on s2, v and π are vital to ensure that the accuracy of GEBV is maximized when using parametric WGP models.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0092-x) contains supplementary material, which is available to authorized users.  相似文献   

16.
The main reason for agricultural productivity decline is farmers' failure to choose the appropriate crop for their soil. It is important for farmers to understand which crops are suitable for different soil types based on their characteristics. Due to the vast variety of soil types worldwide, farmers often struggle to choose the most profitable crop for their land. To improve crop yields, a crop selection system has been developed using GBRT-based deep learning surrogate models. Gradient Boosted Regression Tree (GBRT) has been combined with a Bayesian optimization (BO) algorithm to determine the most optimal hyperparameters for the deep neural network. The optimized hyperparameters are then applied during the testing phase. Further, the impact of each input parameter on the individual outputs is evaluated using explainable artificial intelligence (XAI). The crop recommendation system comprises data preparation, classification, and performance evaluation modules. A classification method based on confusion matrices and performance matrices, as well as feature analysis using density plots and correlation plots, follows. The crop selection system categorizes the experimental dataset into 12 classes, with three for each of the four crops. The dataset includes soil-specific physical and chemical features such as sand, silt, clay, pH, electric conductivity (EC), soil organic carbon (SOC), nitrogen (N), phosphorus (P), and potassium (K). The developed surrogate model is highly accurate, precise, and reliable, with an F1-Score of 1.0 for all classes in the dataset, indicating exact accuracy and recall. The DNN-based classification model achieves an average classification accuracy of 1.00.  相似文献   

17.
We examine the degree to which fitting simple dynamic models to time series of population counts can predict extinction probabilities. This is both an active branch of ecological theory and an important practical topic for resource managers. We introduce an approach that is complementary to recently developed techniques for estimating extinction risks (e.g., diffusion approximations) and, like them, requires only count data rather than the detailed ecological information available for traditional population viability analyses. Assuming process error, we use four different models of population growth to generate snapshots of population dynamics via time series of the lengths commonly available to ecologists. We then ask to what extent we can identify which of several broad classes of population dynamics is evident in the time series snapshot. Along the way, we introduce the idea of "variation thresholds," which are the maximum amount of process error that a population may withstand and still have a specified probability of surviving for a given length of time. We then show how these thresholds may be useful to both ecologists and resource managers, particularly when dealing with large numbers of poorly understood species, a common problem faced by those designing biodiversity reserves.  相似文献   

18.
Although it has been known for nearly a century that strains of Trypanosoma cruzi, the etiological agent for Chagas'' disease, are enzootic in the southern U.S., much remains unknown about the dynamics of its transmission in the sylvatic cycles that maintain it, including the relative importance of different transmission routes. Mathematical models can fill in gaps where field and lab data are difficult to collect, but they need as inputs the values of certain key demographic and epidemiological quantities which parametrize the models. In particular, they determine whether saturation occurs in the contact processes that communicate the infection between the two populations. Concentrating on raccoons, opossums, and woodrats as hosts in Texas and the southeastern U.S., and the vectors Triatoma sanguisuga and Triatoma gerstaeckeri, we use an exhaustive literature review to derive estimates for fundamental parameters, and use simple mathematical models to illustrate a method for estimating infection rates indirectly based on prevalence data. Results are used to draw conclusions about saturation and which population density drives each of the two contact-based infection processes (stercorarian/bloodborne and oral). Analysis suggests that the vector feeding process associated with stercorarian transmission to hosts and bloodborne transmission to vectors is limited by the population density of vectors when dealing with woodrats, but by that of hosts when dealing with raccoons and opossums, while the predation of hosts on vectors which drives oral transmission to hosts is limited by the population density of hosts. Confidence in these conclusions is limited by a severe paucity of data underlying associated parameter estimates, but the approaches developed here can also be applied to the study of other vector-borne infections.  相似文献   

19.
Abstract

The growth of population and income throughout the world has increased the demand for ocean‐based resources. At the same time the rapidly evolving technology of ocean resource exploitation has permitted ever greater use of these resources. As the rate of use of these resources has grown, so has the desirability of better management of them. Today national income accounts provide the basic economic data required to implement policies designed to stabilize and assist in planning for economic growth. It is possible to reconstruct these accounts and create an ocean sector account from them. This process will provide a set of measurements that will permit evaluation of the trade‐offs between alternative uses and the construction of economic models that will link the ocean sector to national economies.  相似文献   

20.
The use of mathematical models to study cardiac electrophysiology has a long history, and numerous cellular scale models are now available, covering a range of species and cell types. Their use to study emergent properties in tissue is also widespread, typically using the monodomain or bidomain equations coupled to one or more cell models. Despite the relative maturity of this field, little has been written looking in detail at the interface between the cellular and tissue-level models. Mathematically this is relatively straightforward and well-defined. There are however many details and potential inconsistencies that need to be addressed, in order to ensure correct operation of a cellular model within a tissue simulation. This paper will describe these issues and how to address them.Simply having models available in a common format such as CellML is still of limited utility, with significant manual effort being required to integrate these models within a tissue simulation. We will thus also discuss the facilities available for automating this in a consistent fashion within Chaste, our robust and high-performance cardiac electrophysiology simulator.It will be seen that a common theme arising is the need to go beyond a representation of the model mathematics in a standard language, to include additional semantic information required in determining the model’s interface, and hence to enhance interoperability. Such information can be added as metadata, but agreement is needed on the terms to use, including development of appropriate ontologies, if reliable automated use of CellML models is to become common.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号