期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Multi-Scale Distribution Model for Non-Equilibrium Populations Suggests Resource Limitation in an Endangered Rodent

William T. Bean Robert Stafford H. Scott Butterfield Justin S. Brashares 《PloS one》2014,9(9)

Species distributions are known to be limited by biotic and abiotic factors at multiple temporal and spatial scales. Species distribution models, however, frequently assume a population at equilibrium in both time and space. Studies of habitat selection have repeatedly shown the difficulty of estimating resource selection if the scale or extent of analysis is incorrect. Here, we present a multi-step approach to estimate the realized and potential distribution of the endangered giant kangaroo rat. First, we estimate the potential distribution by modeling suitability at a range-wide scale using static bioclimatic variables. We then examine annual changes in extent at a population-level. We define “available” habitat based on the total suitable potential distribution at the range-wide scale. Then, within the available habitat, model changes in population extent driven by multiple measures of resource availability. By modeling distributions for a population with robust estimates of population extent through time, and ecologically relevant predictor variables, we improved the predictive ability of SDMs, as well as revealed an unanticipated relationship between population extent and precipitation at multiple scales. At a range-wide scale, the best model indicated the giant kangaroo rat was limited to areas that received little to no precipitation in the summer months. In contrast, the best model for shorter time scales showed a positive relation with resource abundance, driven by precipitation, in the current and previous year. These results suggest that the distribution of the giant kangaroo rat was limited to the wettest parts of the drier areas within the study region. This multi-step approach reinforces the differing relationship species may have with environmental variables at different scales, provides a novel method for defining “available” habitat in habitat selection studies, and suggests a way to create distribution models at spatial and temporal scales relevant to theoretical and applied ecologists. 相似文献

2.

The Causal Meaning of Genomic Predictors and How It Affects Construction and Comparison of Genome-Enabled Selection Models

Bruno D. Valente Gota Morota Francisco Pe?agaricano Daniel Gianola Kent Weigel Guilherme J. M. Rosa 《Genetics》2015,200(2):483-494

The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. 相似文献

3.

Right-hand-side updating for fast computing of genomic breeding values

Mario PL Calus 《遗传、选种与进化》2014,46(1):24

Background

Since both the number of SNPs (single nucleotide polymorphisms) used in genomic prediction and the number of individuals used in training datasets are rapidly increasing, there is an increasing need to improve the efficiency of genomic prediction models in terms of computing time and memory (RAM) required.

Methods

In this paper, two alternative algorithms for genomic prediction are presented that replace the originally suggested residual updating algorithm, without affecting the estimates. The first alternative algorithm continues to use residual updating, but takes advantage of the characteristic that the predictor variables in the model (i.e. the SNP genotypes) take only three different values, and is therefore termed “improved residual updating”. The second alternative algorithm, here termed “right-hand-side updating” (RHS-updating), extends the idea of improved residual updating across multiple SNPs. The alternative algorithms can be implemented for a range of different genomic predictions models, including random regression BLUP (best linear unbiased prediction) and most Bayesian genomic prediction models. To test the required computing time and RAM, both alternative algorithms were implemented in a Bayesian stochastic search variable selection model.

Results

Compared to the original algorithm, the improved residual updating algorithm reduced CPU time by 35.3 to 43.3%, without changing memory requirements. The RHS-updating algorithm reduced CPU time by 74.5 to 93.0% and memory requirements by 13.1 to 66.4% compared to the original algorithm.

Conclusions

The presented RHS-updating algorithm provides an interesting alternative to reduce both computing time and memory requirements for a range of genomic prediction models. 相似文献

4.

Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy

Wei Liu Wen Zhu Bo Liao Xiangtao Chen 《PloS one》2016,11(11)

Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. 相似文献

5.

On the Additive and Dominant Variance and Covariance of Individuals Within the Genomic Selection Scope

Zulma G. Vitezica Luis Varona Andres Legarra 《Genetics》2013,195(4):1223-1230

Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or “breeding” values of individuals are generated by substitution effects, which involve both “biological” additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the “genotypic” value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts. 相似文献

6.

Robust Demographic Inference from Genomic and SNP Data

Laurent Excoffier Isabelle Dupanloup Emilia Huerta-Sánchez Vitor C. Sousa Matthieu Foll 《PLoS genetics》2013,9(10)

We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets. 相似文献

7.

Parameter Estimation Methods for Chaotic Intercellular Networks

Inés P. Mari?o Ekkehard Ullner Alexey Zaikin 《PloS one》2013,8(11)

We have investigated simulation-based techniques for parameter estimation in chaotic intercellular networks. The proposed methodology combines a synchronization–based framework for parameter estimation in coupled chaotic systems with some state–of–the–art computational inference methods borrowed from the field of computational statistics. The first method is a stochastic optimization algorithm, known as accelerated random search method, and the other two techniques are based on approximate Bayesian computation. The latter is a general methodology for non–parametric inference that can be applied to practically any system of interest. The first method based on approximate Bayesian computation is a Markov Chain Monte Carlo scheme that generates a series of random parameter realizations for which a low synchronization error is guaranteed. We show that accurate parameter estimates can be obtained by averaging over these realizations. The second ABC–based technique is a Sequential Monte Carlo scheme. The algorithm generates a sequence of “populations”, i.e., sets of randomly generated parameter values, where the members of a certain population attain a synchronization error that is lesser than the error attained by members of the previous population. Again, we show that accurate estimates can be obtained by averaging over the parameter values in the last population of the sequence. We have analysed how effective these methods are from a computational perspective. For the numerical simulations we have considered a network that consists of two modified repressilators with identical parameters, coupled by the fast diffusion of the autoinducer across the cell membranes. 相似文献

8.

Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

Yohei Murakami 《PloS one》2014,9(8)

Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. 相似文献

9.

Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates

Patrick A. Reeves Christopher M. Richards 《PloS one》2009,4(1)

Background

Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data.

Methodology/Principal Findings

PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population F_st values as low as 0.03 (G''_st>0.2), whereas the limit of resolution of the Bayesian approach was F_st = 0.05 (G''_st>0.35).

Conclusions/Significance

We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies. 相似文献

10.

Genomic sequencing reveals historical,demographic and selective factors associated with the diversification of the fire‐associated fungus Neurospora discreta

下载免费PDF全文

Pierre Gladieux Benjamin A. Wilson Fanny Perraudeau Liliam A. Montoya David Kowbel Christopher Hann‐Soden Monika Fischer Iman Sylvain David J. Jacobson John W. Taylor 《Molecular ecology》2015,24(22):5657-5675

Delineating microbial populations, discovering ecologically relevant phenotypes and identifying migrants, hybrids or admixed individuals have long proved notoriously difficult, thereby limiting our understanding of the evolutionary forces at play during the diversification of microbial species. However, recent advances in sequencing and computational methods have enabled an unbiased approach whereby incipient species and the genetic correlates of speciation can be identified by examining patterns of genomic variation within and between lineages. We present here a population genomic study of a phylogenetic species in the Neurospora discreta species complex, based on the resequencing of full genomes (~37 Mb) for 52 fungal isolates from nine sites in three continents. Population structure analyses revealed two distinct lineages in South–East Asia, and three lineages in North America/Europe with a broad longitudinal and latitudinal range and limited admixture between lineages. Genome scans for selective sweeps and comparisons of the genomic landscapes of diversity and recombination provided no support for a role of selection at linked sites on genomic heterogeneity in levels of divergence between lineages. However, demographic inference indicated that the observed genomic heterogeneity in divergence was generated by varying rates of gene flow between lineages following a period of isolation. Many putative cases of exchange of genetic material between phylogenetically divergent fungal lineages have been discovered, and our work highlights the quantitative importance of genetic exchanges between more closely related taxa to the evolution of fungal genomes. Our study also supports the role of allopatric isolation as a driver of diversification in saprobic microbes. 相似文献

11.

How to Infer Relative Fitness from a Sample of Genomic Sequences

Adel Dayarian Boris I. Shraiman 《Genetics》2014,197(3):913-923

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman’s coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright–Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1–0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. 相似文献

12.

Amplification Biases and Consistent Recovery of Loci in a Double-Digest RAD-seq Protocol

Jeffrey M. DaCosta Michael D. Sorenson 《PloS one》2014,9(9)

A growing variety of “genotype-by-sequencing” (GBS) methods use restriction enzymes and high throughput DNA sequencing to generate data for a subset of genomic loci, allowing the simultaneous discovery and genotyping of thousands of polymorphisms in a set of multiplexed samples. We evaluated a “double-digest” restriction-site associated DNA sequencing (ddRAD-seq) protocol by 1) comparing results for a zebra finch (Taeniopygia guttata) sample with in silico predictions from the zebra finch reference genome; 2) assessing data quality for a population sample of indigobirds (Vidua spp.); and 3) testing for consistent recovery of loci across multiple samples and sequencing runs. Comparison with in silico predictions revealed that 1) over 90% of predicted, single-copy loci in our targeted size range (178–328 bp) were recovered; 2) short restriction fragments (38–178 bp) were carried through the size selection step and sequenced at appreciable depth, generating unexpected but nonetheless useful data; 3) amplification bias favored shorter, GC-rich fragments, contributing to among locus variation in sequencing depth that was strongly correlated across samples; 4) our use of restriction enzymes with a GC-rich recognition sequence resulted in an up to four-fold overrepresentation of GC-rich portions of the genome; and 5) star activity (i.e., non-specific cutting) resulted in thousands of “extra” loci sequenced at low depth. Results for three species of indigobirds show that a common set of thousands of loci can be consistently recovered across both individual samples and sequencing runs. In a run with 46 samples, we genotyped 5,996 loci in all individuals and 9,833 loci in 42 or more individuals, resulting in <1% missing data for the larger data set. We compare our approach to similar methods and discuss the range of factors (fragment library preparation, natural genetic variation, bioinformatics) influencing the recovery of a consistent set of loci among samples. 相似文献

13.

"Reverse ecology" and the power of population genomics

Li YF Costello JC Holloway AK Hahn MW 《Evolution; international journal of organic evolution》2008,62(12):2984-2994

Rapid and inexpensive sequencing technologies are making it possible to collect whole genome sequence data on multiple individuals from a population. This type of data can be used to quickly identify genes that control important ecological and evolutionary phenotypes by finding the targets of adaptive natural selection, and we therefore refer to such approaches as "reverse ecology." To quantify the power gained in detecting positive selection using population genomic data, we compare three statistical methods for identifying targets of selection: the McDonald-Kreitman test, the mkprf method, and a likelihood implementation for detecting d(N)/d(S) > 1. Because the first two methods use polymorphism data we expect them to have more power to detect selection. However, when applied to population genomic datasets from human, fly, and yeast, the tests using polymorphism data were actually weaker in two of the three datasets. We explore reasons why the simpler comparative method has identified more genes under selection, and suggest that the different methods may really be detecting different signals from the same sequence data. Finally, we find several statistical anomalies associated with the mkprf method, including an almost linear dependence between the number of positively selected genes identified and the prior distributions used. We conclude that interpreting the results produced by this method should be done with some caution. 相似文献

14.

Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding

Chen Cao Jingni He Lauren Mak Deshan Perera Devin Kwok Jia Wang Minghao Li Tobias Mourier Stefan Gavriliuc Matthew Greenberg A Sorana Morrissy Laura K Sycuro Guang Yang Daniel C Jeffares Quan Long 《Molecular biology and evolution》2021,38(6):2660

DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or “haplotypes.” However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis. 相似文献

15.

Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals

Robert Brown Bogdan Pasaniuc 《PLoS computational biology》2014,10(4)

Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs. 相似文献

16.

Emergence of spatially structured populations by area‐concentrated search

Thotsapol Chaianunporn Thomas Hovestadt 《Ecology and evolution》2022,12(12)

The idea that populations are spatially structured has become a very powerful concept in ecology, raising interest in many research areas. However, despite dispersal being a core component of the concept, it typically does not consider the movement behavior underlying any dispersal. Using individual‐based simulations in continuous space, we explored the emergence of a spatially structured population in landscapes with spatially heterogeneous resource distribution and with organisms following simple area‐concentrated search (ACS); individuals do not, however, perceive or respond to any habitat attributes per se but only to their foraging success. We investigated the effects of different resource clustering pattern in landscapes (single large cluster vs. many small clusters) and different resource density on the spatial structure of populations and movement between resource clusters of individuals. As results, we found that foraging success increased with increasing resource density and decreasing number of resource clusters. In a wide parameter space, the system exhibited attributes of a spatially structured populations with individuals concentrated in areas of high resource density, searching within areas of resources, and “dispersing” in straight line between resource patches. “Emigration” was more likely from patches that were small or of low quality (low resource density), but we observed an interaction effect between these two parameters. With the ACS implemented, individuals tended to move deeper into a resource cluster in scenarios with moderate resource density than in scenarios with high resource density. “Looping” from patches was more likely if patches were large and of high quality. Our simulations demonstrate that spatial structure in populations may emerge if critical resources are heterogeneously distributed and if individuals follow simple movement rules (such as ACS). Neither the perception of habitat nor an explicit decision to emigrate from a patch on the side of acting individuals is necessary for the emergence of such spatial structure. 相似文献

17.

Selective integration of multiple biological data for supervised network inference

Kato T Tsuda K Asai K 《Bioinformatics (Oxford, England)》2005,21(10):2488-2495

MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request. 相似文献

18.

Deep Learning for Population Genetic Inference

Sara Sheehan Yun S. Song 《PLoS computational biology》2016,12(3)

Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. 相似文献

19.

The Relationship between Sexual Selection and Sexual Conflict

Hanna Kokko Michael D. Jennions 《Cold Spring Harbor perspectives in biology》2014,6(9)

Evolutionary conflicts of interest arise whenever genetically different individuals interact and their routes to fitness maximization differ. Sexual selection favors traits that increase an individual’s competitiveness to acquire mates and fertilizations. Sexual conflict occurs if an individual of sex A’s relative fitness would increase if it had a “tool” that could alter what an individual of sex B does (including the parental genes transferred), at a cost to B’s fitness. This definition clarifies several issues: Conflict is very common and, although it extends outside traits under sexual selection, sexual selection is a ready source of sexual conflict. Sexual conflict and sexual selection should not be presented as alternative explanations for trait evolution. Conflict is closely linked to the concept of a lag load, which is context-dependent and sex-specific. This makes it possible to ask if one sex can “win.” We expect higher population fitness if females win.Many published studies ask if sexual selection or sexual conflict drives the evolution of key reproductive traits (e.g., mate choice). Here we argue that this is an inappropriate question. By analogy, G. Evelyn Hutchinson (1965) coined the phrase “the ecological theatre and the evolutionary play” to capture how factors that influence the birth, death, and reproduction of individuals (studied by ecologists) determine which individuals reproduce, and “sets the stage” for the selective forces that drive evolutionary trajectories (studied by evolutionary biologists). The more modern concept of “eco-evolutionary feedback” () emphasizes that selection changes the character of the actors over time, altering their ecological interactions. No one would sensibly ask whether one or the other shapes the natural world, when obviously both interact to determine the outcome.So why have sexual conflict and sexual selection sometimes been elevated to alternate explanations? This approach is often associated with an assumption that sexual conflict affects traits under direct selection, favoring traits that alter the likelihood of a potential mate agreeing or refusing to mate because it affects the bearer’s immediate reproductive output, whereas “traditional” sexual selection is assumed to favor traits that are under indirect selection because they increase offspring fitness. These “traditional” models are sometimes described as “mutualistic” (e.g., ; ), although this term appears to be used only when contrasting them with sexual conflict models. The investigators of the original models never describe them as “mutualistic,” which is hardly surprising given that some males are rejected by females.In this review, we first define sexual conflict and sexual selection. We then describe how the notion of a “lag load” can reveal which sex currently has greater “power” in a sexual conflict over a specific resource. Next, we discuss why sexual conflict and sexual selection are sometimes implicitly (or explicitly) presented as alternative explanations for sexual traits (usually female mate choice/resistance). To illustrate the problems with the assumptions made to take this stance, we present a “toy model” of snake mating behavior based on a study by . We show that empirical predictions about the mating behavior that will be observed if females seek to minimize direct cost of mating or to obtain indirect genetic benefits were overly simplistic. This allows us to make the wider point that whom a female is willing to mate with and how often she mates are often related questions. Finally, we discuss the effect of sexual conflict on population fitness. 相似文献

20.

Chapter 2: Data-Driven View of Disease Biology

Casey S. Greene Olga G. Troyanskaya 《PLoS computational biology》2012,8(12)

Modern experimental strategies often generate genome-scale measurements of human tissues or cell lines in various physiological states. Investigators often use these datasets individually to help elucidate molecular mechanisms of human diseases. Here we discuss approaches that effectively weight and integrate hundreds of heterogeneous datasets to gene-gene networks that focus on a specific process or disease. Diverse and systematic genome-scale measurements provide such approaches both a great deal of power and a number of challenges. We discuss some such challenges as well as methods to address them. We also raise important considerations for the assessment and evaluation of such approaches. When carefully applied, these integrative data-driven methods can make novel high-quality predictions that can transform our understanding of the molecular-basis of human disease.

What to Learn in This Chapter

What a functional relationship network represents.
The fundamentals of Bayesian inference for genomic data integration.
How to build a network of functional relationships between genes using examples of functionally related genes and diverse experimental data.
How computational scientists study disease using data driven approaches, such as integrated networks of protein-protein functional relationships.
Strategies to assess predictions from a functional relationship network

This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.

相似文献