共查询到20条相似文献,搜索用时 0 毫秒
1.
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of “soft shoulders” underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans. 相似文献
2.
3.
4.
Elizabeth Redman Fiona Whitelaw Andrew Tait Charlotte Burgess Yvonne Bartley Philip John Skuce Frank Jackson John Stuart Gilleard 《PLoS neglected tropical diseases》2015,9(2)
Anthelmintic resistance is a major problem for the control of parasitic nematodes of livestock and of growing concern for human parasite control. However, there is little understanding of how resistance arises and spreads or of the “genetic signature” of selection for this group of important pathogens. We have investigated these questions in the system for which anthelmintic resistance is most advanced; benzimidazole resistance in the sheep parasites Haemonchus contortus and Teladorsagia circumcincta. Population genetic analysis with neutral microsatellite markers reveals that T. circumcincta has higher genetic diversity but lower genetic differentiation between farms than H. contortus in the UK. We propose that this is due to epidemiological differences between the two parasites resulting in greater seasonal bottlenecking of H. contortus. There is a remarkably high level of resistance haplotype diversity in both parasites compared with drug resistance studies in other eukaryotic systems. Our analysis suggests a minimum of four independent origins of resistance mutations on just seven farms for H. contortus, and even more for T. circumincta. Both hard and soft selective sweeps have occurred with striking differences between individual farms. The sweeps are generally softer for T. circumcincta than H. contortus, consistent with its higher level of genetic diversity and consequent greater availability of new mutations. We propose a model in which multiple independent resistance mutations recurrently arise and spread by migration to explain the widespread occurrence of resistance in these parasites. Finally, in spite of the complex haplotypic diversity, we show that selection can be detected at the target locus using simple measures of genetic diversity and departures from neutrality. This work has important implications for the application of genome-wide approaches to identify new anthelmintic resistance loci and the likelihood of anthelmintic resistance emerging as selection pressure is increased in human soil-transmitted nematodes by community wide treatment programs. 相似文献
5.
6.
Gad Abraham Jason A. Tye-Din Oneil G. Bhalala Adam Kowalczyk Justin Zobel Michael Inouye 《PLoS genetics》2014,10(2)
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS''s utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases. 相似文献
7.
Expression quantitative trait loci (eQTL) mapping is a widely used technique to uncover regulatory relationships between genes. A range of methodologies have been developed to map links between expression traits and genotypes. The DREAM (Dialogue on Reverse Engineering Assessments and Methods) initiative is a community project to objectively assess the relative performance of different computational approaches for solving specific systems biology problems. The goal of one of the DREAM5 challenges was to reverse-engineer genetic interaction networks from synthetic genetic variation and gene expression data, which simulates the problem of eQTL mapping. In this framework, we proposed an approach whose originality resides in the use of a combination of existing machine learning algorithms (committee). Although it was not the best performer, this method was by far the most precise on average. After the competition, we continued in this direction by evaluating other committees using the DREAM5 data and developed a method that relies on Random Forests and LASSO. It achieved a much higher average precision than the DREAM best performer at the cost of slightly lower average sensitivity. 相似文献
8.
Under the network environment, the trading volume and asset price of a financial commodity or instrument are affected by various complicated factors. Machine learning and sentiment analysis provide powerful tools to collect a great deal of data from the website and retrieve useful information for effectively forecasting financial risk of associated companies. This article studies trading volume and asset price risk when sentimental financial information data are available using both sentiment analysis and popular machine learning approaches: artificial neural network (ANN) and support vector machine (SVM). Nonlinear GARCH-based mining models are developed by integrating GARCH (generalized autoregressive conditional heteroskedasticity) theory and ANN and SVM. Empirical studies in the U.S. stock market show that the proposed approach achieves favorable forecast performances. GARCH-based SVM outperforms GARCH-based ANN for volatility forecast, whereas GARCH-based ANN achieves a better forecast result for the volatility trend. Results also indicate a strong correlation between information sentiment and both trading volume and asset price volatility. 相似文献
9.
Paul Fergus Pauline Cheung Abir Hussain Dhiya Al-Jumeily Chelsea Dobbins Shamaila Iram 《PloS one》2013,8(10)
There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier. 相似文献
10.
Helitrons, eukaryotic transposable elements (TEs) transposed by rolling-circle mechanism, have been found in various species with highly variable copy numbers and sometimes with a large portion of their genomes. The impact of helitrons sequences in the genome is to frequently capture host genes during their transposition. Since their discovery, 18 years ago, by computational analysis of whole genome sequences of Arabidopsis thaliana plant and Caenorhabditis elegans (C. elegans) nematode, the identification and classification of these mobile genetic elements remain a challenge due to the fact that the wide majority of their families are non-autonomous. In C. elegans genome, DNA helitrons sequences possess great variability in terms of length that varies between 11 and 8965 base pairs (bps) from one sequence to another. In this work, we develop a new method to predict helitrons DNA-sequences, which is particularly based on Frequency Chaos Game Representation (FCGR) DNA-images. Thus, we introduce an automatic system in order to classify helitrons families in C. elegans genome, based on a combination between machine learning approaches and features extracted from DNA-sequences. Consequently, the new set of helitrons features (the FCGR images and K-mers) are extracted from DNA sequences. These helitrons features consist of the frequency apparition number of K nucleotides pairs (Tandem Repeat) in the DNA sequences. Indeed, three different classifiers are used for the classification of all existing helitrons families. The results have shown potential global score equal to 72.7% due to FCGR images which constitute helitrons features and the pre-trained neural network as a classifier. The two other classifiers demonstrate that their efficiency reaches 68.7% for Support Vector Machine (SVM) and 91.45% for Random Forest (RF) algorithms using the K-mers features corresponding to the genomic sequences. 相似文献
11.
A mathematical procedure is proposed for the analysis of multivariate data recorded during spectroscopically monitored melting experiments of biomolecules such as nucleic acids and proteins. The method is based on hard/soft hybrid modeling in which one part of the observed variance is explained in terms of a physicochemical model (hard modeling), whereas the other part of the observed variance is explained in terms of soft modeling. The physicochemical model is applied to all of the components related to the unfolding of the biomolecules studied and provides thermodynamic values associated with the unfolding process such as the change in enthalpy, entropy, and melting temperature. The soft modeling term explains the contribution of artifacts not related to the unfolding process such as baseline drifts and nonlinearities. Here the method is applied to the analysis of simulated and experimental data corresponding to the unfolding equilibria of intramolecular structures such as i-motif and G-quadruplex. Overall, the method provides better results than the commonly used univariate approach and also better results than pure hard modeling. 相似文献
12.
It has increasingly been recognized that adapting populations of microbes contain not one, but many lineages continually arising and competing at once. This process, termed “clonal interference,” alters the rate and dynamics of adaptation and biases winning mutations toward those with the largest selective effect. Here we uncovered a dramatic example of clonal interference between multiple similar mutations occurring at the same locus within replicate populations of Methylobacterium extorquens AM1. Because these mutational events involved the transposition of an insertion sequence into a narrow window of a single gene, they were both readily detectable at low frequencies and could be distinguished due to differences in insertion sites. This allowed us to detect up to 17 beneficial alleles of this type coexisting in a single population. Despite conferring a large selective benefit, the majority of these alleles rose and then fell in frequency due to other lineages emerging that were more fit. By comparing allele-frequency dynamics to the trajectories of fitness gains by these populations, we estimated the fitness values of the genotypes that contained these mutations. Collectively across all populations, these alleles arose upon backgrounds with a wide range of fitness values. Within any single population, however, multiple alleles tended to rise and fall synchronously during a single wave of multiple genotypes with nearly identical fitness values. These results suggest that alleles of large benefit arose repeatedly in failed “soft sweeps” during narrow windows of adaptation due to the combined effects of epistasis and clonal interference. 相似文献
13.
14.
15.
Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach 总被引:1,自引:0,他引:1
下载免费PDF全文

Shivakumar Keerthikumar Sahely Bhadra Kumaran Kandasamy Rajesh Raju Y.L. Ramachandra Chiranjib Bhattacharyya Kohsuke Imai Osamu Ohara Sujatha Mohan Akhilesh Pandey 《DNA research》2009,16(6):345-351
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes. 相似文献
16.
《Bioscience, biotechnology, and biochemistry》2013,77(10):2739-2749
The identification of specific interactions between small molecules and human proteins of interest is a fundamental step in chemical biology and drug development. Here we describe an efficient method to obtain novel binding ligands of human proteins by a chemical array approach. Our method includes large-scale ligand screening with two libraries, proteins and chemicals, the use of cell lysates that express proteins of interest fused with red fluorescent protein, and high-throughput screening by merged display analysis, which removes false positive signals from array experiments. Using our systematic platform, we detected novel inhibitors of carbonic anhydrase II. It is suggested that our systematic platform is a rapid and robust approach to screen novel ligands for human proteins of interest. 相似文献
17.
Trace element content in hair is affected by the age of the donor. Hair samples of subjects from four counties in China where
people are known to have long lifespan (“longevity counties”) were collected and the trace element content determined. Samples
were subdivided into three age groups based on the age of the donors from whom these were taken: children (0–15 years); elderly
(80–99 years); and centenarians (≥100 years). We compared the trace element content in hair of different age groups of subjects.
Support vector machine classification results showed that a non-linear polynomial kernel function could be used to classify
the three age groups of people. Age did not have a significant effect on the content of Ca and Cd in human hair. The content
of Li, Mg, Mn, Zn, Cr, Cu, and Ni in human hair changed significantly with age. The magnitude of the age effect on trace element
content in hair was in the order Cu > Zn > Ni > Mg > Mn > Cr > Li. Cu content in hair decreased significantly with increasing
age. The hair of centenarians had higher levels of Li and Mn, and lower levels of Cr, Cu, and Ni comparing with that of the
children and elderly subjects. This could be a beneficial factor of their long lifespan. 相似文献
18.
M. K. Rausch G. E. Karniadakis J. D. Humphrey 《Biomechanics and modeling in mechanobiology》2017,16(1):249-261
Biological soft tissues experience damage and failure as a result of injury, disease, or simply age; examples include torn ligaments and arterial dissections. Given the complexity of tissue geometry and material behavior, computational models are often essential for studying both damage and failure. Yet, because of the need to account for discontinuous phenomena such as crazing, tearing, and rupturing, continuum methods are limited. Therefore, we model soft tissue damage and failure using a particle/continuum approach. Specifically, we combine continuum damage theory with Smoothed Particle Hydrodynamics (SPH). Because SPH is a meshless particle method, and particle connectivity is determined solely through a neighbor list, discontinuities can be readily modeled by modifying this list. We show, for the first time, that an anisotropic hyperelastic constitutive model commonly employed for modeling soft tissue can be conveniently implemented within a SPH framework and that SPH results show excellent agreement with analytical solutions for uniaxial and biaxial extension as well as finite element solutions for clamped uniaxial extension in 2D and 3D. We further develop a simple algorithm that automatically detects damaged particles and disconnects the spatial domain along rupture lines in 2D and rupture surfaces in 3D. We demonstrate the utility of this approach by simulating damage and failure under clamped uniaxial extension and in a peeling experiment of virtual soft tissue samples. In conclusion, SPH in combination with continuum damage theory may provide an accurate and efficient framework for modeling damage and failure in soft tissues. 相似文献
19.
Elizabeth M. Sweeney Joshua T. Vogelstein Jennifer L. Cuzzocreo Peter A. Calabresi Daniel S. Reich Ciprian M. Crainiceanu Russell T. Shinohara 《PloS one》2014,9(4)
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. 相似文献
20.
N. Lance Hepler Konrad Scheffler Steven Weaver Ben Murrell Douglas D. Richman Dennis R. Burton Pascal Poignard Davey M. Smith Sergei L. Kosakovsky Pond 《PLoS computational biology》2014,10(9)
Since its identification in 1983, HIV-1 has been the focus of a research effort unprecedented in scope and difficulty, whose ultimate goals — a cure and a vaccine – remain elusive. One of the fundamental challenges in accomplishing these goals is the tremendous genetic variability of the virus, with some genes differing at as many as 40% of nucleotide positions among circulating strains. Because of this, the genetic bases of many viral phenotypes, most notably the susceptibility to neutralization by a particular antibody, are difficult to identify computationally. Drawing upon open-source general-purpose machine learning algorithms and libraries, we have developed a software package IDEPI (IDentify EPItopes) for learning genotype-to-phenotype predictive models from sequences with known phenotypes. IDEPI can apply learned models to classify sequences of unknown phenotypes, and also identify specific sequence features which contribute to a particular phenotype. We demonstrate that IDEPI achieves performance similar to or better than that of previously published approaches on four well-studied problems: finding the epitopes of broadly neutralizing antibodies (bNab), determining coreceptor tropism of the virus, identifying compartment-specific genetic signatures of the virus, and deducing drug-resistance associated mutations. The cross-platform Python source code (released under the GPL 3.0 license), documentation, issue tracking, and a pre-configured virtual machine for IDEPI can be found at https://github.com/veg/idepi.
This is a PLOS Computational Biology Software Article相似文献