首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
While hundreds of loci have been identified as reflecting strong-positive selection in human populations, connections between candidate loci and specific selective pressures often remain obscure. This study investigates broader patterns of selection in African populations, which are underrepresented despite their potential to offer key insights into human adaptation. We scan for hard selective sweeps using several haplotype and allele-frequency statistics with a data set of nearly 500,000 genome-wide single-nucleotide polymorphisms in 12 highly diverged African populations that span a range of environments and subsistence strategies. We find that positive selection does not appear to be a strong determinant of allele-frequency differentiation among these African populations. Haplotype statistics do identify putatively selected regions that are shared across African populations. However, as assessed by extensive simulations, patterns of haplotype sharing between African populations follow neutral expectations and suggest that tails of the empirical distributions contain false-positive signals. After highlighting several genomic regions where positive selection can be inferred with higher confidence, we use a novel method to identify biological functions enriched among populations’ empirical tail genomic windows, such as immune response in agricultural groups. In general, however, it seems that current methods for selection scans are poorly suited to populations that, like the African populations in this study, are affected by ascertainment bias and have low levels of linkage disequilibrium, possibly old selective sweeps, and potentially reduced phasing accuracy. Additionally, population history can confound the interpretation of selection statistics, suggesting that greater care is needed in attributing broad genetic patterns to human adaptation.  相似文献   

2.
Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such “hardening” of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.  相似文献   

3.
Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.  相似文献   

4.
Detecting Selective Sweeps in Naturally Occurring Escherichia Coli   总被引:7,自引:2,他引:5       下载免费PDF全文
The nucleotide sequences of the gapA and pabB genes (separated by approximately 32.5 kb) were determined in 12 natural isolates of Escherichia coli. Three analyses were performed on the data. First, the levels of polymorphism at the loci were compared within and between E. coli and Salmonella strains relative to their degrees of constraint. Second, the gapA and pabB loci were analyzed by the Hudson-Kreitman-Aguade (HKA) test for selective neutrality. Four additional dispersed genes (crr, putP, trp and gnd) were added to the analysis to provide the necessary frame of reference. Finally, the gene genealogies of gapA and pabB were examined for topological consistency within and between the loci. These lines of evidence indicate that some evolutionary event has recently purged the variability in the region surrounding the gapA and pabB loci in E. coli. This can best be explained by the spread of a selected allele through the global E. coli population by directional selection and the resulting loss in variability in the surrounding regions due to genetic hitchhiking.  相似文献   

5.
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of “soft shoulders” underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans.  相似文献   

6.
High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus) samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production.  相似文献   

7.
Bordetella pertussis is the causative agent of pertussis, a highly contagious disease of the human respiratory tract. Despite high vaccination coverage, pertussis has resurged and has become one of the most prevalent vaccine-preventable diseases in developed countries. We have proposed that both waning immunity and pathogen adaptation have contributed to the persistence and resurgence of pertussis. Allelic variation has been found in virulence-associated genes coding for the pertussis toxin A subunit (ptxA), pertactin (prn), serotype 2 fimbriae (fim2), serotype 3 fimbriae (fim3) and the promoter for pertussis toxin (ptxP). In this study, we investigated how more than 60 years of vaccination has affected the Dutch B. pertussis population by combining data from phylogeny, genomics and temporal trends in strain frequencies. Our main focus was on the ptxA, prn, fim3 and ptxP genes. However, we also compared the genomes of 11 Dutch strains belonging to successful lineages. Our results showed that, between 1949 and 2010, the Dutch B. pertussis population has undergone as least four selective sweeps that were associated with small mutations in ptxA, prn, fim3 and ptxP. Phylogenetic analysis revealed a stepwise adaptation in which mutations accumulated clonally. Genomic analysis revealed a number of additional mutations which may have a contributed to the selective sweeps. Five large deletions were identified which were fixed in the pathogen population. However, only one was linked to a selective sweep. No evidence was found for a role of gene acquisition in pathogen adaptation. Our results suggest that the B. pertussis gene repertoire is already well adapted to its current niche and required only fine tuning to persist in the face of vaccination. Further, this work shows that small mutations, even single SNPs, can drive large changes in the populations of bacterial pathogens within a time span of six to 19 years.  相似文献   

8.
Dermatofibrosarcoma protuberans (DFSP) is a very rare soft tissue sarcoma. DFSP often reveals a specific chromosome translocation, t(17;22)(q22;q13), which results in the fusion of collagen 1 alpha 1 (COL1A1) gene and platelet-derived growth factor-B (PDGFB) gene. The COL1A1-PDGFB fusion protein activates the PDGFB receptor and resultant constitutive activation of PDGFR receptor is essential in the pathogenesis of DFSP. Thus, blocking PDGFR receptor activation with imatinib has shown promising activity in the treatment of advanced and metastatic DFSP. Despite the success with targeted agents in cancers, acquired drug resistance eventually occurs. Here, we tried to identify potential drug resistance mechanisms against imatinib in a 46-year old female with DFSP who initially responded well to imatinib but suffered rapid disease progression. We performed whole-genome sequencing of both pre-treatment and post-treatment tumor tissue to identify the mutational events associated with imatinib resistance. No significant copy number alterations, insertion, and deletions were identified during imatinib treatment. Of note, we identified newly emerged 8 non-synonymous somatic mutations of the genes (ACAP2, CARD10, KIAA0556, PAAQR7, PPP1R39, SAFB2, STARD9, and ZFYVE9) in the imatinib-resistant tumor tissue. This study revealed diverse possible candidate mechanisms by which imatinib resistance to PDGFRB inhibition may arise in DFSP, and highlights the usefulness of whole-genome sequencing in identifying drug resistance mechanisms and in pursuing genome-directed, personalized anti-cancer therapy.  相似文献   

9.
The allelic variants of immunity genes in historical breeds likely reflect local infection pressure and therefore represent a reservoir for breeding. Screening to determine the diversity of the Toll-like receptor gene TLR4 was conducted in two conserved cattle breeds: Czech Red and Czech Red Pied. High-throughput sequencing of pooled PCR amplicons using the PacBio platform revealed polymorphisms, which were subsequently confirmed via genotyping techniques. Eight SNPs found in coding and adjacent regions were grouped into 18 haplotypes, representing a significant portion of the known diversity in the global breed panel and presumably exceeding diversity in production populations. Notably, the ancient Czech Red breed appeared to possess greater haplotype diversity than the Czech Red Pied breed, a Simmental variant, although the haplotype frequencies might have been distorted by significant crossbreeding and bottlenecks in the history of Czech Red cattle. The differences in haplotype frequencies validated the phenotypic distinctness of the local breeds. Due to the availability of Czech Red Pied production herds, the effect of intensive breeding on TLR diversity can be evaluated in this model. The advantages of the Pacific Biosciences technology for the resequencing of long PCR fragments with subsequent direct phasing were independently validated.  相似文献   

10.
11.
To understand better how selection processes balance the benefits of Ig repertoire diversity with the risks of autoreactivity and nonfunctionality of highly variable IgH CDR3s, we collected millions of rearranged germline IgH CDR3 sequences by deep sequencing of DNA from mature human naive B cells purified from four individuals and analyzed the data with computational methods. Long HCDR3 regions, often components of HIV-neutralizing Abs, appear to derive not only from incorporation of long D genes and insertion of large N regions but also by usage of multiple D gene segments in tandem. However, comparison of productive and out-of-frame IgH rearrangements revealed a selection bias against long HCDR3 loops, suggesting these may be disproportionately either poorly functional or autoreactive. Our data suggest that developmental selection removes HCDR3 loops containing patches of hydrophobicity, which are commonly found in some auto-antibodies, and at least 69% of the initial productive IgH rearrangements are removed from the repertoire during B cell development. Additionally, we have demonstrated the potential utility of this new technology for vaccine development with the identification in all four individuals of related candidate germline IgH precursors of the HIV-neutralizing Ab 4E10.  相似文献   

12.
The tendency for chlorinated aliphatics and aromatic hydrocarbons to accumulate in environments such as groundwater and sediments poses a serious environmental threat. In this study, the metabolic capacity of hydrocarbon (aromatics and chlorinated aliphatics)-contaminated groundwater in the KwaZulu-Natal province of South Africa has been elucidated for the first time by analysis of pyrosequencing data. The taxonomic data revealed that the metagenomes were dominated by the phylum Proteobacteria (mainly Betaproteobacteria). In addition, Flavobacteriales, Sphingobacteria, Burkholderiales, and Rhodocyclales were the predominant orders present in the individual metagenomes. These orders included microorganisms (Flavobacteria, Dechloromonas aromatica RCB, and Azoarcus) involved in the degradation of aromatic compounds and various other hydrocarbons that were present in the groundwater. Although the metabolic reconstruction of the metagenome represented composite cell networks, the information obtained was sufficient to address questions regarding the metabolic potential of the microbial communities and to correlate the data to the contamination profile of the groundwater. Genes involved in the degradation of benzene and benzoate, heavy metal-resistance mechanisms appeared to provide a survival strategy used by the microbial communities. Analysis of the pyrosequencing-derived data revealed that the metagenomes represent complex microbial communities that have adapted to the geochemical conditions of the groundwater as evidenced by the presence of key enzymes/genes conferring resistance to specific contaminants. Thus, pyrosequencing analysis of the metagenomes provided insights into the microbial activities in hydrocarbon-contaminated habitats.  相似文献   

13.
Scrub typhus (‘Tsutsugamushi’ disease in Japanese) is a mite-borne infectious disease. The causative agent is Orientia tsutsugamushi, an obligate intracellular bacterium belonging to the family Rickettsiaceae of the subdivision alpha-Proteobacteria. In this study, we determined the complete genome sequence of O. tsutsugamushi strain Ikeda, which comprises a single chromosome of 2 008 987 bp and contains 1967 protein coding sequences (CDSs). The chromosome is much larger than those of other members of Rickettsiaceae, and 46.7% of the sequence was occupied by repetitive sequences derived from an integrative and conjugative element, 10 types of transposable elements, and seven types of short repeats of unknown origins. The massive amplification and degradation of these elements have generated a huge number of repeated genes (1196 CDSs, categorized into 85 families), many of which are pseudogenes (766 CDSs), and also induced intensive genome shuffling. By comparing the gene content with those of other family members of Rickettsiacea, we identified the core gene set of the family Rickettsiaceae and found that, while much more extensive gene loss has taken place among the housekeeping genes of Orientia than those of Rickettsia, O. tsutsugamushi has acquired a large number of foreign genes. The O. tsutsugamushi genome sequence is thus a prominent example of the high plasticity of bacterial genomes, and provides the genetic basis for a better understanding of the biology of O. tsutsugamushi and the pathogenesis of ‘Tsutsugamushi’ disease.Key words: Orientia tsutsugamushi, genome sequencing, obligate intracellular bacterium, repetitive sequence, IS element, integrative and conjugative element, gene amplification, genome reduction  相似文献   

14.
As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species.  相似文献   

15.

Background

Massively parallel sequencing systems continue to improve on data output, while leaving labor-intensive library preparations a potential bottleneck. Efforts are currently under way to relieve the crucial and time-consuming work to prepare DNA for high-throughput sequencing.

Methodology/Principal Findings

In this study, we demonstrate an automated parallel library preparation protocol using generic carboxylic acid-coated superparamagnetic beads and polyethylene glycol precipitation as a reproducible and flexible method for DNA fragment length separation. With this approach the library preparation for DNA sequencing can easily be adjusted to a desired fragment length. The automated protocol, here demonstrated using the GS FLX Titanium instrument, was compared to the standard manual library preparation, showing higher yield, throughput and great reproducibility. In addition, 12 libraries were prepared and uniquely tagged in parallel, and the distribution of sequence reads between these indexed samples could be improved using quantitative PCR-assisted pooling.

Conclusions/Significance

We present a novel automated procedure that makes it possible to prepare 36 indexed libraries per person and day, which can be increased to up to 96 libraries processed simultaneously. The yield, speed and robust performance of the protocol constitute a substantial improvement to present manual methods, without the need of extensive equipment investments. The described procedure enables a considerable efficiency increase for small to midsize sequencing centers.  相似文献   

16.
17.
Mycobacterium ulcerans is the causative agent of Buruli ulcer, the third most common mycobacterial disease after tuberculosis and leprosy. It is an emerging infectious disease that afflicts mainly children and youths in West Africa. Little is known about the evolution and transmission mode of M. ulcerans, partially due to the lack of known genetic polymorphisms among isolates, limiting the application of genetic epidemiology. To systematically profile single nucleotide polymorphisms (SNPs), we sequenced the genomes of three M. ulcerans strains using 454 and Solexa technologies. Comparison with the reference genome of the Ghanaian classical lineage isolate Agy99 revealed 26,564 SNPs in a Japanese strain representing the ancestral lineage. Only 173 SNPs were found when comparing Agy99 with two other Ghanaian isolates, which belong to the two other types previously distinguished in Ghana by variable number tandem repeat typing. We further analyzed a collection of Ghanaian strains using the SNPs discovered. With 68 SNP loci, we were able to differentiate 54 strains into 13 distinct SNP haplotypes. The average SNP nucleotide diversity was low (average 0.06–0.09 across 68 SNP loci), and 96% of the SNP locus pairs were in complete linkage disequilibrium. We estimated that the divergence of the M. ulcerans Ghanaian clade from the Japanese strain occurred 394 to 529 thousand years ago. The Ghanaian subtypes diverged about 1000 to 3000 years ago, or even much more recently, because we found evidence that they evolved significantly faster than average. Our results offer significant insight into the evolution of M. ulcerans and provide a comprehensive report on genetic diversity within a highly clonal M. ulcerans population from a Buruli ulcer endemic region, which can facilitate further epidemiological studies of this pathogen through the development of high-resolution tools.  相似文献   

18.
19.
Nie L  Yu Y  Zhang XQ  Yang GF  Wen JK  Zhang YP 《Biochemical genetics》1999,37(7-8):257-265
Genetic variation of 31 blood protein loci in236 cattle from eight South China populations (includingmithan, Bos frontalis) and a Holstein population wasinvestigated by means of horizontal starch gel electrophoresis. Thirteen loci (ALB, CAR, Hb-b,Np, PGM, Amy-I, PEP-B, AKP, 6PGD, Cp, Pa, EsD, and TF)were found to be polymorphic. The comparison of averageheterozygosities (H) shows that all the native cattle embrace a rich genetic diversity. Ourresults on protein polymorphism suggest that cattle inChina originated mainly from Bos indicus and Bos taurus;Xuwen, Hainan, Wenshan, and Dehong cattle and the Dehong zebu are close to zebu-type cattle,and Diqing and Zhaotong cattle are close to the taurine.The mithan was very different from other native cattle,and we suggest that its origin was complicated and may be influenced by other cattlespecies.  相似文献   

20.
Detecting and localizing selective sweeps on the basis of SNP data has recently received considerable attention. Here we introduce the use of hidden Markov models (HMMs) for the detection of selective sweeps in DNA sequences. Like previously published methods, our HMMs use the site frequency spectrum, and the spatial pattern of diversity along the sequence, to identify selection. In contrast to earlier approaches, our HMMs explicitly model the correlation structure between linked sites. The detection power of our methods, and their accuracy for estimating the selected site location, is similar to that of competing methods for constant size populations. In the case of population bottlenecks, however, our methods frequently showed fewer false positives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号