首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Salmonid genomes are considered to be in a pseudo‐tetraploid state as a result of a genome duplication event that occurred between 25 and 100 Ma. This situation complicates single‐nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and not simple allelic variants. To differentiate PSVs from simple allelic variants, we used 19 homozygous doubled haploid (DH) lines that represent a wide geographical range of rainbow trout populations. In the first phase of the study, we analysed SbfI restriction‐site associated DNA (RAD) sequence data from all the 19 lines and selected 11 lines for an extended SNP discovery. In the second phase, we conducted the extended SNP discovery using PstI RAD sequence data from the selected 11 lines. The complete data set is composed of 145 168 high‐quality putative SNPs that were genotyped in at least nine of the 11 lines, of which 71 446 (49%) had minor allele frequencies (MAF) of at least 18% (i.e. at least two of the 11 lines). Approximately 14% of the RAD SNPs in this data set are from expressed or coding rainbow trout sequences. Our comparison of the current data set with previous SNP discovery data sets revealed that 99% of our SNPs are novel. In the support files for this resource, we provide annotation to the positions of the SNPs in the working draft of the rainbow trout reference genome, provide the genotypes of each sample in the discovery panel and identify SNPs that are likely to be in coding sequences.  相似文献   

2.
3.
Restriction‐site‐associated DNA sequencing (RAD‐seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD‐seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under‐ or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD‐seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD‐seq data analysis strategies on population structure inferences that are directly applicable to other species.  相似文献   

4.
Inferring phylogenetic relationships between closely related taxa can be hindered by three factors: (1) the lack of informative molecular variation at short evolutionary timescale; (2) the lack of established markers in poorly studied taxa; and (3) the potential phylogenetic conflicts among different genomic regions due to incomplete lineage sorting or introgression. In this context, Restriction site Associated DNA sequencing (RAD‐seq) seems promising as this technique can generate sequence data from numerous DNA fragments scattered throughout the genome, from a large number of samples, and without preliminary knowledge on the taxa under study. However, divergence beyond the within‐species level will necessarily reduce the number of conserved and non‐duplicated restriction sites, and therefore the number of loci usable for phylogenetic inference. Here, we assess the suitability of RAD‐seq for phylogeny using a simulated experiment on the 12 Drosophila genomes, with divergence times ranging from 5 to 63 million years. These simulations show that RAD‐seq allows the recovery of the known Drosophila phylogeny with strong statistical support, even for relatively ancient nodes. Notably, this conclusion is robust to the potentially confounding effects of sequencing errors, heterozygosity, and low coverage. We further show that clustering RAD‐seq data using the BLASTN and SiLiX programs significantly improves the recovery of orthologous RAD loci compared with previously proposed approaches, especially for distantly related species. This study therefore validates the view that RAD sequencing is a powerful tool for phylogenetic inference.  相似文献   

5.
A considerable number of single nucleotide polymorphisms (SNPs) are required to elucidate genotype–phenotype associations and determine the molecular basis of important traits. In this work, we carried out de novo SNP discovery accounting for both genome duplication and genetic variation from American and European salmon populations. A total of 9 736 473 nonredundant SNPs were identified across a set of 20 fish by whole‐genome sequencing. After applying six bioinformatic filtering steps, 200 K SNPs were selected to develop an Affymetrix Axiom® myDesign Custom Array. This array was used to genotype 480 fish representing wild and farmed salmon from Europe, North America and Chile. A total of 159 099 (79.6%) SNPs were validated as high quality based on clustering properties. A total of 151 509 validated SNPs showed a unique position in the genome. When comparing these SNPs against 238 572 markers currently available in two other Atlantic salmon arrays, only 4.6% of the SNP overlapped with the panel developed in this study. This novel high‐density SNP panel will be very useful for the dissection of economically and ecologically relevant traits, enhancing breeding programmes through genomic selection as well as supporting genetic studies in both wild and farmed populations of Atlantic salmon using high‐resolution genomewide information.  相似文献   

6.
Single nucleotide polymorphisms (SNPs) are essential to the understanding of population genetic variation and diversity. Here, we performed restriction‐site‐associated DNA sequencing (RAD‐seq) on 72 individuals from 13 Chinese indigenous and three introduced chicken breeds. A total of 620 million reads were obtained using an Illumina Hiseq2000 sequencer. An average of 75 587 SNPs were identified from each individual. Further filtering strictly validated 28 895 SNPs candidates for all populations. When compared with the NCBI dbSNP (chicken_9031), 15 404 SNPs were new discoveries. In this study, RAD‐seq was performed for the first time on chickens, implicating the remarkable effectiveness and potential applications on genetic analysis and breeding technique for whole‐genome selection in chicken and other agricultural animals.  相似文献   

7.
The trade‐offs of using single‐digest vs. double‐digest restriction site‐associated DNA sequencing (RAD‐seq) protocols have been widely discussed. However, no direct empirical comparisons of the two methods have been conducted. Here, we sampled a single population of Gulf pipefish (Syngnathus scovelli) and genotyped 444 individuals using RAD‐seq. Sixty individuals were subjected to single‐digest RAD‐seq (sdRAD‐seq), and the remaining 384 individuals were genotyped using a double‐digest RAD‐seq (ddRAD‐seq) protocol. We analysed the resulting Illumina sequencing data and compared the two genotyping methods when reads were analysed either together or separately. Coverage statistics, observed heterozygosity, and allele frequencies differed significantly between the two protocols, as did the results of selection components analysis. We also performed an in silico digestion of the Gulf pipefish genome and modelled five major sources of bias: PCR duplicates, polymorphic restriction sites, shearing bias, asymmetric sampling (i.e., genotyping fewer individuals with sdRAD‐seq than with ddRAD‐seq) and higher major allele frequencies. This combination of approaches allowed us to determine that polymorphic restriction sites, an asymmetric sampling scheme, mean allele frequencies and to some extent PCR duplicates all contribute to different estimates of allele frequencies between samples genotyped using sdRAD‐seq versus ddRAD‐seq. Our finding that sdRAD‐seq and ddRAD‐seq can result in different allele frequencies has implications for comparisons across studies and techniques that endeavour to identify genomewide signatures of evolutionary processes in natural populations.  相似文献   

8.
Whole‐genome duplications are major evolutionary events with a lasting impact on genome structure. Duplication events complicate genetic analyses as paralogous sequences are difficult to distinguish; consequently, paralogs are often excluded from studies. The effects of an ancient whole‐genome duplication (approximately 88 MYA) are still evident in salmonids through the persistence of numerous paralogous gene sequences and partial tetrasomic inheritance. We use restriction site‐associated DNA sequencing on 10 collections of chum salmon from the Salish Sea in the USA and Canada to investigate genetic diversity and population structure in both tetrasomic and rediploidized regions of the genome. We use a pedigree and high‐density linkage map to identify paralogous loci and to investigate genetic variation across the genome. By applying multivariate statistical methods, we show that it is possible to characterize paralogous loci and that they display similar patterns of population structure as the diploidized portion of the genome. We find genetic associations with the adaptively important trait of run‐timing in both sets of loci. By including paralogous loci in genome scans, we can observe evolutionary signals in genomic regions that have routinely been excluded from population genetic studies in other polyploid‐derived species.  相似文献   

9.
Here, we present an adaptation of restriction‐site‐associated DNA sequencing (RAD‐seq) to the Illumina HiSeq2000 technology that we used to produce SNP markers in very large quantities at low cost per unit in the Réunion grey white‐eye (Zosterops borbonicus), a nonmodel passerine bird species with no reference genome. We sequenced a set of six pools of 18–25 individuals using a single sequencing lane. This allowed us to build around 600 000 contigs, among which at least 386 000 could be mapped to the zebra finch (Taeniopygia guttata) genome. This yielded more than 80 000 SNPs that could be mapped unambiguously and are evenly distributed across the genome. Thus, our approach provides a good illustration of the high potential of paired‐end RAD sequencing of pooled DNA samples combined with comparative assembly to the zebra finch genome to build large contigs and characterize vast numbers of informative SNPs in nonmodel passerine bird species in a very efficient and cost‐effective way.  相似文献   

10.
Anadromous Atlantic salmon (Salmo salar) is a species of major conservation and management concern in North America, where population abundance has been declining over the past 30 years. Effective conservation actions require the delineation of conservation units to appropriately reflect the spatial scale of intraspecific variation and local adaptation. Towards this goal, we used the most comprehensive genetic and genomic database for Atlantic salmon to date, covering the entire North American range of the species. The database included microsatellite data from 9142 individuals from 149 sampling locations and data from a medium‐density SNP array providing genotypes for >3000 SNPs for 50 sampling locations. We used neutral and putatively selected loci to integrate adaptive information in the definition of conservation units. Bayesian clustering with the microsatellite data set and with neutral SNPs identified regional groupings largely consistent with previously published regional assessments. The use of outlier SNPs did not result in major differences in the regional groupings, suggesting that neutral markers can reflect the geographic scale of local adaptation despite not being under selection. We also performed assignment tests to compare power obtained from microsatellites, neutral SNPs and outlier SNPs. Using SNP data substantially improved power compared to microsatellites, and an assignment success of 97% to the population of origin and of 100% to the region of origin was achieved when all SNP loci were used. Using outlier SNPs only resulted in minor improvements to assignment success to the population of origin but improved regional assignment. We discuss the implications of these new genetic resources for the conservation and management of Atlantic salmon in North America.  相似文献   

11.
Bottom‐up evolutionary approaches, including geographically explicit population genomic analyses, have the power to reveal the mechanistic basis of adaptation. Here, we conduct a population genomic analysis in the model legume, Medicago truncatula, to characterize population genetic structure and identify symbiosis‐related genes showing evidence of spatially variable selection. Using RAD‐seq, we generated over 26,000 SNPs from 191 accessions from within three regions of the native range in Europe. Results from STRUCTURE analysis identify five distinct genetic clusters with divisions that separate east and west regions in the Mediterranean basin. Much of the genetic variation is maintained within sampling sites, and there is evidence for isolation by distance. Extensive linkage disequilibrium was identified, particularly within populations. We conducted genetic outlier analysis with FST‐based genome scans and a Bayesian modeling approach (PCAdapt). There were 70 core outlier loci shared between these distinct methods with one clear candidate symbiosis related gene, DMI1. This work sets that stage for functional experiments to determine the important phenotypes that selection has acted upon and complementary efforts in rhizobium populations.  相似文献   

12.
Gene sequence similarity due to shared ancestry after a duplication event, that is paralogy, complicates the assessment of genetic variation, as sequences originating from paralogs can be difficult to distinguish. These confounded sequences are often removed prior to further analyses, leaving the underlying loci uncharacterized. Salmonids have only partially rediploidized subsequent to a whole‐genome duplication; residual tetrasomic inheritance has been observed in males. We present a maximum‐likelihood‐based method to resolve confounded paralogous loci by observing the segregation of alleles in gynogenetic haploid offspring and demonstrate its effectiveness by constructing two linkage maps for chum salmon (Oncorhynchus keta), with and without these newly resolved loci. We find that the resolved paralogous loci are not randomly distributed across the genome. A majority are clustered in expanded subtelomeric regions of 14 linkage groups, suggesting a significant fraction of the chum salmon genome may be missed by the exclusion of paralogous loci. Transposable elements have been proposed as drivers of genome evolution and, in salmonids, may have an important role in the rediploidization process by driving differentiation between homeologous chromosomes. Consistent with that hypothesis, we find a reduced fraction of transposable element annotations among paralogous loci, and these loci predominately occur in the genomic regions that lag in the rediploidization process.  相似文献   

13.
Adaptive radiation unfolds as selection acts on the genetic variation underlying functional traits. The nature of this variation can be revealed by studying the tips of an ongoing adaptive radiation. We studied genomic variation at the tips of the Darwin's finch radiation; specifically focusing on polymorphism within, and variation among, three sympatric species of the genus Geospiza. Using restriction site‐associated DNA (RAD‐seq), we characterized 32 569 single‐nucleotide polymorphisms (SNPs), from which 11 outlier SNPs for beak and body size were uncovered by a genomewide association study (GWAS). Principal component analysis revealed that these 11 SNPs formed four statistically linked groups. Stepwise regression then revealed that the first PC score, which included 6 of the 11 top SNPs, explained over 80% of the variation in beak size, suggesting that selection on these traits influences multiple correlated loci. The two SNPs most strongly associated with beak size were near genes associated with beak morphology across deeper branches of the radiation: delta‐like 1 homologue (DLK1) and high‐mobility group AT‐hook 2 (HMGA2). Our results suggest that (i) key adaptive traits are associated with a small fraction of the genome (11 of 32 569 SNPs), (ii) SNPs linked to the candidate genes are dispersed throughout the genome (on several chromosomes), and (iii) micro‐ and macro‐evolutionary variation (roots and tips of the radiation) involve some shared and some unique genomic regions.  相似文献   

14.
Whole‐genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome. Paralogs can be difficult to reliably genotype and are often excluded from genotyping‐by‐sequencing (GBS) analyses; however, removal requires paralogs to be identified which is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: (i) the expected frequency of heterozygotes exceeds that for singleton loci, and (ii) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low; but, we postulated that summing allele reads for each locus over all heterozygous individuals in a population would provide sufficient power to detect deviations at those loci. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole‐genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely rediploidized following an ancient whole‐genome duplication. Importantly, this approach only requires the genotype and allele‐specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.  相似文献   

15.
Developing genomic insights is challenging in nonmodel species for which resources are often scarce and prohibitively costly. Here, we explore the potential of a recently established approach using Pool‐seq data to generate a de novo genome assembly for mining exons, upon which Pool‐seq data are used to estimate population divergence and diversity. We do this for two pairs of sympatric populations of brown trout (Salmo trutta): one naturally sympatric set of populations and another pair of populations introduced to a common environment. We validate our approach by comparing the results to those from markers previously used to describe the populations (allozymes and individual‐based single nucleotide polymorphisms [SNPs]) and from mapping the Pool‐seq data to a reference genome of the closely related Atlantic salmon (Salmo salar). We find that genomic differentiation (FST) between the two introduced populations exceeds that of the naturally sympatric populations (FST = 0.13 and 0.03 between the introduced and the naturally sympatric populations, respectively), in concordance with estimates from the previously used SNPs. The same level of population divergence is found for the two genome assemblies, but estimates of average nucleotide diversity differ ( ≈ 0.002 and  ≈ 0.001 when mapping to S. trutta and S. salar, respectively), although the relationships between population values are largely consistent. This discrepancy might be attributed to biases when mapping to a haploid condensed assembly made of highly fragmented read data compared to using a high‐quality reference assembly from a divergent species. We conclude that the Pool‐seq‐only approach can be suitable for detecting and quantifying genome‐wide population differentiation, and for comparing genomic diversity in populations of nonmodel species where reference genomes are lacking.  相似文献   

16.
A major barrier to evolutionary studies of sex determination and sex chromosomes has been a lack of information on the types of sex‐determining mechanisms that occur among different species. This is particularly problematic in groups where most species lack visually heteromorphic sex chromosomes, such as fish, amphibians and reptiles, because cytogenetic analyses will fail to identify the sex chromosomes in these species. We describe the use of restriction site‐associated DNA (RAD) sequencing, or RAD‐seq, to identify sex‐specific molecular markers and subsequently determine whether a species has male or female heterogamety. To test the accuracy of this technique, we examined the lizard Anolis carolinensis. We performed RAD‐seq on seven male and ten female A. carolinensis and found one male‐specific molecular marker. Anolis carolinensis has previously been shown to possess male heterogamety and the recently published A. carolinensis genome facilitated the characterization of the sex‐specific RAD‐seq marker. We validated the male specificity of the new marker using PCR on additional individuals and also found that it is conserved in some other Anolis species. We discuss the utility of using RAD‐seq to identify sex‐determining mechanisms in other species with cryptic or homomorphic sex chromosomes and the implications for the evolution of male heterogamety in Anolis.  相似文献   

17.
As populations diverge many processes can shape genomic patterns of differentiation. Regions of high differentiation can arise due to divergent selection acting on selected loci, genetic hitchhiking of nearby loci, or through repeated selection against deleterious alleles (linked background selection); this divergence may then be further elevated in regions of reduced recombination. Atlantic salmon (Salmo salar) from Europe and North America diverged >600,000 years ago and despite some evidence of secondary contact, the majority of genetic data indicate substantial divergence between lineages. This deep divergence with potential gene flow provides an opportunity to investigate the role of different mechanisms that shape the genomic landscape during early speciation. Here, using 184,295 single nucleotide polymorphisms (SNPs) and 80 populations, we investigate the genomic landscape of differentiation across the Atlantic Ocean with a focus on highly differentiated regions and the processes shaping them. We found evidence of high (mean FST = 0.26) and heterogeneous genomic differentiation between continents. Genomic regions associated with high trans‐Atlantic differentiation ranged in size from single loci (SNPs) within important genes to large regions (1–3 Mbp ) on four chromosomes (Ssa06, Ssa13, Ssa16 and Ssa19). These regions showed signatures consistent with selection, including high linkage disequilibrium, despite no significant reduction in recombination. Genes and functional enrichment of processes associated with differentiated regions may highlight continental differences in ocean navigation and parasite resistance. Our results provide insight into potential mechanisms underlying differences between continents, and evidence of near‐fixed and potentially adaptive trans‐Atlantic differences concurrent with a background of high genome‐wide differentiation supports subspecies designation in Atlantic salmon.  相似文献   

18.
Wild populations of Atlantic salmon have declined worldwide. While the causes for this decline may be complex and numerous, increased mortality at sea is predicted to be one of the major contributing factors. Examining the potential changes occurring in the genome‐wide composition of populations during this migration has the potential to tease apart some of the factors influencing marine mortality. Here, we genotyped 5568 SNPs in Atlantic salmon populations representing two distinct regional genetic groups and across two cohorts to test for differential allelic and genotypic frequencies between juveniles (smolts) migrating to sea and adults (grilses) returning to freshwater after 1 year at sea. Given the complexity of the traits potentially associated with sea mortality, we contrasted the outcomes of a single‐locus FST based genome scan method with a new multilocus framework to test for genetically based differential mortality at sea. While numerous outliers were identified by the single‐locus analysis, no evidence for parallel, temporally repeated selection was found. In contrast, the multilocus approach detected repeated patterns of selection for a multilocus group of 34 covarying SNPs in one of the two populations. No significant pattern of selective mortality was detected in the other population, suggesting different causes of mortality among populations. These results first support the hypothesis that selection mainly causes small changes in allele frequencies among many covarying loci rather than a small number of changes in loci with large effects. They also point out that moving away from the a strict ‘selective sweep paradigm’ towards a multilocus genetics framework may be a more useful approach for studying the genomic signatures of natural selection on complex traits in wild populations.  相似文献   

19.
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

20.
Chilean mussel populations have been thought to be panmictic with limited genetic structure. Genotyping‐by‐sequencing approaches have enabled investigation of genomewide variation that may better distinguish populations that have evolved in different environments. We investigated neutral and adaptive genetic variation in Mytilus from six locations in southern Chile with 1240 SNPs obtained with RAD‐seq. Differentiation among locations with 891 neutral SNPs was low (FST = 0.005). Higher differentiation was obtained with a panel of 58 putative outlier SNPs (FST = 0.114) indicating the potential for local adaptation. This panel identified clusters of genetically related individuals and demonstrated that much of the differentiation (~92%) could be attributed to the three major regions and environments: extreme conditions in Patagonia, inner bay influenced by aquaculture (Reloncaví), and outer bay (Chiloé Island). Patagonia samples were most distinct, but additional analysis carried out excluding this collection also revealed adaptive divergence between inner and outer bay samples. The four locations within Reloncaví area were most similar with all panels of markers, likely due to similar environments, high gene flow by aquaculture practices, and low geographical distance. Our results and the SNP markers developed will be a powerful tool supporting management and programs of this harvested species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号