首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Whole‐genome duplications have occurred in the recent ancestors of many plants, fish and amphibians. Signals of these whole‐genome duplications still exist in the form of paralogous loci. Recent advances have allowed reliable identification of paralogs in genotyping‐by‐sequencing (GBS) data such as that generated from restriction‐site‐associated DNA sequencing (RADSeq); however, excluding paralogs from analyses is still routine due to difficulties in genotyping. This exclusion of paralogs may filter a large fraction of loci, including loci that may be adaptively important or informative for population genetic analyses. We present a maximum‐likelihood method for inferring allele dosage in paralogs and assess its accuracy using simulated GBS, empirical RADSeq and amplicon sequencing data from Chinook salmon. We accurately infer allele dosage for some paralogs from a RADSeq data set and show how accuracy is dependent upon both read depth and allele frequency. The amplicon sequencing data set, using RADSeq‐derived markers, achieved sufficient depth to infer allele dosage for all paralogs. This study demonstrates that RADSeq locus discovery combined with amplicon sequencing of targeted loci is an effective method for incorporating paralogs into population genetic analyses.  相似文献   

2.
Gene sequence similarity due to shared ancestry after a duplication event, that is paralogy, complicates the assessment of genetic variation, as sequences originating from paralogs can be difficult to distinguish. These confounded sequences are often removed prior to further analyses, leaving the underlying loci uncharacterized. Salmonids have only partially rediploidized subsequent to a whole‐genome duplication; residual tetrasomic inheritance has been observed in males. We present a maximum‐likelihood‐based method to resolve confounded paralogous loci by observing the segregation of alleles in gynogenetic haploid offspring and demonstrate its effectiveness by constructing two linkage maps for chum salmon (Oncorhynchus keta), with and without these newly resolved loci. We find that the resolved paralogous loci are not randomly distributed across the genome. A majority are clustered in expanded subtelomeric regions of 14 linkage groups, suggesting a significant fraction of the chum salmon genome may be missed by the exclusion of paralogous loci. Transposable elements have been proposed as drivers of genome evolution and, in salmonids, may have an important role in the rediploidization process by driving differentiation between homeologous chromosomes. Consistent with that hypothesis, we find a reduced fraction of transposable element annotations among paralogous loci, and these loci predominately occur in the genomic regions that lag in the rediploidization process.  相似文献   

3.
4.
5.
Whole‐genome duplications are major evolutionary events with a lasting impact on genome structure. Duplication events complicate genetic analyses as paralogous sequences are difficult to distinguish; consequently, paralogs are often excluded from studies. The effects of an ancient whole‐genome duplication (approximately 88 MYA) are still evident in salmonids through the persistence of numerous paralogous gene sequences and partial tetrasomic inheritance. We use restriction site‐associated DNA sequencing on 10 collections of chum salmon from the Salish Sea in the USA and Canada to investigate genetic diversity and population structure in both tetrasomic and rediploidized regions of the genome. We use a pedigree and high‐density linkage map to identify paralogous loci and to investigate genetic variation across the genome. By applying multivariate statistical methods, we show that it is possible to characterize paralogous loci and that they display similar patterns of population structure as the diploidized portion of the genome. We find genetic associations with the adaptively important trait of run‐timing in both sets of loci. By including paralogous loci in genome scans, we can observe evolutionary signals in genomic regions that have routinely been excluded from population genetic studies in other polyploid‐derived species.  相似文献   

6.
Establishing the sex of individuals in wild systems can be challenging and often requires genetic testing. Genotyping‐by‐sequencing (GBS) and other reduced‐representation DNA sequencing (RRS) protocols (e.g., RADseq, ddRAD) have enabled the analysis of genetic data on an unprecedented scale. Here, we present a novel approach for the discovery and statistical validation of sex‐specific loci in GBS data sets. We used GBS to genotype 166 New Zealand fur seals (NZFS, Arctocephalus forsteri) of known sex. We retained monomorphic loci as potential sex‐specific markers in the locus discovery phase. We then used (i) a sex‐specific locus threshold (SSLT) to identify significantly male‐specific loci within our data set; and (ii) a significant sex‐assignment threshold (SSAT) to confidently assign sex in silico the presence or absence of significantly male‐specific loci to individuals in our data set treated as unknowns (98.9% accuracy for females; 95.8% for males, estimated via cross‐validation). Furthermore, we assigned sex to 86 individuals of true unknown sex using our SSAT and assessed the effect of SSLT adjustments on these assignments. From 90 verified sex‐specific loci, we developed a panel of three sex‐specific PCR primers that we used to ascertain sex independently of our GBS data, which we show amplify reliably in at least two other pinniped species. Using monomorphic loci normally discarded from large SNP data sets is an effective way to identify robust sex‐linked markers for nonmodel species. Our novel pipeline can be used to identify and statistically validate monomorphic and polymorphic sex‐specific markers across a range of species and RRS data sets.  相似文献   

7.
Restriction‐site associated DNA sequencing (RAD‐seq) can identify and score thousands of genetic markers from a group of samples for population‐genetics studies. One challenge of de novo RAD‐seq analysis is to distinguish paralogous sequence variants (PSVs) from true single‐nucleotide polymorphisms (SNPs) associated with orthologous loci. In the absence of a reference genome, it is difficult to differentiate true SNPs from PSVs, and their impact on downstream analysis remains unclear. Here, we introduce a network‐based approach, PMERGE that connects fragments based on their DNA sequence similarity to identify probable PSVs. Applying our method to de novo RAD‐seq data from 150 Atlantic salmon (Salmo salar) samples collected from 15 locations across the Southern Newfoundland coast allowed the identification of 87% of total PSVs identified through alignment to the Atlantic salmon genome. Removal of these paralogs altered the inferred population structure, highlighting the potential impact of filtering in RAD‐seq analysis. PMERGE is also applied to a green crab (Carcinus maenas) data set consisting of 242 samples from 11 different locations and was successfully able to identify and remove the majority of paralogous loci (62%). The PMERGE software can be run as part of the widely used Stacks analysis package.  相似文献   

8.
In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat.  相似文献   

9.
Genotyping‐by‐sequencing (GBS) and related methods are increasingly used for studies of non‐model organisms from population genetic to phylogenetic scales. We present GIbPSs, a new genotyping toolkit for the analysis of data from various protocols such as RAD, double‐digest RAD, GBS, and two‐enzyme GBS without a reference genome. GIbPSs can handle paired‐end GBS data and is able to assign reads from both strands of a restriction fragment to the same locus. GIbPSs is most suitable for population genetic and phylogeographic analyses. It avoids genotyping errors due to indel variation by identifying and discarding affected loci. GIbPSs creates a genotype database that offers rich functionality for data filtering and export in numerous formats. We performed comparative analyses of simulated and real GBS data with GIbPSs and another program, pyRAD. This program accounts for indel variation by aligning homologous sequences. GIbPSs performed better than pyRAD in several aspects. It required much less computation time and displayed higher genotyping accuracy. GIbPSs retained smaller numbers of loci overall in analyses of real GBS data. It nevertheless delivered more complete genotype matrices with greater locus overlap between individuals and greater numbers of loci sampled in all individuals.  相似文献   

10.
Approximate Bayesian computation (ABC) is a powerful tool for model‐based inference of demographic histories from large genetic data sets. For most organisms, its implementation has been hampered by the lack of sufficient genetic data. Genotyping‐by‐sequencing (GBS) provides cheap genome‐scale data to fill this gap, but its potential has not fully been exploited. Here, we explored power, precision and biases of a coalescent‐based ABC approach where GBS data were modelled with either a population mutation parameter (θ) or a fixed site (FS) approach, allowing single or several segregating sites per locus. With simulated data ranging from 500 to 50 000 loci, a variety of demographic models could be reliably inferred across a range of timescales and migration scenarios. Posterior estimates were informative with 1000 loci for migration and split time in simple population divergence models. In more complex models, posterior distributions were wide and almost reverted to the uninformative prior even with 50 000 loci. ABC parameter estimates, however, were generally more accurate than an alternative composite‐likelihood method. Bottleneck scenarios proved particularly difficult, and only recent bottlenecks without recovery could be reliably detected and dated. Notably, minor‐allele‐frequency filters – usual practice for GBS data – negatively affected nearly all estimates. With this in mind, we used a combination of FS and θ approaches on empirical GBS data generated from the Atlantic walrus (Odobenus rosmarus rosmarus), collectively providing support for a population split before the last glacial maximum followed by asymmetrical migration and a high Arctic bottleneck. Overall, this study evaluates the potential and limitations of GBS data in an ABC‐coalescence framework and proposes a best‐practice approach.  相似文献   

11.
Blue catfish, Ictalurus furcatus, are valued in the United States as a trophy fishery for their capacity to reach large sizes, sometimes exceeding 45 kg. Additionally, blue catfish × channel catfish (I. punctatus) hybrid food fish production has recently increased the demand for blue catfish broodstock. However, there has been little study of the genetic impacts and interaction of farmed, introduced and stocked populations of blue catfish. We utilized genotyping‐by‐sequencing (GBS) to capture and genotype SNP markers on 190 individuals from five wild and domesticated populations (Mississippi River, Missouri, D&B, Rio Grande and Texas). Stringent filtering of SNP‐calling parameters resulted in 4275 SNP loci represented across all five populations. Population genetics and structure analyses revealed potential shared ancestry and admixture between populations. We utilized the Sequenom MassARRAY to validate two multiplex panels of SNPs selected from the GBS data. Selection criteria included SNPs shared between populations, SNPs specific to populations, number of reads per individual and number of individuals genotyped by GBS. Putative SNPs were validated in the discovery population and in two additional populations not used in the GBS analysis. A total of 64 SNPs were genotyped successfully in 191 individuals from nine populations. Our results should guide the development of highly informative, flexible genotyping multiplexes for blue catfish from the larger GBS SNP set as well as provide an example of a rapid, low‐cost approach to generate and genotype informative marker loci in aquatic species with minimal previous genetic information.  相似文献   

12.
The Salmoniform whole‐genome duplication is hypothesized to have facilitated the evolution of anadromy, but little is known about the contribution of paralogs from this event to the physiological performance traits required for anadromy, such as salinity tolerance. Here, we determined when two candidate, salinity‐responsive paralogs of the Na+, K+ ATPase α subunit (α1a and α1b) evolved and studied their evolutionary trajectories and tissue‐specific expression patterns. We found that these paralogs arose during a small‐scale duplication event prior to the Salmoniform, but after the teleost, whole‐genome duplication. The ‘freshwater paralog’ (α1a) is primarily expressed in the gills of Salmoniformes and an unduplicated freshwater sister species (Esox lucius) and experienced positive selection in the freshwater ancestor of Salmoniformes and Esociformes. Contrary to our predictions, the ‘saltwater paralog’ (α1b), which is more widely expressed than α1a, did not experience positive selection during the evolution of anadromy in the Coregoninae and Salmonine. To determine whether parallel mutations in Na+, K+ ATPase α1 may contribute to salinity tolerance in other fishes, we studied independently evolved salinity‐responsive Na+, K+ ATPase α1 paralogs in Anabas testudineus and Oreochromis mossambicus. We found that a quarter of the mutations occurring between salmonid α1a and α1b in functionally important sites also evolved in parallel in at least one of these species. Together, these data argue that paralogs contributing to salinity tolerance evolved prior to the Salmoniform whole‐genome duplication and that strong selection and/or functional constraints have led to parallel evolution in salinity‐responsive Na+, K+ ATPase α1 paralogs in fishes.  相似文献   

13.
Salmonid genomes are considered to be in a pseudo‐tetraploid state as a result of a genome duplication event that occurred between 25 and 100 Ma. This situation complicates single‐nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and not simple allelic variants. To differentiate PSVs from simple allelic variants, we used 19 homozygous doubled haploid (DH) lines that represent a wide geographical range of rainbow trout populations. In the first phase of the study, we analysed SbfI restriction‐site associated DNA (RAD) sequence data from all the 19 lines and selected 11 lines for an extended SNP discovery. In the second phase, we conducted the extended SNP discovery using PstI RAD sequence data from the selected 11 lines. The complete data set is composed of 145 168 high‐quality putative SNPs that were genotyped in at least nine of the 11 lines, of which 71 446 (49%) had minor allele frequencies (MAF) of at least 18% (i.e. at least two of the 11 lines). Approximately 14% of the RAD SNPs in this data set are from expressed or coding rainbow trout sequences. Our comparison of the current data set with previous SNP discovery data sets revealed that 99% of our SNPs are novel. In the support files for this resource, we provide annotation to the positions of the SNPs in the working draft of the rainbow trout reference genome, provide the genotypes of each sample in the discovery panel and identify SNPs that are likely to be in coding sequences.  相似文献   

14.
Verticillium wilt (VW) is a fungal disease that causes severe yield losses in alfalfa. The most effective method to control the disease is through the development and use of resistant varieties. The identification of marker loci linked to VW resistance can facilitate breeding for disease‐resistant alfalfa. In the present investigation, we applied an integrated framework of genome‐wide association with genotyping‐by‐sequencing (GBS) to identify VW resistance loci in a panel of elite alfalfa breeding lines. Phenotyping was performed by manual inoculation of the pathogen to healthy seedlings, and scoring for disease resistance was carried out according to the standard test of the North America Alfalfa Improvement Conference (NAAIC). Marker–trait association by linkage disequilibrium identified 10 single nucleotide polymorphism (SNP) markers significantly associated with VW resistance. Alignment of the SNP marker sequences to the M. truncatula genome revealed multiple quantitative trait loci (QTLs). Three, two, one and five markers were located on chromosomes 5, 6, 7 and 8, respectively. Resistance loci found on chromosomes 7 and 8 in the present study co‐localized with the QTLs reported previously. A pairwise alignment (blastn ) using the flanking sequences of the resistance loci against the M. truncatula genome identified potential candidate genes with putative disease resistance function. With further investigation, these markers may be implemented into breeding programmes using marker‐assisted selection, ultimately leading to improved VW resistance in alfalfa.  相似文献   

15.
Targeted GBS is a recent approach for obtaining an effective characterization for hundreds to thousands of markers. The high throughput of next‐generation sequencing technologies, moreover, allows sample multiplexing. The aims of this study were to (i) define a panel of single nucleotide polymorphisms (SNPs) in the cat, (ii) use GBS for profiling 16 cats, and (iii) evaluate the performance with respect to the inference using standard approaches at different coverage thresholds, thereby providing useful information for designing similar experiments. Probes for sequencing 230 variants were designed based on the Felis_catus_8.0. 8.0 genome. The regions comprised anonymous and non‐anonymous SNPs. Sixteen cat samples were analysed, some of which had already been genotyped in a large group of loci and one having been whole‐genome sequenced in the 99_Lives Cat Genome Sequencing Project. The accuracy of the method was assessed by comparing the GBS results with the genotypes already available. Overall, GBS achieved good performance, with 92–96% correct assignments, depending on the coverage threshold used to define the set of trustable genotypes. Analyses confirmed that (i) the reliability of the inference of each genotype depends on the coverage at that locus and (ii) the fraction of target loci whose genotype can be inferred correctly is a function of the total coverage. GBS proves to be a valid alternative to other methods. Data suggested a depth of less than 11× is required for greater than 95% accuracy. However, sequencing depth must be adapted to the total size of the targets to ensure proper genotype inference.  相似文献   

16.
A whole‐genome duplication (WGD) doubles the entire genomic content of a species and is thought to have catalysed adaptive radiation in some polyploid‐origin lineages. However, little is known about general consequences of a WGD because gene duplicates (i.e., paralogs) are commonly filtered in genomic studies; such filtering may remove substantial portions of the genome in data sets from polyploid‐origin species. We demonstrate a new method that enables genome‐wide scans for signatures of selection at both nonduplicated and duplicated loci by taking locus‐specific copy number into account. We apply this method to RAD sequence data from different ecotypes of a polyploid‐origin salmonid (Oncorhynchus nerka) and reveal signatures of divergent selection that would have been missed if duplicated loci were filtered. We also find conserved signatures of elevated divergence at pairs of homeologous chromosomes with residual tetrasomic inheritance, suggesting that joint evolution of some nondiverged gene duplicates may affect the adaptive potential of these genes. These findings illustrate that including duplicated loci in genomic analyses enables novel insights into the evolutionary consequences of WGDs and local segmental gene duplications.  相似文献   

17.
18.
Restriction‐site associated DNA sequencing (RADSeq) facilitates rapid generation of thousands of genetic markers at relatively low cost; however, several sources of error specific to RADSeq methods often lead to biased estimates of allele frequencies and thereby to erroneous population genetic inference. Estimating the distribution of sample allele frequencies without calling genotypes was shown to improve population inference from whole genome sequencing data, but the ability of this approach to account for RADSeq‐specific biases remains unexplored. Here we assess in how far genotype‐free methods of allele frequency estimation affect demographic inference from empirical RADSeq data. Using the well‐studied pied flycatcher (Ficedula hypoleuca) as a study system, we compare allele frequency estimation and demographic inference from whole genome sequencing data with that from RADSeq data matched for samples using both genotype‐based and genotype free methods. The demographic history of pied flycatchers as inferred from RADSeq data was highly congruent with that inferred from whole genome resequencing (WGS) data when allele frequencies were estimated directly from the read data. In contrast, when allele frequencies were derived from called genotypes, RADSeq‐based estimates of most model parameters fell outside the 95% confidence interval of estimates derived from WGS data. Notably, more stringent filtering of the genotype calls tended to increase the discrepancy between parameter estimates from WGS and RADSeq data, respectively. The results from this study demonstrate the ability of genotype‐free methods to improve allele frequency spectrum‐ (AFS‐) based demographic inference from empirical RADSeq data and highlight the need to account for uncertainty in NGS data regardless of sequencing method.  相似文献   

19.
The floral polymorphism tristyly involves three style morphs with a reciprocal arrangement of stigma and anther heights governed by two diallelic loci (S and M). Tristyly functions to promote cross‐pollination, but modifications to stamen position commonly cause transitions to selfing. Here, we integrate whole‐genome sequencing and genetic mapping to investigate the genetic architecture of the M locus and the genetic basis of independent transitions to selfing in tristylous Eichhornia paniculata. We crossed independently derived semi‐homostylous selfing variants of the long‐ and mid‐styled morph fixed for alternate alleles at the M locus (ssmm and ssMM, respectively), and backcrossed the F1 to the parental ssmm genotype. We phenotyped and genotyped 462 backcross progeny using 1450 genotyping‐by‐sequencing (GBS) markers and performed composite interval mapping to identify quantitative trait loci (QTL) governing style‐length and anther‐height variation. A QTL associated with the primary style‐morph differences (style length and anther height) mapped to linkage group 5 and spanned ~13–27.5 Mbp of assembled sequence. Bulk segregant analysis identified 334 genes containing SNPs potentially linked to the M locus. The stamen modifications characterizing each selfing variant were governed by loci on different linkage groups. Our results provide an important step towards identifying the M locus and demonstrate that transitions to selfing have originated by independent sets of mating‐system modifier genes unlinked to the M locus, a pattern inconsistent with a recombinational origin of selfing variants at a putative supergene.  相似文献   

20.
Previous studies with rainbow trout (Oncorhynchus mykiss) have shown that allozymic heterozygotes have increased developmental stability, as measured by reduced fluctuating bilateral asymmetry. In this paper, we examine the phenotypic effects of null alleles at two lactate dehydrogenase (LDH) loci. If the association between allozymic heterozygosity and developmental stability is due largely to linked chromosomal segments, then we would expect null allele heterozygotes to have increased developmental stability. In contrast, heterozygotes for LDH null alleles in three populations have reduced developmental stability. This suggests that the reduction in enzyme activity at these loci is having a deleterious effect on development that is strong enough to mask any beneficial effects that may be associated with heterozygosity for these chromosomal segments. The LDH loci examined in this study are members of two different paralogous pairs of duplicate genes produced by the polyploidization of the ancestral salmonid genome. The apparent deleterious effects of these null alleles in heterozygotes could retard the possible loss of duplicate gene expression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号