首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high‐throughput sequencing workflows. We fit observed ultra‐deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low‐depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1–6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.  相似文献   

2.
With their direct link to individual fitness, genes of the major histocompatibility complex (MHC) are a popular system to study the evolution of adaptive genetic diversity. However, owing to the highly dynamic evolution of the MHC region, the isolation, characterization and genotyping of MHC genes remain a major challenge. While high‐throughput sequencing technologies now provide unprecedented resolution of the high allelic diversity observed at the MHC, in many species, it remains unclear (i) how alleles are distributed among MHC loci, (ii) whether MHC loci are linked or segregate independently and (iii) how much copy number variation (CNV) can be observed for MHC genes in natural populations. Here, we show that the study of allele segregation patterns within families can provide significant insights in this context. We sequenced two MHC class I (MHC‐I) loci in 1267 European barn owls (Tyto alba), including 590 offspring from 130 families using Illumina MiSeq technology. Coupled with a high per‐individual sequencing coverage (~3000×), the study of allele segregation patterns within families provided information on three aspects of the architecture of MHC‐I variation in barn owls: (i) extensive sharing of alleles among loci, (ii) strong linkage of MHC‐I loci indicating tandem architecture and (iii) the presence of CNV in the barn owl MHC‐I. We conclude that the additional information that can be gained from high‐coverage amplicon sequencing by investigating allele segregation patterns in families not only helps improving the accuracy of MHC genotyping, but also contributes towards enhanced analyses in the context of MHC evolutionary ecology.  相似文献   

3.
4.
High‐throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b‐RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b‐RAD protocols on non‐model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b‐RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade‐off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective‐base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b‐RAD protocols on non‐model organisms with different genome sizes, helping decision‐making for a reliable and cost‐effective genotyping.  相似文献   

5.
The genotyping of highly polymorphic multigene families across many individuals used to be a particularly challenging task because of methodological limitations associated with traditional approaches. Next‐generation sequencing (NGS) can overcome most of these limitations, and it is increasingly being applied in population genetic studies of multigene families. Here, we critically review NGS bioinformatic approaches that have been used to genotype the major histocompatibility complex (MHC) immune genes, and we discuss how the significant advances made in this field are applicable to population genetic studies of gene families. Increasingly, approaches are introduced that apply thresholds of sequencing depth and sequence similarity to separate alleles from methodological artefacts. We explain why these approaches are particularly sensitive to methodological biases by violating fundamental genotyping assumptions. An alternative strategy that utilizes ultra‐deep sequencing (hundreds to thousands of sequences per amplicon) to reconstruct genotypes and applies statistical methods on the sequencing depth to separate alleles from artefacts appears to be more robust. Importantly, the ‘degree of change’ (DOC) method avoids using arbitrary cut‐off thresholds by looking for statistical boundaries between the sequencing depth for alleles and artefacts, and hence, it is entirely repeatable across studies. Although the advances made in generating NGS data are still far ahead of our ability to perform reliable processing, analysis and interpretation, the community is developing statistically rigorous protocols that will allow us to address novel questions in evolution, ecology and genetics of multigene families. Future developments in third‐generation single molecule sequencing may potentially help overcome problems that still persist in de novo multigene amplicon genotyping when using current second‐generation sequencing approaches.  相似文献   

6.
Efficient methods for constructing 16S tag amplicon libraries for pyrosequencing are needed for the rapid and thorough screening of infectious bacterial diversity from host tissue samples. Here we have developed a double‐nested PCR methodology that generates 16S tag amplicon libraries from very small amounts of bacteria/host samples. This methodology was tested for 133 kidney samples from the lake whitefish Coregonus clupeaformis (Salmonidae) sampled in five different lake populations. The double‐nested PCR efficiency was compared with two other PCR strategies: single primer pair amplification and simple nested PCR. The double‐nested PCR was the only amplification strategy to provide highly specific amplification of bacterial DNA. The resulting 16S amplicon libraries were synthesized and pyrosequenced using 454 FLX technology to analyse the variation of pathogenic bacteria abundance. The proportion of the community sequenced was very high (Good’s coverage estimator; mean = 95.4%). Furthermore, there were no significant differences of sequence coverage among samples. Finally, the occurrence of chimeric amplicons was very low. Therefore, the double‐nested PCR approach provides a rapid, informative and cost‐effective method for screening fish immunobiomes and most likely applicable to other low‐density microbiomes as well.  相似文献   

7.
While various technologies for high‐throughput genotyping have been developed for ecological studies, simple methods tolerant to low‐quality DNA samples are still limited. In this study, we tested the availability of a random PCR‐based genotyping‐by‐sequencing technology, genotyping by random amplicon sequencing, direct (GRAS‐Di). We focused on population genetic analysis of estuarine mangrove fishes, including two resident species, the Amboina cardinalfish (Fibramia amboinensis, Bleeker, 1853) and the Duncker's river garfish (Zenarchopterus dunckeri, Mohr, 1926), and a marine migrant, the blacktail snapper (Lutjanus fulvus, Forster, 1801). Collections were from the Ryukyu Islands, southern Japan. PCR amplicons derived from ~130 individuals were pooled and sequenced in a single lane on a HiSeq2500 platform, and an average of three million reads was obtained per individual. Consensus contigs were assembled for each species and used for genotyping of single nucleotide polymorphisms by mapping trimmed reads onto the contigs. After quality filtering steps, 4,000–9,000 putative single nucleotide polymorphisms were detected for each species. Although DNA fragmentation can diminish genotyping performance when analysed on next‐generation sequencing technology, the effect was small. Genetic differentiation and a clear pattern of isolation‐by‐distance was observed in F. amboinensis and Z. dunckeri by means of principal component analysis, FST and the admixture analysis. By contrast, L. fulvus comprised a genetically homogeneous population with directional recent gene flow. These genetic differentiation patterns reflect patterns of estuary use through life history. These results showed the power of GRAS‐Di for fine‐grained genetic analysis using field samples, including mangrove fishes.  相似文献   

8.
Next‐generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus‐specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post‐processing of NGS data. Amplicon Sequence Assignment (amplisas ) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. amplisas is designed as a three‐step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. amplisas performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies.  相似文献   

9.
Although approaches for performing genome‐wide association studies (GWAS) are well developed, conventional GWAS requires high‐density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP‐GWAS (extreme‐phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well‐characterized kernel row number trait, which was selected to enable comparisons between the results of XP‐GWAS and conventional GWAS. An exome‐sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait‐associated variants were significantly enriched in regions identified by conventional GWAS. XP‐GWAS was able to resolve several linked QTL and detect trait‐associated variants within a single gene under a QTL peak. XP‐GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest.  相似文献   

10.
Genotyping of classical major histocompatibility complex (MHC) genes is challenging when they are hypervariable and occur in multiple copies. In this study, we used several different approaches to genotype the moderately variable MHC class I exon 3 (MHCIe3) and the highly polymorphic MHC class II exon 2 (MHCIIβe2) in the bluethroat (Luscinia svecica). Two family groups (eight individuals) were sequenced in replicates at both markers using Ion Torrent technology with both a single‐ and a dual‐indexed primer structure. Additionally, MHCIIβe2 was sequenced on Illumina MiSeq. Allele calling was conducted by modifications of the pipeline developed by Sommer et al. (BMC Genomics, 14, 2013, 542) and the software AmpliSAS. While the different genotyping strategies gave largely consistent results for MHCIe3, with a maximum of eight alleles per individual, MHCIIβe2 was remarkably complex with a maximum of 56 MHCIIβe2 alleles called for one individual. Each genotyping strategy detected on average 50%–82% of all MHCIIβe2 alleles per individual, but dropouts were largely allele‐specific and consistent within families for each strategy. The discrepancies among approaches indicate PCR biases caused by the platform‐specific primer tails. Further, AmpliSAS called fewer alleles than the modified Sommer pipeline. Our results demonstrate that allelic dropout is a significant problem when genotyping the hypervariable MHCIIβe2. As these genotyping errors are largely nonrandom and method‐specific, we caution against comparing genotypes across different genotyping strategies. Nevertheless, we conclude that high‐throughput approaches provide a major advance in the challenging task of genotyping hypervariable MHC loci, even though they may not reveal the complete allelic repertoire.  相似文献   

11.
Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double‐digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single‐end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11 000 polymorphic loci per library of 6–30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost‐effective generation of variable and reproducible genetic markers.  相似文献   

12.
In the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practice. New methods allowing rapid, high‐throughput genotyping of entire crop populations have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping‐by‐sequencing (GBS) methods include over a dozen reduced‐representation sequencing (RRS) approaches and at least four whole‐genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best‐suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genomewide association studies (GWAS), marker‐assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small‐ to moderate‐sized genomes and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost‐effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost‐effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease.  相似文献   

13.
Whole‐genome duplications have occurred in the recent ancestors of many plants, fish and amphibians. Signals of these whole‐genome duplications still exist in the form of paralogous loci. Recent advances have allowed reliable identification of paralogs in genotyping‐by‐sequencing (GBS) data such as that generated from restriction‐site‐associated DNA sequencing (RADSeq); however, excluding paralogs from analyses is still routine due to difficulties in genotyping. This exclusion of paralogs may filter a large fraction of loci, including loci that may be adaptively important or informative for population genetic analyses. We present a maximum‐likelihood method for inferring allele dosage in paralogs and assess its accuracy using simulated GBS, empirical RADSeq and amplicon sequencing data from Chinook salmon. We accurately infer allele dosage for some paralogs from a RADSeq data set and show how accuracy is dependent upon both read depth and allele frequency. The amplicon sequencing data set, using RADSeq‐derived markers, achieved sufficient depth to infer allele dosage for all paralogs. This study demonstrates that RADSeq locus discovery combined with amplicon sequencing of targeted loci is an effective method for incorporating paralogs into population genetic analyses.  相似文献   

14.
Microsatellite marker development has been greatly simplified by the use of high‐throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high‐coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command‐line version as well as integrated into a Galaxy server, providing a user‐friendly interface, as well as the possibility to utilize a large variety of NGS tools.  相似文献   

15.
In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat.  相似文献   

16.
Characterization and population genetic analysis of multilocus genes, such as those found in the major histocompatibility complex (MHC) is challenging in nonmodel vertebrates. The traditional method of extensive cloning and Sanger sequencing is costly and time‐intensive and indirect methods of assessment often underestimate total variation. Here, we explored the suitability of 454 pyrosequencing for characterizing multilocus genes for use in population genetic studies. We compared two sample tagging protocols and two bioinformatic procedures for 454 sequencing through characterization of a 185‐bp fragment of MHC DRB exon 2 in wolverines (Gulo gulo) and further compared the results with those from cloning and Sanger sequencing. We found 10 putative DRB alleles in the 88 individuals screened with between two and four alleles per individual, suggesting amplification of a duplicated DRB gene. In addition to the putative alleles, all individuals possessed an easily identifiable pseudogene. In our system, sequence variants with a frequency below 6% in an individual sample were usually artefacts. However, we found that sample preparation and data processing procedures can greatly affect variant frequencies in addition to the complexity of the multilocus system. Therefore, we recommend determining a per‐amplicon‐variant frequency threshold for each unique system. The extremely deep coverage obtained in our study (approximately 5000×) coupled with the semi‐quantitative nature of pyrosequencing enabled us to assign all putative alleles to the two DRB loci, which is generally not possible using traditional methods. Our method of obtaining locus‐specific MHC genotypes will enhance population genetic analyses and studies on disease susceptibility in nonmodel wildlife species.  相似文献   

17.
Single‐nucleotide polymorphisms (SNPs) are rapidly becoming the standard markers in population genomics studies; however, their use in nonmodel organisms is limited due to the lack of cost‐effective approaches to uncover genome‐wide variation, and the large number of individuals needed in the screening process to reduce ascertainment bias. To discover SNPs for population genomics studies in the fungal symbionts of the mountain pine beetle (MPB), we developed a road map to discover SNPs and to produce a genotyping platform. We undertook a whole‐genome sequencing approach of Leptographium longiclavatum in combination with available genomics resources of another MPB symbiont, Grosmannia clavigera. We sequenced 71 individuals pooled into four groups using the Illumina sequencing technology. We generated between 27 and 30 million reads of 75 bp that resulted in a total of 1, 181 contigs longer than 2 kb and an assembled genome size of 28.9 Mb (N50 = 48 kb, average depth = 125x). A total of 9052 proteins were annotated, and between 9531 and 17 266 SNPs were identified in the four pools. A subset of 206 genes (containing 574 SNPs, 11% false positives) was used to develop a genotyping platform for this species. Using this roadmap, we developed a genotyping assay with a total of 147 SNPs located in 121 genes using the Illumina® Sequenom iPLEX Gold. Our preliminary genotyping (success rate = 85%) of 304 individuals from 36 populations supports the utility of this approach for population genomics studies in other MPB fungal symbionts and other fungal nonmodel species.  相似文献   

18.
Genes of the highly dynamic major histocompatibility complex (MHC) are directly linked to individual fitness and are of high interest in evolutionary ecology and conservation genetics. Gene duplication and positive selection usually lead to high levels of polymorphism in the MHC region, making genotyping of MHC a challenging task. Here, we compare the performance of two methods for MHC class I genotyping in a passerine with highly duplicated MHC class I genes: capillary electrophoresis-single-strand conformation polymorphism (CE-SSCP) analysis and 454 GS FLX Titanium pyrosequencing. According to our findings, the number of MHC variants (called alleles for simplicity) detected by CE-SSCP is significantly lower than detected by 454. To resolve discrepancies between the two methods, we cloned and Sanger sequenced a MHC class I amplicon for an individual with high number of alleles. We found a perfect congruence between cloning/Sanger sequencing results and 454. Thus, in case of multi-locus amplification, CE-SSCP considerably underestimates individual MHC diversity. However, numbers of alleles detected by both methods are significantly correlated, although the correlation is weak (r = 0.32). Thus, in systems with highly duplicated MHC, 454 provides more reliable information on individual diversity than CE-SSCP.  相似文献   

19.
Genotyping‐by‐sequencing (GBS) and related methods are increasingly used for studies of non‐model organisms from population genetic to phylogenetic scales. We present GIbPSs, a new genotyping toolkit for the analysis of data from various protocols such as RAD, double‐digest RAD, GBS, and two‐enzyme GBS without a reference genome. GIbPSs can handle paired‐end GBS data and is able to assign reads from both strands of a restriction fragment to the same locus. GIbPSs is most suitable for population genetic and phylogeographic analyses. It avoids genotyping errors due to indel variation by identifying and discarding affected loci. GIbPSs creates a genotype database that offers rich functionality for data filtering and export in numerous formats. We performed comparative analyses of simulated and real GBS data with GIbPSs and another program, pyRAD. This program accounts for indel variation by aligning homologous sequences. GIbPSs performed better than pyRAD in several aspects. It required much less computation time and displayed higher genotyping accuracy. GIbPSs retained smaller numbers of loci overall in analyses of real GBS data. It nevertheless delivered more complete genotype matrices with greater locus overlap between individuals and greater numbers of loci sampled in all individuals.  相似文献   

20.
Genetic relatedness of 24 animals belonging to seven Indian cattle breeds was studied using high throughput genotyping‐by‐sequencing (GBS) markers. GBS produced 93.6 million reads with an average of about 3.9 million reads per animal. A total of 107 488 SNPs were identified in these individuals. When only one SNP per read was considered, a total of 60 261 SNPs representing independent reads were identified with an average SNP‐to‐SNP distance of 45 kb across the bovine reference genome. About 24% of the GBS‐SNP markers were more than 100 kb apart. Of these, 58 322 SNPs mapped to autosomes, 1645 to the X chromosome and 28 to the Y chromosome. The average SNP‐to‐SNP distance on the X chromosome was 91.3 kb, whereas on the Y chromosome it was 1546.4 kb. The minor allele frequency within the Indian cattle varied from 0.103 (Ongole) to 0.177 (Siri), whereas Holstein cattle had the lowest value of 0.089. This is the first application of GBS in cattle of South Asia. The baseline information generated in this study might prompt implementation of GBS in breeding of cattle belonging to this region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号