共查询到20条相似文献,搜索用时 0 毫秒
1.
Jackie Lighten Cock van Oosterhout Ian G. Paterson Mark McMullan Paul Bentzen 《Molecular ecology resources》2014,14(4):753-767
We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high‐throughput sequencing workflows. We fit observed ultra‐deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low‐depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1–6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations. 相似文献
2.
Genotyping of multilocus gene families, such as the major histocompatibility complex (MHC), may be challenging because of problems with assigning alleles to loci and copy number variation among individuals. Simultaneous amplification and genotyping of multiple loci may be necessary, and in such cases, next-generation deep amplicon sequencing offers a great promise as a genotyping method of choice. Here, we describe jMHC, a computer program developed for analysing and assisting in the visualization of deep amplicon sequencing data. Software operates on FASTA files; therefore, output from any sequencing technology may be used. jMHC was designed specifically for MHC studies but it may be useful for analysing amplicons derived from other multigene families or for genotyping other polymorphic systems. The program is written in Java with user-friendly graphical interface (GUI) and can be run on Microsoft Windows, Linux OS and Mac OS. 相似文献
3.
Artemis Efstratiou;Arnaud Gaigher;Sven Künzel;Ana Teles;Tobias L. Lenz; 《Molecular ecology resources》2024,24(4):e13935
Using high-throughput sequencing for precise genotyping of multi-locus gene families, such as the major histocompatibility complex (MHC), remains challenging, due to the complexity of the data and difficulties in distinguishing genuine from erroneous variants. Several dedicated genotyping pipelines for data from high-throughput sequencing, such as next-generation sequencing (NGS), have been developed to tackle the ensuing risk of artificially inflated diversity. Here, we thoroughly assess three such multi-locus genotyping pipelines for NGS data, the DOC method, AmpliSAS and ACACIA, using MHC class IIβ data sets of three-spined stickleback gDNA, cDNA and “artificial” plasmid samples with known allelic diversity. We show that genotyping of gDNA and plasmid samples at optimal pipeline parameters was highly accurate and reproducible across methods. However, for cDNA data, the gDNA-optimal parameter configuration yielded decreased overall genotyping precision and consistency between pipelines. Further adjustments of key clustering parameters were required tο account for higher error rates and larger variation in sequencing depth per allele, highlighting the importance of template-specific pipeline optimization for reliable genotyping of multi-locus gene families. Through accurate paired gDNA-cDNA typing and MHC-II haplotype inference, we show that MHC-II allele-specific expression levels correlate negatively with allele number across haplotypes. Lastly, sibship-assisted cDNA-typing of MHC-I revealed novel variants linked in haplotype blocks, and a higher-than-previously-reported individual MHC-I allelic diversity. In conclusion, we provide novel genotyping protocols for the three-spined stickleback MHC-I and -II genes, and evaluate the performance of popular NGS-genotyping pipelines. We also show that fine-tuned genotyping of paired gDNA-cDNA samples facilitates amplification bias-corrected MHC allele expression analysis. 相似文献
4.
Nathan R. Campbell Stephanie A. Harmon Shawn R. Narum 《Molecular ecology resources》2015,15(4):855-867
5.
Promerová M Babik W Bryja J Albrecht T Stuglik M Radwan J 《Molecular ecology resources》2012,12(2):285-292
Genes of the highly dynamic major histocompatibility complex (MHC) are directly linked to individual fitness and are of high interest in evolutionary ecology and conservation genetics. Gene duplication and positive selection usually lead to high levels of polymorphism in the MHC region, making genotyping of MHC a challenging task. Here, we compare the performance of two methods for MHC class I genotyping in a passerine with highly duplicated MHC class I genes: capillary electrophoresis-single-strand conformation polymorphism (CE-SSCP) analysis and 454 GS FLX Titanium pyrosequencing. According to our findings, the number of MHC variants (called alleles for simplicity) detected by CE-SSCP is significantly lower than detected by 454. To resolve discrepancies between the two methods, we cloned and Sanger sequenced a MHC class I amplicon for an individual with high number of alleles. We found a perfect congruence between cloning/Sanger sequencing results and 454. Thus, in case of multi-locus amplification, CE-SSCP considerably underestimates individual MHC diversity. However, numbers of alleles detected by both methods are significantly correlated, although the correlation is weak (r = 0.32). Thus, in systems with highly duplicated MHC, 454 provides more reliable information on individual diversity than CE-SSCP. 相似文献
6.
7.
Babik W 《Molecular ecology resources》2010,10(2):237-251
Genes of the major histocompatibility complex (MHC) are considered a paradigm of adaptive evolution at the molecular level and as such are frequently investigated by evolutionary biologists and ecologists. Accurate genotyping is essential for understanding of the role that MHC variation plays in natural populations, but may be extremely challenging. Here, I discuss the DNA-based methods currently used for genotyping MHC in non-model vertebrates, as well as techniques likely to find widespread use in the future. I also highlight the aspects of MHC structure that are relevant for genotyping, and detail the challenges posed by the complex genomic organization and high sequence variation of MHC loci. Special emphasis is placed on designing appropriate PCR primers, accounting for artefacts and the problem of genotyping alleles from multiple, co-amplifying loci, a strategy which is frequently necessary due to the structure of the MHC. The suitability of typing techniques is compared in various research situations, strategies for efficient genotyping are discussed and areas of likely progress in future are identified. This review addresses the well established typing methods such as the Single Strand Conformation Polymorphism (SSCP), Denaturing Gradient Gel Electrophoresis (DGGE), Reference Strand Conformational Analysis (RSCA) and cloning of PCR products. In addition, it includes the intriguing possibility of direct amplicon sequencing followed by the computational inference of alleles and also next generation sequencing (NGS) technologies; the latter technique may, in the future, find widespread use in typing complex multilocus MHC systems. 相似文献
8.
Wendy Wang Wei X. Tan Denis Bertrand Amanda H. Q. Ng Esther J. H. Boey Jayce J. Y. Koh Niranjan Nagarajan Rudolf Meier 《Molecular ecology resources》2018,18(5):1035-1049
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well‐equipped molecular laboratory and is time‐consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION? and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION? reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch‐free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION? runs that represent three different stages of MinION? development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION? barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION? barcodes have an accuracy of 99.3%–100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be 相似文献
9.
Erika Crispo Haley R. Tunna Noreen Hussain Silvia S. Rodriguez Scott A. Pavey Leland J. Jackson Sean M. Rogers 《Ecology and evolution》2017,7(10):3297-3311
Populations in upstream versus downstream river locations can be exposed to vastly different environmental and ecological conditions and can thus harbor different genetic resources due to selection and neutral processes. An interesting question is how upstream–downstream directionality in rivers affects the evolution of immune response genes. We used next‐generation amplicon sequencing to identify eight alleles of the major histocompatibility complex (MHC) class II β exon 2 in the cyprinid longnose dace (Rhinichthys cataractae) from three rivers in Alberta, upstream and downstream of municipal and agricultural areas along contaminant gradients. We used these data to test for directional and balancing selection on the MHC. We also genotyped microsatellite loci to examine neutral population processes in this system. We found evidence for balancing selection on the MHC in the form of increased nonsynonymous variation relative to neutral expectations, and selection occurred at more amino acid residues upstream than downstream in two rivers. We found this pattern despite no population structure or isolation by distance, based on microsatellite data, at these sites. Overall, our results suggest that MHC evolution is driven by upstream–downstream directionality in fish inhabiting this system. 相似文献
10.
Next‐generation sequencing technologies are extensively used in the field of molecular microbial ecology to describe taxonomic composition and to infer functionality of microbial communities. In particular, the so‐called barcode or metagenetic applications that are based on PCR amplicon library sequencing are very popular at present. One of the problems, related to the utilization of the data of these libraries, is the analysis of reads quality and removal (trimming) of low‐quality segments, while retaining sufficient information for subsequent analyses (e.g. taxonomic assignment). Here, we present StreamingTrim, a DNA reads trimming software, written in Java, with which researchers are able to analyse the quality of DNA sequences in fastq files and to search for low‐quality zones in a very conservative way. This software has been developed with the aim to provide a tool capable of trimming amplicon library data, retaining as much as taxonomic information as possible. This software is equipped with a graphical user interface for a user‐friendly usage. Moreover, from a computational point of view, StreamingTrim reads and analyses sequences one by one from an input fastq file, without keeping anything in memory, permitting to run the computation on a normal desktop PC or even a laptop. Trimmed sequences are saved in an output file, and a statistics summary is displayed that contains the mean and standard deviation of the length and quality of the whole sequence file. Compiled software, a manual and example data sets are available under the BSD‐2‐Clause License at the GitHub repository at https://github.com/GiBacci/StreamingTrim/ . 相似文献
11.
Kristin Hardge Stefan Neuhaus Estelle S. Kilias Christian Wolf Katja Metfies Stephan Frickenhaus 《Molecular ecology resources》2018,18(2):204-216
Next‐generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454‐pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom‐made pipeline based on DNAStar and the phylogenetic tree‐based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy‐based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere. 相似文献
12.
Aleksandra Biedrzycka Alvaro Sebastian Magdalena Migalska Helena Westerdahl Jacek Radwan 《Molecular ecology resources》2017,17(4):642-655
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co‐amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra‐deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500–20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within‐method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co‐amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co‐amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage. 相似文献
13.
猪脂肪组织表达序列标签(ESTs)大规模测序及分析 总被引:1,自引:0,他引:1
利用大规模DNA序列测定的方法,对猪脂肪组织进行了表达序列标签(Expressed Sequence Tag,EST)序列测定,获得高质量EST共7790个,并对此进行了初步分析。使用STACK-PACK软件进行聚类分析,得到4354个基因聚类,包括3609个单拷贝基因和745个多拷贝基因;将候选基因序列用BlastN与nr库进行比较(e=1e-10),从4354个候选基因中得到2712个已知基因,其中单基因为1987个,多拷贝基因为725(3694克隆)个;未知功能基因和新EST有2109个克隆。根据BlastN结果,利用基因组文库添加序号(GenBank Accession No.)为索引,构建了猪脂肪组织已知功能基因表达谱。从基因表达谱可以看出,在猪脂肪组织中参与代谢的基因所占比例最高,在某些方面也显示了脂肪组织旺盛的代谢活性。同时发现在猪脂肪组织中主组织相容性抗原(Major Histocompatibility Complex,MHC)或与MHC相关的基因表达丰度很高。其中单拷贝基因181个,多拷贝基因44个,共计257个克隆,占细胞机体防御(cell and organism defense)总数的44.9%。占总已知基因数的5.4%。提取出全部与MHC相关的EST序列(257个克隆),发现所有EST的部分序列(长约200个碱基),几乎分布在每一个已知猪BAC的所有编码序列上。据此推测,构成MHC的这些EST序列中,有一段长约200个碱基(200bp)的碱基序列高度保守,MHC基因中每一段编码序列都包含有这一段序列。这些MHC序列虽然在不同的BAC上其蛋白的域不同,但均为高度保守区域,并且都与免疫功能密切相关。猪脂肪中如此大量表达的MHC部分保守序列,由于与免疫功能高度相关,在MHC基因的传递过程中,可以反复复制,并能够稳定遗传。 相似文献
14.
15.
Marie Suez Abdelkader Behdenna Sophie Brouillet Paula Graça Dominique Higuet Guillaume Achaz 《Molecular ecology resources》2016,16(2):524-533
Microsatellites are widely used in population genetics to uncover recent evolutionary events. They are typically genotyped using capillary sequencer, which capacity is usually limited to 9, at most 12 loci for each run, and which analysis is a tedious task that is performed by hand. With the rise of next‐generation sequencing (NGS), a much larger number of loci and individuals are available from sequencing: for example, on a single run of a GS Junior, 28 loci from 96 individuals are sequenced with a 30X cover. We have developed an algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual (e.g. specific PCR amplifications of a locus or a collection of reads that encompass a locus of interest). As the sequencing and the PCR amplification introduce artefactual insertions or deletions, the set of reads from a single microsatellite allele shows several length variants. The algorithm infers, without alignment, the true unknown allele(s) of each individual from the observed distributions of microsatellites length of all individuals. MicNeSs, a python implementation of the algorithm, can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies (a model species) and red deers (a nonmodel species). Without any parallelization, it automatically genotypes 22 loci from 441 individuals in 11 hours on a standard computer. The comparison of MicNeSs inferences to the standard method shows an excellent agreement, with some differences illustrating the pros and cons of both methods. 相似文献
16.
Caroline K. Glidden Anson V. Koehler Ross S. Hall Muhammad A. Saeed Mauricio Coppo Brianna R. Beechler Bryan Charleston Robin B. Gasser Anna E. Jolles Abdul Jabbar 《Ecology and evolution》2020,10(1):70-80
- Increasing access to next‐generation sequencing (NGS) technologies is revolutionizing the life sciences. In disease ecology, NGS‐based methods have the potential to provide higher‐resolution data on communities of parasites found in individual hosts as well as host populations.
- Here, we demonstrate how a novel analytical method, utilizing high‐throughput sequencing of PCR amplicons, can be used to explore variation in blood‐borne parasite (Theileria—Apicomplexa: Piroplasmida) communities of African buffalo at higher resolutions than has been obtained with conventional molecular tools.
- Results reveal temporal patterns of synchronized and opposite fluctuations of prevalence and relative abundance of Theileria spp. within the host population, suggesting heterogeneous transmission across taxa. Furthermore, we show that the community composition of Theileria spp. and their subtypes varies considerably between buffalo, with differences in composition reflected in mean and variance of overall parasitemia, thereby showing potential to elucidate previously unexplained contrasts in infection outcomes for host individuals.
- Importantly, our methods are generalizable as they can be utilized to describe blood‐borne parasite communities in any host species. Furthermore, our methodological framework can be adapted to any parasite system given the appropriate genetic marker.
- The findings of this study demonstrate how a novel NGS‐based analytical approach can provide fine‐scale, quantitative data, unlocking opportunities for discovery in disease ecology.
17.
Alexandra M. Allen Gary L. A. Barker Paul Wilkinson Amanda Burridge Mark Winfield Jane Coghill Cristobal Uauy Simon Griffiths Peter Jack Simon Berry Peter Werner James P. E. Melichar Jane McDougall Rhian Gwilliam Phil Robinson Keith J. Edwards 《Plant biotechnology journal》2013,11(3):279-295
Globally, wheat is the most widely grown crop and one of the three most important crops for human and livestock feed. However, the complex nature of the wheat genome has, until recently, resulted in a lack of single nucleotide polymorphism (SNP)‐based molecular markers of practical use to wheat breeders. Recently, large numbers of SNP‐based wheat markers have been made available via the use of next‐generation sequencing combined with a variety of genotyping platforms. However, many of these markers and platforms have difficulty distinguishing between heterozygote and homozygote individuals and are therefore of limited use to wheat breeders carrying out commercial‐scale breeding programmes. To identify exome‐based co‐dominant SNP‐based assays, which are capable of distinguishing between heterozygotes and homozygotes, we have used targeted re‐sequencing of the wheat exome to generate large amounts of genomic sequences from eight varieties. Using a bioinformatics approach, these sequences have been used to identify 95 266 putative single nucleotide polymorphisms, of which 10 251 were classified as being putatively co‐dominant. Validation of a subset of these putative co‐dominant markers confirmed that 96% were true polymorphisms and 65% were co‐dominant SNP assays. The new co‐dominant markers described here are capable of genotypic classification of a segregating locus in polyploid wheat and can be used on a variety of genotyping platforms; as such, they represent a powerful tool for wheat breeders. These markers and related information have been made publically available on an interactive web‐based database to facilitate their use on genotyping programmes worldwide. 相似文献
18.
Sho Hosoya Shotaro Hirase Kiyoshi Kikuchi Kusuto Nanjo Yohei Nakamura Hiroyoshi Kohno Mitsuhiko Sano 《Molecular ecology resources》2019,19(5):1153-1163
While various technologies for high‐throughput genotyping have been developed for ecological studies, simple methods tolerant to low‐quality DNA samples are still limited. In this study, we tested the availability of a random PCR‐based genotyping‐by‐sequencing technology, genotyping by random amplicon sequencing, direct (GRAS‐Di). We focused on population genetic analysis of estuarine mangrove fishes, including two resident species, the Amboina cardinalfish (Fibramia amboinensis, Bleeker, 1853) and the Duncker's river garfish (Zenarchopterus dunckeri, Mohr, 1926), and a marine migrant, the blacktail snapper (Lutjanus fulvus, Forster, 1801). Collections were from the Ryukyu Islands, southern Japan. PCR amplicons derived from ~130 individuals were pooled and sequenced in a single lane on a HiSeq2500 platform, and an average of three million reads was obtained per individual. Consensus contigs were assembled for each species and used for genotyping of single nucleotide polymorphisms by mapping trimmed reads onto the contigs. After quality filtering steps, 4,000–9,000 putative single nucleotide polymorphisms were detected for each species. Although DNA fragmentation can diminish genotyping performance when analysed on next‐generation sequencing technology, the effect was small. Genetic differentiation and a clear pattern of isolation‐by‐distance was observed in F. amboinensis and Z. dunckeri by means of principal component analysis, FST and the admixture analysis. By contrast, L. fulvus comprised a genetically homogeneous population with directional recent gene flow. These genetic differentiation patterns reflect patterns of estuary use through life history. These results showed the power of GRAS‐Di for fine‐grained genetic analysis using field samples, including mangrove fishes. 相似文献
19.
Allen AM Barker GL Berry ST Coghill JA Gwilliam R Kirby S Robinson P Brenchley RC D'Amore R McKenzie N Waite D Hall A Bevan M Hall N Edwards KJ 《Plant biotechnology journal》2011,9(9):1086-1099
Food security is a global concern and substantial yield increases in cereal crops are required to feed the growing world population. Wheat is one of the three most important crops for human and livestock feed. However, the complexity of the genome coupled with a decline in genetic diversity within modern elite cultivars has hindered the application of marker‐assisted selection (MAS) in breeding programmes. A crucial step in the successful application of MAS in breeding programmes is the development of cheap and easy to use molecular markers, such as single‐nucleotide polymorphisms. To mine selected elite wheat germplasm for intervarietal single‐nucleotide polymorphisms, we have used expressed sequence tags derived from public sequencing programmes and next‐generation sequencing of normalized wheat complementary DNA libraries, in combination with a novel sequence alignment and assembly approach. Here, we describe the development and validation of a panel of 1114 single‐nucleotide polymorphisms in hexaploid bread wheat using competitive allele‐specific polymerase chain reaction genotyping technology. We report the genotyping results of these markers on 23 wheat varieties, selected to represent a broad cross‐section of wheat germplasm including a number of elite UK varieties. Finally, we show that, using relatively simple technology, it is possible to rapidly generate a linkage map containing several hundred single‐nucleotide polymorphism markers in the doubled haploid mapping population of Avalon × Cadenza. 相似文献
20.
Libor Mořkovský Jan Pačes Jakub Rídl Radka Reifová 《Molecular ecology resources》2015,15(6):1415-1420