期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Understanding and utilizing crop genome diversity via high‐resolution genotyping

Kai Voss‐Fels Rod J. Snowdon 《Plant biotechnology journal》2016,14(4):1086-1094

相似文献

2.

Small‐ and large‐scale heterogeneity in genetic variation across the collard flycatcher genome: implications for estimating genetic diversity in nonmodel organisms

下载免费PDF全文

Pär K. Ingvarsson Jing Wang 《Molecular ecology resources》2017,17(4):583-585

Population genetic studies in nonmodel organisms are often hampered by a lack of reference genomes that are essential for whole‐genome resequencing. In the light of this, genotyping methods have been developed to effectively eliminate the need for a reference genome, such as genotyping by sequencing or restriction site‐associated DNA sequencing (RAD‐seq). However, what remains relatively poorly studied is how accurately these methods capture both average and variation in genetic diversity across an organism's genome. In this issue of Molecular Ecology Resources, Dutoit et al. (2016) use whole‐genome resequencing data from the collard flycatcher to assess what factors drive heterogeneity in nucleotide diversity across the genome. Using these data, they then simulate how well different sequencing designs, including RAD sequencing, could capture most of the variation in genetic diversity. They conclude that for evolutionary and conservation‐related studies focused on the estimating genomic diversity, researchers should emphasize the number of loci analysed over the number of individuals sequenced. 相似文献

3.

Optimization of the genotyping‐by‐sequencing strategy for population genomic analysis in conifers

下载免费PDF全文

Jin Pan Baosheng Wang Zhi‐Yong Pei Wei Zhao Jie Gao Jian‐Feng Mao Xiao‐Ru Wang 《Molecular ecology resources》2015,15(4):711-722

Flexibility and low cost make genotyping‐by‐sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI‐MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference‐free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking. 相似文献

4.

AlleleSeq: analysis of allele‐specific expression and binding in a network framework

Jing Wang Pedro Alves Debasish Raha Arif Harmanci Jing Leng Robert Bjornson Yong Kong Naoki Kitabayashi Nitin Bhardwaj Mark Rubin Michael Snyder Mark Gerstein 《Molecular systems biology》2011,7(1)

相似文献

5.

Targeted multiplex next‐generation sequencing: advances in techniques of mitochondrial and nuclear DNA sequencing for population genomics

Brittany L. Hancock‐Hanser Amy Frey Matthew S. Leslie Peter H. Dutton Frederick I. Archer Phillip A. Morin 《Molecular ecology resources》2013,13(2):254-268

Next‐generation sequencing (NGS) is emerging as an efficient and cost‐effective tool in population genomic analyses of nonmodel organisms, allowing simultaneous resequencing of many regions of multi‐genomic DNA from multiplexed samples. Here, we detail our synthesis of protocols for targeted resequencing of mitochondrial and nuclear loci by generating indexed genomic libraries for multiplexing up to 100 individuals in a single sequencing pool, and then enriching the pooled library using custom DNA capture arrays. Our use of DNA sequence from one species to capture and enrich the sequencing libraries of another species (i.e. cross‐species DNA capture) indicates that efficient enrichment occurs when sequences are up to about 12% divergent, allowing us to take advantage of genomic information in one species to sequence orthologous regions in related species. In addition to a complete mitochondrial genome on each array, we have included between 43 and 118 nuclear loci for low‐coverage sequencing of between 18 kb and 87 kb of DNA sequence per individual for single nucleotide polymorphisms discovery from 50 to 100 individuals in a single sequencing lane. Using this method, we have generated a total of over 500 whole mitochondrial genomes from seven cetacean species and green sea turtles. The greater variation detected in mitogenomes relative to short mtDNA sequences is helping to resolve genetic structure ranging from geographic to species‐level differences. These NGS and analysis techniques have allowed for simultaneous population genomic studies of mtDNA and nDNA with greater genomic coverage and phylogeographic resolution than has previously been possible in marine mammals and turtles. 相似文献

6.

Phylogenetic analysis of RAD‐seq data: examining the influence of gene genealogy conflict on analysis of concatenated data

下载免费PDF全文

David M. Rivers Clive T. Darwell David M. Althoff 《Cladistics : the international journal of the Willi Hennig Society》2016,32(6):672-681

One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next‐generation sequencing techniques such as RAD‐seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD‐seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next‐generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa. 相似文献

7.

SNP discovery in wild and domesticated populations of blue catfish,Ictalurus furcatus,using genotyping‐by‐sequencing and subsequent SNP validation

Chao Li Geoff Waldbieser Brian Bosworth Benjamin H. Beck Wilawan Thongda Eric Peatman 《Molecular ecology resources》2014,14(6):1261-1270

Blue catfish, Ictalurus furcatus, are valued in the United States as a trophy fishery for their capacity to reach large sizes, sometimes exceeding 45 kg. Additionally, blue catfish × channel catfish (I. punctatus) hybrid food fish production has recently increased the demand for blue catfish broodstock. However, there has been little study of the genetic impacts and interaction of farmed, introduced and stocked populations of blue catfish. We utilized genotyping‐by‐sequencing (GBS) to capture and genotype SNP markers on 190 individuals from five wild and domesticated populations (Mississippi River, Missouri, D&B, Rio Grande and Texas). Stringent filtering of SNP‐calling parameters resulted in 4275 SNP loci represented across all five populations. Population genetics and structure analyses revealed potential shared ancestry and admixture between populations. We utilized the Sequenom MassARRAY to validate two multiplex panels of SNPs selected from the GBS data. Selection criteria included SNPs shared between populations, SNPs specific to populations, number of reads per individual and number of individuals genotyped by GBS. Putative SNPs were validated in the discovery population and in two additional populations not used in the GBS analysis. A total of 64 SNPs were genotyped successfully in 191 individuals from nine populations. Our results should guide the development of highly informative, flexible genotyping multiplexes for blue catfish from the larger GBS SNP set as well as provide an example of a rapid, low‐cost approach to generate and genotype informative marker loci in aquatic species with minimal previous genetic information. 相似文献

8.

Helping decision making for reliable and cost‐effective 2b‐RAD sequencing and genotyping analyses in non‐model species

Anna Barbanti Hector Torrado Enrique Macpherson Luca Bargelloni Rafaella Franch Carlos Carreras Marta Pascual 《Molecular ecology resources》2020,20(3):795-806

High‐throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b‐RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b‐RAD protocols on non‐model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b‐RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade‐off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective‐base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b‐RAD protocols on non‐model organisms with different genome sizes, helping decision‐making for a reliable and cost‐effective genotyping. 相似文献

9.

Whole mitochondrial genome capture from faecal samples and museum‐preserved specimens

下载免费PDF全文

Tom van der Valk Frida Lona Durazo Love Dalén Katerina Guschanski 《Molecular ecology resources》2017,17(6):e111-e121

Population‐scale molecular studies of endangered and cryptic species are often limited by access to high‐quality samples. The use of noninvasively collected samples or museum‐preserved specimens reduces the pressure on modern populations by removing the need to capture and handle live animals. However, endogenous DNA content in such samples is low, making shotgun sequencing a financially prohibitive approach. Here, we apply a target enrichment method to retrieve mitochondrial genomes from 65 museum specimens and 56 noninvasively collected faecal samples of two endangered great ape species, Grauer's gorilla and the eastern chimpanzee. We show that the applied method is suitable for a wide range of sample types that differ in endogenous DNA content, increasing the proportion of target reads to over 300‐fold. By systematically evaluating biases introduced during target enrichment of pooled museum samples, we show that capture is less efficient for fragments shorter or longer than the baits, that the proportion of human contaminating reads increases postcapture although capture efficiency is lower for human compared to gorilla fragments with a gorilla‐generated bait, and that the rate of jumping PCR is considerable, but can be controlled for with a double‐barcoding approach. We succeed in capturing complete mitochondrial genomes from faecal samples, but observe reduced capture efficiency as sequence divergence increases between the bait and target species. As previously shown for museum specimens, we demonstrate here that mitochondrial genome capture from field‐collected faecal samples is a robust and reliable approach for population‐wide studies of nonmodel organisms. 相似文献

10.

Estimation of population allele frequencies from next‐generation sequencing data: pool‐versus individual‐based genotyping

Karim Gharbi Timothée Cézard Maxime Galan Anne Loiseau Marian Thomson Pierre Pudlo Carole Kerdelhué Arnaud Estoup 《Molecular ecology》2013,22(14):3766-3779

Molecular markers produced by next‐generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user‐friendly application assessing the accuracy of allele frequency estimation from both pool‐ and individual‐based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site–associated DNA (RAD) sequencing in pool‐ and individual‐based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost‐effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome‐wide patterns of genetic variation for large numbers of individuals in multiple populations. 相似文献

11.

Haplotype Estimation Using Sequencing Reads

Olivier Delaneau Bryan Howie Anthony?J. Cox Jean-Fran?ois Zagury Jonathan Marchini 《American journal of human genetics》2013,93(4):687-696

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5–20 kb read with 4%–15% error per base), phasing performance was substantially improved. 相似文献

12.

A beginners guide to SNP calling from high-throughput DNA-sequencing data

A Altmann P Weber D Bader M Preuß EB Binder B Müller-Myhsok 《Human genetics》2012,131(10):1541-1554

High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results. 相似文献

13.

Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes

Suyash S. Shringarpure Andrew Carroll Francisco M. De La Vega Carlos D. Bustamante 《PloS one》2015,10(6)

Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. 相似文献

14.

A targeted next‐generation sequencing toolkit for exon‐based cichlid phylogenomics

Katriina L. Ilves Hernán López‐Fernández 《Molecular ecology resources》2014,14(4):802-811

Cichlid fishes (family Cichlidae) are models for evolutionary and ecological research. Massively parallel sequencing approaches have been successfully applied to study relatively recent diversification in groups of African and Neotropical cichlids, but such technologies have yet to be used for addressing larger‐scale phylogenetic questions of cichlid evolution. Here, we describe a process for identifying putative single‐copy exons from five African cichlid genomes and sequence the targeted exons for a range of divergent (>tens of millions of years) taxa with probes designed from a single reference species (Oreochromis niloticus, Nile tilapia). Targeted sequencing of 923 exons across 10 cichlid species that represent the family's major lineages and geographic distribution resulted in a complete taxon matrix of 564 exons (649 549 bp), representing 559 genes. Maximum likelihood and Bayesian analyses in both species tree and concatenation frameworks yielded the same fully resolved and highly supported topology, which matched the expected backbone phylogeny of the major cichlid lineages. This work adds to the body of evidence that it is possible to use a relatively divergent reference genome for exon target design and successful capture across a broad phylogenetic range of species. Furthermore, our results show that the use of a third‐party laboratory coupled with accessible bioinformatics tools makes such phylogenomics projects feasible for research groups that lack direct access to genomic facilities. We expect that these resources will be used in further cichlid evolution studies and hope the protocols and identified targets will also be useful for phylogenetic studies of a wider range of organisms. 相似文献

15.

Prevention,diagnosis and treatment of high‐throughput sequencing data pathologies

Xiaofan Zhou Antonis Rokas 《Molecular ecology》2014,23(7):1679-1700

High‐throughput sequencing (HTS) technologies generate millions of sequence reads from DNA/RNA molecules rapidly and cost‐effectively, enabling single investigator laboratories to address a variety of ‘omics’ questions in nonmodel organisms, fundamentally changing the way genomic approaches are used to advance biological research. One major challenge posed by HTS is the complexity and difficulty of data quality control (QC). While QC issues associated with sample isolation, library preparation and sequencing are well known and protocols for their handling are widely available, the QC of the actual sequence reads generated by HTS is often overlooked. HTS‐generated sequence reads can contain various errors, biases and artefacts whose identification and amelioration can greatly impact subsequent data analysis. However, a systematic survey on QC procedures for HTS data is still lacking. In this review, we begin by presenting standard ‘health check‐up’ QC procedures recommended for HTS data sets and establishing what ‘healthy’ HTS data look like. We next proceed by classifying errors, biases and artefacts present in HTS data into three major types of ‘pathologies’, discussing their causes and symptoms and illustrating with examples their diagnosis and impact on downstream analyses. We conclude this review by offering examples of successful ‘treatment’ protocols and recommendations on standard practices and treatment options. Notwithstanding the speed with which HTS technologies – and consequently their pathologies – change, we argue that careful QC of HTS data is an important – yet often neglected – aspect of their application in molecular ecology, and lay the groundwork for developing a HTS data QC ‘best practices’ guide. 相似文献

16.

Optimization of next‐generation sequencing transcriptome annotation for species lacking sequenced genomes

Nina F. Ockendon Lauren A. O'Connell Stephen J. Bush Jimena Monzón‐Sandoval Holly Barnes Tamás Székely Hans A. Hofmann Steve Dorus Araxi O. Urrutia 《Molecular ecology resources》2016,16(2):446-458

相似文献

17.

Application of next‐generation sequencing to the study of non‐model insects

下载免费PDF全文

Nakatada Wachi Kei W. Matsubayashi Kaoru Maeto 《Entomological Science》2018,21(1):3-11

相似文献

18.

GENOMIC BASIS OF AGING AND LIFE‐HISTORY EVOLUTION IN DROSOPHILA MELANOGASTER

Silvia C. Remolina Peter L. Chang Jeff Leips Sergey V. Nuzhdin Kimberly A. Hughes 《Evolution; international journal of organic evolution》2012,66(11):3390-3403

Natural diversity in aging and other life‐history patterns is a hallmark of organismal variation. Related species, populations, and individuals within populations show genetically based variation in life span and other aspects of age‐related performance. Population differences are especially informative because these differences can be large relative to within‐population variation and because they occur in organisms with otherwise similar genomes. We used experimental evolution to produce populations divergent for life span and late‐age fertility and then used deep genome sequencing to detect sequence variants with nucleotide‐level resolution. Several genes and genome regions showed strong signatures of selection, and the same regions were implicated in independent comparisons, suggesting that the same alleles were selected in replicate lines. Genes related to oogenesis, immunity, and protein degradation were implicated as important modifiers of late‐life performance. Expression profiling and functional annotation narrowed the list of strong candidate genes to 38, most of which are novel candidates for regulating aging. Life span and early age fecundity were negatively correlated among populations; therefore, the alleles we identified also are candidate regulators of a major life‐history trade‐off. More generally, we argue that hitchhiking mapping can be a powerful tool for uncovering the molecular bases of quantitative genetic variation. 相似文献

19.

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise

Valentina Peona Mozes P. K. Blom Luohao Xu Reto Burri Shawn Sullivan Ignas Bunikis Ivan Liachko Tri Haryoko Knud A. Jnsson Qi Zhou Martin Irestedt Alexander Suh 《Molecular ecology resources》2021,21(1):263-286

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes. 相似文献

20.

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy 总被引：1，自引：0，他引：1

下载免费PDF全文

M. M. Y. Tin F. E. Rheindt E. Cros A. S. Mikheyev 《Molecular ecology resources》2015,15(2):329-336

RAD‐tag is a powerful tool for high‐throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four‐base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double‐digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample‐to‐sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low‐input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low‐input material. 相似文献