首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Molecular markers produced by next‐generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user‐friendly application assessing the accuracy of allele frequency estimation from both pool‐ and individual‐based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site–associated DNA (RAD) sequencing in pool‐ and individual‐based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost‐effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome‐wide patterns of genetic variation for large numbers of individuals in multiple populations.  相似文献   

2.
3.
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next‐generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century‐old type specimens of Lepidoptera by attempting to recover 164‐bp and 94‐bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164‐bp sequence), medium (94‐bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR‐based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.  相似文献   

4.
Early analytical clone screening is important during Chinese hamster ovary (CHO) cell line development of biotherapeutic proteins to select a clonally derived cell line with most favorable stability and product quality. Sensitive sequence confirmation methods using mass spectrometry have limitations in throughput and turnaround time. Next‐generation sequencing (NGS) technologies emerged as alternatives for CHO clone analytics. We report an efficient NGS workflow applying the targeted locus amplification (TLA) strategy for genomic screening of antibody expressing CHO clones. In contrast to previously reported RNA sequencing approaches, TLA allows for targeted sequencing of genomic integrated transgenic DNA without prior locus information, robust detection of single‐nucleotide variants (SNVs) and transgenic rearrangements. During clone selection, TLA/NGS revealed CHO clones with high‐level SNVs within the antibody gene and we report in another case the utility of TLA/NGS to identify rearrangements at transgenic DNA level. We also determined detection limits for SNVs calling and the potential to identify clone contaminations by TLA/NGS. TLA/NGS also allows to identify genetically identical clones. In summary, we demonstrate that TLA/NGS is a robust screening method useful for routine clone analytics during cell line development with the potential to process up to 24 CHO clones in less than 7 workdays.  相似文献   

5.
High‐throughput sequencing methods for genotyping genome‐wide markers are being rapidly adopted for phylogenetics of nonmodel organisms in conservation and biodiversity studies. However, the reproducibility of SNP genotyping and degree of marker overlap or compatibility between datasets from different methodologies have not been tested in nonmodel systems. Using double‐digest restriction site‐associated DNA sequencing, we sequenced a common set of 22 specimens from the butterfly genus Speyeria on two different Illumina platforms, using two variations of library preparation. We then used a de novo approach to bioinformatic locus assembly and SNP discovery for subsequent phylogenetic analyses. We found a high rate of locus recovery despite differences in library preparation and sequencing platforms, as well as overall high levels of data compatibility after data processing and filtering. These results provide the first application of NGS methods for phylogenetic reconstruction in Speyeria and support the use and long‐term viability of SNP genotyping applications in nonmodel systems.  相似文献   

6.
7.
We present SymPortal (SymPortal.org), a novel analytical framework and platform for genetically resolving the algal symbionts of reef corals using next‐generation sequencing (NGS) data of the ITS2 rDNA. Although the ITS2 marker is widely used to genetically characterize taxa within the family Symbiodiniaceae (formerly the genus Symbiodinium), the multicopy nature of the marker complicates its use. Commonly, the intragenomic diversity resultant from this multicopy nature is collapsed by analytical approaches, thereby focusing on only the most abundant sequences. In contrast, SymPortal employs logic to identify within‐sample informative intragenomic sequences, which we have termed ‘defining intragenomic variants' (DIVs), to identify ITS2‐type profiles representative of putative Symbiodiniaceae taxa. By making use of this intragenomic ITS2 diversity, SymPortal is able to resolve genetic delineations using the ITS2 marker at a level that was previously only possible by using additional genetic markers. We demonstrate this by comparing this novel approach to the most commonly used alternative approach for NGS ITS2 data, the 97% similarity clustering to operational taxonomic units (OTUs). The SymPortal platform accepts NGS raw sequencing data as input to provide an easy‐to‐use, standardization‐enforced, and community‐driven framework that integrates with a database to gain resolving power with increased use. We consider that SymPortal, in conjunction with ongoing large‐scale sampling and sequencing efforts, should play an instrumental role in making future sampling efforts more comparable and in maximizing their efficacy in working towards the classification of the global Symbiodiniaceae diversity.  相似文献   

8.
Ciliates are unicellular eukaryotes with separate germline and somatic genomes and diverse life cycles, which make them a unique model to improve our understanding of population genetics through the detection of genetic variations. However, traditional sequencing methods cannot be directly applied to ciliates because the majority are uncultivated. Single‐cell whole‐genome sequencing (WGS) is a powerful tool for studying genetic variation in microbes, but no studies have been performed in ciliates. We compared the use of single‐cell WGS and bulk DNA WGS to detect genetic variation, specifically single nucleotide polymorphisms (SNPs), in the model ciliate Tetrahymena thermophila. Our analyses showed that (i) single‐cell WGS has excellent performance regarding mapping rate and genome coverage but lower sequencing uniformity compared with bulk DNA WGS due to amplification bias (which was reproducible); (ii) false‐positive SNP sites detected by single‐cell WGS tend to occur in genomic regions with particularly high sequencing depth and high rate of C:G to T:A base changes; (iii) SNPs detected in three or more cells should be reliable (an detection efficiency of 83.4–97.4% was obtained for combined data from three cells). This analytical method could be adapted to measure genetic variation in other ciliates and broaden research into ciliate population genetics.  相似文献   

9.
Hermansky–Pudlak syndrome (HPS) is a rare recessive disorder characterized by hypopigmentation, bleeding diathesis, and other symptoms due to multiple defects in lysosome‐related organelles. Ten HPS subtypes have been identified with mutations in HPS1 to HPS10. Only four patients with HPS‐1 have been reported in Chinese population. Using next‐generation sequencing (NGS), we have screened 100 hypopigmentation genes and identified four HPS‐1, two HPS‐3, one HPS‐5, and three HPS‐6 in Chinese HPS patients with typical ocular or oculocutaneous albinism and the absence of platelet dense granules together with other variable phenotypes. All these patients except one homozygote were compound heterozygotes. Among these mutations, 14 were previously unreported alleles (four in HPS1, three in HPS3, two in HPS5, five in HPS6). Our results demonstrate the feasibility and utility of NGS‐based panel diagnostics for HPS. Genotyping of HPS subtypes is a prerequisite for intervention of subtype‐specific symptoms.  相似文献   

10.
The minimal antibiotic options for carbapenemase‐producing Gram‐negative bacteria necessitate their rapid detection. A literature review of a variety of phenotypic and genotypic methods is presented. Advances in culture methods and screening media are still subject to long incubation hours. Biochemical methods have shorter turnaround times and higher sensitivities and specificities, but cannot differentiate between various types and variants. Spectrophotometric methods are cheap and efficient, but are uncommon in many clinical settings, while the MALDI‐TOF MS is promising for species identification, typing and resistance gene determination. Although next generation sequencing (NGS) technologies provide a better platform to detect, type and characterize carbapenem‐resistant bacteria, the different NGS platforms, the large computer memories and space needed to process and store genomic data and the nonuniformity in data analysis platforms are still a challenge. The sensitivities, specificities and turnaround times recorded in the various studies reviewed favours the use of the biochemical tests (Carba NP or Rapid Carb screen tests) for the detection of putative carbapenemase‐producing isolates. MALDI‐TOF MS and/or molecular methods like microarray, loop‐mediated isothermal amplification and real‐time multiplex PCR assays could be used for further characterization in a reference laboratory. NGS may be used for advanced epidemiological and molecular studies.  相似文献   

11.
The identification of mutations in targeted genes has been significantly simplified by the advent of TILLING (Targeting Induced Local Lesions In Genomes), speeding up the functional genomic analysis of animals and plants. Next‐generation sequencing (NGS) is gradually replacing classical TILLING for mutation detection, as it allows the analysis of a large number of amplicons in short durations. The NGS approach was used to identify mutations in a population of Solanum lycopersicum (tomato) that was doubly mutagenized by ethylmethane sulphonate (EMS). Twenty‐five genes belonging to carotenoids and folate metabolism were PCR‐amplified and screened to identify potentially beneficial alleles. To augment efficiency, the 600‐bp amplicons were directly sequenced in a non‐overlapping manner in Illumina MiSeq, obviating the need for a fragmentation step before library preparation. A comparison of the different pooling depths revealed that heterozygous mutations could be identified up to 128‐fold pooling. An evaluation of six different software programs (camba , crisp , gatk unified genotyper , lofreq , snver and vipr ) revealed that no software program was robust enough to predict mutations with high fidelity. Among these, crisp and camba predicted mutations with lower false discovery rates. The false positives were largely eliminated by considering only mutations commonly predicted by two different software programs. The screening of 23.47 Mb of tomato genome yielded 75 predicted mutations, 64 of which were confirmed by Sanger sequencing with an average mutation density of 1/367 Kb. Our results indicate that NGS combined with multiple variant detection tools can reduce false positives and significantly speed up the mutation discovery rate.  相似文献   

12.
DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next‐generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor‐made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools . A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org .  相似文献   

13.
The recent emergence of barcoding approaches coupled to those of next‐generation sequencing (NGS) has raised new perspectives for studying environmental communities. In this framework, we tested the possibility to derive accurate inventories of diatom communities from pyrosequencing outputs with an available DNA reference library. We used three molecular markers targeting the nuclear, chloroplast and mitochondrial genomes (SSU rDNA, rbcL and cox1) and three samples of a mock community composed of 30 known diatom strains belonging to 21 species. In the goal to detect methodological biases, one sample was constituted directly from pooled cultures, whereas the others consisted of pooled PCR products. The NGS reads obtained by pyrosequencing (Roche 454) were compared first to a DNA reference library including the sequences of all the species used to constitute the mock community, and second to a complete DNA reference library with a larger taxonomic coverage. A stringent taxonomic assignation gave inventories that were compared to the real one. We detected biases due to DNA extraction and PCR amplification that resulted in false‐negative detection. Conversely, pyrosequencing errors appeared to generate false positives, especially in case of closely allied species. The taxonomic coverage of DNA reference libraries appears to be the most crucial factor, together with marker polymorphism which is essential to identify taxa at the species level. RbcL offers a high resolving power together with a large DNA reference library. Although needing further optimization, pyrosequencing is suitable for identifying diatom assemblages and may find applications in the field of freshwater biomonitoring.  相似文献   

14.
Hereditary spherocytosis (HS) is the most common inherited haemolytic anaemia disorder. ANK1 mutations account for most HS cases, but pathogenicity analysis and functional research have not been widely performed for these mutations. In this study, in order to confirm diagnosis, gene mutation was screened in two unrelated Chinese families with HS by a next‐generation sequencing (NGS) panel and then confirmed by Sanger sequencing. Two novel heterozygous mutations (c.C841T, p.R281X and c.T290G, p.L97R) of the ANK1 gene were identified in the two families respectively. Then, the pathogenicity of the two new mutations and two previously reported ANK1 mutations (c.C648G, p.Y216X and c.G424T, p.E142X) were studied by in vitro experiments. The four mutations increased the osmotic fragility of cells, reduced the stabilities of ANK1 proteins and prevented the protein from localizing to the plasma membrane and interacting with SPTB and SLC4A1. We classified these four mutations into disease‐causing mutations for HS. Thus, conducting the same mutation test and providing genetic counselling for the two families were meaningful and significant. Moreover, the identification of two novel mutations enriches the ANK1 mutation database, especially in China.  相似文献   

15.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance.  相似文献   

16.
Chemical mutagenesis is routinely used to create large numbers of rare mutations in plant and animal populations, which can be subsequently subjected to selection for beneficial traits and phenotypes that enable the characterization of gene functions. Several next‐generation sequencing (NGS)‐based target enrichment methods have been developed for the detection of mutations in target DNA regions. However, most of these methods aim to sequence a large number of target regions from a small number of individuals. Here, we demonstrate an effective and affordable strategy for the discovery of rare mutations in a large sodium azide‐induced mutant rice population (F2). The integration of multiplex, semi‐nested PCR combined with NGS library construction allowed for the amplification of multiple target DNA fragments for sequencing. The 8 × 8 × 8 tridimensional DNA sample pooling strategy enabled us to obtain DNA sequences of 512 individuals while only sequencing 24 samples. A stepwise filtering procedure was then elaborated to eliminate most of the false positives expected to arise through sequencing error, and the application of a simple Student's t‐test against position‐prone error allowed for the discovery of 16 mutations from 36 enriched targeted DNA fragments of 1024 mutagenized rice plants, all without any false calls.  相似文献   

17.
BACKGROUND: Neural tube defects are severe, common birth defects that result from failure of neural tube closure. They are considered to be a multifactorial disorder, and our knowledge of causal mechanisms remains limited. We hypothesized that abnormal DNA methylation occurs in NTD‐affected fetuses. The correlations of global DNA methylation levels with complexity of NTDs and known risk factors of NTDs, MTHFR genotype and fever, were analyzed. METHODS: A hospital‐based case‐control study was performed. Epidemiologic data, pathologic diagnosis, and methylenetetrahydrofolate reductase (MTHFR) genotype analysis were completed. Array comparative genomic hybridization was used to exclude cytogenetic abnormalities. Global DNA methylation statuses were determined for both brain and skin tissue. RESULTS: Sixty‐five NTD‐affected fetuses and 65 normal controls matched for gestational and maternal ages were collected. In brain tissue, global DNA methylation levels were significantly decreased in cases compared with controls (4.12 vs. 4.99%; p < 0.001). DNA hypomethylation (<4.35%) resulted in a significant 5.736‐fold increased risk for NTDs (95% confidence interval, 1.731–19.009; p = 0.004). Nonisolated NTDs had lower levels of global DNA methylation than did isolated NTDs (3.77 vs. 4.70%; p = 0.022). After stratifying subjects by MTHFR genotype, we observed a skewed distribution of global DNA methylation levels. For genotype C/C, global DNA methylation status was the same in the two groups (4.51 vs. 4.72%; p = 0.687). For T/T, cases had significantly lower global methylation levels than did controls (5.23 vs. 3.79%; p < 0.001). CONCLUSIONS: Global DNA hypomethylation in fetal brain tissue was associated with NTD‐affected pregnancy. DNA methylation levels were correlated with NTD complexity. The MTHFR genotype contributed to global DNA hypomethylation. Birth Defects Research (Part A), 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

18.
Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within‐population variation. Additionally, a public Illumina data set was used to validate the pipeline on community‐level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova ) revealed that population structure of Cmesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within‐population structure but also the successful application of the QRS pipeline on Illumina‐generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences .  相似文献   

19.
It has been a tumultuous 5 years in phylogeography and phylogenetics during which both fields have struggled to harness the power of next‐generation sequencing (NGS) (Ekblom & Galindo 2010 ; McCormack et al. 2012a ). Fortunately, several methodological approaches appear to be taking root. In this issue of Molecular Ecology, O'Neill et al. 2013 ) employ one such method – parallel tagged sequencing (PTS) – to elucidate the phylogeography of a tiger salamander (Ambystoma tigrinum) species complex. This study demonstrates a practical application of NGS on a scale appropriate (and not overkill) for most biologists interested in phylogeography (~100 loci for ~100 individuals), and their results highlight several analytical challenges that lie ahead for researchers employing NGS techniques.  相似文献   

20.
Microsatellite marker development has been greatly simplified by the use of high‐throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high‐coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command‐line version as well as integrated into a Galaxy server, providing a user‐friendly interface, as well as the possibility to utilize a large variety of NGS tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号