共查询到20条相似文献,搜索用时 546 毫秒
1.
Background
Recent advances in deep digital sequencing have unveiled an unprecedented degree of clonal heterogeneity within a single tumor DNA sample. Resolving such heterogeneity depends on accurate estimation of fractions of alleles that harbor somatic mutations. Unlike substitutions or small indels, structural variants such as deletions, duplications, inversions and translocations involve segments of DNAs and are potentially more accurate for allele fraction estimations. However, no systematic method exists that can support such analysis.Results
In this paper, we present a novel maximum-likelihood method that estimates allele fractions of structural variants integratively from various forms of alignment signals. We develop a tool, BreakDown, to estimate the allele fractions of most structural variants including medium size (from 1 kilobase to 1 megabase) deletions and duplications, and balanced inversions and translocations.Conclusions
Evaluation based on both simulated and real data indicates that our method systematically enables structural variants for clonal heterogeneity analysis and can greatly enhance the characterization of genomically instable tumors.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-299) contains supplementary material, which is available to authorized users. 相似文献2.
Background
Transposable elements are mobile DNA repeat sequences, known to have high impact on genes, genome structure and evolution. This has stimulated broad interest in the detailed biological studies of transposable elements. Hence, we have developed an easy-to-use tool for the comparative analysis of the structural organization and functional relationships of transposable elements, to help understand their functional role in genomes.Results
We named our new software VisualTE and describe it here. VisualTE is a JAVA stand-alone graphical interface that allows users to visualize and analyze all occurrences of transposable element families in annotated genomes. VisualTE reads and extracts transposable elements and genomic information from annotation and repeat data. Result analyses are displayed in several graphical panels that include location and distribution on the chromosome, the occurrence of transposable elements in the genome, their size distribution, and neighboring genes’ features and ontologies. With these hallmarks, VisualTE provides a convenient tool for studying transposable element copies and their functional relationships with genes, at the whole-genome scale, and in diverse organisms.Conclusions
VisualTE graphical interface makes possible comparative analyses of transposable elements in any annotated sequence as well as structural organization and functional relationships between transposable elements and other genetic object. This tool is freely available at: http://lcb.cnrs-mrs.fr/spip.php?article867.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1351-5) contains supplementary material, which is available to authorized users. 相似文献3.
Background
Transposable elements constitute an important part of the genome and are essential in adaptive mechanisms. Transposition events associated with phenotypic changes occur naturally or are induced in insertional mutant populations. Transposon mutagenesis results in multiple random insertions and recovery of most/all the insertions is critical for forward genetics study. Using genome next-generation sequencing data and appropriate bioinformatics tool, it is plausible to accurately identify transposon insertion sites, which could provide candidate causal mutations for desired phenotypes for further functional validation.Results
We developed a novel bioinformatics tool, ITIS (Identification of Transposon Insertion Sites), for localizing transposon insertion sites within a genome. It takes next-generation genome re-sequencing data (NGS data), transposon sequence, and reference genome sequence as input, and generates a list of highly reliable candidate insertion sites as well as zygosity information of each insertion. Using a simulated dataset and a case study based on an insertional mutant line from Medicago truncatula, we showed that ITIS performed better in terms of sensitivity and specificity than other similar algorithms such as RelocaTE, RetroSeq, TEMP and TIF. With the case study data, we demonstrated the efficiency of ITIS by validating the presence and zygosity of predicted insertion sites of the Tnt1 transposon within a complex plant system, M. truncatula.Conclusion
This study showed that ITIS is a robust and powerful tool for forward genetic studies in identifying transposable element insertions causing phenotypes. ITIS is suitable in various systems such as cell culture, bacteria, yeast, insect, mammal and plant.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0507-2) contains supplementary material, which is available to authorized users. 相似文献4.
Yuki Ozaki Shingo Suzuki Koichi Kashiwase Atsuko Shigenari Yuko Okudaira Sayaka Ito Anri Masuya Fumihiro Azuma Toshio Yabe Satoko Morishima Shigeki Mitsunaga Masahiro Satake Masao Ota Yasuo Morishima Jerzy K Kulski Katsuyuki Saito Hidetoshi Inoko Takashi Shiina 《BMC genomics》2015,16(1)
Background
HLA genotyping by next generation sequencing (NGS) requires three basic steps, PCR, NGS, and allele assignment. Compared to the conventional methods, such as PCR-sequence specific oligonucleotide primers (SSOP) and -sequence based typing (SBT), PCR-NGS is extremely labor intensive and time consuming. In order to simplify and accelerate the NGS-based HLA genotyping method for multiple DNA samples, we developed and evaluated four multiplex PCR methods for genotyping up to nine classical HLA loci including HLA-A, HLA-B, HLA-C, HLA-DRB1/3/4/5, HLA-DQB1, and HLA-DPB1.Results
We developed multiplex PCR methods using newly and previously designed middle ranged PCR primer sets for genotyping different combinations of HLA loci, (1) HLA-DRB1/3/4/5, (2) HLA-DQB1 (3.8 kb to 5.3 kb), (3) HLA-A, HLA-B, HLA-C, and (4) HLA-DPB1 (4.6 kb to 7.2 kb). The primer sets were designed to genotype polymorphic exons to the field 3 level or 6-digit typing. When we evaluated the PCR method for genotyping all nine HLA loci (9LOCI) using 46 Japanese reference subjects who represented a distribution of more than 99.5% of the HLA alleles at each of the nine HLA loci, all of the 276 alleles genotyped, except for HLA-DRB3/4/5 alleles, were consistent with known alleles assigned by the conventional methods together with relevant locus balance and no excessive allelic imbalance. One multiplex PCR method (9LOCI) was able to provide precise genotyping data even when only 1 ng of genomic DNA was used for the PCR as a sample template.Conclusions
In this study, we have demonstrated that the multiplex PCR approach for NGS-based HLA genotyping could serve as an alternative routine HLA genotyping method, possibly replacing the conventional methods by providing an accelerated yet robust amplification step. The method also could provide significant merits for clinical applications with its ability to amplify lower quantity of samples and the cost-saving factors.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1514-4) contains supplementary material, which is available to authorized users. 相似文献5.
Background
Human leukocyte antigen (HLA) is a group of genes that are extremely polymorphic among individuals and populations and have been associated with more than 100 different diseases and adverse drug effects. HLA typing is accordingly an important tool in clinical application, medical research, and population genetics. We have previously developed a phase-defined HLA gene sequencing method using MiSeq sequencing.Results
Here we report a simple, high-throughput, and cost-effective sequencing method that includes normalized library preparation and adjustment of DNA molar concentration. We applied long-range PCR to amplify HLA-B for 96 samples followed by transposase-based library construction and multiplex sequencing with the MiSeq sequencer. After sequencing, we observed low variation in read percentages (0.2% to 1.55%) among the 96 demultiplexed samples. On this basis, all the samples were amenable to haplotype phasing using our phase-defined sequencing method. In our study, a sequencing depth of 800x was necessary and sufficient to achieve full phasing of HLA-B alleles with reliable assignment of the allelic sequence to the 8 digit level.Conclusions
Our HLA sequencing method optimized for 96 multiplexing samples is highly time effective and cost effective and is especially suitable for automated multi-sample library preparation and sequencing.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-645) contains supplementary material, which is available to authorized users. 相似文献6.
Background
Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.Results
We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.Conclusions
Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users. 相似文献7.
Jose Alfredo Samaniego Castruita Marie Lisandra Zepeda Mendoza Ross Barnett Nathan Wales M Thomas P. Gilbert 《BMC bioinformatics》2015,16(1)
Background
Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets.Results
We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly.Conclusion
Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0682-1) contains supplementary material, which is available to authorized users. 相似文献8.
Sandra Unterseer Eva Bauer Georg Haberer Michael Seidel Carsten Knaak Milena Ouzunova Thomas Meitinger Tim M Strom Ruedi Fries Hubert Pausch Christofer Bertani Alessandro Davassi Klaus FX Mayer Chris-Carolin Sch?n 《BMC genomics》2014,15(1)
Background
High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far.Results
We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel.Conclusions
The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-823) contains supplementary material, which is available to authorized users. 相似文献9.
Philip K Ehrenberg Aviva Geretz Karen M Baldwin Richard Apps Victoria R Polonis Merlin L Robb Jerome H Kim Nelson L Michael Rasmi Thomas 《BMC genomics》2014,15(1)
Background
Unambiguous human leukocyte antigen (HLA) typing is important in transplant matching and disease association studies. High-resolution HLA typing that is not restricted to the peptide-binding region can decrease HLA allele ambiguities. Cost and technology constraints have hampered high-throughput and efficient high resolution unambiguous HLA typing. We have developed a method for HLA genotyping that preserves the very high-resolution that can be obtained by next-generation sequencing (NGS) but also achieves substantially increased efficiency. Unambiguous HLA-A, B, C and DRB1 genotypes can be determined for 96 individuals in a single run of the Illumina MiSeq.Results
Long-range amplification of full-length HLA genes from four loci was performed in separate polymerase chain reactions (PCR) using primers and PCR conditions that were optimized to reduce co-amplification of other HLA loci. Amplicons from the four HLA loci of each individual were then pooled and subjected to enzymatic library generation. All four loci of an individual were then tagged with one unique index combination. This multi-locus individual tagging (MIT) method combined with NGS enabled the four loci of 96 individuals to be analyzed in a single 500 cycle sequencing paired-end run of the Illumina-MiSeq. The MIT-NGS method generated sequence reads from the four loci were then discriminated using commercially available NGS HLA typing software. Comparison of the MIT-NGS with Sanger sequence-based HLA typing methods showed that all the ambiguities and discordances between the two methods were due to the accuracy of the MIT-NGS method.Conclusions
The MIT-NGS method enabled accurate, robust and cost effective simultaneous analyses of four HLA loci per sample and produced 6 or 8-digit high-resolution unambiguous phased HLA typing data from 96 individuals in a single NGS run.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-864) contains supplementary material, which is available to authorized users. 相似文献10.
Brandi L. Cantarel Yunping Lei Daniel Weaver Huiping Zhu Andrew Farrell Graeme Benstead-Hume Justin Reese Richard H. Finnell 《BMC genomics》2015,16(1)
Background
Deidentified newborn screening bloodspot samples (NBS) represent a valuable potential resource for genomic research if impediments to whole exome sequencing of NBS deoxyribonucleic acid (DNA), including the small amount of genomic DNA in NBS material, can be overcome. For instance, genomic analysis of NBS could be used to define allele frequencies of disease-associated variants in local populations, or to conduct prospective or retrospective studies relating genomic variation to disease emergence in pediatric populations over time. In this study, we compared the recovery of variant calls from exome sequences of amplified NBS genomic DNA to variant calls from exome sequencing of non-amplified NBS DNA from the same individuals.Results
Using a standard alignment-based Genome Analysis Toolkit (GATK), we find 62,000–76,000 additional variants in amplified samples. After application of a unique kmer enumeration and variant detection method (RUFUS), only 38,000–47,000 additional variants are observed in amplified gDNA. This result suggests that roughly half of the amplification-introduced variants identified using GATK may be the result of mapping errors and read misalignment.Conclusions
Our results show that it is possible to obtain informative, high-quality data from exome analysis of whole genome amplified NBS with the important caveat that different data generation and analysis methods can affect variant detection accuracy, and the concordance of variant calls in whole-genome amplified and non-amplified exomes.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1747-2) contains supplementary material, which is available to authorized users. 相似文献11.
12.
Anna L. Paterson Jamie M.J. Weaver Matthew D. Eldridge Simon Tavaré Rebecca C. Fitzgerald Paul A.W. Edwards the OCCAMs Consortium 《BMC genomics》2015,16(1)
Background
Mobile elements are active in the human genome, both in the germline and cancers, where they can mutate driver genes.Results
While analysing whole genome paired-end sequencing of oesophageal adenocarcinomas to find genomic rearrangements, we identified three ways in which new mobile element insertions appear in the data, resembling translocation or insertion junctions: inserts where unique sequence has been transduced by an L1 (Long interspersed element 1) mobile element; novel inserts that are confidently, but often incorrectly, mapped by alignment software to L1s or polyA tracts in the reference sequence; and a combination of these two ways, where different sequences within one insert are mapped to different loci. We identified nine unique sequences that were transduced by neighbouring L1s, both L1s in the reference genome and L1s not present in the reference. Many of the resulting inserts were small fragments that include little or no recognisable mobile element sequence. We found 6 loci in the reference genome to which sequence reads from inserts were frequently mapped, probably erroneously, by alignment software: these were either L1 sequence or particularly long polyA runs. Inserts identified from such apparent rearrangement junctions averaged 16 inserts/tumour, range 0–153 insertions in 43 tumours. However, many inserts would not be detected by mapping the sequences to the reference genome, because they do not include sufficient mappable sequence. To estimate total somatic inserts we searched for polyA sequences that were not present in the matched normal or other normals from the same tumour batch, and were not associated with known polymorphisms. Samples of these candidate inserts were verified by sequencing across them or manual inspection of surrounding reads: at least 85 % were somatic and resembled L1-mediated events, most including L1Hs sequence. Approximately 100 such inserts were detected per tumour on average (range zero to approximately 700).Conclusions
Somatic mobile elements insertions are abundant in these tumours, with over 75 % of cases having a number of novel inserts detected. The inserts create a variety of problems for the interpretation of paired-end sequencing data.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1685-z) contains supplementary material, which is available to authorized users. 相似文献13.
Yan Guo Shilin Zhao Brian D Lehmann Quanhu Sheng Timothy M Shaver Thomas P Stricker Jennifer A Pietenpol Yu Shyr 《BMC bioinformatics》2014,15(1)
Background
Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.Results
We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.Conclusions
ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users. 相似文献14.
So-Young Bang Young-Ji Na Kwangwoo Kim Young Bin Joo Youngho Park Jaemoon Lee Sun-Young Lee Adnan A Ansari Junghee Jung Hwanseok Rhee Jong-Young Lee Bok-Ghee Han Sung-Min Ahn Sungho Won Hye-Soon Lee Sang-Cheol Bae 《Arthritis research & therapy》2014,16(5)
Introduction
Although it has been suggested that rare coding variants could explain the substantial missing heritability, very few sequencing studies have been performed in rheumatoid arthritis (RA). We aimed to identify novel functional variants with rare to low frequency using targeted exon sequencing of RA in Korea.Methods
We analyzed targeted exon sequencing data of 398 genes selected from a multifaceted approach in Korean RA patients (n = 1,217) and controls (n = 717). We conducted a single-marker association test and a gene-based analysis of rare variants. For meta-analysis or enrichment tests, we also used ethnically matched independent samples of Korean genome-wide association studies (GWAS) (n = 4,799) or immunochip data (n = 4,722).Results
After stringent quality control, we analyzed 10,588 variants of 398 genes from 1,934 Korean RA case controls. We identified 13 nonsynonymous variants with nominal association in single-variant association tests. In a meta-analysis, we did not find any novel variant with genome-wide significance for RA risk. Using a gene-based approach, we identified 17 genes with nominal burden signals. Among them, VSTM1 showed the greatest association with RA (P = 7.80 × 10−4). In the enrichment test using Korean GWAS, although the significant signal appeared to be driven by total genic variants, we found no evidence for enriched association of coding variants only with RA.Conclusions
We were unable to identify rare coding variants with large effect to explain the missing heritability for RA in the current targeted resequencing study. Our study raises skepticism about exon sequencing of targeted genes for complex diseases like RA.Electronic supplementary material
The online version of this article (doi:10.1186/s13075-014-0447-7) contains supplementary material, which is available to authorized users. 相似文献15.
Background
The power of the genome wide association studies starts to go down when the minor allele frequency (MAF) is below 0.05. Here, we proposed the use of Cohen’s h in detecting disease associated rare variants. The variance stabilizing effect based on the arcsine square root transformation of MAFs to generate Cohen’s h contributed to the statistical power for rare variants analysis. We re-analyzed published datasets, one microarray and one sequencing based, and used simulation to compare the performance of Cohen’s h with the risk difference (RD) and odds ratio (OR).Results
The analysis showed that the type 1 error rate of Cohen’s h was as expected and Cohen’s h and RD were both less biased and had higher power than OR. The advantage of Cohen’s h was more obvious when MAF was less than 0.01.Conclusions
Cohen’s h can increase the power to find genetic association of rare variants and diseases, especially when MAF is less than 0.01.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-875) contains supplementary material, which is available to authorized users. 相似文献16.
Min Wang Christine R Beck Adam C English Qingchang Meng Christian Buhay Yi Han Harsha V Doddapaneni Fuli Yu Eric Boerwinkle James R Lupski Donna M Muzny Richard A Gibbs 《BMC genomics》2015,16(1)
Background
Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.Results
We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki–Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants.Conclusions
The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1370-2) contains supplementary material, which is available to authorized users. 相似文献17.
Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data
Richard J Orton Caroline F Wright Marco J Morelli David J King David J Paton Donald P King Daniel T Haydon 《BMC genomics》2015,16(1)
Background
RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial.Results
We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site’s coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone.Conclusions
These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1456-x) contains supplementary material, which is available to authorized users. 相似文献18.
Adam C English William J Salerno Oliver A Hampton Claudia Gonzaga-Jauregui Shruthi Ambreth Deborah I Ritter Christine R Beck Caleb F Davis Mahmoud Dahdouli Singer Ma Andrew Carroll Narayanan Veeraraghavan Jeremy Bruestle Becky Drees Alex Hastie Ernest T Lam Simon White Pamela Mishra Min Wang Yi Han Feng Zhang Pawel Stankiewicz David A Wheeler Jeffrey G Reid Donna M Muzny Jeffrey Rogers Aniko Sabo Kim C Worley James R Lupski Eric Boerwinkle Richard A Gibbs 《BMC genomics》2015,16(1)
Background
Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.Results
We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.Conclusions
HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users. 相似文献19.
Chen Zhang Ying Liu Brian Z. Ring Kai Nie Mengjie Yang Miao Wang Hongwei Shen Xiyang Wu Xuejun Ma 《PloS one》2013,8(4)
Background
The tetra-primer amplification refractory mutation system PCR (T-ARMS-PCR) is a fast and economical means of assaying SNP''s, requiring only PCR amplification and subsequent electrophoresis for the determination of genotypes. To improve the throughput and efficiency of T-ARMS-PCR, we combined T-ARMS-PCR with a chimeric primer-based temperature switch PCR (TSP) strategy, and used capillary electrophoresis (CE) for amplicon separation and identification. We assessed this process in the simultaneous genotyping of four breast cancer–and two cervical cancer risk–related SNPs.Methods
A total of 24 T-ARMS-PCR primers, each 5′-tagged with a universal sequence and a pair of universal primers, were pooled together to amplify the 12 target alleles of 6 SNPs in 186 control female blood samples. Direct sequencing of all samples was also performed to assess the accuracy of this method.Results
Of the 186 samples, as many as 11 amplicons can be produced in one single PCR and separated by CE. Genotyping results of the multiplex T-ARMS-PCR were in complete agreement with direct sequencing of all samples.Conclusions
This novel multiplex T-ARMS-PCR method is the first reported method allowing one to genotype six SNPs in a single reaction with no post-PCR treatment other than electrophoresis. This method is reliable, fast, and easy to perform. 相似文献20.
Eva C Berglund Carl M?rten Lindqvist Shahina Hayat Elin ?vern?s Niklas Henriksson Jessica Nordlund Per Wahlberg Erik Forestier Gudmar L?nnerholm Ann-Christine Syv?nen 《BMC genomics》2013,14(1)