首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.

Background

The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage.

Results

We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays.

Conclusions

We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.  相似文献   

2.
《Genome biology》2013,14(7):R82

Background

The mouse inbred line C57BL/6J is widely used in mouse genetics and its genome has been incorporated into many genetic reference populations. More recently large initiatives such as the International Knockout Mouse Consortium (IKMC) are using the C57BL/6N mouse strain to generate null alleles for all mouse genes. Hence both strains are now widely used in mouse genetics studies. Here we perform a comprehensive genomic and phenotypic analysis of the two strains to identify differences that may influence their underlying genetic mechanisms.

Results

We undertake genome sequence comparisons of C57BL/6J and C57BL/6N to identify SNPs, indels and structural variants, with a focus on identifying all coding variants. We annotate 34 SNPs and 2 indels that distinguish C57BL/6J and C57BL/6N coding sequences, as well as 15 structural variants that overlap a gene. In parallel we assess the comparative phenotypes of the two inbred lines utilizing the EMPReSSslim phenotyping pipeline, a broad based assessment encompassing diverse biological systems. We perform additional secondary phenotyping assessments to explore other phenotype domains and to elaborate phenotype differences identified in the primary assessment. We uncover significant phenotypic differences between the two lines, replicated across multiple centers, in a number of physiological, biochemical and behavioral systems.

Conclusions

Comparison of C57BL/6J and C57BL/6N demonstrates a range of phenotypic differences that have the potential to impact upon penetrance and expressivity of mutational effects in these strains. Moreover, the sequence variants we identify provide a set of candidate genes for the phenotypic differences observed between the two strains.  相似文献   

3.

Background

Insertions and deletions (indels) are the most abundant form of structural variation in all genomes. Indels have been increasingly recognized as an important source of molecular markers due to high-density occurrence, cost-effectiveness, and ease of genotyping. Coupled with developments in bioinformatics, next-generation sequencing (NGS) platforms enable the discovery of millions of indel polymorphisms by comparing the whole genome sequences of individuals within a species.

Results

A total of 1,973,746 unique indels were identified in 345 maize genomes, with an overall density of 958.79 indels/Mbp, and an average allele number of 2.76, ranging from 2 to 107. There were 264,214 indels with polymorphism information content (PIC) values greater than or equal to 0.5, accounting for 13.39 % of overall indels. Of these highly polymorphic indels, we designed primer pairs for 83,481 and 29,403 indels with major allele differences (i.e. the size difference between the most and second most frequent alleles) greater than or equal to 3 and 8 bp, respectively, based on the differing resolution capabilities of gel electrophoresis. The accuracy of our indel markers was experimentally validated, and among 100 indel markers, average accuracy was approximately 90 %. In addition, we also validated the polymorphism of the indel markers. Of 100 highly polymorphic indel markers, all had polymorphisms with average PIC values of 0.54.

Conclusions

The maize genome is rich in indel polymorphisms. Intriguingly, the level of polymorphism in genic regions of the maize genome was higher than that in intergenic regions. The polymorphic indel markers developed from this study may enhance the efficiency of genetic research and marker-assisted breeding in maize.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1797-5) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Recent advances in deep digital sequencing have unveiled an unprecedented degree of clonal heterogeneity within a single tumor DNA sample. Resolving such heterogeneity depends on accurate estimation of fractions of alleles that harbor somatic mutations. Unlike substitutions or small indels, structural variants such as deletions, duplications, inversions and translocations involve segments of DNAs and are potentially more accurate for allele fraction estimations. However, no systematic method exists that can support such analysis.

Results

In this paper, we present a novel maximum-likelihood method that estimates allele fractions of structural variants integratively from various forms of alignment signals. We develop a tool, BreakDown, to estimate the allele fractions of most structural variants including medium size (from 1 kilobase to 1 megabase) deletions and duplications, and balanced inversions and translocations.

Conclusions

Evaluation based on both simulated and real data indicates that our method systematically enables structural variants for clonal heterogeneity analysis and can greatly enhance the characterization of genomically instable tumors.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-299) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Trypanosoma cruzi is the causal agent of Chagas Disease. Recently, the genomes of representative strains from two major evolutionary lineages were sequenced, allowing the construction of a detailed genetic diversity map for this important parasite. However this map is focused on coding regions of the genome, leaving a vast space of regulatory regions uncharacterized in terms of their evolutionary conservation and/or divergence.

Methodology

Using data from the hybrid CL Brener and Sylvio X10 genomes (from the TcVI and TcI Discrete Typing Units, respectively), we identified intergenic regions that share a common evolutionary ancestry, and are present in both CL Brener haplotypes (TcII-like and TcIII-like) and in the TcI genome; as well as intergenic regions that were conserved in only two of the three genomes/haplotypes analyzed. The genetic diversity in these regions was characterized in terms of the accumulation of indels and nucleotide changes.

Principal Findings

Based on this analysis we have identified i) a core of highly conserved intergenic regions, which remained essentially unchanged in independently evolving lineages; ii) intergenic regions that show high diversity in spite of still retaining their corresponding upstream and downstream coding sequences; iii) a number of defined sequence motifs that are shared by a number of unrelated intergenic regions. A fraction of indels explains the diversification of some intergenic regions by the expansion/contraction of microsatellite-like repeats.  相似文献   

6.

Background

Human papillomavirus 16 (HPV16) species group (alpha-9) of the Alphapapillomavirus genus contains HPV16, HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. These HPVs account for 75% of invasive cervical cancers worldwide. Viral variants of these HPVs differ in evolutionary history and pathogenicity. Moreover, a comprehensive nomenclature system for HPV variants is lacking, limiting comparisons between studies.

Methods

DNA from cervical samples previously characterized for HPV type were obtained from multiple geographic regions to screen for novel variants. The complete 8 kb genomes of 120 variants representing the major and minor lineages of the HPV16-related alpha-9 HPV types were sequenced to capture maximum viral heterogeneity. Viral evolution was characterized by constructing phylogenic trees based on complete genomes using multiple algorithms. Maximal and viral region specific divergence was calculated by global and pairwise alignments. Variant lineages were classified and named using an alphanumeric system; the prototype genome was assigned to the A lineage for all types.

Results

The range of genome-genome sequence heterogeneity varied from 0.6% for HPV35 to 2.2% for HPV52 and included 1.4% for HPV31, 1.1% for HPV33, 1.7% for HPV58 and 1.1% for HPV67. Nucleotide differences of approximately 1.0% - 10.0% and 0.5%–1.0% of the complete genomes were used to define variant lineages and sublineages, respectively. Each gene/region differs in sequence diversity, from most variable to least variable: noncoding region 1 (NCR1) /noncoding region 2 (NCR2) >upstream regulatory region (URR)> E6/E7 > E2/L2 > E1/L1.

Conclusions

These data define maximum viral genomic heterogeneity of HPV16-related alpha-9 HPV variants. The proposed nomenclature system facilitates the comparison of variants across epidemiological studies. Sequence diversity and phylogenies of this clinically important group of HPVs provides the basis for further studies of discrete viral evolution, epidemiology, pathogenesis and preventative/therapeutic interventions.  相似文献   

7.

Background

Domestication has shaped the horse and lead to a group of many different types. Some have been under strong human selection while others developed in close relationship with nature. The aim of our study was to perform next generation sequencing of breed and non-breed horses to provide an insight into genetic influences on selective forces.

Results

Whole genome sequencing of five horses of four different populations revealed 10,193,421 single nucleotide polymorphisms (SNPs) and 1,361,948 insertion/deletion polymorphisms (indels). In comparison to horse variant databases and previous reports, we were able to identify 3,394,883 novel SNPs and 868,525 novel indels. We analyzed the distribution of individual variants and found significant enrichment of private mutations in coding regions of genes involved in primary metabolic processes, anatomical structures, morphogenesis and cellular components in non-breed horses and in contrast to that private mutations in genes affecting cell communication, lipid metabolic process, neurological system process, muscle contraction, ion transport, developmental processes of the nervous system and ectoderm in breed horses.

Conclusions

Our next generation sequencing data constitute an important first step for the characterization of non-breed in comparison to breed horses and provide a large number of novel variants for future analyses. Functional annotations suggest specific variants that could play a role for the characterization of breed or non-breed horses.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-562) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background and Aims

Although monocotyledonous plants comprise one of the two major groups of angiosperms and include >65 000 species, comprehensive genome analysis has been focused mainly on the Poaceae (grass) family. Due to this bias, most of the conclusions that have been drawn for monocot genome evolution are based on grasses. It is not known whether these conclusions apply to many other monocots.

Methods

To extend our understanding of genome evolution in the monocots, Asparagales genomic sequence data were acquired and the structural properties of asparagus and onion genomes were analysed. Specifically, several available onion and asparagus bacterial artificial chromosomes (BACs) with contig sizes >35 kb were annotated and analysed, with a particular focus on the characterization of long terminal repeat (LTR) retrotransposons.

Key Results

The results reveal that LTR retrotransposons are the major components of the onion and garden asparagus genomes. These elements are mostly intact (i.e. with two LTRs), have mainly inserted within the past 6 million years and are piled up into nested structures. Analysis of shotgun genomic sequence data and the observation of two copies for some transposable elements (TEs) in annotated BACs indicates that some families have become particularly abundant, as high as 4–5 % (asparagus) or 3–4 % (onion) of the genome for the most abundant families, as also seen in large grass genomes such as wheat and maize.

Conclusions

Although previous annotations of contiguous genomic sequences have suggested that LTR retrotransposons were highly fragmented in these two Asparagales genomes, the results presented here show that this was largely due to the methodology used. In contrast, this current work indicates an ensemble of genomic features similar to those observed in the Poaceae.  相似文献   

9.

Background

Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved.

Results

We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans.

Conclusions

The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution.  相似文献   

10.

Background

RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results

We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion

Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

The correct taxonomic assignment of bacterial genomes is a primary and challenging task. With the availability of whole genome sequences, the gene content based approaches appear promising in inferring the bacterial taxonomy. The complete genome sequencing of a bacterial genome often reveals a substantial number of unique genes present only in that genome which can be used for its taxonomic classification.

Results

In this study, we have proposed a comprehensive method which uses the taxon-specific genes for the correct taxonomic assignment of existing and new bacterial genomes. The taxon-specific genes identified at each taxonomic rank have been successfully used for the taxonomic classification of 2,342 genomes present in the NCBI genomes, 36 newly sequenced genomes, and 17 genomes for which the complete taxonomy is not yet known. This approach has been implemented for the development of a tool ‘Microtaxi’ which can be used for the taxonomic assignment of complete bacterial genomes.

Conclusion

The taxon-specific gene based approach provides an alternate valuable methodology to carry out the taxonomic classification of newly sequenced or existing bacterial genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1542-0) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

The mechanism of high-altitude adaptation has been studied in certain mammals. However, in avian species like the ground tit Pseudopodoces humilis, the adaptation mechanism remains unclear. The phylogeny of the ground tit is also controversial.

Results

Using next generation sequencing technology, we generated and assembled a draft genome sequence of the ground tit. The assembly contained 1.04 Gb of sequence that covered 95.4% of the whole genome and had higher N50 values, at the level of both scaffolds and contigs, than other sequenced avian genomes. About 1.7 million SNPs were detected, 16,998 protein-coding genes were predicted and 7% of the genome was identified as repeat sequences. Comparisons between the ground tit genome and other avian genomes revealed a conserved genome structure and confirmed the phylogeny of ground tit as not belonging to the Corvidae family. Gene family expansion and positively selected gene analysis revealed genes that were related to cardiac function. Our findings contribute to our understanding of the adaptation of this species to extreme environmental living conditions.

Conclusions

Our data and analysis contribute to the study of avian evolutionary history and provide new insights into the adaptation mechanisms to extreme conditions in animals.  相似文献   

13.

Background

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases.

Results

We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools.

Conclusions

indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0483-6) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Since 2010, four Charolais calves with a congenital mechanobullous skin disorder that were born in the same herd from consanguineous matings were reported to us. Clinical and histopathological examination revealed lesions that are compatible with junctional epidermolysis bullosa (JEB).

Results

Fifty-four extended regions of homozygosity (>1 Mb) were identified after analysing the whole-genome sequencing (WGS) data from the only case available for DNA sampling at the beginning of the study. Filtering of variants located in these regions for (i) homozygous polymorphisms observed in the WGS data from eight healthy Charolais animals and (ii) homozygous or heterozygous polymorphisms found in the genomes of 234 animals from different breeds did not reveal any deleterious candidate SNPs (single nucleotide polymorphisms) or small indels. Subsequent screening for structural variants in candidate genes located in the same regions identified a homozygous deletion that includes exons 17 to 23 of the integrin beta 4 (ITGB4), a gene that was previously associated with the same defect in humans. Genotyping of a second case and of six parents of affected calves (two sires and four dams) revealed a perfect association between this mutation and the assumed genotypes of the individuals. Mining of Illumina BovineSNP50 Beadchip genotyping data from 6870 Charolais cattle detected only 44 heterozygous animals for a 5.6-Mb haplotype around ITGB4 that was shared with the carriers of the mutation. Interestingly, none of the 16 animals genotyped for the deletion carried the mutation, which suggests a rather recent origin for the mutation.

Conclusions

In conclusion, we successfully identified the causative mutation for a very rare autosomal recessive mutation with only one case by exploiting the most recent DNA sequencing technologies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0110-z) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation. Identification of large numbers of SNPs is helpful for genetic diversity analysis, map-based cloning, genome-wide association analyses and marker-assisted breeding. Recently, identifying genome-wide SNPs in allopolyploid Brassica napus (rapeseed, canola) by resequencing many accessions has become feasible, due to the availability of reference genomes of Brassica rapa (2n = AA) and Brassica oleracea (2n = CC), which are the progenitor species of B. napus (2n = AACC). Although many SNPs in B. napus have been released, the objective in the present study was to produce a larger, more informative set of SNPs for large-scale and efficient genotypic screening. Hence, short-read genome sequencing was conducted on ten elite B. napus accessions for SNP discovery. A subset of these SNPs was randomly selected for sequence validation and for genotyping efficiency testing using the Illumina GoldenGate assay.

Results

A total of 892,536 bi-allelic SNPs were discovered throughout the B. napus genome. A total of 36,458 putative amino acid variants were located in 13,552 protein-coding genes, which were predicted to have enriched binding and catalytic activity as a result. Using the GoldenGate genotyping platform, 94 of 96 SNPs sampled could effectively distinguish genotypes of 130 lines from two mapping populations, with an average call rate of 92%.

Conclusions

Despite the polyploid nature of B. napus, nearly 900,000 simple SNPs were identified by whole genome resequencing. These SNPs were predicted to be effective in high-throughput genotyping assays (51% polymorphic SNPs, 92% average call rate using the GoldenGate assay, leading to an estimated >450 000 useful SNPs). Hence, the development of a much larger genotyping array of informative SNPs is feasible. SNPs identified in this study to cause non-synonymous amino acid substitutions can also be utilized to directly identify causal genes in association studies.  相似文献   

16.

Background

One of the goals of genomics is to identify the genetic loci responsible for variation in phenotypic traits. The completion of the tomato genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of genetic variation present in the tomato genome. Like many self-pollinated crops, cultivated tomato accessions show a low molecular but high phenotypic diversity. Here we describe the whole-genome resequencing of eight accessions (four cherry-type and four large fruited lines) chosen to represent a large range of intra-specific variability and the identification and annotation of novel polymorphisms.

Results

The eight genomes were sequenced using the GAII Illumina platform. Comparison of the sequences with the reference genome yielded more than 4 million single nucleotide polymorphisms (SNPs). This number varied from 80,000 to 1.5 million according to the accessions. Almost 128,000 InDels were detected. The distribution of SNPs and InDels across and within chromosomes was highly heterogeneous revealing introgressions from wild species and the mosaic structure of the genomes of the cherry tomato accessions. In-depth annotation of the polymorphisms identified more than 16,000 unique non-synonymous SNPs. In addition 1,686 putative copy-number variations (CNVs) were identified.

Conclusions

This study represents the first whole genome resequencing experiment in cultivated tomato. Substantial genetic differences exist between the sequenced tomato accessions and the reference sequence. The heterogeneous distribution of the polymorphisms may be related to introgressions that occurred during domestication or breeding. The annotated SNPs, InDels and CNVs identified in this resequencing study will serve as useful genetic tools, and as candidate polymorphisms in the search for phenotype-altering DNA variations.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-791) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

By reshuffling genomes, structural genomic reorganizations provide genetic variation on which natural selection can work. Understanding the mechanisms underlying this process has been a long-standing question in evolutionary biology. In this context, our purpose in this study is to characterize the genomic regions involved in structural rearrangements between human and macaque genomes and determine their influence on meiotic recombination as a way to explore the adaptive role of genome shuffling in mammalian evolution.

Results

We first constructed a highly refined map of the structural rearrangements and evolutionary breakpoint regions in the human and rhesus macaque genomes based on orthologous genes and whole-genome sequence alignments. Using two different algorithms, we refined the genomic position of known rearrangements previously reported by cytogenetic approaches and described new putative micro-rearrangements (inversions and indels) in both genomes. A detailed analysis of the rhesus macaque genome showed that evolutionary breakpoints are in gene-rich regions, being enriched in GO terms related to immune system. We also identified defense-response genes within a chromosome inversion fixed in the macaque lineage, underlying the relevance of structural genomic changes in evolutionary and/or adaptation processes. Moreover, by combining in silico and experimental approaches, we studied the recombination pattern of specific chromosomes that have suffered rearrangements between human and macaque lineages.

Conclusions

Our data suggest that adaptive alleles – in this case, genes involved in the immune response – might have been favored by genome rearrangements in the macaque lineage.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-530) contains supplementary material, which is available to authorized users.  相似文献   

18.

Background

Target enrichment and resequencing is a widely used approach for identification of cancer genes and genetic variants associated with diseases. Although cost effective compared to whole genome sequencing, analysis of many samples constitutes a significant cost, which could be reduced by pooling samples before capture. Another limitation to the number of cancer samples that can be analyzed is often the amount of available tumor DNA. We evaluated the performance of whole genome amplified DNA and the power to detect subclonal somatic single nucleotide variants in non-indexed pools of cancer samples using the HaloPlex technology for target enrichment and next generation sequencing.

Results

We captured a set of 1528 putative somatic single nucleotide variants and germline SNPs, which were identified by whole genome sequencing, with the HaloPlex technology and sequenced to a depth of 792–1752. We found that the allele fractions of the analyzed variants are well preserved during whole genome amplification and that capture specificity or variant calling is not affected. We detected a large majority of the known single nucleotide variants present uniquely in one sample with allele fractions as low as 0.1 in non-indexed pools of up to ten samples. We also identified and experimentally validated six novel variants in the samples included in the pools.

Conclusion

Our work demonstrates that whole genome amplified DNA can be used for target enrichment equally well as genomic DNA and that accurate variant detection is possible in non-indexed pools of cancer samples. These findings show that analysis of a large number of samples is feasible at low cost, even when only small amounts of DNA is available, and thereby significantly increases the chances of indentifying recurrent mutations in cancer samples.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-856) contains supplementary material, which is available to authorized users.  相似文献   

19.

Background

The genome of the melon (Cucumis melo L.) double-haploid line DHL92 was recently sequenced, with 87.5 and 80.8% of the scaffold assembly anchored and oriented to the 12 linkage groups, respectively. However, insufficient marker coverage and a lack of recombination left several large, gene rich scaffolds unanchored, and some anchored scaffolds unoriented. To improve the anchoring and orientation of the melon genome assembly, we used resequencing data between the parental lines of DHL92 to develop a new set of SNP markers from unanchored scaffolds.

Results

A high-resolution genetic map composed of 580 SNPs was used to anchor 354.8 Mb of sequence, contained in 141 scaffolds (average size 2.5 Mb) and corresponding to 98.2% of the scaffold assembly, to the 12 melon chromosomes. Over 325.4 Mb (90%) of the assembly was oriented. The genetic map revealed regions of segregation distortion favoring SC alleles as well as recombination suppression regions coinciding with putative centromere, 45S, and 5S rDNA sites. New chromosome-scale pseudomolecules were created by incorporating to the previous v3.5 version an additional 38.3 Mb of anchored sequence representing 1,837 predicted genes contained in 55 scaffolds. Using fluorescent in situ hybridization (FISH) with BACs that produced chromosome-specific signals, melon chromosomes that correspond to the twelve linkage groups were identified, and a standardized karyotype of melon inbred line T111 was developed.

Conclusions

By utilizing resequencing data and targeted SNP selection combined with a large F2 mapping population, we significantly improved the quantity of anchored and oriented melon scaffold genome assembly. Using genome information combined with FISH mapping provided the first cytogenetic map of an inodorus melon type. With these results it was possible to make inferences on melon chromosome structure by relating zones of recombination suppression to centromeres and 45S and 5S heterochromatic regions. This study represents the first steps towards the integration of the high-resolution genetic and cytogenetic maps with the genomic sequence in melon that will provide more information on genome organization and allow for the improvement of the melon genome draft sequence.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-014-1196-3) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

Familial aggregation of Chagas cardiac disease in T. cruzi–infected persons suggests that human genetic variation may be an important determinant of disease progression.

Objective

To perform a GWAS using a well-characterized cohort to detect single nucleotide polymorphisms (SNPs) and genes associated with cardiac outcomes.

Methods

A retrospective cohort study was developed by the NHLBI REDS-II program in Brazil. Samples were collected from 499 T. cruzi seropositive blood donors who had donated between1996 and 2002, and 101 patients with clinically diagnosed Chagas cardiomyopathy. In 2008–2010, all subjects underwent a complete medical examination. After genotype calling, quality control filtering with exclusion of 20 cases, and imputation of 1,000 genomes variants; association analysis was performed for 7 cardiac and parasite related traits, adjusting for population stratification.

Results

The cohort showed a wide range of African, European, and modest Native American admixture proportions, consistent with the recent history of Brazil. No SNPs were found to be highly (P<10−8) associated with cardiomyopathy. The two mostly highly associated SNPs for cardiomyopathy (rs4149018 and rs12582717; P-values <10−6) are located on Chromosome 12p12.2 in the SLCO1B1 gene, a solute carrier family member. We identified 44 additional genic SNPs associated with six traits at P-value <10-6: Ejection Fraction, PR, QRS, QT intervals, antibody levels by EIA, and parasitemia by PCR.

Conclusion

This GWAS identified suggestive SNPs that may impact the risk of progression to cardiomyopathy. Although this Chagas cohort is the largest examined by GWAS to date, (580 subjects), moderate sample size may explain in part the limited number of significant SNP variants. Enlarging the current sample through expanded cohorts and meta-analyses, and targeted studies of candidate genes, will be required to confirm and extend the results reported here. Future studies should also include exposed seronegative controls to investigate genetic associations with susceptibility or resitance to T. cruzi infection and non-Chagas cardiomathy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号