首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Flexibility and low cost make genotyping‐by‐sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI‐MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference‐free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking.  相似文献   

2.

Background

Many areas critical to agricultural production and research, such as the breeding and trait mapping in plants and livestock, require robust and scalable genotyping platforms. Genotyping-by-sequencing (GBS) is a one such method highly suited to non-human organisms. In the GBS protocol, genomic DNA is fractionated via restriction digest, then reduced representation is achieved through size selection. Since many restriction sites are conserved across a species, the sequenced portion of the genome is highly consistent within a population. This makes the GBS protocol highly suited for experiments that require surveying large numbers of markers within a population, such as those involving genetic mapping, breeding, and population genomics. We have modified the GBS technology in a number of ways. Custom, enzyme specific adaptors have been replaced with standard Illumina adaptors compatible with blunt-end restriction enzymes. Multiplexing is achieved through a dual barcoding system, and bead-based library preparation protocols allows for in-solution size selection and eliminates the need for columns and gels.

Results

A panel of eight restriction enzymes was selected for testing on B73 maize and Nipponbare rice genomic DNA. Quality of the data was demonstrated by identifying that the vast majority of reads from each enzyme aligned to restriction sites predicted in silico. The link between enzyme parameters and experimental outcome was demonstrated by showing that the sequenced portion of the genome was adaptable by selecting enzymes based on motif length, complexity, and methylation sensitivity. The utility of the new GBS protocol was demonstrated by correctly mapping several in a maize F2 population resulting from a B73 × Country Gentleman test cross.

Conclusions

This technology is readily adaptable to different genomes, highly amenable to multiplexing and compatible with over forty commercially available restriction enzymes. These advancements represent a major improvement in genotyping technology by providing a highly flexible and scalable GBS that is readily implemented for studies on genome-wide variation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-979) contains supplementary material, which is available to authorized users.  相似文献   

3.
Next-generation sequencing technologies have increased markedly the throughput of genetic studies, allowing the identification of several thousands of SNPs within a single experiment. Even though sequencing cost is rapidly decreasing, the price for whole-genome re-sequencing of a large number of individuals is still costly, especially in plants with a large and highly redundant genome. In recent years, several reduced representation library approaches have been developed for reducing the sequencing cost per individual. Among them, genotyping-by-sequencing (GBS) represents a simple, cost-effective, and highly multiplexed alternative for species with or without an available reference genome. However, this technology requires specific optimization for each species, especially for the restriction enzyme (RE) used. Here we report on the application of GBS in a test experiment with 18 genotypes of wild and domesticated Phaseolus vulgaris. After an in silico digestion with different RE of the P. vulgaris genome reference sequence, we selected CviAII as the most suitable RE for GBS in common bean based on the high frequency and even distribution of restriction sites. A total of 44,875 SNPs, 1940 deletions, and 1693 insertions were identified, with 50 % of the variants located in genic sequences and tagging 11,027 genes. SNP and InDel distributions were positively correlated with gene density across the genome. In addition, we were able to also identify putative copy number variations of genomic segments between different genotypes. In conclusion, GBS with the CviAII enzyme results in thousands of evenly spaced markers and provides a reliable, high-throughput, and cost-effective approach for genotyping both wild and domesticated common beans.  相似文献   

4.
Application of high‐throughput sequencing platforms in the field of ecology and evolutionary biology is developing quickly with the introduction of efficient methods to reduce genome complexity. Numerous approaches for genome complexity reduction have been developed using different combinations of restriction enzymes, library construction strategies and fragment size selection. As a result, the choice of which techniques to use may become cumbersome, because it is difficult to anticipate the number of loci resulting from each method. We developed SimRAD, an R package that performs in silico restriction enzyme digests and fragment size selection as implemented in most restriction site associated DNA polymorphism and genotyping by sequencing methods. In silico digestion is performed on a reference genome or on a randomly generated DNA sequence when no reference genome sequence is available. SimRAD accurately predicts the number of loci under alternative protocols when a reference genome sequence is available for the targeted species (or a close relative) but may be unreliable when no reference genome is available. SimRAD is also useful for fine‐tuning a given protocol to adjust the number of targeted loci. Here, we outline the functionality of SimRAD and provide an illustrative example of the use of the package (available on the CRAN at http://cran.r-project.org/web/packages/SimRAD ).  相似文献   

5.
Approximate Bayesian computation (ABC) is widely used to infer demographic history of populations and species using DNA markers. Genomic markers can now be developed for nonmodel species using reduced representation library (RRL) sequencing methods that select a fraction of the genome using targeted sequence capture or restriction enzymes (genotyping‐by‐sequencing, GBS). We explored the influence of marker number and length, knowledge of gametic phase, and tradeoffs between sample size and sequencing depth on the quality of demographic inferences performed with ABC. We focused on two‐population models of recent spatial expansion with varying numbers of unknown parameters. Performing ABC on simulated data sets with known parameter values, we found that the timing of a recent spatial expansion event could be precisely estimated in a three‐parameter model. Taking into account uncertainty in parameters such as initial population size and migration rate collectively decreased the precision of inferences dramatically. Phasing haplotypes did not improve results, regardless of sequence length. Numerous short sequences were as valuable as fewer, longer sequences, and performed best when a large sample size was sequenced at low individual depth, even when sequencing errors were added. ABC results were similar to results obtained with an alternative method based on the site frequency spectrum (SFS) when performed with unphased GBS‐type markers. We conclude that unphased GBS‐type data sets can be sufficient to precisely infer simple demographic models, and discuss possible improvements for the use of ABC with genomic data.  相似文献   

6.
IIB型限制内切酶能够识别并切割特异酶切位点两端特定距离的DNA,形成粘性末端的30 bp左右的等长DNA片段。利用其特性与限制性酶切位点关联测序技术(RAD)相结合发展出2b-RAD简化基因组测序技术,应用于遗传图谱构建、种群遗传结构分析、性状定位以及细菌分型等多种研究领域。构建2b-RAD测序文库之前,需要对基因组中的IIB型限制内切酶位点进行预测与统计分析,制定有效的测序文库构建方案。本文利用Python语言构建分析基因组中IIB型限制内切酶位点的流程,预测并统计6个鳞翅目代表物种基因组含有的8个商业化IIB型限制内切酶的酶切位点,比较了各个基因组与IIB型限制内切酶之间含有的酶切位点总量、重复序列数量以及酶切间隔长度的关系,为在昆虫基因组中进一步试行2b-RAD研究提供了参考。  相似文献   

7.
Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.  相似文献   

8.
Understanding the role of ‘epigenetic’ changes such as DNA methylation and chromatin remodeling has now become critical in understanding many biological processes. In order to delineate the global methylation pattern in a given genomic DNA, computer software has been developed to create a virtual image of restriction landmark genomic scanning (Vi-RLGS). When using a methylation- sensitive enzyme such as NotI as the restriction landmark, the comparison between real and in silico RLGS profiles of the genome provides a methylation map of genomic NotI sites. A methylation map of the Arabidopsis genome was created that could be confirmed by a methylation-sensitive PCR assay. The method has also been applied to the mouse genome. Although a complete methylation map has not been completed, a region of methylation difference between two tissues has been tested and confirmed by bisulfite sequencing. Vi-RLGS in conjunction with real RLGS will make it possible to develop a more complete map of genomic sites that are methylated or demethylated as a consequence of normal or abnormal development.  相似文献   

9.
A new method to improve the efficiency of flanking sequence identification by genome walking was developed based on an expanded, sequential list of criteria for selecting candidate enzymes, plus several other optimization steps. These criteria include: step (1) initially choosing the most appropriate restriction enzyme according to the average fragment size produced by each enzyme determined using in silico digestion of genomic DNA, step (2) evaluating the in silico frequency of fragment size distribution between individual chromosomes, step (3) selecting those enzymes that generate fragments with the majority between 100 bp and 3,000 bp, step (4) weighing the advantages and disadvantages of blunt-end sites vs. cohesive-end sites, step (5) elimination of methylation sensitive enzymes with methylation-insensitive isoschizomers, and step (6) elimination of enzymes with recognition sites within the binary vector sequence (T-DNA and plasmid backbone). Step (7) includes the selection of a second restriction enzyme with highest number of recognition sites within regions not covered by the first restriction enzyme. Step (8) considers primer and adapter sequence optimization, selecting the best adapter-primer pairs according to their hairpin/dimers and secondary structure. In step (9), the efficiency of genomic library development was improved by column-filtration of digested DNA to remove restriction enzyme and phosphatase enzyme, and most important, to remove small genomic fragments (<100 bp) lacking the T-DNA insertion, hence improving the chance of ligation between adapters and fragments harbouring a T-DNA. Two enzymes, NsiI and NdeI, fit these criteria for the Arabidopsis thaliana genome. Their efficiency was assessed using 54 T(3) lines from an Arabidopsis SK enhancer population. Over 70% success rate was achieved in amplifying the flanking sequences of these lines. This strategy was also tested with Brachypodium distachyon to demonstrate its applicability to other larger genomes.  相似文献   

10.
Population genetic studies of nonmodel organisms frequently employ reduced representation library (RRL) methodologies, many of which rely on protocols in which genomic DNA is digested by one or more restriction enzymes. However, because high molecular weight DNA is recommended for these protocols, samples with degraded DNA are generally unsuitable for RRL methods. Given that ancient and historic specimens can provide key temporal perspectives to evolutionary questions, we explored how custom‐designed RNA probes could enrich for RRL loci (Restriction Enzyme‐Associated Loci baits, or REALbaits). Starting with genotyping‐by‐sequencing (GBS) data generated on modern common ragweed (Ambrosia artemisiifolia L.) specimens, we designed 20 000 RNA probes to target well‐characterized genomic loci in herbarium voucher specimens dating from 1835 to 1913. Compared to shotgun sequencing, we observed enrichment of the targeted loci at 19‐ to 151‐fold. Using our GBS capture pipeline on a data set of 38 herbarium samples, we discovered 22 813 SNPs, providing sufficient genomic resolution to distinguish geographic populations. For these samples, we found that dilution of REALbaits to 10% of their original concentration still yielded sufficient data for downstream analyses and that a sequencing depth of ~7m reads was sufficient to characterize most loci without wasting sequencing capacity. In addition, we observed that targeted loci had highly variable rates of success, which we primarily attribute to similarity between loci, a trait that ultimately interferes with unambiguous read mapping. Our findings can help researchers design capture experiments for RRL loci, thereby providing an efficient means to integrate samples with degraded DNA into existing RRL data sets.  相似文献   

11.
A restriction map of the 2.8-Mb genome of the unicellular eukaryote Encephalitozoon cuniculi (phylum Microspora), a mammal-infecting intracellular parasite, has been constructed using two restriction enzymes with 6 bp recognition sites (BssHII and MluI). The fragments resulting from either single digestions of the whole molecular karyotype or double digestions of 11 individual chromosomes have been separated by two-dimensional pulsed field gel electrophoresis (2D-PFGE) procedures. The average distance between successive restriction sites is ~19 kb. The terminal regions of the chromosomes show a common pattern covering ~15 kb and including one 16S–23S rDNA unit. Results of hybridisation and molecular combing experiments indicate a palindromic-like orientation of the two subtelomeric rDNA copies on each chromosome. We have also located 67 DNA markers (clones from a partial E.cuniculi genomic library) by hybridisation to restriction fragments. Partial or complete sequencing has revealed homologies with known protein-coding genes for 32 of these clones. Evidence for two homologous chromosomes III, with a size difference (3 kb) related to a subtelomeric deletion/insertion event, argues for diploidy of E.cuniculi. The physical map should be useful for both the whole genome sequencing project and studies on genome plasticity of this widespread parasite.  相似文献   

12.
为了对1株中国棉铃虫核型多角体缺失病毒HZ-9进行基因组测序,采用了一种新的方法,通过超声波振断HaBacHZ9细菌人工染色体质粒(bacterial artificial chromosome plasmid,Bacmid)基因组DNA,用Taq酶在DNA片段两端加腺噤呤A,胶回收后得到预期的1—2kb的DNA片段,然后与pGEM-Teasy载体连接,构建了中国棉铃虫缺失病毒HaBacHZ9的亚克隆文库。结果随机挑选10个克隆子酶切分析,显示9个克隆子有1500bp左右的插入片断,并对HaBaeHZ9进行了全基因组测序。结论成功构建了HaBaeHZ9的DNA测序文库,为HZ-9功能基因组学研究奠定了基础,这是一种简单快速的构建DNA病毒测序文库的方法。  相似文献   

13.
A growing variety of “genotype-by-sequencing” (GBS) methods use restriction enzymes and high throughput DNA sequencing to generate data for a subset of genomic loci, allowing the simultaneous discovery and genotyping of thousands of polymorphisms in a set of multiplexed samples. We evaluated a “double-digest” restriction-site associated DNA sequencing (ddRAD-seq) protocol by 1) comparing results for a zebra finch (Taeniopygia guttata) sample with in silico predictions from the zebra finch reference genome; 2) assessing data quality for a population sample of indigobirds (Vidua spp.); and 3) testing for consistent recovery of loci across multiple samples and sequencing runs. Comparison with in silico predictions revealed that 1) over 90% of predicted, single-copy loci in our targeted size range (178–328 bp) were recovered; 2) short restriction fragments (38–178 bp) were carried through the size selection step and sequenced at appreciable depth, generating unexpected but nonetheless useful data; 3) amplification bias favored shorter, GC-rich fragments, contributing to among locus variation in sequencing depth that was strongly correlated across samples; 4) our use of restriction enzymes with a GC-rich recognition sequence resulted in an up to four-fold overrepresentation of GC-rich portions of the genome; and 5) star activity (i.e., non-specific cutting) resulted in thousands of “extra” loci sequenced at low depth. Results for three species of indigobirds show that a common set of thousands of loci can be consistently recovered across both individual samples and sequencing runs. In a run with 46 samples, we genotyped 5,996 loci in all individuals and 9,833 loci in 42 or more individuals, resulting in <1% missing data for the larger data set. We compare our approach to similar methods and discuss the range of factors (fragment library preparation, natural genetic variation, bioinformatics) influencing the recovery of a consistent set of loci among samples.  相似文献   

14.
A key component of a sound functional genomics infrastructure is the availability of a knockout mutant for every gene in the genome. A fruitful approach to systematically knockingout genes in the plant Arabidopsis thaliana has been the use of transferred-DNA (T-DNA) from Agrobacterium tumefaciens as an insertional mutagen. One of the assumptions underlying the use of T-DNA as a mutagen is that the insertion of these DNA elements into the Arabidopsis genome occurs at randomly selected locations. We have directly investigated the distribution of T-DNA insertions sites in populations of transformed Arabidopsis using two different approaches. To begin with, we utilized a polymerase chain reaction (PCR) procedure to systematically catalog the precise locations of all the T-DNA elements inserted within a 65 kb segment of chromosome IV. Of the 47 T-DNA insertions identified, 30% were found within the coding regions of genes. We also documented the insertion of T-DNA elements within the centromeric region of chromosome IV. In addition to these targeted T-DNA screens, we also mapped the genomic locations of 583 randomly chosen T-DNA elements by sequencing the genomic DNA flanking the insertion sites from individual T-DNA-transformed lines. 35% of these randomly chosen T-DNA insertions were located within the coding regions of genes. For comparison, coding sequences account for 44% of the Arabidopsis genome. Our results demonstrate that there is a small bias towards recovering T-DNA insertions within intergenic regions. However, this bias does not limit the utility of T-DNA as an effective insertional mutagen for use in reverse-genetic strategies.  相似文献   

15.
Cloning using bacterial artificial chromosomes (BACs) can yield high quality genomic libraries, which are used for the physical mapping, identification and isolation of genes, and for gene sequencing. A BAC genomic library was constructed from high molecular weight DNA (HMW DNA) obtained from nuclei of the cucumber (Cucumis sativus L. cv. Borszczagowski; B10 line). The DNA was digested with the HindIII restriction enzyme and ligated into the pCC1BAC vector. The library consists of 34,560 BAC clones with an average insert size of 135 kb, and 12.7x genome coverage. Screening the library for chloroplast and mitochondrial DNA content indicated an exceptionally low 0.26% contamination with chloroplast DNA and 0.3% with mitochondrial DNA.  相似文献   

16.
Remi-RFLP Mapping in the Dictyostelium Genome   总被引:6,自引:1,他引:5  
A. Kuspa  W. F. Loomis 《Genetics》1994,138(3):665-674
A set of 147 Dictyostelium discoideum strains was constructed by random integration of a vector containing rare restriction sites. The strains were generated by transformation using restriction enzymemediated integration (REMI) which results in the integration of linear DNA fragments into randomly distributed genomic restriction sites. Restriction fragment length polymorphism (RFLP) was generated in a single genomic site in each strain. These REMI-RFLP strains were used to confirm gene linkages previously supported by two other physical mapping techniques: yeast artificial chromosome (YAC) contig construction, and megabase-scale restriction mapping. New linkages were uncovered when two or more hybridization probes identified the same RFLP fragments. Probes for 100 genes have marked 53% of the RFLPs, representing greater than 22 Mb of the 40 Mb Dictyostelium genome. Alignment of these and other large fragments along each chromosome should lead to a complete physical map of the Dictyostelium genome.  相似文献   

17.
The Cre/loxP system is a powerful tool that has allowed the study of the effects of specific genes of interest in various biological settings. The Tyr::CreERT2 system allows for the targeted expression and activity of the Cre enzyme in the melanocyte lineage following treatment with tamoxifen, thus providing spatial and temporal control of the expression of specific target genes. Two independent transgenic mouse models, each containing a Tyr::CreERT2 transgene, have been generated and are widely used to study melanocyte transformation. In this study, we performed whole genome sequencing (WGS) on genomic DNA from the two Tyr::CreERT2 mouse models and identified their sites of integration in the C57BL/6 genome. Based on these results, we designed PCR primers to accurately, and efficiently, genotype transgenic mice. Finally, we discussed some of the advantages of each transgenic mouse model.  相似文献   

18.
Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluated the performance of genotyping-by-sequencing (GBS), one of the emerging NGS-based platforms, for genotyping two economically important conifer species, lodgepole pine (Pinus contorta) and white spruce (Picea glauca). Both species have very large genomes (>20,000 Mbp), are highly heterozygous, and lack reference sequences. From a small set (six accessions each) of independent replicated DNA samples and a 48-plex read depth, we obtained ~60,000 SNPs per species. After stringent filtering, we obtained 17,765 and 17,845 high-coverage SNPs without missing data for lodgepole pine and white spruce, respectively. Our results demonstrated that GBS is a robust and suitable method for genotyping conifers. The application of GBS to forest tree breeding and genomic selection is discussed.  相似文献   

19.
A bacterial artificial chromosome (BAC) library containing a large genomlc DNA insert is an important tool for genome physical mapping, map-based cloning, and genome sequencing. To Isolate genes via a map-based cloning strategy and to perform physical mapping of the cotton genome, a high-quality BAC library containing large cotton DNA Inserts Is needed. We have developed a BAC library of the restoring line 0-613-2R for Isolating the fertility restorer (Rf1) gene and genomic research in cotton (Gossypium hirsutum L.). The BAC library contains 97 825 clones stored In 255 pieces of a 384-well mlcrotiter plate. Random samples of BACs digested with the Notl enzyme Indicated that the average Insert size Is approximately 130 kb, with a range of 80-275 kb, and 95.7% of the BAC clones in the library have an average insert size larger than 100 kb. Based on a cotton genome size of 2 250 Mb, library coverage is 5.7 × haploid genome equivalents. Four clones were selected randomly from the library to determine the stability of the BAC clones. There were no different fingerprints for 0 and 100 generations of each clone digested with Notl and Hlndiii enzymes. Thus, the atabiiity of a single BAC clone can be sustained at iesat for 100 generations. Eight simple sequence repeat (SSR) markers flanking the Rf; gene were chosen to screen the BAC library by pool using PCR method and 25 positive clones were identified with 3.1 positive clones per SSR marker.  相似文献   

20.
DNA methylation is an epigenetic mark crucial in regulation of gene expression. Aberrant DNA methylation causes silencing of tumor suppressor genes and promotes chromosomal instability in human cancers. Most of previous studies for DNA methylation have focused on limited genomic regions, such as selected genes or promoter CpG islands (CGIs) containing recognition sites of methylation-sensitive restriction enzymes. Here, we describe a method for high-resolution analysis of DNA methylation using oligonucleotide tiling arrays. The input material is methylated DNA immunoprecipitated with anti-methylcytosine antibodies. We examined the ENCODE region (∼1% of human genome) in three human colorectal cancer cell lines and identified over 700 candidate methylated sites (CMS), where 24 of 25 CMS selected randomly were subsequently verified by bisulfite sequencing. CMS were enriched in the 5′ regulatory regions and the 3′ regions of genes. We also compared DNA methylation patterns with histone H3 and H4 acetylation patterns in the HOXA cluster region. Our analysis revealed no acetylated histones in the hypermethylated region, demonstrating reciprocal relationship between DNA methylation and histone H3 and H4 acetylation. Our method recognizes DNA methylation with little bias by genomic location and, therefore, is useful for comprehensive high-resolution analysis of DNA methylation providing new findings in the epigenomics. Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号