首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Despite recent advances in high‐throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club‐legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired‐end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild‐caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high‐throughput sequencing and graph‐based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.  相似文献   

2.
De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.  相似文献   

3.
Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.  相似文献   

4.
We developed 13 polymorphic microsatellite loci of the Japanese land leech (Haemadipsa japonica; Haemadipsidea) using an Illumina MiSeq sequencing approach. A total of 42,064 nuclear DNA contigs were filtered for microsatellite motifs, among which 30,873 simple sequence repeat loci were identified. From these sequences, we selected 30 primer sets, and 13 of these loci were successfully amplified. Polymorphism of the 13 loci was tested using 16 individuals sampled from sixteen populations across Japan. The number of alleles and polymorphism information content varied from 5 to 17 and 0.335 to 0.883, respectively, and observed and expected heterozygosity values ranged from 0.143 to 0.875 and 0.349 to 0.893, respectively, indicating that these loci are polymorphic. Furthermore, we established useful multiplex PCR using these loci. The 13 microsatellite loci described in this paper are the first nuclear microsatellite markers for a land leech species.  相似文献   

5.
该研究利用基于全基因组限制性酶切位点简化基因组测序技术(RAD seq技术),开发濒危植物羊踯躅(Rhododendron molle G. Don)全基因组SSR标记,并对3个群体共63份羊踯躅材料进行验证鉴定,为进一步研究羊踯躅的遗传多样性和群体遗传结构以及保护利用提供技术支持。结果显示:(1)羊踯躅基因组测序获得原始数据7.653G bp,过滤后为7.513G bp;经组装发现,羊踯躅171.534 M bp的基因组分布在498 252 contigs中。(2)通过SSR检测,在11 961 SSR位点中获得了11 687对SSR分子标记,并且二核苷酸为基序的重复类型最丰富,达51.76%。(3)随机选取128对SSR标记在6个羊踯躅株系中进行PCR扩增,获得20对高多态性的SSR标记。(4)用所选的20对多态性SSR标记对3个群体共63份羊踯躅材料进行验证分析发现,这些多态性SSR标记位点的等位基因数为4~16个,期望杂合度(He)为0.489~0.908。 研究表明,羊踯躅的SSR丰度适中,且二核苷酸为羊踯躅中最丰富的重复序列,该实验进一步证明RAD seq技术是一种经济有效的基因测序方法,实验中开发的SSR引物将有助于进一步研究羊踯躅和其他近缘种的群体结构和多样性。  相似文献   

6.
7.
Oil camellia trees are important woody plants for the production of high-quality cooking oil. On the contrary to their economic importance, their genetic and genomic resources are very limited, which greatly hamper the genetic studies on oil camellia trees. Microsatellites or simple sequence repeats (SSRs) have great value in many aspects of genetic analyses due to their high polymorphism and codominant inheritance. In this study, we report the large-scale development and characterization of SSR markers derived from genomic sequences of Camellia chekiangoleosa by high-throughput pyrosequencing technology. A total of 1,091,393 genomic shotgun reads were generated using Roche 454 FLX sequencer, the average read length was 319 bp, and the total sequence throughput was 347.9 Mb. These sequences were assembled into 35,315 contigs with total length of 14.8 Mb and the N50 contig size of 770 bp. By analyzing with microsatellite (MISA), a total of 5,844 perfect microsatellites were detected from the assembled sequences. Among them, tetranucleotide repeats were found to be the most frequent microsatellites in the genome of C. chekiangoleosa, and all the dominant repeat motifs for different types of SSRs were detected to be rich in A/T. Experimental analysis with 900 SSR primer pairs revealed that 66 % of them succeeded in PCR amplification. Further investigation with 345 SSR primer pairs showed that a relatively high percentage of primers amplified polymorphic loci (31.9 %). Experimental data also revealed that, overall, long microsatellite repeats (>20 bp) were more variable than the short ones (<20 bp) in the genome of oil camellia tree.  相似文献   

8.
9.
We present the development of a genomic library using RADseq (restriction site associated DNA sequencing) protocol for marker discovery that can be applied on evolutionary studies of the sugarcane borer Diatraea saccharalis, an important South American insect pest. A RADtag protocol combined with Illumina paired‐end sequencing allowed de novo discovery of 12 811 SNPs and a high‐quality assembly of 122.8M paired‐end reads from six individuals, representing 40 Gb of sequencing data. Approximately 1.7 Mb of the sugarcane borer genome distributed over 5289 minicontigs were obtained upon assembly of second reads from first reads RADtag loci where at least one SNP was discovered and genotyped. Minicontig lengths ranged from 200 to 611 bp and were used for functional annotation and microsatellite discovery. These markers will be used in future studies to understand gene flow and adaptation to host plants and control tactics.  相似文献   

10.
Ligumia nasuta (Say, 1817; Eastern Pondmussel) is an imperiled freshwater mussel (Unionidae) in eastern North America. Population declines in the Laurentian Great Lakes resulting from the introduction of dreissenid mussels and habitat destruction in the 20th Century have greatly reduced and limited its distribution. To properly inform restoration and management efforts for L. nasuta, fine-scale genetic analyses must be performed on the remnant populations. This study used Illumina paired-end shotgun sequencing to identify potential microsatellite loci for L. nasuta, utilizing two samples to develop the Illumina paired-end shotgun library. Forty-eight primer pairs were tested on the remaining 24 samples. Twenty-nine of the 48 microsatellite primer sets screened were successfully amplified using 24 L. nasuta samples collected from the Great Lakes watershed. The estimated fragment size ranged from 167 to 445 base-pairs (bp) and the number of alleles per locus varied between 5 and 16 (mean = 9.7). Only five of the loci deviated significantly from Hardy–Weinberg expectations after Bonferroni corrections. The development of these new microsatellite loci will greatly facilitate future genetic studies on L. nasuta.  相似文献   

11.
Rice bean (Vigna umbellata (Thunb.) Ohwi & Ohashi) is a warm season annual legume mainly grown in East Asia. Only scarce genomic resources are currently available for this legume crop species and no simple sequence repeat (SSR) markers have been specifically developed for rice bean yet. In this study, approximately 26 million high quality cDNA sequence reads were obtained from rice bean using Illumina paired-end sequencing technology and assembled into 71,929 unigenes with an average length of 986 bp. Of these unigenes, 38,840 (33.2%) showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases. Furthermore, 30,170 (76.3%) could be classified into gene ontology categories, 25,451 (64.4%) into Swiss-Prot categories and 21,982 (55.6%) into KOG database categories (E-value < 1.0E-5). A total of 9,301 (23.5%) were mapped onto 118 pathways using the Kyoto Encyclopedia of Genes and Genome (KEGG) pathway database. A total of 3,011 genic SSRs were identified as potential molecular markers. AG/CT (30.3%), AAG/CTT (8.1%) and AGAA/TTCT (20.0%) are the three main repeat motifs. A total of 300 SSR loci were randomly selected for validation by using PCR amplification. Of these loci, 23 primer pairs were polymorphic among 32 rice bean accessions. A UPGMA dendrogram revealed three major clusters among 32 rice bean accessions. The large number of SSR-containing sequences and genic SSRs in this study will be valuable for the construction of high-resolution genetic linkage maps, association or comparative mapping and genetic analyses of various Vigna species.  相似文献   

12.
Siberian stone pine, Pinus sibirica Du Tour is one of the most economically and environmentally important forest-forming species of conifers in Russia. To study these forests a large number of highly polymorphic molecular genetic markers, such as microsatellite loci, are required. Prior to the new high-throughput next generation sequencing (NGS) methods, discovery of microsatellite loci and development of micro-satellite markers were very time consuming and laborious. The recently developed draft assembly of the Siberian stone pine genome, sequenced using the NGS methods, allowed us to identify a large number of microsatellite loci in the Siberian stone pine genome and to develop species-specific PCR primers for amplification and genotyping of 70 microsatellite loci. The primers were designed using contigs containing short simple sequence tandem repeats from the Siberian stone pine whole genome draft assembly. Based on the testing of primers for 70 microsatellite loci with tri-, tetra- or pentanucleotide repeats, 18 most promising, reliable and polymorphic loci were selected that can be used further as molecular genetic markers in population genetic studies of Siberian stone pine.  相似文献   

13.
Molecular stock improvement techniques such as marker assisted selection have great potential in accelerating selective breeding programmes for animal production industries. However, the discovery and application of trait/marker associations usually requires a large number of genome-wide polymorphic loci. Here, we present 2322 unique microsatellites for the silver-lipped pearl oyster, Pinctada maxima, a species of aquaculture importance throughout the Indo-Australian Archipelago for production of the highly valued South Sea pearl. More than 1.2 million Roche 454 expressed sequence tag (EST) reads were screened for microsatellite repeat motifs. A total of 12,604 sequences contained either a di, tri, tetra, penta or hexa microsatellite repeat motif (n ≥ 6), with 6435 of these sequences having sufficient flanking regions for primer development. All identified microsatellites with designed primers were condensed into 2322 unique clusters (i.e., unique loci) of which 360 were shown to be polymorphic based on multiple sequence reads with different repeat motifs. Genotyping of five microsatellite loci demonstrated that in silico evaluation of polymorphism levels was a very useful method for identification of polymorphic loci, with the variation uncovered being a lower bound. Gene Ontology annotations of sequences containing microsatellites suggest that most are derived from a diverse array of unique genes. This EST derived microsatellite database will be a valuable resource for future studies in genetic map construction, diversity analysis, quantitative trait loci analysis, association mapping and marker assisted selection, not only for P. maxima, but also closely related species within the genus Pinctada.  相似文献   

14.

Background

Problems associated with using draft genome assemblies are well documented and have become more pronounced with the use of short read data for de novo genome assembly. We set out to improve the draft genome assembly of the African cichlid fish, Metriaclima zebra, using a set of Pacific Biosciences SMRT sequencing reads corresponding to 16.5× coverage of the genome. Here we characterize the improvements that these long reads allowed us to make to the state-of-the-art draft genome previously assembled from short read data.

Results

Our new assembly closed 68 % of the existing gaps and added 90.6Mbp of new non-gap sequence to the existing draft assembly of M. zebra. Comparison of the new assembly to the sequence of several bacterial artificial chromosome clones confirmed the accuracy of the new assembly. The closure of sequence gaps revealed thousands of new exons, allowing significant improvement in gene models. We corrected one known misassembly, and identified and fixed other likely misassemblies. 63.5 Mbp (70 %) of the new sequence was classified as repetitive and the new sequence allowed for the assembly of many more transposable elements.

Conclusions

Our improvements to the M. zebra draft genome suggest that a reasonable investment in long reads could greatly improve many comparable vertebrate draft genome assemblies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1930-5) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing.

Principal Findings

In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations.

Conclusions

This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.  相似文献   

16.
High-throughput RNA-Seq affords a cost and time effective means of obtaining large numbers of genetic markers for aquatic genomics. Here, we present thousands of novel microsatellite loci developed for the pearl oyster, Pinctada martensii from the Illumina HiSeq™ 2000 library of the pearl sac. Free user-friendly bioinformatics tools were employed to screen for microsatellite loci and design appropriate primers in 102,762 unigenes with 7216 microsatellite loci identified in total, 4862 of which had flanking sequences suitable for polymerase chain reaction primer design. The 50 randomly chosen primer pairs were tested in two populations of pearl oyster (base population (POP1) and selected population (POP2), with 30 individuals of each population). All the primer pairs were amplified successfully in two populations. All loci were polymorphic in POP1, while there were 3 loci showing monomorphism in POP2. In POP1 and POP2, observed heterozygosity from 0.033 to 1.000 and 0.000 to 1.000, 19 and 16 microsatellite loci deviated significantly from Hardy–Weinberg expectations including a Bonferroni correction (P < 0.001). Thirteen loci were highly informative content (PIC ≥ 0.5) in both populations. These identified loci will be useful for potential application for evolutionary, population genetic and chromosome linkage mapping research on pearl oyster.  相似文献   

17.
18.
19.
20.
Proteaceae, a largely southern hemisphere family consisting of 80 genera distributed in Australia and southern Africa as its centres of greatest diversity, also extends well in northern and southern America. Under this family, Grevillea robusta is a fast-growing species got popularity in farm and avenue plantations. Despite the ecological and economic importance, the species has not yet been investigated for its genetic improvement and genome-based studies. Only a few molecular markers are available for the species or its close relatives, which hinders  genomic and population genetics studies. Genetic markers have been intensively applied for the main strategies in breeding programs, especially for the economically important traits. Hence, it is of utmost priority to develop genomic database resources and species-specific markers for studying quantitative genetics in G. robusta. Given this, the present study aimed to develop de novo genome sequencing, robust microsatellites markers, sequence annotation and their validation in different stands of G. robusta in northern India. Library preparation and sequencing were carried out using Illumina paired-end sequencing technology. Approximately, ten gigabases (Gb) sequence data with 70.87 million raw reads assembled into 425,923 contigs (read mapped to 76.48%) comprising 455 Mb genome size (23 × coverage) generated through genome skimming approach. In total, 9421 simple sequence repeat (SSR) primer pairs were successfully designed from 13,335 microsatellite repeats. Afterward, a subset of 161 primer pairs was randomly selected, synthesized and validated. All the tested primers showed successful amplification but only 13 showed polymorphisms. The polymorphic SSRs were further used to estimate the measures of genetic diversity in 12 genotypes each from the states of Punjab, Haryana, Himachal Pradesh and Uttarakhand. Importantly, the average number of alleles (Na), observed heterozygosity (Ho), expected heterozygosity (He), and the polymorphism information content (PIC) were recorded as 2.69, 0.356, 0.557 and 0.388, respectively. The availability of sequence information and newly developed SSR markers could potentially be used in various genetic analyses and improvements through molecular breeding strategies for G. robusta.Supplementary InformationThe online version contains supplementary material available at 10.1007/s12298-021-01035-w.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号