首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Asparagus kiusianus is a disease-resistant dioecious plant species and a wild relative of garden asparagus (Asparagus officinalis). To enhance A. kiusianus genomic resources, advance plant science, and facilitate asparagus breeding, we determined the genome sequences of the male and female lines of A. kiusianus. Genome sequence reads obtained with a linked-read technology were assembled into four haplotype-phased contig sequences (∼1.6 Gb each) for the male and female lines. The contig sequences were aligned onto the chromosome sequences of garden asparagus to construct pseudomolecule sequences. Approximately 55,000 potential protein-encoding genes were predicted in each genome assembly, and ∼70% of the genome sequence was annotated as repetitive. Comparative analysis of the genomes of the two species revealed structural and sequence variants between the two species as well as between the male and female lines of each species. Genes with high sequence similarity with the male-specific sex determinant gene in A. officinalis, MSE1/AoMYB35/AspTDF1, were presented in the genomes of the male line but absent from the female genome assemblies. Overall, the genome sequence assemblies, gene sequences, and structural and sequence variants determined in this study will reveal the genetic mechanisms underlying sexual differentiation in plants, and will accelerate disease-resistance breeding in garden asparagus.  相似文献   

2.
A new approach to genome mapping and sequencing: slalom libraries   总被引:2,自引:2,他引:0       下载免费PDF全文
We describe here an efficient strategy for simultaneous genome mapping and sequencing. The approach is based on physically oriented, overlapping restriction fragment libraries called slalom libraries. Slalom libraries combine features of general genomic, jumping and linking libraries. Slalom libraries can be adapted to different applications and two main types of slalom libraries are described in detail. This approach was used to map and sequence (with ~46% coverage) two human P1-derived artificial chromosome (PAC) clones, each of ~100 kb. This model experiment demonstrates the feasibility of the approach and shows that the efficiency (cost-effectiveness and speed) of existing mapping/sequencing methods could be improved at least 5–10-fold. Furthermore, since the efficiency of contig assembly in the slalom approach is virtually independent of length of sequence reads, even short sequences produced by rapid, high throughput sequencing techniques would suffice to complete a physical map and a sequence scan of a small genome.  相似文献   

3.
4.
5.
Japanese chestnut (Castanea crenata Sieb. et Zucc.), unlike other Castanea species, is resistant to most diseases and wasps. However, genomic data of Japanese chestnut that could be used to determine its biotic stress resistance mechanisms have not been reported to date. In this study, we employed long-read sequencing and genetic mapping to generate genome sequences of Japanese chestnut at the chromosome level. Long reads (47.7 Gb; 71.6× genome coverage) were assembled into 781 contigs, with a total length of 721.2 Mb and a contig N50 length of 1.6 Mb. Genome sequences were anchored to the chestnut genetic map, comprising 14,973 single nucleotide polymorphisms (SNPs) and covering 1,807.8 cM map distance, to establish a chromosome-level genome assembly (683.8 Mb), with 69,980 potential protein-encoding genes and 425.5 Mb repetitive sequences. Furthermore, comparative genome structure analysis revealed that Japanese chestnut shares conserved chromosomal segments with woody plants, but not with herbaceous plants, of rosids. Overall, the genome sequence data of Japanese chestnut generated in this study is expected to enhance not only its genetics and genomics but also the evolutionary genomics of woody rosids.  相似文献   

6.
7.

Background

The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy.

Results

The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries.

Conclusions

The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes.  相似文献   

8.
Liriodendron tulipifera L., a member of Magnoliaceae in the order Magnoliales, has been used extensively as a reference species in studies on plant evolution. However, genomic resources for this tree species are limited. We constructed cDNA libraries from ten different types of tissues: premeiotic flower buds, postmeiotic flower buds, open flowers, developing fruit, terminal buds, leaves, cambium, xylem, roots, and seedlings. EST sequences were generated either by 454 GS FLX or Sanger methods. Assembly of almost 2.4 million sequencing reads from all libraries resulted in 137,923 unigenes (132,905 contigs and 4,599 singletons). About 50% of the unigenes had significant matches to publically available plant protein sequences, representing a wide variety of putative functions. Approximately 30,000 simple sequence repeats were identified. More than 97% of the cell wall formation genes in the Cell Wall Navigator and the MAIZEWALL databases are represented. The cinnamyl alcohol dehydrogenase (CAD) homologs identified in the L. tulipifera EST dataset showed different expression levels in the ten tissue types included in this study. In particular, the LtuCAD1 was found to partially recover the stiffness of the floral stems in the Arabidopsis thaliana CAD4 and CAD5 double mutant plants, of the LtuCAD1 in lignin biosynthesis. L. tulipifera genes have greater sequence similarity to homologs from other woody angiosperm species than to non-woody model plants. This large-scale genomic resour"HistryDatesce will be instrumental for gene discovery, cDNA microarray production, and marker-assisted breeding in L. tulipifera, and strengthen this species' role in comparative studies.  相似文献   

9.
《Journal of Asia》2020,23(3):816-824
Leptotrombidium pallidum is the major vector mite for Orientia tsutsugamushi, the causative agent of scrub typhus, in Asian countries, including Korea. Despite its medical importance, L. pallidum has little genetic information available to date. To analyze the L. pallidum genome, we extracted genomic DNA (gDNA) from a single female of a 7-generation inbred L. pallidum colony and amplified the gDNA by whole genome amplification (WGA). The resulting amplified gDNA was used to construct paired-end and mate-pair libraries that were sequenced using Illumina platforms (HiSeq2000 and MiSeq). More than 45 Gb of sequence reads from both paired-end and mate-pair libraries of the WGA gDNA were trimmed and then de novo assembled using CLC Assembly Cell v.4.0 for contig assembly and SSPACE for scaffolding. The assembly generated approximately 6,545 scaffolds with an N50 value of 92,945 and total size of ~ 193 Mb. For gene predictions, the PASA and GeneWise models were used, and ab initio gene predictions were performed independently, resulting in the prediction of 15,842 genes. RNA-Seq expression profiles revealed constitutive expression of 11,572 unique protein-coding genes in larva, 12,364 in protonymph, 12,872 in male adult, and 12,617 in female adult stages. Of the 15,842 predicted genes, 10,885 were commonly expressed through all L. pallidum stages. Genes selectively over-transcribed in the larval stage, which is when host parasitization and disease transmission occur, were further annotated, and their putative roles were discussed.  相似文献   

10.
Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified.  相似文献   

11.
Oil camellia trees are important woody plants for the production of high-quality cooking oil. On the contrary to their economic importance, their genetic and genomic resources are very limited, which greatly hamper the genetic studies on oil camellia trees. Microsatellites or simple sequence repeats (SSRs) have great value in many aspects of genetic analyses due to their high polymorphism and codominant inheritance. In this study, we report the large-scale development and characterization of SSR markers derived from genomic sequences of Camellia chekiangoleosa by high-throughput pyrosequencing technology. A total of 1,091,393 genomic shotgun reads were generated using Roche 454 FLX sequencer, the average read length was 319 bp, and the total sequence throughput was 347.9 Mb. These sequences were assembled into 35,315 contigs with total length of 14.8 Mb and the N50 contig size of 770 bp. By analyzing with microsatellite (MISA), a total of 5,844 perfect microsatellites were detected from the assembled sequences. Among them, tetranucleotide repeats were found to be the most frequent microsatellites in the genome of C. chekiangoleosa, and all the dominant repeat motifs for different types of SSRs were detected to be rich in A/T. Experimental analysis with 900 SSR primer pairs revealed that 66 % of them succeeded in PCR amplification. Further investigation with 345 SSR primer pairs showed that a relatively high percentage of primers amplified polymorphic loci (31.9 %). Experimental data also revealed that, overall, long microsatellite repeats (>20 bp) were more variable than the short ones (<20 bp) in the genome of oil camellia tree.  相似文献   

12.
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).  相似文献   

13.
14.
Chokecherry (Prunus virginiana L.) (2n?=?4x?=?32) is a unique Prunus species for both genetics and disease-resistance research due to its tetraploid nature and X-disease resistance. However, no genetic and genomic information on chokecherry is available. A partial chokecherry genome was sequenced using Roche 454 sequencing technology. A total of 145,094 reads covering 4.8?Mbp of the chokecherry genome were generated and 15,113 contigs were assembled, of which 11,675 contigs were larger than 100?bp in size. A total of 481 SSR loci were identified from 234 (out of 11,675) contigs and 246 polymerase chain reaction (PCR) primer pairs were designed. Of 246 primers, 212 (86.2?%) effectively produced amplification from the genomic DNA of chokecherry. All 212 amplifiable chokecherry primers were used to amplify genomic DNA from 11 other rosaceous species (sour cherry, sweet cherry, black cherry, peach, apricot, plum, apple, crabapple, pear, juneberry, and raspberry). Thus, chokecherry SSR primers can be transferable across Prunus species and other rosaceous species. An average of 63.2 and 58.7?% of amplifiable chokecherry primers amplified DNA from cherry and other Prunus species, respectively, while 47.2?% of amplifiable chokecherry primers amplified DNA from other rosaceous species. Using random genome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no sequence information available. Sequence information and confirmed transferability of the identified chokecherry SSRs among species will be valuable for genetic research in Prunus and other rosaceous species. Key message A total of 246 SSR primers were identified from chokecherry genome sequences. Of which, 212 were confirmed amplifiable both in chokecherry and other 11 other rosaceous species.  相似文献   

15.
Characterization of the segmental duplication LCR7-20 in the human genome   总被引:1,自引:0,他引:1  
Liu X  Li X  Li M  Acimovic YJ  Li Z  Scherer SW  Estivill X  Tsui LC 《Genomics》2004,83(2):262-269
Our previous study described the amplification of a genomic sequence containing exon 9 of CFTR in the human genome. Here we report that this CFTR sequence is part of a large duplicated sequence unit, provisionally named LCR7-20. Through successive screening of two human chromosome 7-specific cosmid libraries to construct a cosmid contig, we assembled two sequenced BAC clones into a single contig containing a prototypic LCR7-20 unit. Subsequent searches of existing human genome sequences identified additional six copies of LCR7-20-like sequences with more than 90% sequence homology. Additional genomic clones containing LCR7-20-like sequences were then isolated from total genomic BAC and PAC libraries. Restriction fragment analysis and limited sequencing data indicated that there could be around 30 copies of LCR7-20-like sequences in the human genome and that the average region of homology could extend over 120 kb. As indicated by fluorescence in situ hybridization analysis, LCR7-20-like sequences are dispersed on different chromosomes, mainly in the centromeric and pericentromeric regions, and some may exist in tandem copies. Our study also indicates that many genomic regions containing LCR7-20's either have been misassembled or are missing in current versions of the human genome sequence.  相似文献   

16.
Next‐generation sequencing continues to revolutionize biodiversity studies by generating unprecedented amounts of DNA sequence data for comparative genomic analysis. However, these data are produced as millions or billions of short reads of variable quality that cannot be directly applied in comparative analyses, creating a demand for methods to facilitate assembly. We optimized an in silico strategy to efficiently reconstruct high‐quality mitochondrial genomes directly from genomic reads. We tested this strategy using sequences from five species of frogs: Hylodes meridionalis (Hylodidae), Hyloxalus yasuni (Dendrobatidae), Pristimantis fenestratus (Craugastoridae), and Melanophryniscus simplex and Rhinella sp. (Bufonidae). These are the first mitogenomes published for these species, the genera Hylodes, Hyloxalus, Pristimantis, Melanophryniscus and Rhinella, and the families Craugastoridae and Hylodidae. Sequences were generated using only half of one lane of a standard Illumina HiqSeq 2000 flow cell, resulting in fewer than eight million reads. We analysed the reads of Hylodes meridionalis using three different assembly strategies: (1) reference‐based (using bowtie2 ); (2) de novo (using abyss , soapdenovo2 and velvet ); and (3) baiting and iterative mapping (using mira and mitobim ). Mitogenomes were assembled exclusively with strategy 3, which we employed to assemble the remaining mitogenomes. Annotations were performed with mitos and confirmed by comparison with published amphibian mitochondria. In most cases, we recovered all 13 coding genes, 22 tRNAs, and two ribosomal subunit genes, with minor gene rearrangements. Our results show that few raw reads can be sufficient to generate high‐quality scaffolds, making any Illumina machine run using genomic multiplex libraries a potential source of data for organelle assemblies as by‐catch.  相似文献   

17.
18.

Background

Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood.

Results

We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results.

Conclusions

In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号