首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Chinese liquorice/licorice (Glycyrrhiza uralensis) is a leguminous plant species whose roots and rhizomes have been widely used as a herbal medicine and natural sweetener. Whole‐genome sequencing is essential for gene discovery studies and molecular breeding in liquorice. Here, we report a draft assembly of the approximately 379‐Mb whole‐genome sequence of strain 308‐19 of G. uralensis; this assembly contains 34 445 predicted protein‐coding genes. Comparative analyses suggested well‐conserved genomic components and collinearity of gene loci (synteny) between the genome of liquorice and those of other legumes such as Medicago and chickpea. We observed that three genes involved in isoflavonoid biosynthesis, namely, 2‐hydroxyisoflavanone synthase (CYP93C), 2,7,4′‐trihydroxyisoflavanone 4′‐O‐methyltransferase/isoflavone 4′‐O‐methyltransferase (HI4OMT) and isoflavone‐7‐O‐methyltransferase (7‐IOMT) formed a cluster on the scaffold of the liquorice genome and showed conserved microsynteny with Medicago and chickpea. Based on the liquorice genome annotation, we predicted genes in the P450 and UDP‐dependent glycosyltransferase (UGT) superfamilies, some of which are involved in triterpenoid saponin biosynthesis, and characterised their gene expression with the reference genome sequence. The genome sequencing and its annotations provide an essential resource for liquorice improvement through molecular breeding and the discovery of useful genes for engineering bioactive components through synthetic biology approaches.  相似文献   

3.
Inserts of DNA from extranuclear sources, such as organelles and microbes, are common in eukaryote nuclear genomes. However, sequence similarity between the nuclear and extranuclear DNA, and a history of multiple insertions, make the assembly of these regions challenging. Consequently, the number, sequence and location of these vagrant DNAs cannot be reliably inferred from the genome assemblies of most organisms. We introduce two statistical methods to estimate the abundance of nuclear inserts even in the absence of a nuclear genome assembly. The first (intercept method) only requires low-coverage (<1×) sequencing data, as commonly generated for population studies of organellar and ribosomal DNAs. The second method additionally requires that a subset of the individuals carry extranuclear DNA with diverged genotypes. We validated our intercept method using simulations and by re-estimating the frequency of human NUMTs (nuclear mitochondrial inserts). We then applied it to the grasshopper Podisma pedestris, exceptional for both its large genome size and reports of numerous NUMT inserts, estimating that NUMTs make up 0.056% of the nuclear genome, equivalent to >500 times the mitochondrial genome size. We also re-analysed a museomics data set of the parrot Psephotellus varius, obtaining an estimate of only 0.0043%, in line with reports from other species of bird. Our study demonstrates the utility of low-coverage high-throughput sequencing data for the quantification of nuclear vagrant DNAs. Beyond quantifying organellar inserts, these methods could also be used on endosymbiont-derived sequences. We provide an R implementation of our methods called “vagrantDNA” and code to simulate test data sets.  相似文献   

4.
5.
RenSeq is a NB‐LRR (nucleotide binding‐site leucine‐rich repeat) gene‐targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM, and increased the number of identified NB‐LRRs from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB‐LRRs and can be accessed through a genome browser that we provide. We compared these NB‐LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum ‘Heinz 1706’ extended the NB‐LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co‐segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild Solanum species S. berthaultii (Rpi‐ber2) and S. ruiz‐ceballosii (Rpi‐rzc1), we were able to apply RenSeq successfully to identify markers that co‐segregate with resistance towards the late blight pathogen Phytophthora infestans. These SNP identification workflows were designed as easy‐to‐adapt Galaxy pipelines.  相似文献   

6.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

7.
An essential component of microtubules, alpha-tubulin is also a multigene family in many species. An orthology-based nomenclature for this gene family has previously been difficult to assign due to incomplete genome builds and the high degree of sequence similarity between members of this family. Using the current genome builds, sequence analysis of human, mouse, and rat alpha-tubulin genes has enabled an updated nomenclature to be generated. This revised nomenclature provides a unified language for the discussion of these genes in mammalian species; it has been approved by the gene nomenclature committees of the three species and is supported by researchers in the field.  相似文献   

8.
We report on a whole‐genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole‐genome shotgun sequencing of the 7.9‐Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low‐copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high‐density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high‐density genetic map and establishing a synteny‐based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome‐wide map of selection signals differentiating the divergent gene pools. This rye whole‐genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome‐based breeding in rye.  相似文献   

9.
10.
Casuarina equisetifolia (C. equisetifolia), a conifer‐like angiosperm with resistance to typhoon and stress tolerance, is mainly cultivated in the coastal areas of Australasia. C. equisetifolia, making it a valuable model to study secondary growth associated genes and stress‐tolerance traits. However, the genome sequence is unavailable and therefore wood‐associated growth rate and stress resistance at the molecular level is largely unexplored. We therefore constructed a high‐quality draft genome sequence of C. equisetifolia by a combination of Illumina second‐generation sequencing reads and Pacific Biosciences single‐molecule real‐time (SMRT) long reads to advance the investigation of this species. Here, we report the genome assembly, which contains approximately 300 megabases (Mb) and scaffold size of N50 is 1.06 Mb. Additionally, gene annotation, assisted by a combination of prediction and RNA‐seq data, generated 29 827 annotated protein‐coding genes and 1983 non‐coding genes, respectively. Furthermore, we found that the total number of repetitive sequences account for one‐third of the genome assembly. Here we also construct the genome‐wide map of DNA modification, such as two novel forms N6‐adenine (6mA) and N4‐methylcytosine (4mC) at the level of single‐nucleotide resolution using single‐molecule real‐time (SMRT) sequencing. Interestingly, we found that 17% of 6mA modification genes and 15% of 4mC modification genes also included alternative splicing events. Finally, we investigated cellulose, hemicellulose, and lignin‐related genes, which were associated with secondary growth and contained different DNA modifications. The high‐quality genome sequence and annotation of C. equisetifolia in this study provide a valuable resource to strengthen our understanding of the diverse traits of trees.  相似文献   

11.
12.
Next‐generation whole‐genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence‐based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost‐efficient establishment of powerful genomic information for many species.  相似文献   

13.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

14.
Abstract

The in vitro replication of DNA, principally using the polymerase chain reaction (PCR), permits the amplification of defined sequences of DNA. By exponentially amplifying a target sequence, PCR significantly enhances the probability of detecting target gene sequences in complex mixtures of DNA. It also facilitates the cloning and sequencing of genes. Amplification of DNA by PCR and other newly developed methods has been applied in many areas of biological research, including molecular biology, biotechnology, and medicine, permitting studies that were not possible before. Nucleic acid amplification has added a new and revolutionary dimension to molecular biology. This review examines PCR and other in vitro nucleic acid amplification methodologies—examining the critical parameters and variations and their widespread applications—giving the strengths and limitations of these methodologies.  相似文献   

15.
16.
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.  相似文献   

17.
Genetic and genomics tools to characterize host–pathogen interactions are disproportionately directed to the host because of the focus on resistance. However, understanding the genetics of pathogen virulence is equally important and has been limited by the high cost of de novo genotyping of species with limited marker data. Non‐resource‐prohibitive methods that overcome the limitation of genotyping are now available through genotype‐by‐sequencing (GBS). The use of a two‐enzyme restriction‐associated DNA (RAD)‐GBS method adapted for Ion Torrent sequencing technology provided robust and reproducible high‐density genotyping of several fungal species. A total of 5783 and 2373 unique loci, ‘sequence tags’, containing 16 441 and 9992 single nucleotide polymorphisms (SNPs) were identified and characterized from natural populations of Pyrenophora teres f. maculata and Sphaerulina musiva, respectively. The data generated from the P. teres f. maculata natural population were used in association mapping analysis to map the mating‐type gene to high resolution. To further validate the methodology, a biparental population of P. teres f. teres, previously used to develop a genetic map utilizing simple sequence repeat (SSR) and amplified fragment length polymorphism (AFLP) markers, was re‐analysed using the SNP markers generated from this protocol. A robust genetic map containing 1393 SNPs on 997 sequence tags spread across 15 linkage groups with anchored reference markers was generated from the P. teres f. teres biparental population. The robust high‐density markers generated using this protocol will allow positional cloning in biparental fungal populations, association mapping of natural fungal populations and population genetics studies.  相似文献   

18.
The Tetraodontidae family are known to have relatively small and compact genomes compared to other vertebrates. The obscure puffer fish Takifugu obscurus is an anadromous species that migrates to freshwater from the sea for spawning. Thus the euryhaline characteristics of T. obscurus have been investigated to gain understanding of their survival ability, osmoregulation, and other homeostatic mechanisms in both freshwater and seawater. In this study, a high quality chromosome‐level reference genome for T. obscurus was constructed using long‐read Pacific Biosciences (PacBio) Sequel sequencing and a Hi‐C‐based chromatin contact map platform. The final genome assembly of T. obscurus is 381 Mb, with a contig N50 length of 3,296 kb and longest length of 10.7 Mb, from a total of 62 Gb of raw reads generated using single‐molecule real‐time sequencing technology from a PacBio Sequel platform. The PacBio data were further clustered into chromosome‐scale scaffolds using a Hi‐C approach, resulting in a 373 Mb genome assembly with a contig N50 length of 15.2 Mb and and longest length of 28 Mb. When we directly compared the 22 longest scaffolds of T. obscurus to the 22 chromosomes of the tiger puffer Takifugu rubripes, a clear one‐to‐one orthologous relationship was observed between the two species, supporting the chromosome‐level assembly of T. obscurus. This genome assembly can serve as a valuable genetic resource for exploring fugu‐specific compact genome characteristics, and will provide essential genomic information for understanding molecular adaptations to salinity fluctuations and the evolution of osmoregulatory mechanisms.  相似文献   

19.
Predicting protein domains is essential for understanding a protein’s function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.  相似文献   

20.
The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号