首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.  相似文献   

2.
Estimating genetic diversity and inferring the evolutionary history of Plasmodium falciparum could be helpful in understanding origin and spread of virulent and drug‐resistant forms of the malaria pathogen and therefore contribute to malaria control programme. Genetic diversity of the whole mitochondrial (mt) genome of P. falciparum sampled across the major distribution ranges had been reported, but no Indian P. falciparum isolate had been analysed so far, even though India is highly endemic to P. falciparum malaria. We have sequenced the whole mt genome of 44 Indian field isolates and utilized published data set of 96 genome sequences to present global genetic diversity and to revisit the evolutionary history of P. falciparum. Indian P. falciparum presents high genetic diversity with several characteristics of ancestral populations and shares many of the genetic features with African and to some extent Papua New Guinean (PNG) isolates. Similar to African isolates, Indian P. falciparum populations have maintained high effective population size and undergone rapid expansion in the past with oldest time to the most recent common ancestor (TMRCA). Interestingly, one of the four single nucleotide polymorphisms (SNPs) that differentiates P. falciparum from P. falciparum‐like isolates (infecting non‐human primates in Africa) was found to be segregating in five Indian P. falciparum isolates. This SNP was in tight linkage with other two novel SNPs that were found exclusively in these five Indian isolates. The results on the mt genome sequence analyses of Indian isolates on the whole add to the current understanding on the evolutionary history of P. falciparum.  相似文献   

3.

Background  

Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user.  相似文献   

4.
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole‐genome shotgun sequencing of the nuclear genome of flax. Seven paired‐end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep‐coverage (approximately 94× raw, approximately 69× filtered) short‐sequence reads (44–100 bp), produced a set of scaffolds with N50 = 694 kb, including contigs with N50 = 20.1 kb. The contig assembly contained 302 Mb of non‐redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole‐genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis‐assembly of regions at the genome scale. A total of 43 384 protein‐coding genes were predicted in the whole‐genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (Ks) observed within duplicate gene pairs was consistent with a recent (5–9 MYA) whole‐genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam‐A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole‐genome shotgun short‐sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.  相似文献   

5.
Using next‐generation sequencing, we developed the first whole‐genome resources for two hybridizing Nothofagus species of the Patagonian forests that crucially lack genomic data, despite their ecological and industrial value. A de novo assembly strategy combining base quality control and optimization of the putative chloroplast gene map yielded ~32 000 contigs from 43% of the reads produced. With 12.5% of assembled reads, we covered ~96% of the chloroplast genome and ~70% of the mitochondrial gene content, providing functional and structural annotations for 112 and 52 genes, respectively. Functional annotation was possible on 15% of the contigs, with ~1750 potentially novel nuclear genes identified for Nothofagus species. We estimated that the new resources (13.41 Mb in total) included ~4000 gene regions representing ~6.5% of the expected genic partition of the genome, the remaining contigs potentially being nongenic DNA. A high‐quality single nucleotide polymorphisms resource was developed by comparing various filtering methods, and preliminary results indicate a strong conservation of cpDNA genomes in contrast to numerous exclusive nuclear polymorphisms in both species. Finally, we characterized 2274 potential simple sequence repeat (SSR) loci, designed primers for 769 of them and validated nine of 29 loci in 42 individuals per species. Nothofagus obliqua had more alleles (4.89) on average than N. nervosa (2.89), 8 SSRs were efficient to discriminate species, and three were successfully transferred in three other Nothofagus species. These resources will greatly help for future inferences of demographic, adaptive and hybridizing events in Nothofagus species, and for conserving and managing natural populations.  相似文献   

6.
Glycine latifolia (Benth.) Newell & Hymowitz (2= 40), one of the 27 wild perennial relatives of soybean, possesses genetic diversity and agronomically favorable traits that are lacking in soybean. Here, we report the 939‐Mb draft genome assembly of G. latifolia (PI 559298) using exclusively linked‐reads sequenced from a single Chromium library. We organized scaffolds into 20 chromosome‐scale pseudomolecules utilizing two genetic maps and the Glycine max (L.) Merr. genome sequence. High copy numbers of putative 91‐bp centromere‐specific tandem repeats were observed in consecutive blocks within predicted pericentromeric regions on several pseudomolecules. No 92‐bp putative centromeric repeats, which are abundant in G. max, were detected in G. latifolia or Glycine tomentella. Annotation of the assembled genome and subsequent filtering yielded a high confidence gene set of 54 475 protein‐coding loci. In comparative analysis with five legume species, genes related to defense responses were significantly overrepresented in Glycine‐specific orthologous gene families. A total of 304 putative nucleotide‐binding site (NBS)‐leucine‐rich‐repeat (LRR) genes were identified in this genome assembly. Different from other legume species, we observed a scarcity of TIR‐NBS‐LRR genes in G. latifolia. The G. latifolia genome was also predicted to contain genes encoding 367 LRR‐receptor‐like kinases, a family of proteins involved in basal defense responses and responses to abiotic stress. The genome sequence and annotation of G. latifolia provides a valuable source of alternative alleles and novel genes to facilitate soybean improvement. This study also highlights the efficacy and cost‐effectiveness of the application of Chromium linked‐reads in diploid plant genome de novo assembly.  相似文献   

7.
Cicer arietinum L. (chickpea) is the third most important food legume crop. We have generated the draft sequence of a desi‐type chickpea genome using next‐generation sequencing platforms, bacterial artificial chromosome end sequences and a genetic map. The 520‐Mb assembly covers 70% of the predicted 740‐Mb genome length, and more than 80% of the gene space. Genome analysis predicts the presence of 27 571 genes and 210 Mb as repeat elements. The gene expression analysis performed using 274 million RNA‐Seq reads identified several tissue‐specific and stress‐responsive genes. Although segmental duplicated blocks are observed, the chickpea genome does not exhibit any indication of recent whole‐genome duplication. Nucleotide diversity analysis provides an assessment of a narrow genetic base within the chickpea cultivars. We have developed a resource for genetic markers by comparing the genome sequences of one wild and three cultivated chickpea genotypes. The draft genome sequence is expected to facilitate genetic enhancement and breeding to develop improved chickpea varieties.  相似文献   

8.
9.
In this study, new chloroplast (cp) resources were developed for the genus Cynara, using whole cp genomes from 20 genotypes, by means of high‐throughput sequencing technologies. Our target species included seven globe artichokes, two cultivated cardoons, eight wild artichokes, and three other wild Cynara species (C. baetica, C. cornigera and C. syriaca). One complete cp genome was isolated using short reads from a whole‐genome sequencing project, while the others were obtained by means of long‐range PCR, for which primer pairs are provided here. A de novo assembly strategy combined with a reference‐based assembly allowed us to reconstruct each cp genome. Comparative analyses among the newly sequenced genotypes and two additional Cynara cp genomes (‘Brindisino’ artichoke and C. humilis) retrieved from public databases revealed 126 parsimony informative characters and 258 singletons in Cynara, for a total of 384 variable characters. Thirty‐nine SSR loci and 34 other INDEL events were detected. After data analysis, 37 primer pairs for SSR amplification were designed, and these molecular markers were subsequently validated in our Cynara genotypes. Phylogenetic analysis based on all cp variable characters provided the best resolution when compared to what was observed using only parsimony informative characters, or only short ‘variable’ cp regions. The evaluation of the molecular resources obtained from this study led us to support the ‘super‐barcode’ theory and consider the total cp sequence of Cynara as a reliable and valuable molecular marker for exploring species diversity and examining variation below the species level.  相似文献   

10.
Advanced resources for genome‐assisted research in barley (Hordeum vulgare) including a whole‐genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole‐genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA‐coding exome reduces barley genomic complexity more than 50‐fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in‐solution hybridization‐based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full‐length cDNAs and de novo assembled RNA‐Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA‐coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping‐by‐sequencing and genetic diversity analyzes.  相似文献   

11.
【背景】花椒根腐病的防治一直是生产中难以解决的问题,优良生防菌的筛选是微生物菌剂研发的重要方向。【目的】解析花椒根腐病拮抗菌T-1的遗传信息,深入挖掘其拮抗基因簇资源,揭示该菌的拮抗机制。【方法】采用平板对峙法、形态观察、生理生化测定结合分子生物学等方法进行拮抗菌的分离鉴定,同时对菌株进行全基因组测序,并对其序列进行分析及比较基因组学分析。【结果】分离获得的菌株经鉴定为贝莱斯芽孢杆菌,编号T-1,该菌对花椒根腐病的抑制率可达72%,可使菌丝前端的生长严重受阻,抑菌谱检测和花椒根片的离体拮抗试验结果表明,拮抗菌T-1具有较广的抑菌活性且离体状态下对花椒根片具有一定的拮抗作用。其全基因组序列数据提交到NCBI的SRA数据库中获得登录号为SRX11086663,基因组总长为3 886 726 bp,GC含量为46.42%,全基因组中有4015个编码基因,占总基因组的89.74%,比较基因组学分析结果显示,菌株T-1与贝莱斯芽孢杆菌模式菌株FZB42相似性高,拮抗基因簇预测结果发现B. velezensis T-1基因组序列中有12个编码次级代谢产物基因合成簇,其中8个与已知功能基因簇高度相似...  相似文献   

12.
Casuarina equisetifolia (C. equisetifolia), a conifer‐like angiosperm with resistance to typhoon and stress tolerance, is mainly cultivated in the coastal areas of Australasia. C. equisetifolia, making it a valuable model to study secondary growth associated genes and stress‐tolerance traits. However, the genome sequence is unavailable and therefore wood‐associated growth rate and stress resistance at the molecular level is largely unexplored. We therefore constructed a high‐quality draft genome sequence of C. equisetifolia by a combination of Illumina second‐generation sequencing reads and Pacific Biosciences single‐molecule real‐time (SMRT) long reads to advance the investigation of this species. Here, we report the genome assembly, which contains approximately 300 megabases (Mb) and scaffold size of N50 is 1.06 Mb. Additionally, gene annotation, assisted by a combination of prediction and RNA‐seq data, generated 29 827 annotated protein‐coding genes and 1983 non‐coding genes, respectively. Furthermore, we found that the total number of repetitive sequences account for one‐third of the genome assembly. Here we also construct the genome‐wide map of DNA modification, such as two novel forms N6‐adenine (6mA) and N4‐methylcytosine (4mC) at the level of single‐nucleotide resolution using single‐molecule real‐time (SMRT) sequencing. Interestingly, we found that 17% of 6mA modification genes and 15% of 4mC modification genes also included alternative splicing events. Finally, we investigated cellulose, hemicellulose, and lignin‐related genes, which were associated with secondary growth and contained different DNA modifications. The high‐quality genome sequence and annotation of C. equisetifolia in this study provide a valuable resource to strengthen our understanding of the diverse traits of trees.  相似文献   

13.

There is a natural floral organ mutant of rice (var. Jugal) where the florets, popularly known as spikelet bear multiple carpels and produce multiple kernels in most of its grain. In our earlier work a detailed study has been done on its morpho-anatomical structure with allelic diversity and expression study of the major genetic loci associated with floral organ development. In present study high throughput whole genome sequencing was done which generated about of 3.7 million base pair genomic data for downstream analysis. The reads were about 101 bases long and mapped to the Oryza sativa var. Nipponbare as reference genome. Genome wide variant analysis detected 1,096,419 variants which included 943,033 SNPs and 153,386 InDels. A total of 24,920 non-synonymous SNPs were identified for 11,529 identified genes. Chromosome-wise distribution of uniquely mapped reads onto reference genome showed that maximum reads were mapped to 1st chromosome and least to 9th chromosome. 10th chromosome showed highest density of variations (about 325.6 per 100 kb genome sequence). Detailed sequence analysis of 23 floral organ developmental genes detected 419 potent variants where DL (Drooping Leaf) and OSH1 (Oryza sativa Homeobox1) genes showed highest number (32) of variants; whereas, MADS21 (Minichromosome Agamous Deficient Serum Factor 21) gene have lowest number (5) of variants. The information generated in this study will enrich the genomics of floral organ development in indica rice and cereal crops in general.

  相似文献   

14.
15.
Chimonanthus salicifolius, a member of the Calycanthaceae of magnoliids, is one of the most famous medicinal plants in Eastern China. Here, we report a chromosome‐level genome assembly of Csalicifolius, comprising 820.1 Mb of genomic sequence with a contig N50 of 2.3 Mb and containing 36 651 annotated protein‐coding genes. Phylogenetic analyses revealed that magnoliids were sister to the eudicots. Two rounds of ancient whole‐genome duplication were inferred in the Csalicifolious genome. One is shared by Calycanthaceae after its divergence with Lauraceae, and the other is in the ancestry of Magnoliales and Laurales. Notably, long genes with > 20 kb in length were much more prevalent in the magnoliid genomes compared with other angiosperms, which could be caused by the length expansion of introns inserted by transposon elements. Homologous genes within the flavonoid pathway for Csalicifolius were identified, and correlation of the gene expression and the contents of flavonoid metabolites revealed potential critical genes involved in flavonoids biosynthesis. This study not only provides an additional whole‐genome sequence from the magnoliids, but also opens the door to functional genomic research and molecular breeding of Csalicifolius.  相似文献   

16.
Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean that is spreading across major soybean production regions worldwide. Increased SCN virulence has recently been observed in both the United States and China. However, no study has reported a genome assembly for H. glycines at the chromosome scale. Herein, the first chromosome‐level reference genome of X12, an unusual SCN race with high infection ability, is presented. Using whole‐genome shotgun (WGS) sequencing, Pacific Biosciences (PacBio) sequencing, Illumina paired‐end sequencing, 10X Genomics linked reads and high‐throughput chromatin conformation capture (Hi‐C) genome scaffolding techniques, a 141.01‐megabase (Mb) assembled genome was obtained with scaffold and contig N50 sizes of 16.27 Mb and 330.54 kilobases (kb), respectively. The assembly showed high integrity and quality, with over 90% of Illumina reads mapped to the genome. The assembly quality was evaluated using Core Eukaryotic Genes Mapping Approach and Benchmarking Universal Single‐Copy Orthologs. A total of 11,882 genes were predicted using de novo, homolog and RNAseq data generated from eggs, second‐stage juveniles (J2), third‐stage juveniles (J3) and fourth‐stage juveniles (J4) of X12, and 79.0% of homologous sequences were annotated in the genome. These high‐quality X12 genome data will provide valuable resources for research in a broad range of areas, including fundamental nematode biology, SCN–plant interactions and co‐evolution, and also contribute to the development of technology for overall SCN management.  相似文献   

17.
Dendrolimus spp. are important destructive pests of conifer forests, and Dendrolimus punctatus Walker (Lepidoptera; Lasiocampidae) is the most widely distributed Dendrolimus species. During periodic outbreaks, this species is said to make “fire without smoke” because large areas of pine forest can be quickly and heavily damaged. Yet, little is known about the molecular mechanisms that underlie the unique ecological characteristics of this forest insect. Here, we combined Pacific Biosciences (PacBio) RSII single‐molecule long reads and high‐throughput chromosome conformation capture (Hi‐C) genomics‐linked reads to produce a high‐quality, chromosome‐level reference genome for D. punctatus. The final assembly was 614 Mb with contig and scaffold N50 values of 1.39 and 22.15 Mb, respectively, and 96.96% of the contigs anchored onto 30 chromosomes. Based on the prediction, this genome contained 17,593 protein‐coding genes and 56.16% repetitive sequences. Phylogenetic analyses indicated that D. punctatus diverged from the common ancestor of Hyphantria cunea, Spodoptera litura and Thaumetopoea pityocampa ~ 108.91 million years ago. Many gene families that were expanded in the D. punctatus genome were significantly enriched for the xenobiotic biodegradation system, especially the cytochrome P450 gene family. This high‐quality, chromosome‐level reference genome will be a valuable resource for understanding mechanisms of D. punctatus outbreak and host resistance adaption. Because this is the first Lasiocampidae insect genome to be sequenced, it also will serve as a reference for further comparative genomics.  相似文献   

18.
Recent advances have highlighted the ubiquity of whole‐genome duplication (polyploidy) in angiosperms, although subsequent genome size change and diploidization (returning to a diploid‐like condition) are poorly understood. An excellent system to assess these processes is provided by Nicotiana section Repandae, which arose via allopolyploidy (approximately 5 million years ago) involving relatives of Nicotiana sylvestris and Nicotiana obtusifolia. Subsequent speciation in Repandae has resulted in allotetraploids with divergent genome sizes, including Nicotiana repanda and Nicotiana nudicaulis studied here, which have an estimated 23.6% genome expansion and 19.2% genome contraction from the early polyploid, respectively. Graph‐based clustering of next‐generation sequence data enabled assessment of the global genome composition of these allotetraploids and their diploid progenitors. Unexpectedly, in both allotetraploids, over 85% of sequence clusters (repetitive DNA families) had a lower abundance than predicted from their diploid relatives; a trend seen particularly in low‐copy repeats. The loss of high‐copy sequences predominantly accounts for the genome downsizing in N. nudicaulis. In contrast, N. repanda shows expansion of clusters already inherited in high copy number (mostly chromovirus‐like Ty3/Gypsy retroelements and some low‐complexity sequences), leading to much of the genome upsizing predicted. We suggest that the differential dynamics of low‐ and high‐copy sequences reveal two genomic processes that occur subsequent to allopolyploidy. The loss of low‐copy sequences, common to both allopolyploids, may reflect genome diploidization, a process that also involves loss of duplicate copies of genes and upstream regulators. In contrast, genome size divergence between allopolyploids is manifested through differential accumulation and/or deletion of high‐copy‐number sequences.  相似文献   

19.
The emergence of third‐generation sequencing (3GS; long‐reads) is bringing closer the goal of chromosome‐size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of nonmodel organisms. However, long‐read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short‐reads and long‐reads, provide an alternative efficient and cost‐effective approach to generate de novo, chromosome‐level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation are constantly being expanded and improved. This makes it difficult for nonexperts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of nonmodel organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline.  相似文献   

20.
Powdery mildew of wheat (Triticum aestivum L.) is caused by the ascomycete fungus Blumeria graminis f.sp. tritici. Genomic approaches open new ways to study the biology of this obligate biotrophic pathogen. We started the analysis of the Bg tritici genome with the low-pass sequencing of its genome using the 454 technology and the construction of the first genomic bacterial artificial chromosome (BAC) library for this fungus. High-coverage contigs were assembled with the 454 reads. They allowed the characterization of 56 transposable elements and the establishment of the Blumeria repeat database. The BAC library contains 12,288 clones with an average insert size of 115 kb, which represents a maximum of 7.5-fold genome coverage. Sequencing of the BAC ends generated 12.6 Mb of random sequence representative of the genome. Analysis of BAC-end sequences revealed a massive invasion of transposable elements accounting for at least 85% of the genome. This explains the unusually large size of this genome which we estimate to be at least 174 Mb, based on a large-scale physical map constructed through the fingerprinting of the BAC library. Our study represents a crucial step in the perspective of the determination and study of the whole Bg tritici genome sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号