首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A BAC-based integrated linkage map of the silkworm Bombyx mori   总被引:3,自引:0,他引:3  

Background

In 2004, draft sequences of the model lepidopteran Bombyx mori were reported using whole-genome shotgun sequencing. Because of relatively shallow genome coverage, the silkworm genome remains fragmented, hampering annotation and comparative genome studies. For a more complete genome analysis, we developed extended scaffolds combining physical maps with improved genetic maps.

Results

We mapped 1,755 single nucleotide polymorphism (SNP) markers from bacterial artificial chromosome (BAC) end sequences onto 28 linkage groups using a recombining male backcross population, yielding an average inter-SNP distance of 0.81 cM (about 270 kilobases). We constructed 6,221 contigs by fingerprinting clones from three BAC libraries digested with different restriction enzymes, and assigned a total of 724 single copy genes to them by BLAST (basic local alignment search tool) search of the BAC end sequences and high-density BAC filter hybridization using expressed sequence tags as probes. We assigned 964 additional expressed sequence tags to linkage groups by restriction fragment length polymorphism analysis of a nonrecombining female backcross population. Altogether, 361.1 megabases of BAC contigs and singletons were integrated with a map containing 1,688 independent genes. A test of synteny using Oxford grid analysis with more than 500 silkworm genes revealed six versus 20 silkworm linkage groups containing eight or more orthologs of Apis versus Tribolium, respectively.

Conclusion

The integrated map contains approximately 10% of predicted silkworm genes and has an estimated 76% genome coverage by BACs. This provides a new resource for improved assembly of whole-genome shotgun data, gene annotation and positional cloning, and will serve as a platform for comparative genomics and gene discovery in Lepidoptera and other insects.  相似文献   

2.
Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene‐rich regions. Gene‐enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene‐enrichment strategy, we have compared assemblies using methyl‐filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA‐seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single‐nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop.  相似文献   

3.
4.

Background

The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.

Results

Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.

Conclusions

The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.  相似文献   

5.
The sex chromosomes of the silkworm Bombyx mori are designated ZW(XY) for females and ZZ (XX) for males. Numerous long terminal repeat (LTR) and non-LTR retrotransposons, retroposons and DNA transposons have accumulated as strata on the W chromosome. However, there are nucleotide sequences that do not show the characteristics of typical transposable elements on the W chromosome. To analyse these uncharacterized nucleotide sequences on the W chromosome, we used whole-genome shotgun (WGS) data and assembled data that was obtained using male genome DNA. Through these analyses, we found that almost all of these uncharacterized sequences were non-autonomous transposable elements that do not fit into the conventional classification. It is notable that some of these transposable elements contained the Bombyx short interspersed element (Bm1) sequences in the elements. We designated them as secondary-Bm1 transposable elements (SBTEs). Because putative ancestral SBTE nucleotide sequences without Bm1 do not occur in the WGS data, we suggest that the Bm1 sequences of SBTEs are not carried on each element merely as a package but are components of each element. Therefore, we confirmed that SBTEs should be classified as a new group of transposable elements.  相似文献   

6.
A second-generation linkage map was constructed for the silkworm, Bombyx mori, focusing on mapping Bombyx sequences appearing in public nucleotide databases and bacterial artificial chromosome (BAC) contigs. A total of 874 BAC contigs containing 5067 clones (22% of the library) were constructed by PCR-based screening with sequence-tagged sites (STSs) derived from whole-genome shotgun (WGS) sequences. A total of 523 BAC contigs, including 342 independent genes registered in public databases and 85 expressed sequence tags (ESTs), were placed onto the linkage map. We found significant synteny and conserved gene order between B. mori and a nymphalid butterfly, Heliconius melpomene, in four linkage groups (LGs), strongly suggesting that using B. mori as a reference for comparative genomics in Lepidotera is highly feasible.  相似文献   

7.
Microsatellites, or simple sequence repeats (SSRs), are highly polymorphic and universally distributed in eukaryotes. SSRs have been used extensively as sequence tagged markers in genetic studies. Recently, the functional and evolutionary importance of SSRs has received considerable attention. Here we report the mining and characterization of the SSRs in papaya genome. We analyzed SSRs from 277.4 Mb of whole genome shotgun (WGS) sequences, 51.2 Mb bacterial artificial chromosome (BAC) end sequences (BES), and 13.4 Mb expressed sequence tag (EST) sequences. The papaya SSR density was one SSR per 0.7 kb of DNA sequence in the WGS, which was higher than that in BES and EST sequences. SSR abundance was dramatically reduced as the repeat length increased. According to SSR motif length, dinucleotide repeats were the most common motif in class I, whereas hexanucleotides were the most copious in class II SSRs. The tri- and hexanucleotide repeats of both classes were greater in EST sequences compared to genomic sequences. In class I SSR, AT and AAT were the most frequent motifs in BES and WGS sequences. By contrast, AG and AAG were the most abundant in EST sequences. For SSR marker development, 9,860 primer pairs were surveyed for amplification and polymorphism. Successful amplification and polymorphic rates were 66.6% and 17.6%, respectively. The highest polymorphic rates were achieved by AT, AG, and ATG motifs. The genome wide analysis of microsatellites revealed their frequency and distribution in papaya genome, which varies among plant genomes. This complete set of SSRs markers throughout the genome will assist diverse genetic studies in papaya and related species.  相似文献   

8.
The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.  相似文献   

9.
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.  相似文献   

10.
Chibana H  Oka N  Nakayama H  Aoyama T  Magee BB  Magee PT  Mikami Y 《Genetics》2005,170(4):1525-1537
The size of the genome in the opportunistic fungus Candida albicans is 15.6 Mb. Whole-genome shotgun sequencing was carried out at Stanford University where the sequences were assembled into 412 contigs. C. albicans is a diploid basically, and analysis of the sequence is complicated due to repeated sequences and to sequence polymorphism between homologous chromosomes. Chromosome 7 is 1 Mb in size and the best characterized of the 8 chromosomes in C. albicans. We assigned 16 of the contigs, ranging in length from 7309 to 267,590 bp, to chromosome 7 and determined sequences of 16 regions. These regions included four gaps, a misassembled sequence, and two major repeat sequences (MRS) of >16 kb. The length of the continuous sequence attained was 949,626 bp and provided complete coverage of chromosome 7 except for telomeric regions. Sequence analysis was carried out and predicted 404 genes, 11 of which included at least one intron. A 7-kb indel, which might be caused by a retrotransposon, was identified as the largest difference between the homologous chromosomes. Synteny analysis revealed that the degree of synteny between C. albicans and Saccharomyces cerevisiae is too weak to use for completion of the genomic sequence in C. albicans.  相似文献   

11.

Background

Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.

Results

We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.

Conclusion

Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.  相似文献   

12.
Japanese chestnut (Castanea crenata Sieb. et Zucc.), unlike other Castanea species, is resistant to most diseases and wasps. However, genomic data of Japanese chestnut that could be used to determine its biotic stress resistance mechanisms have not been reported to date. In this study, we employed long-read sequencing and genetic mapping to generate genome sequences of Japanese chestnut at the chromosome level. Long reads (47.7 Gb; 71.6× genome coverage) were assembled into 781 contigs, with a total length of 721.2 Mb and a contig N50 length of 1.6 Mb. Genome sequences were anchored to the chestnut genetic map, comprising 14,973 single nucleotide polymorphisms (SNPs) and covering 1,807.8 cM map distance, to establish a chromosome-level genome assembly (683.8 Mb), with 69,980 potential protein-encoding genes and 425.5 Mb repetitive sequences. Furthermore, comparative genome structure analysis revealed that Japanese chestnut shares conserved chromosomal segments with woody plants, but not with herbaceous plants, of rosids. Overall, the genome sequence data of Japanese chestnut generated in this study is expected to enhance not only its genetics and genomics but also the evolutionary genomics of woody rosids.  相似文献   

13.
We report the results of a study on the effectiveness of Cot filtration (CF) in the characterization of the gene space of bread wheat (Triticum aestivum L.), a large genome species (1C = 16,700 Mb) of tremendous agronomic importance. Using published Cot data as a guide, 2 genomic libraries for hexaploid wheat were constructed from the single-stranded DNA collected at Cot values > 1188 and 1639 M x s. Compared with sequences from a whole genome shotgun library from Aegilops tauschii (the D genome donor of bread wheat), the CF libraries exhibited 13.7-fold enrichment in genes, 5.8-fold enrichment in unknown low-copy sequences, and a 3-fold reduction in repetitive DNA. CF is twice as efficient as methylation filtration at enriching wheat genes. This research suggests that, with improvements, CF will be a highly useful tool in sequencing the gene space of wheat.  相似文献   

14.
A new approach to sequencing and assembling a highly heterozygous genome, that of grape, species Vitis vinifera cv Pinot Noir, is described. The combining of genome shotgun of paired reads produced by Sanger sequencing and sequencing by synthesis of unpaired reads was shown to be an efficient procedure for decoding a complex genome. About 2 million SNPs and more than a million heterozygous gaps have been identified in the 500Mb genome of grape. More than 91% of the sequence assembled into 58,611 contigs is now anchored to the 19 linkage groups of V. vinifera.  相似文献   

15.
16.
17.
Because of its unusual high degree of compaction and paucity of repetitive sequences, the genome of the smooth pufferfish Tetraodon nigroviridis is the subject of a well-advanced sequencing project. An astonishing diversity of transposable elements not found in the human and the mouse has been observed in the genome of T. nigroviridis. Due to the difficulty of assembling repeat-rich regions, the whole genome shotgun sequencing approach will probably fail to reveal the general organisation of this compact vertebrate genome. Therefore, in order to gain new insights into the global distribution pattern of repeated DNA in the genome of T. nigroviridis, we have reconstructed partial/complete repetitive sequences from data generated by the genome project and performed double-colour fluorescent in situ hybridization (FISH) analysis for representatives of three major categories of repeated sequences including two minisatellites (ms100 and ms104), two DNA transposons (Tol2 and Buffy1) and two non-long terminal repeat (LTR) retrotransposons (Rex3 and Babar). We show that DNA transposons and retroelements very frequently colocalize with minisatellites and mostly accumulate within heterochromatic regions. These results, which have not been reported so far for the fugu Takifugu rubripes, show that repeated elements are generally excluded from gene-rich regions in T. nigroviridis and underline the extreme degree of compartmentalization of this compact genome. The genome organization of the pufferfish is clearly different from that observed in humans, where repeated sequences make up an important fraction of euchromatic DNA, and is more similar to that observed in the fruit fly Drosophila melanogaster.  相似文献   

18.

Background

Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented.

Principal Findings

We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before).

Conclusions

Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape.  相似文献   

19.
MOTIVATION: Since the simultaneous publication of the human genome assembly by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics, several comparisons have been made of various aspects of these two assemblies. In this work, we set out to provide a more comprehensive comparative analysis of the two assemblies and their associated gene sets. RESULTS: The local sequence content for both draft genome assemblies has been similar since the early releases, however it took a year for the quality of the Celera assembly to approach that of HGSC, suggesting an advantage of HGSC's hierarchical shotgun (HS) sequencing strategy over Celera's whole genome shotgun (WGS) approach. While similar numbers of ab initio predicted genes can be derived from both assemblies, Celera's Otto approach consistently generated larger, more varied gene sets than the Ensembl gene build system. The presence of a non-overlapping gene set has persisted with successive data releases from both groups. Since most of the unique genes from either genome assembly could be mapped back to the other assembly, we conclude that the gene set discrepancies do not reflect differences in local sequence content but rather in the assemblies and especially the different gene-prediction methodologies.  相似文献   

20.
As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 ± 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa , version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号