首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate‐pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi‐C and Dovetail Genomics Chicago libraries and long‐read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high‐quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high‐quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi‐C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome‐level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi‐C libraries increased the longest scaffold over 12‐fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50‐fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long‐read sequencing.  相似文献   

2.
The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.  相似文献   

3.
DNA fingerprints and end sequences from bacterial artificial chromosomes (BACs) from two new libraries were generated to improve the first generation integrated physical and genetic map of the rainbow trout (Oncorhynchus mykiss) genome. The current version of the physical map is composed of 167,989 clones of which 158,670 are assembled into contigs and 9,319 are singletons. The number of contigs was reduced from 4,173 to 3,220. End sequencing of clones from the new libraries generated a total of 11,958 high quality sequence reads. The end sequences were used to develop 238 new microsatellites of which 42 were added to the genetic map. Conserved synteny between the rainbow trout genome and model fish genomes was analyzed using 188,443 BAC end sequence (BES) reads. The fractions of BES reads with significant BLASTN hits against the zebrafish, medaka, and stickleback genomes were 8.8%, 9.7%, and 10.5%, respectively, while the fractions of significant BLASTX hits against the zebrafish, medaka, and stickleback protein databases were 6.2%, 5.8%, and 5.5%, respectively. The overall number of unique regions of conserved synteny identified through grouping of the rainbow trout BES into fingerprinting contigs was 2,259, 2,229, and 2,203 for stickleback, medaka, and zebrafish, respectively. These numbers are approximately three to five times greater than those we have previously identified using BAC paired ends. Clustering of the conserved synteny analysis results by linkage groups as derived from the integrated physical and genetic map revealed that despite the low sequence homology, large blocks of macrosynteny are conserved between chromosome arms of rainbow trout and the model fish species.  相似文献   

4.
MingCheng Luo  Kavitha Madishetty  Jan T. Svensson  Matthew J. Moscou  Steve Wanamaker  Tao Jiang  Andris Kleinhofs  Gary J. Muehlbauer  Roger P. Wise  Nils Stein  Yaqin Ma  Edmundo Rodriguez  Dave Kudrna  Prasanna R. Bhat  Shiaoman Chao  Pascal Condamine  Shane Heinen  Josh Resnik  Rod Wing  Heather N. Witt  Matthew Alpert  Marco Beccuti  Serdar Bozdag  Francesca Cordero  Hamid Mirebrahim  Rachid Ounit  Yonghui Wu  Frank You  Jie Zheng  Hana Simková  Jaroslav Dolezel  Jane Grimwood  Jeremy Schmutz  Denisa Duma  Lothar Altschmied  Tom Blake  Phil Bregitzer  Laurel Cooper  Muharrem Dilbirligi  Anders Falk  Leila Feiz  Andreas Graner  Perry Gustafson  Patrick M. Hayes  Peggy Lemaux  Jafar Mammadov  Timothy J. Close 《The Plant journal : for cell and molecular biology》2015,84(1):216-227
Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole‐genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene‐containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical‐mapped gene‐bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene‐enriched BACs and are characterized by high recombination rates, there are also gene‐dense regions with suppressed recombination. We made use of published map‐anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D‐genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley–Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map‐based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene‐dense but low recombination is particularly relevant.  相似文献   

5.
The CHORI-212 bacterial artificial chromosome (BAC) library was constructed by cloning EcoRI/EcoRI partially digested DNA into the pTARBAC2.1 vector. The library has an average insert size of 161 kb, and provides 10.6-fold coverage of the channel catfish haploid genome. Screening of 32 genes using overgo or cDNA probes indicated that this library had a good representation of the genome as all tested genes existed in the library. We previously reported sequencing of approximately 25,000 BAC ends that generated 20,366 high-quality BAC end sequences (BES) and identified a large number of sequences similar to known genes using BLASTX searches. In this work, particular attention was given to identification of BAC mate pairs with known genes from both ends. When identified, comparative genome analysis was conducted to determine syntenic regions of the catfish genome with the genomes of zebrafish and Tetraodon. Of the 141 mate pairs with known genes from channel catfish, conserved syntenies were identified in 34 (24.1%), with 30 conserved in the zebrafish genome and 14 conserved in the Tetraodon genome. Additional analysis of three of the 34 conserved syntenic groups by direct sequencing indicated conserved gene contents in all three species. This indicates that comparative genome analysis may provide shortcuts to genome analysis in catfish, especially for short genomic regions once the conserved syntenies are identified. Shaolin Wang and Peng Xu contributed equally to the article.  相似文献   

6.
Cultivated potato (Solanum tuberosum L.) is a highly heterozygous autotetraploid that presents challenges in genome analyses and breeding. Wild potato species serve as a resource for the introgression of important agronomic traits into cultivated potato. One key species is Solanum chacoense and the diploid, inbred clone M6, which is self‐compatible and has desirable tuber market quality and disease resistance traits. Sequencing and assembly of the genome of the M6 clone of S. chacoense generated an assembly of 825 767 562 bp in 8260 scaffolds with an N50 scaffold size of 713 602 bp. Pseudomolecule construction anchored 508 Mb of the genome assembly into 12 chromosomes. Genome annotation yielded 49 124 high‐confidence gene models representing 37 740 genes. Comparative analyses of the M6 genome with six other Solanaceae species revealed a core set of 158 367 Solanaceae genes and 1897 genes unique to three potato species. Analysis of single nucleotide polymorphisms across the M6 genome revealed enhanced residual heterozygosity on chromosomes 4, 8 and 9 relative to the other chromosomes. Access to the M6 genome provides a resource for identification of key genes for important agronomic traits and aids in genome‐enabled development of inbred diploid potatoes with the potential to accelerate potato breeding.  相似文献   

7.
The Tetraodontidae family are known to have relatively small and compact genomes compared to other vertebrates. The obscure puffer fish Takifugu obscurus is an anadromous species that migrates to freshwater from the sea for spawning. Thus the euryhaline characteristics of T. obscurus have been investigated to gain understanding of their survival ability, osmoregulation, and other homeostatic mechanisms in both freshwater and seawater. In this study, a high quality chromosome‐level reference genome for T. obscurus was constructed using long‐read Pacific Biosciences (PacBio) Sequel sequencing and a Hi‐C‐based chromatin contact map platform. The final genome assembly of T. obscurus is 381 Mb, with a contig N50 length of 3,296 kb and longest length of 10.7 Mb, from a total of 62 Gb of raw reads generated using single‐molecule real‐time sequencing technology from a PacBio Sequel platform. The PacBio data were further clustered into chromosome‐scale scaffolds using a Hi‐C approach, resulting in a 373 Mb genome assembly with a contig N50 length of 15.2 Mb and and longest length of 28 Mb. When we directly compared the 22 longest scaffolds of T. obscurus to the 22 chromosomes of the tiger puffer Takifugu rubripes, a clear one‐to‐one orthologous relationship was observed between the two species, supporting the chromosome‐level assembly of T. obscurus. This genome assembly can serve as a valuable genetic resource for exploring fugu‐specific compact genome characteristics, and will provide essential genomic information for understanding molecular adaptations to salinity fluctuations and the evolution of osmoregulatory mechanisms.  相似文献   

8.
As a new developmental vector system, the bacterial artificial chromosome (BAC) has been used widely in constructing genomic libraries and in generating transgenic animals. Isolation of the BAC insert end is useful to analyze the BAC clone. Here, we describe a fast and efficient method to obtain the BAC end by ligating the BAC fragments digested with Not I and another selected restriction enzyme into universal cloning vector, followed by determining the correct clones with HindIII digestion. Further DNA sequencing analysis verified the results mentioned above.  相似文献   

9.
10.
The European rabbit (Oryctolagus cuniculus) is a domesticated species with one of the broadest ranges of economic and scientific applications and fields of investigation. Rabbit genome information and assembly are available (oryCun2.0), but so far few studies have investigated its variability, and massive discovery of polymorphisms has not been published yet for this species. Here, we sequenced two reduced representation libraries (RRLs) to identify single nucleotide polymorphisms (SNPs) in the rabbit genome. Genomic DNA of 10 rabbits belonging to different breeds was pooled and digested with two restriction enzymes (HaeIII and RsaI) to create two RRLs which were sequenced using the Ion Torrent Personal Genome Machine. The two RRLs produced 2 917 879 and 4 046 871 reads, for a total of 280.51 Mb (248.49 Mb with quality >20) and 417.28 Mb (360.89 Mb with quality >20) respectively of sequenced DNA. About 90% and 91% respectively of the obtained reads were mapped on the rabbit genome, covering a total of 15.82% of the oryCun2.0 genome version. The mapping and ad hoc filtering procedures allowed to reliably call 62 491 SNPs. SNPs in a few genomic regions were validated by Sanger sequencing. The Variant Effect Predictor Web tool was used to map SNPs on the current version of the rabbit genome. The obtained results will be useful for many applied and basic research programs for this species and will contribute to the development of cost‐effective solutions for high‐throughput SNP genotyping in the rabbit.  相似文献   

11.
To isolate genes of interest in plants, it is essential to construct bacterial artificial chromosome (BAC) libraries from specific genotypes. Construction and organisation of BAC libraries is laborious and costly, especially from organisms with large and complex genomes. In the present study, we developed the pooled BAC library strategy that allows rapid and low cost generation and screening of genomic libraries from any genotype of interest. The BAC library is constructed, directly organised into a few pools and screened for BAC clones of interest using PCR and hybridisation steps, without requiring organization into individual clones. As a proof of concept, a pooled BAC library of approximately 177,000 recombinant clones has been constructed from the barley cultivar Cebada Capa that carries the Rph7 leaf rust resistance gene. The library has an average insert size of 140 kb, a coverage of six barley genome equivalents and is organised in 138 pools of about 1,300 clones each. We rapidly established a single contig of six BAC clones spanning 230 kb at the Rph7 locus on chromosome 3HS. The described low-cost cloning strategy is fast and will greatly facilitate direct targeting of genes and large-scale intra- and inter-species comparative genome analysis.Edwige Isidore and Beatrice Scherrer contributed equally to the work.  相似文献   

12.
Bivalves, a highly diverse and the most evolutionarily successful class of invertebrates native to aquatic habitats, provide valuable molecular resources for understanding the evolutionary adaptation and aquatic ecology. Here, we reported a high‐quality chromosome‐level genome assembly of the razor clam Sinonovacula constricta using Pacific Bioscience single‐molecule real‐time sequencing, Illumina paired‐end sequencing, 10X Genomics linked‐reads and Hi‐C reads. The genome size was 1,220.85 Mb, containing scaffold N50 of 65.93 Mb and contig N50 of 976.94 Kb. A total of 899 complete (91.92%) and seven partial (0.72%) matches of the 978 metazoa Benchmarking Universal Single‐Copy Orthologs were determined in this genome assembly. And Hi‐C scaffolding of the genome resulted in 19 pseudochromosomes. A total of 28,594 protein‐coding genes were predicted in the S. constricta genome, of which 25,413 genes (88.88%) were functionally annotated. In addition, 39.79% of the assembled genome was composed of repetitive sequences, and 4,372 noncoding RNAs were identified. The enrichment analyses of the significantly expanded and contracted genes suggested an evolutionary adaptation of S. constricta to highly stressful living environments. In summary, the genomic resources generated in this work not only provide a valuable reference genome for investigating the molecular mechanisms of S. constricta biological functions and evolutionary adaptation, but also facilitate its genetic improvement and disease treatment. Meanwhile, the obtained genome greatly improves our understanding of the genetics of molluscs and their comparative evolution.  相似文献   

13.
Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double‐digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single‐end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11 000 polymorphic loci per library of 6–30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost‐effective generation of variable and reproducible genetic markers.  相似文献   

14.
A first generation clone-based physical map for the bovine genome was constructed combining, fluorescent double digestion fingerprinting and sequence tagged site (STS) marker screening. The BAC clones were selected from an Inra BAC library (105 984 clones) and a part of the CHORI-240 BAC library (26 500 clones). The contigs were anchored using the screening information for a total of 1303 markers (451 microsatellites, 471 genes, 127 EST, and 254 BAC ends). The final map, which consists of 6615 contigs assembled from 100 923 clones, will be a valuable tool for genomic research in ruminants, including targeted marker production, positional cloning or targeted sequencing of regions of specific interest.  相似文献   

15.
Powdery mildew of wheat (Triticum aestivum L.) is caused by the ascomycete fungus Blumeria graminis f.sp. tritici. Genomic approaches open new ways to study the biology of this obligate biotrophic pathogen. We started the analysis of the Bg tritici genome with the low-pass sequencing of its genome using the 454 technology and the construction of the first genomic bacterial artificial chromosome (BAC) library for this fungus. High-coverage contigs were assembled with the 454 reads. They allowed the characterization of 56 transposable elements and the establishment of the Blumeria repeat database. The BAC library contains 12,288 clones with an average insert size of 115 kb, which represents a maximum of 7.5-fold genome coverage. Sequencing of the BAC ends generated 12.6 Mb of random sequence representative of the genome. Analysis of BAC-end sequences revealed a massive invasion of transposable elements accounting for at least 85% of the genome. This explains the unusually large size of this genome which we estimate to be at least 174 Mb, based on a large-scale physical map constructed through the fingerprinting of the BAC library. Our study represents a crucial step in the perspective of the determination and study of the whole Bg tritici genome sequence.  相似文献   

16.
A bacterial artificial chromosome (BAC) library of common carp Cyprinus carpio L. was constructed as a part of ongoing common carp genome project, which is aiming assembly of common carp genome. The library, containing a total of 92,160 BAC clones with an average insert size of 141 kb, was constructed into the restriction site of Hind III on BAC vector CopyControl pCC1BAC, covering 7.7 X haploid genome equivalents. Three dimension pools and superpools of the BAC library were established and 23 positive clones of 14 targets were identified from one-fifth of the BAC library. Pilot project of BAC end sequencing was conducted on 2,688 BAC ends from 1,344 clones and harvested 2,522 high-quality Q20 sequences with average length of 677 bp. The sequencing success rate was 93.8% and pair-end success rate was 92.3%. A total of 212 microsyntenies had been established between common carp and zebrafish genomes as a trial for genome-wide comparative genomics in these two closely related species.  相似文献   

17.
Despite recent advances in high‐throughput sequencing, difficulties are often encountered when developing microsatellites for species with large and complex genomes. This probably reflects the close association in many species of microsatellites with cryptic repetitive elements. We therefore developed a novel approach for isolating polymorphic microsatellites from the club‐legged grasshopper (Gomphocerus sibiricus), an emerging quantitative genetic and behavioral model system. Whole genome shotgun Illumina MiSeq sequencing was used to generate over three million 300 bp paired‐end reads, of which 67.75% were grouped into 40,548 clusters within RepeatExplorer. Annotations of the top 468 clusters, which represent 60.5% of the reads, revealed homology to satellite DNA and a variety of transposable elements. Evaluating 96 primer pairs in eight wild‐caught individuals, we found that primers mined from singleton reads were six times more likely to amplify a single polymorphic microsatellite locus than primers mined from clusters. Our study provides experimental evidence in support of the notion that microsatellites associated with repetitive elements are less likely to successfully amplify. It also reveals how advances in high‐throughput sequencing and graph‐based repetitive DNA analysis can be leveraged to isolate polymorphic microsatellites from complex genomes.  相似文献   

18.
A complete and high‐quality genome reference sequence of an organism provides a solid foundation for a wide research community and determines the outcomes of relevant genomic, genetic, molecular and evolutionary research. Rice is an important food crop and a model plant for grasses, and therefore was the first chosen crop plant for whole genome sequencing. The genome of the japonica representative rice variety, Nipponbare, was sequenced using a gold standard, map‐based clone‐by‐clone strategy. However, although the Nipponbare reference sequence (RefSeq) has the best quality for existing crop genome sequences, it still contains many assembly errors and gaps. To improve the Nipponbare RefSeq, first a robust method is required to detect the hidden assembly errors. Through alignments between BAC‐end sequences (BESs) embedded in the Nipponbare bacterial artificial chromosome (BAC) physical map and the Nipponbare RefSeq, we detected locations on the Nipponbare RefSeq that were inversely matched with BESs and could therefore be candidates for spurious inversions of assembly. We performed further analysis of five potential locations and confirmed assembly errors at those locations; four of them, two on chr4 and two on chr11 of the Nipponbare RefSeq (IRGSP build 5), were found to be caused by reverse repetitive sequences flanking the locations. Our approach is effective in detecting spurious inversions in the Nipponbare RefSeq and can be applied for improving the sequence qualities of other genomes as well.  相似文献   

19.
Rice is a leading grain crop and the staple food for over half of the world population. Rice is also an ideal species for genetic and biological studies of cereal crops and other monocotyledonous plants because of its small genome and well developed genetic system. To facilitate rice genome analysis leading to physical mapping, the identification of molecular markers closely linked to economic traits, and map-based cloning, we have constructed two rice bacterial artificial chromosome (BAC) libraries from the parents of a permanent mapping population (Lemont and Teqing) consisting of 400 F9 recombinant inbred lines (RILs). Lemont (japonica) and Teqing (indica) represent the two major genomes of cultivated rice, both are leading commercial varieties and widely used germplasm in rice breeding programs. The Lemont library contains 7296 clones with an average insert size of 150 kb, which represents 2.6 rice haploid genome equivalents. The Teqing library contains 14208 clones with an average insert size of 130 kb, which represents 4.4. rice haploid genome equivalents. Three single-copy DNA probes were used to screen the libraries and at least two overlapping BAC clones were isolated with each probe from each library, ranging from 45 to 260 kb in insert size. Hybridization of BAC clones with chloroplast DNA probes and fluorescent in situ hybridization using BAC DNA as probes demonstrated that both libraries contain very few clones of chloroplast DNA origin and are likely free of chimeric clones. These data indicate that both BAC libraries should be suitable for map-based cloning of rice genes and physical mapping of the rice genome.  相似文献   

20.
High‐throughput sequencing has revolutionized population and conservation genetics. RAD sequencing methods, such as 2b‐RAD, can be used on species lacking a reference genome. However, transferring protocols across taxa can potentially lead to poor results. We tested two different IIB enzymes (AlfI and CspCI) on two species with different genome sizes (the loggerhead turtle Caretta caretta and the sharpsnout seabream Diplodus puntazzo) to build a set of guidelines to improve 2b‐RAD protocols on non‐model organisms while optimising costs. Good results were obtained even with degraded samples, showing the value of 2b‐RAD in studies with poor DNA quality. However, library quality was found to be a critical parameter on the number of reads and loci obtained for genotyping. Resampling analyses with different number of reads per individual showed a trade‐off between number of loci and number of reads per sample. The resulting accumulation curves can be used as a tool to calculate the number of sequences per individual needed to reach a mean depth ≥20 reads to acquire good genotyping results. Finally, we demonstrated that selective‐base ligation does not affect genomic differentiation between individuals, indicating that this technique can be used in species with large genome sizes to adjust the number of loci to the study scope, to reduce sequencing costs and to maintain suitable sequencing depth for a reliable genotyping without compromising the results. Here, we provide a set of guidelines to improve 2b‐RAD protocols on non‐model organisms with different genome sizes, helping decision‐making for a reliable and cost‐effective genotyping.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号