首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the finishing phase of the Chromobacterium violaceum genome project, the shotgun sequences were assembled into 57 contigs that were then organized into 19 scaffolds, using the information from shotgun and cosmid clones. Among the 38 ends resulting from the 19 scaffolds, 10 ended with sequences corresponding to rRNA genes (seven ended with the 5S rRNA gene and three ended with the 16S rRNA gene). The 28 non-ribosomal ends were extended using the PCR-assisted contig extension (PACE) methodology, which immediately closed 15 real gaps. We then applied PACE to the 16S rRNA gene containing ends, resulting in eight different sequences that were correctly assembled within the C. violaceum genome by combinatory PCR strategy, with primers derived from the non-repetitive genomic region flanking the 16S and 5S rRNA gene. An oriented combinatory PCR was used to correctly position the two versions (copy A and copy B, which differ by the presence or absence of a 100-bp insert); it revealed six copies corresponding to copy A, and two to copy B. We estimate that the use of PACE, followed by combinatory PCR, accelerated the finishing phase of the C. violaceum genome project by at least 40%.  相似文献   

2.
3.
Large-scale genomic sequencing projects generally rely on random sequencing of shotgun clones, followed by different gap closing strategies. To reduce the overall effort and cost of those projects and to accelerate the sequencing throughput, we have developed an efficient, high throughput oligonucleotide fingerprinting protocol to select optimal shotgun clone sets prior to sequencing. Both computer simulations and experimental results, obtained from five PAC-derived shotgun libraries spanning 535 kb of the 17p11.2 region of the human genome, demonstrate that at least a 2-fold reduction in the number of sequence reads required to sequence an individual genomic clone (cosmid, PAC, etc.) can be achieved. Treatment of clone contigs with significant clone overlaps will allow an even greater reduction.  相似文献   

4.
PRIMO is a computer program that designs walking primers for large-scale DNA sequencing projects. Oligonucleotide primers are predicted automatically, using quality information associated with each base call, eliminating the need for manually viewing the sequence traces or inspecting contig assemblies to determine appropriate locations for primer design. This allows PRIMO to run in batch mode on an arbitrarily large number of templates. For shotgun sequencing, PRIMO reads assembled sequence contigs with corresponding base quality statistics and automatically designs walking primers as needed to extend and join contigs, or improve their overall quality. In the opposite extreme of single-pass or completely directed sequencing, PRIMO reads the unassembled sequence for each template and designs walking primers for extending each read. If the base-calling software does not provide base quality statistics, PRIMO assigns its own measure of base quality determined by the shapes of individual peaks in the trace data for each template. In this way, PRIMO can be used in the finishing stages of a shotgun sequencing project, in sequencing by directed primer walking, or in some intermediate strategy. The code is written in ANSI C and maintained in two versions: one for the Macintosh and the other for UNIX.  相似文献   

5.
A new approach to sequencing and assembling a highly heterozygous genome, that of grape, species Vitis vinifera cv Pinot Noir, is described. The combining of genome shotgun of paired reads produced by Sanger sequencing and sequencing by synthesis of unpaired reads was shown to be an efficient procedure for decoding a complex genome. About 2 million SNPs and more than a million heterozygous gaps have been identified in the 500Mb genome of grape. More than 91% of the sequence assembled into 58,611 contigs is now anchored to the 19 linkage groups of V. vinifera.  相似文献   

6.
Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.  相似文献   

7.
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.  相似文献   

8.
Common bean (Phaseolus vulgaris L.) is a legume that is an important source of dietary protein in developing countries throughout the world. Utilizing the G19833 BAC library for P. vulgaris from Clemson University, 89,017 BAC-end sequences were generated giving 62,588,675 base pairs of genomic sequence covering approximately 9.54% of the genome. Analysis of these sequences in combination with 1,404 shotgun sequences from the cultivar Bat7 revealed that approximately 49.2% of the genome contains repetitive sequence and 29.3% is genic. Compared to other legume BAC-end sequencing projects, it appears that P. vulgaris has higher predicted levels of repetitive sequence, but this may be due to a more intense identification strategy combining both similarity-based matches as well as de novo identification of repeats. In addition, fingerprints for 41,717 BACs were obtained and assembled into a draft physical map consisting of 1,183 clone contigs and 6,385 singletons with ~9x coverage of the genome.  相似文献   

9.
Gap closure is a challenging phase in microbial random shotgun genome sequencing projects, particularly since genome assemblies are often complicated by the presence of repeat elements, insertion sequences and other similar factors that contribute to sequence misassemblies. While it is well recognized that the conservation of genetic information between microbial genomes, combined with the exponential increase in available microbial sequences, can be exploited to increase the efficiency of gap closure, we lack the computational tools to aid in this process. We describe here a new tool, MGView, which was developed to create a graphical depiction of the alignment of a set of microbial contigs against a completed microbial genome. The results of our assembly of the Staphylococcus aureus RF122 genome show that MGView enables a considerable reduction in time and economic cost associated with closure. Together, the results also show that the application of MGView not only enables a reduction in fold-coverage requirements of the random shotgun sequence phase, but also provides interesting insights into differences in gene content and organization between finished and unfinished microbial genomes.  相似文献   

10.
We have developed the software package Tomato and Potato Assembly Assistance System (TOPAAS), which automates the assembly and scaffolding of contig sequences for low-coverage sequencing projects. The order of contigs predicted by TOPAAS is based on read pair information; alignments between genomic, expressed sequence tags, and bacterial artificial chromosome (BAC) end sequences; and annotated genes. The contig scaffold is used by TOPAAS for automated design of nonredundant sequence gap-flanking PCR primers. We show that TOPAAS builds reliable scaffolds for tomato (Solanum lycopersicum) and potato (Solanum tuberosum) BAC contigs that were assembled from shotgun sequences covering the target at 6- to 8-fold coverage. More than 90% of the gaps are closed by sequence PCR, based on the predicted ordering information. TOPAAS also assists the selection of large genomic insert clones from BAC libraries for walking. For this, tomato BACs are screened by automated BLAST analysis and in parallel, high-density nonselective amplified fragment length polymorphism fingerprinting is used for constructing a high-resolution BAC physical map. BLAST and amplified fragment length polymorphism analysis are then used together to determine the precise overlap. Assembly onto the seed BAC consensus confirms the BACs are properly selected for having an extremely short overlap and largest extending insert. This method will be particularly applicable where related or syntenic genomes are sequenced, as shown here for the Solanaceae, and potentially useful for the monocots Brassicaceae and Leguminosea.  相似文献   

11.
PGAAS: a prokaryotic genome assembly assistant system   总被引:3,自引:0,他引:3  
MOTIVATION: In order to accelerate the finishing phase of genome assembly, especially for the whole genome shotgun approach of prokaryotic species, we have developed a software package designated prokaryotic genome assembly assistant system (PGAAS). The approach upon which PGAAS is based is to confirm the order of contigs and fill gaps between contigs through peptide links obtained by searching each contig end with BLASTX against protein databases. RESULTS: We used the contig dataset of the cyanobacterium Synechococcus sp. strain PCC7002 (PCC7002), which was sequenced with six-fold coverage and assembled using the Phrap package. The subject database is the protein database of the cyanobacterium, Synechocystis sp. strain PCC6803 (PCC6803). We found more than 100 non-redundant peptide segments which can link at least 2 contigs. We tested one pair of linked contigs by sequencing and obtained satisfactory result. PGAAS provides a graphic user interface to show the bridge peptides and pier contigs. We integrated Primer3 into our package to design PCR primers at the adjacent ends of the pier contigs. AVAILABILITY: We tested PGAAS on a Linux (Redhat 6.2) PC machine. It is developed with free software (MySQL, PHP and Apache). The whole package is distributed freely and can be downloaded as UNIX compress file: ftp://ftp.cbi.pku.edu.cn/pub/software/unix/pgaas1.0.tar.gz. The package is being continually updated.  相似文献   

12.
Kabir MA  Rustchenko E 《Gene》2005,345(2):279-287
We have adopted a method of telomere-mediated chromosome fragmentation in order to demonstrate the alignment of contigs and determination of gaps. We established the order and orientation of four contigs of Candida albicans chromosome 5 and determined the sizes of three gaps between these contigs. We confirmed this proposed alignment of contigs, as well as gap sizes, by sequencing one gap and analyzing three mega deletions of approximately 41 kbp, 58 kbp, and 77 kbp, which covered two other gaps. These gaps could be also conveniently sequenced, which is an important step in establishing a complete sequence. The combined length of contigs and gaps covered approximately 422 kbp, which is one third of chromosome 5. Telomere-mediated chromosome fragmentation, used here for the first time to align the contigs of C. albicans and determine the gaps, proved to be a reliable method. The method could be helpful in sequencing projects of other diploid organisms, in particular those in which centromeres have not been identified. In addition, our approach can be used to assign any contig to a chromosome, or to induce the loss of a specific chromosome.  相似文献   

13.
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.  相似文献   

14.

Background  

Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence.  相似文献   

15.

Background  

Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method.  相似文献   

16.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

17.
Fifty-four new markers were developed to fill in gaps in the current map of canine microsatellites and to complement existing markers that may not be sufficiently informative in highly inbred canine pedigrees. Canine genes contained on the radiation hybrid map were used to obtain the sequence of the human homolog. A BLAST search versus the canine whole genome shotgun (wgs) sequence resource was used to obtain the sequence of the canine genomic contigs containing the homolog of the corresponding human gene. Canine sequences that contained microsatellites and mapped back to the correct location in the human genome were used to design primers for amplification of the microsatellites from canine genomic DNA. Heterozygosities of the markers were tested by genotyping grandparental DNAs obtained from the Nestle Purina Reference family DNA distribution center plus DNAs from unrelated Bouviers and Irish wolfhounds. Canine map positions of markers on the July 2004 freeze of the canine genome assembly were determined by in silico PCR or BLAST.  相似文献   

18.

Background  

At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.  相似文献   

19.
We propose a genome sequencing strategy, which is neither divide-and-conquer (clone by clone) nor the shotgun approach. Random PCR-based and PCR relay sequencing constitute the basis of this novel strategy. Most of the genome is sequenced by the former process that requires only a set of non-specific primers and a template DNA. Random PCR-based sequencing reduces redundancy in sequencing by exploiting known sequence information. The number of primers required for random PCR was significantly diminished by using a combination of primers. The former process can be partially replaced by the shotgun method, if necessary. The gap-filling process can be effectively performed by way of PCR relay. The feasibility of this strategy was demonstrated using the Escherichia coli genome. This strategy enhances the global effort towards genome sequencing by being available through the Internet and by allowing the use of preexisting sequence data.  相似文献   

20.
Chibana H  Oka N  Nakayama H  Aoyama T  Magee BB  Magee PT  Mikami Y 《Genetics》2005,170(4):1525-1537
The size of the genome in the opportunistic fungus Candida albicans is 15.6 Mb. Whole-genome shotgun sequencing was carried out at Stanford University where the sequences were assembled into 412 contigs. C. albicans is a diploid basically, and analysis of the sequence is complicated due to repeated sequences and to sequence polymorphism between homologous chromosomes. Chromosome 7 is 1 Mb in size and the best characterized of the 8 chromosomes in C. albicans. We assigned 16 of the contigs, ranging in length from 7309 to 267,590 bp, to chromosome 7 and determined sequences of 16 regions. These regions included four gaps, a misassembled sequence, and two major repeat sequences (MRS) of >16 kb. The length of the continuous sequence attained was 949,626 bp and provided complete coverage of chromosome 7 except for telomeric regions. Sequence analysis was carried out and predicted 404 genes, 11 of which included at least one intron. A 7-kb indel, which might be caused by a retrotransposon, was identified as the largest difference between the homologous chromosomes. Synteny analysis revealed that the degree of synteny between C. albicans and Saccharomyces cerevisiae is too weak to use for completion of the genomic sequence in C. albicans.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号