首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automated restriction enzyme fingerprinting of 7900 cosmids from chromosome 19 and calculation of the likelihood of their overlap based on shared fragments have resulted in the assembly of 743 sets of overlapping cosmids (contigs). We have mapped 22% of the formed contigs (n = 165) and all of the contigs with minimal tiling paths exceeding 6 members (n = 50) to chromosomal bands by fluorescence in situ hybridization using DNA from at least one member cosmid. The estimated average size of the formed contigs is 60-70 kb. Thus, members of a correctly formed contig are expected to lie close to each other in metaphase and interphase chromatin. Therefore, we tested the contig assembly process by comparing the band assignment of two or more members selected from each of 97 contigs. Forty-two of these contigs were further characterized for valid assembly by determining the proximity of members in interphase chromatin. Using these tests, we surveyed a total of 431 joins counted along the minimal tiling path (280 in interphase as well as metaphase) and found 6 erroneous joins, one in each of 6 contigs (6% of tested).  相似文献   

2.
Despite the power of massively parallel sequencing platforms, a drawback is the short length of the sequence reads produced. We demonstrate that short reads can be locally assembled into longer contigs using paired-end sequencing of restriction-site associated DNA (RAD-PE) fragments. We use this RAD-PE contig approach to identify single nucleotide polymorphisms (SNPs) and determine haplotype structure in threespine stickleback and to sequence E. coli and stickleback genomic DNA with overlapping contigs of several hundred nucleotides. We also demonstrate that adding a circularization step allows the local assembly of contigs up to 5 kilobases (kb) in length. The ease of assembly and accuracy of the individual contigs produced from each RAD site sequence suggests RAD-PE sequencing is a useful way to convert genome-wide short reads into individually-assembled sequences hundreds or thousands of nucleotides long.  相似文献   

3.
A contig assembly program based on sensitive detection of fragment overlaps.   总被引:23,自引:0,他引:23  
X Huang 《Genomics》1992,14(1):18-25
An effective computer program for assembling DNA fragments, the contig assembly program (CAP), has been developed. In the CAP program, a filter is used to eliminate quickly fragment pairs that could not possibly overlap, a dynamic programming algorithm is applied to compute the maximal-scoring overlapping alignment between each remaining pair of fragments, and a simple greedy approach is employed to assemble fragments in order of alignment scores. To identify the true fragment overlaps, the dynamic programming algorithm uses specially chosen sets of alignment parameters to tolerate sequencing errors and to penalize "mutational" changes between different copies of a repetitive sequence. The performance tests of the program on fragment data from genomic sequencing projects produced satisfactory results. The CAP program is efficient in computer time and memory; it took about 4 h to assemble a set of 1015 fragments into long contigs on a Sun workstation.  相似文献   

4.
Ustilago maydis, a basidiomycete, is a model organism among phytopathogenic fungi. A physical map of U. maydis strain 521 was developed from bacterial artificial chromosome (BAC) clones. BAC fingerprints used polyacrylamide gel electrophoresis to separate restriction fragments. Fragments were labeled at the HindIII site and co-digested with HaeIII to reduce fragments to 50-750 bp. Contiguous overlapping sets of clones (contigs) were assembled at nine stringencies (from P < or = 1 x 10(-6) to 1 x 10(-24)). Each assembly nucleated contigs with different percentages of bands overlapping between clones (from 20% to 97%). The number of clones per contig decreased linearly from 41 to 12 from P < or = 1 x 10(-7) to 1 x 10 (-12). The number of separate contigs increased from 56 to 150 over the same range. A hybridization-based physical map of the same BAC clones was compared with the fingerprint contigs built at P < or = 1 x 10(-7). The two methods provided consistent physical maps that were largely validated by genome sequence. The combined hybridization and fingerprint physical map provided a minimum tile path composed of 258 BAC clones (18-20 Mbp) distributed among 28 merged contigs. The genome of U. maydis was estimated to be 20.5 Mbp by pulsed-field gel electrophoresis and 24 Mbp by BAC fingerprints. There were 23 separate chromosomes inferred by both pulsed-field gel electrophoresis and fingerprint contigs. Only 11 of the tile path BAC clones contained recognizable centromere, telomere, and subtelomere repeats (high-copy DNA), suggesting that repeats caused some false merges. There were 247 tile path BAC clones that encompassed about 17.5 Mbp of low-copy DNA sequence. BAC clones are available for repeat and unique gene cluster analysis including tDNA-mediated transformation. Program FingerPrint Contigs maps aligned with each chromosome can be viewed at http://www.siu.edu/~meksem/ustilago_maydis/.  相似文献   

5.
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.  相似文献   

6.
We have developed an automated, high-throughput fingerprinting technique for large genomic DNA fragments suitable for the construction of physical maps of large genomes. In the technique described here, BAC DNA is isolated in a 96-well plate format and simultaneously digested with four 6-bp-recognizing restriction endonucleases that generate 3' recessed ends and one 4-bp-recognizing restriction endonuclease that generates a blunt end. Each of the four recessed 3' ends is labeled with a different fluorescent dye, and restriction fragments are sized on a capillary DNA analyzer. The resulting fingerprints are edited with a fingerprint-editing computer program and contigs are assembled with the FPC computer program. The technique was evaluated by repeated fingerprinting of several BACs included as controls in plates during routine fingerprinting of a BAC library and by reconstruction of contigs of rice BAC clones with known positions on rice chromosome 10.  相似文献   

7.
In physical mapping, one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (probe contigs). For this purpose we introduce a new distance function between probes, the averaged rank distance (ARD) derived from bootstrap resampling of the raw data. The ARD measures the pairwise distances of probes within a contig and smoothes the distances of probes across different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering. We have designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs. We evaluated our method on data sets from two physical mapping projects. On data from the recently sequenced bacterium Xylella fastidiosa, the probe contig set produced by the new method was evaluated using the probe order derived from the sequence information. Our approach yielded a basically correct contig set. On this data we also compared our method to an approach which uses the number of supporting clones to determine contigs. Our map is much more accurate. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. The results of our method have already proven helpful for the design of experiments aimed at further improving the quality of a map.  相似文献   

8.
A BAC-based physical map of the channel catfish genome   总被引:3,自引:0,他引:3  
Xu P  Wang S  Liu L  Thorsen J  Kucuktas H  Liu Z 《Genomics》2007,90(3):380-388
Catfish is the major aquaculture species in the United States. To enhance its genome studies involving genetic linkage and comparative mapping, a bacterial artificial chromosome (BAC) contig-based physical map of the channel catfish (Ictalurus punctatus) genome was generated using four-color fluorescence-based fingerprints. Fingerprints of 34,580 BAC clones (5.6x genome coverage) were generated for the FPC assembly of the BAC contigs. A total of 3307 contigs were assembled using a cutoff value of 1x10(-20). Each contig contains an average of 9.25 clones with an average size of 292 kb. The combined contig size for all contigs was 0.965 Gb, approximately the genome size of the channel catfish. The reliability of the contig assembly was assessed by both hybridization of gene probes to BAC clones contained in the fingerprinted assembly and validation of randomly selected contigs using overgo probes designed from BAC end sequences. The presented physical map should greatly enhance genome research in the catfish, particularly aiding in the identification of genomic regions containing genes underlying important performance traits.  相似文献   

9.
A long-range physical map of the carcinoembryonic antigen (CEA) gene family cluster, which is located on the long arm of chromosome 19, has been constructed. This was achieved by hybridization analysis of large DNA fragments separated by pulse-field gel electrophoresis and of DNA from human/rodent somatic cell hybrids, as well as the assembly of ordered sets of cosmids for this gene region into contigs. The different approaches yielded very similar results and indicate that the entire gene family is contained within a region located at position 19q13.1-q13.2 between the CYP2A and the D19S15/D19S8 markers. The physical linkage of nine genes belonging to the CEA subgroup and their location with respect to the pregnancy-specific glycoprotein (PSG) subgroup genes have been determined, and the latter are located closer to the telomere. From large groups of ordered cosmid clones, the identity of all known CEA subgroup genes has been confirmed either by hybridization using gene-specific probes or by DNA sequencing. These studies have identified a new member of the CEA subgroup (CGM8), which probably represents a pseudogene due to the existence of two stop codons, one in the leader and one in the N-terminal domain exons. The gene order and orientation, which were determined by hybridization with probes from the 5' and 3' regions of the genes, are as follows: cen/3'-CGM7-5'/3'-CGM2-5'/5'-CEA-3'/5'-NCA-3'/5'-CGM1- 3'/3'-BGP-5'/3'- CGM9-5'/3'-CGM6-5'/5'-CGM8-3'/PSGcluster/qter.  相似文献   

10.
Here we describe a practical procedure for sequencing long PCR products. The method relies on ultrasonic shearing of PCR products, resulting in fragments 700-1,000 nt long. Termini are subsequently repaired to obtain blunt ends and 3' A-overhangs are added before TA cloning. A predetermined number of clones are sequenced using an insert-independent primer to obtain an overlapping contig covering the full length of the PCR product. This method is cost effective and enables the complete sequencing of any large PCR product in a high-throughput format. Processing of amplified DNA requires 3 h handling time prior to the ligation step, and the clone library is available 2 d later. The complete sequence information is obtained approximately 5 d after the PCR step, depending on the sequencing procedure adopted.  相似文献   

11.
12.
As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.  相似文献   

13.
14.
The occurrence and nature of repeated DNA sequences has been analysed within an 850 kb YAC contig on Arabidopsis thaliana chromosome 4. Hybridization analysis with seven RFLP markers, six cosmid contigs, 29 YAC end probes and eight YAC clones showed that a least 585 kb of the 850 kb contained only low-copy sequences. One YAC end probe, EG15C8LE, hybridized to multiple genomic fragments and contained a sequence with predicted protein homology to cytochrome P450 monooxygenases. Another one, EG11B7RE, was found to be non-contiguous with the other YAC clones and contained a dispersed repetitive sequence associated with centromeric regions  相似文献   

15.
Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation, and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by experimentally simple and commonly used complete restriction digests of the target. By computationally combining information from the contig sequences and the fragment sizes measured for several different enzymes, we seek to form a "scaffold" on which the contigs will be placed in their correct orientation, order, and distance. We give a heuristic search algorithm for solving the problem and report on promising preliminary simulation results. The key to the success of the search scheme is the very rapid solution of two time-critical subproblems that are solved to optimality in linear time. Our simulations indicate that with noise levels of some 3% relative error in measuring fragment sizes, using six enzymes, most datasets of 13 contigs spanning 300kb can be correctly ordered, and the remaining ones have most of their pairs of neighboring contigs correct. Hence, the technique has a potential to provide real help to finishing. Even without closing all gaps, the ability to order and orient the contigs correctly makes the partial assembly both more accessible and more useful for biologists.  相似文献   

16.
PCR筛选BAC文库和直接BAC末端测序方法的建立   总被引:3,自引:0,他引:3  
何聪芬  小松田隆夫 《遗传学报》2004,31(11):1262-1267
建立了一种用PCR方法筛选富含高度重复序列的大麦BAC DNA 文库和直接对 BAC DNA进行末端测序的方法.用PCR技术进行大麦BAC DNA 文库(含816个平板,每个平板含384个克隆)的筛选分4步进行.在实验中,建立了两个水平的BAC DNA池(一级池和二级池).一个二级池由一个平板(含有384个克隆)的DNA 组成,一个一级池由连续10个稀释100倍的二级池的DNA混合而成(如1~10,11~20等),共82个一级池.BAC DNA 文库筛选的第一步是对82个一级池的筛选.得到阳性一级池后(如2号一级池),对其所含的10个二级池(从11~20)进行第二步筛选.得到阳性二级池后,培养相应的阳性平板的所有克隆(384个),从头开始(左上侧),每相邻的4个克隆为一组,在96孔板上(4 X 96=384) 进行第三轮PCR反应;之后对筛选结果为阳性的4个克隆分别进行菌落 PCR(第四轮)得到单一阳性克隆.根据BAC DNA Hind III 酶切指纹图谱,对同一引物筛选的BACs进行重叠群作图(Contig).对代表contig 的两端的BAC DNA直接进行末端测序并对测序结果Blast,以检测其在大麦中是否属于单拷贝序列.根据测序和Blast结果设计引物,用中国春附加系(附加大麦染色体)对来自BAC克隆的引物进行染色体定位并用分离群体进行遗传学作图,以确定是否可以用作下一步的染色体步行.  相似文献   

17.
18.
PGAAS: a prokaryotic genome assembly assistant system   总被引:3,自引:0,他引:3  
MOTIVATION: In order to accelerate the finishing phase of genome assembly, especially for the whole genome shotgun approach of prokaryotic species, we have developed a software package designated prokaryotic genome assembly assistant system (PGAAS). The approach upon which PGAAS is based is to confirm the order of contigs and fill gaps between contigs through peptide links obtained by searching each contig end with BLASTX against protein databases. RESULTS: We used the contig dataset of the cyanobacterium Synechococcus sp. strain PCC7002 (PCC7002), which was sequenced with six-fold coverage and assembled using the Phrap package. The subject database is the protein database of the cyanobacterium, Synechocystis sp. strain PCC6803 (PCC6803). We found more than 100 non-redundant peptide segments which can link at least 2 contigs. We tested one pair of linked contigs by sequencing and obtained satisfactory result. PGAAS provides a graphic user interface to show the bridge peptides and pier contigs. We integrated Primer3 into our package to design PCR primers at the adjacent ends of the pier contigs. AVAILABILITY: We tested PGAAS on a Linux (Redhat 6.2) PC machine. It is developed with free software (MySQL, PHP and Apache). The whole package is distributed freely and can be downloaded as UNIX compress file: ftp://ftp.cbi.pku.edu.cn/pub/software/unix/pgaas1.0.tar.gz. The package is being continually updated.  相似文献   

19.
Phytophthora spp. are serious pathogens that threaten numerous cultivated crops, trees, and natural vegetation worldwide. The soybean pathogen P. sojae has been developed as a model oomycete. Here, we report a bacterial artificial chromosome (BAC)-based, integrated physical map of the P. sojae genome. We constructed two BAC libraries, digested 8,681 BACs with seven restriction enzymes, end labeled the digested fragments with four dyes, and analyzed them with capillary electrophoresis. Fifteen data sets were constructed from the fingerprints, using individual dyes and all possible combinations, and were evaluated for contig assembly. In all, 257 contigs were assembled from the XhoI data set, collectively spanning approximately 132 Mb in physical length. The BAC contigs were integrated with the draft genome sequence of P. sojae by end sequencing a total of 1,440 BACs that formed a minimal tiling path. This enabled the 257 contigs of the BAC map to be merged with 207 sequence scaffolds to form an integrated map consisting of 79 superscaffolds. The map represents the first genome-wide physical map of a Phytophthora sp. and provides a valuable resource for genomics and molecular biology research in P. sojae and other Phytophthora spp. In one illustration of this value, we have placed the 350 members of a superfamily of putative pathogenicity effector genes onto the map, revealing extensive clustering of these genes.  相似文献   

20.
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号