首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Chibana H  Oka N  Nakayama H  Aoyama T  Magee BB  Magee PT  Mikami Y 《Genetics》2005,170(4):1525-1537
The size of the genome in the opportunistic fungus Candida albicans is 15.6 Mb. Whole-genome shotgun sequencing was carried out at Stanford University where the sequences were assembled into 412 contigs. C. albicans is a diploid basically, and analysis of the sequence is complicated due to repeated sequences and to sequence polymorphism between homologous chromosomes. Chromosome 7 is 1 Mb in size and the best characterized of the 8 chromosomes in C. albicans. We assigned 16 of the contigs, ranging in length from 7309 to 267,590 bp, to chromosome 7 and determined sequences of 16 regions. These regions included four gaps, a misassembled sequence, and two major repeat sequences (MRS) of >16 kb. The length of the continuous sequence attained was 949,626 bp and provided complete coverage of chromosome 7 except for telomeric regions. Sequence analysis was carried out and predicted 404 genes, 11 of which included at least one intron. A 7-kb indel, which might be caused by a retrotransposon, was identified as the largest difference between the homologous chromosomes. Synteny analysis revealed that the degree of synteny between C. albicans and Saccharomyces cerevisiae is too weak to use for completion of the genomic sequence in C. albicans.  相似文献   

2.
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.  相似文献   

3.
4.
For effective exploitation of the genome sequence information of Lotus microsymbiont, Mesorhizobium loti MAFF303099, to discover gene functions, we have constructed an ordered and mutually overlapping cosmid library using an IncP broad host-range vector. The library consisted of 480 clones to cover approximately 99.6% of the genome with average insert size and overlap of 26.9 and 11.1 kbp, respectively. The genome of M. loti consists of a single chromosome and two plasmids. The chromosome (7,036,071 bp) was covered 99.68% by 445 clones with four gaps, although two clones were unstable in E. coli. The larger plasmid pMLa (351,911 bp) was completely covered by 23 clones, while the smaller pMLb (208,315 bp) was covered 98.85% by 12 clones with two gaps. We have also made ancillary plasmids to facilitate the construction of deletion mutants using derivatives of the library clones. As a pilot experiment to uncover regions which contain novel symbiotic genes, 13 deletion mutants were constructed to lack in total 180.5 kbp of the genome. All the mutants formed apparently normal nodules and supported symbiotic nitrogen fixation, however, one mutant that lacked a 5.3 kbp chromosomal region, 4,551,930-4,557,222, did not produce normal exopolysaccharides as judged by fluorescence on medium containing Calcofluor. The results supported the effectiveness of the approach to detect gene functions.  相似文献   

5.
Most shotgun sequencing projects undergo a long and costly phase of finishing, in which a partial assembly forms several contigs whose order, orientation, and relative distance is unknown. We propose here a new technique that supplements the shotgun assembly data by experimentally simple and commonly used complete restriction digests of the target. By computationally combining information from the contig sequences and the fragment sizes measured for several different enzymes, we seek to form a "scaffold" on which the contigs will be placed in their correct orientation, order, and distance. We give a heuristic search algorithm for solving the problem and report on promising preliminary simulation results. The key to the success of the search scheme is the very rapid solution of two time-critical subproblems that are solved to optimality in linear time. Our simulations indicate that with noise levels of some 3% relative error in measuring fragment sizes, using six enzymes, most datasets of 13 contigs spanning 300kb can be correctly ordered, and the remaining ones have most of their pairs of neighboring contigs correct. Hence, the technique has a potential to provide real help to finishing. Even without closing all gaps, the ability to order and orient the contigs correctly makes the partial assembly both more accessible and more useful for biologists.  相似文献   

6.

Background  

At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.  相似文献   

7.
The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and/or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ~ 99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum. One of these CpG islands was differentially methylated and paternally hypermethylated. We found all chr 20 gaps to comprise structured non-coding RNAs (ncRNAs) and to be conserved in primates. We verified expression for 13 candidate ncRNAs, some of which showed tissue specificity. Four ncRNAs expressed within the gap at DLGAP4 show elevated expression in the human brain. Our data suggest that unfinished human genome gaps are likely to comprise numerous functional elements.  相似文献   

8.
In order to generate a physical map of Arabidopsis thaliana chromosome 5, 142 molecular markers mapping to chromosome 5 have been used in colony hybridization experiments with four Arabidopsis, ecotype Columbia, yeast artificial chromosome (YAC) libraries. This resulted in 634 YAC clones being anchored on chromosome 5. Southern blot analysis confirmed their positioning and provided data, which along with knowledge of the sizes of all the YAC clones, enabled the clones to be arranged into 31 contigs. Genetic mapping of markers located within 29 of these contigs on the Landsberg erecta/Columbia recombinant inbred lines allowed positioning of the contigs along the chromosome. A high proportion of the YAC clones were found to contain chimaeric inserts. The availability of this YAC contig map will accelerate chromosome-walking experiments, provide substrates for large-scale genomic sequencing projects and facilitate the mapping of new probes to this chromosome.  相似文献   

9.
We have constructed approximately 1-Mb contigs of yeast artificial chromosome (YAC), bacterial artificial chromosome (BAC) and cosmid clones covering the imprinted region in mouse chromosome band 7F4/F5. This region is syntenic to human chromosome 11p15.5, which is associated with Beckwith-Wiedemann syndrome (BWS) and certain childhood and adult tumors. These contigs provide the basis for genomic sequencing, identification of genes and their regulatory elements, and functional studies in transgenic and knockout mice, which should be of help to understand not only the mechanisms of imprinting but also the molecular events involved in the genesis of BWS and tumors.  相似文献   

10.
A total of 207 BAC clones containing 155 loci were isolated and arranged into a map of linearly ordered overlapping clones over the proximal part of horse chromosome 21 (ECA21), which corresponds to the proximal half of the short arm of human chromosome 19 (HSA19p) and part of HSA5. The clones form two contigs - each corresponding to the respective human chromosomes - that are estimated to be separated by a gap of approximately 200 kb. Of the 155 markers present in the two contigs, 141 (33 genes and 108 STS) were generated and mapped in this study. The BACs provide a 4-5x coverage of the region and span an estimated length of approximately 3.3 Mb. The region presently contains one mapped marker per 22 kb on average, which represents a major improvement over the previous resolution of one marker per 380 kb obtained through the generation of a dense RH map for this segment. Dual color fluorescence in situ hybridization on metaphase and interphase chromosomes verified the relative order of some of the BACs and helped to orient them accurately in the contigs. Despite having similar gene order and content, the equine region covered by the contigs appears to be distinctly smaller than the corresponding region in human (3.3 Mb vs. 5.5-6 Mb) because the latter harbors a host of repetitive elements and gene families unique to humans/primates. Considering limited representation of the region in the latest version of the horse whole genome sequence EquCab2, the dense map developed in this study will prove useful for the assembly and annotation of the sequence data on ECA21 and will be instrumental in rapid search and isolation of candidate genes for traits mapped to this region.  相似文献   

11.
A cosmid contig physical map of human chromosome 16 has been developed by repetitive sequence finger-printing of approximately 4000 cosmid clones obtained from a chromosome 16-specific cosmid library. The arrangement of clones in contigs is determined by (1) estimating cosmid length and determining the likelihoods for all possible pairwise clone overlaps, using the fingerprint data, and (2) using an optimization technique to fit contig maps to these estimates. Two important questions concerning this contig map are how much of chromosome 16 is covered and how accurate are the assembled contigs. Both questions can be addressed by hybridization of single-copy sequence probes to gridded arrays of the cosmids. All of the fingerprinted clones have been arrayed on nylon membranes so that any region of interest can be identified by hybridization. The hybridization experiments indicate that approximately 84% of the euchromatic arms of chromosome 16 are covered by contigs and singleton cosmids. Both grid hybridization (26 contigs) and pulsed-field gel electrophoresis experiments (11 contigs) confirmed the assembled contigs, indicating that false positive overlaps occur infrequently in the present map. Furthermore, regional localization of 93 contigs and singleton cosmids to a somatic cell hybrid mapping panel indicates that there is no bias in the coverage of the euchromatic arms.  相似文献   

12.
An integrated large-insert clone map of the region Xq11-q12 is presented. A physical map containing markers within a few hundred kilobases of the centromeric locus DXZ1 to DXS1125 spans nearly 5 Mb in two contigs separated by a gap estimated to be approximately 100-250 kb. The contigs combine 75 yeast artificial chromosome clones, 12 bacterial artificial chromosome clones, and 17 P1-derived artificial chromosome clones with 81 STS or EST markers. Overall marker density across this region is approximately 1 STS/60 kb. Mapped within the contigs are 12 ESTs as well as 5 known genes, moesin (MSN), hephaestin (HEPH), androgen receptor (AR), oligophrenin-1 (OPHN1), and Eph ligand-2 (EPLG2). Orientation of the contigs on the X chromosome, as well as marker order within the contigs, was unambiguously determined by reference to a number of X chromosome breakpoints. In addition, the distal contig spans deletions from chromosomes of three patients exhibiting either complete androgen insensitivity (CAI) or a contiguous gene syndrome that includes CAI, impaired vision, and mental retardation.  相似文献   

13.
We described the construction of BAC contigs of the genome of a indica variety of Oryza sativa.Guang Lu Ai 4. An entire representative(Sixfold coverage of rice chromosomes)and genetically stable BAC library of rice genome constructed in this lab has been systematically analysed by restriction enzyme fragmentation and polyacrylamide gel electrophoresis.And all the images thus obtained were subject to image-processing,which consisted of preliminary location of bands,cooperative tracking of lanes by correlation of adjacent bads.a precise densitometric pass,alignment at the marker bands with the standard,optional interactive editing,and normalization of the accepted bands.The contigs were generated based on the Computer Software specially designed for genome mapping.The number of contigs with 600 kb in length on average was 464.of contigs with 1000kb in length on average was 107; of contigs with 1500 kb in length on average was Construction of Oryza Sativa genome contigs.23.Therefor,all the contigs we have obtained ampunted up to 420 megabases in length.Considering the size of rice genome(430 megabased),the contigs generated in this lab have covered nearly 98% of the rice genome.We are now in the process of mapping the contigs to chromosomes.  相似文献   

14.
 To facilitate construction of physical map of the rice genome, a bacterial artificial chromosome (BAC) library of IR64 genomic DNA was constructed. It consists of 18 432 clones and contains 3.28 rice genomic equivalents. The insert size ranged from 37 to 364 kb with an average of 107 kb. We used 31 RFLP markers on chromosome 4 to screen the library by colony hybridization. Sixty eight positive clones were identified with 2.2 positive clones per RFLP marker. The positive clones were analyzed to generate 29 contigs whose sizes ranged from 50 to 384 kb with an average of 145.6 kb. Chromosome walking was initiated for ten contigs linked to resistance genes. Thirty eight BAC clones were obtained and two contigs were integrated. Altogether, they covered 5.65 Mb (15.1%) of chromosome 4. These contigs may be used as landmarks for physical mapping of chromosome 4, and as starting points for chromosome walking towards the map-based cloning of disease resistance genes which were located nearby. Received: 15 November 1996 / Accepted: 24 January 1997  相似文献   

15.
Bacterial artificial chromosome (BAC) libraries are an important tool for positional cloning, gene analysis and physical mapping. During studies using BAC clones, it is often necessary to organize them into contiguous sequences (contigs). To finalize, join and extend the contigs, both cloning and sequencing of the ends of the inserts are required. Here, we describe a low-cost, accessible, fast and powerful method for the routine isolation of BAC ends. This method allows the isolation of 20 BAC clone ends in one day. The analysis of the ends reveals fragment sizes compatible with sequencing, and the structure of these clones allows the sequencing of both ends using the same plasmid. Moreover, long end fragments can be sequenced in both directions.  相似文献   

16.
Finishing is rate limiting for genome projects, and improvements in the efficiency of complete genome-sequence compilation will require improved protocols for gap closure. Here we report a novel approach for extending shotgun contigs and closing gaps that we termed PCR-assisted contig extension (PACE). PACE depends on the capture of rare mismatched interactions that occur between arbitrary primers and template DNA of unknown sequence, even under highly stringent conditions, by means of elevated PCR-cycle repetition and the use of specific anchoring primers corresponding to adjacent regions of known sequence. Using PACE, we have generated extensions with an average of 1 kb from all contigs generated from the shotgun sequencing of a 5-Mb genome, which closed the majority of gaps with a single round of experimentation. This included the generation of multiple extensions for contigs that terminated in one of the eight copies of the rRNA operon. We calculate that the switch from shotgun sequencing to PACE should occur between 5- and 8-fold genome coverage for maximum benefit and minimum overall cost. PACE is a robust and straightforward strategy that should simplify the finishing phase of bacterial genome projects.  相似文献   

17.
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.  相似文献   

18.
Draft sequence derived from the 46-Mb gene-rich euchromatic portion of human chromosome 19 (HSA19) was utilized to generate a sequence-ready physical map spanning homologous regions of mouse chromosomes. Sequence similarity searches with the human sequence identified more than 1000 individual orthologous mouse genes from which 382 overgo probes were developed for hybridization. Using human gene order and spacing as a model, these probes were used to isolate and assemble bacterial artificial chromosome (BAC) clone contigs spanning homologous mouse regions. Each contig was verified, extended, and joined to neighboring contigs by restriction enzyme fingerprinting analysis. Approximately 3000 mouse BACs were analyzed and assembled into 44 contigs with a combined length of 41.4 Mb. These BAC contigs, covering 90% of HSA19-related mouse DNA, are distributed throughout 15 homology segments derived from different regions of mouse chromosomes 7, 8, 9, 10, and 17. The alignment of the HSA19 map with the ordered mouse BAC contigs revealed a number of structural differences in several overtly conserved homologous regions and more precisely defined the borders of the known regions of HSA19-syntenic homology. Our results demonstrate that given a human draft sequence, BAC contig maps can be constructed quickly for comparative sequencing without the need for preestablished mouse-specific genetic or physical markers and indicate that similar strategies can be applied with equal success to genomes of other vertebrate species.  相似文献   

19.
We describe the assembly of a cosmid and PAC contig of approximately 700 kb on human chromosome 18q12 spanning the DSC and DSG genes coding for the desmocollins and desmogleins. These are members of the cadherin superfamily of calcium-dependent cell adhesion proteins present in the desmosome type of cell junction found especially in epithelial cells. They provide the strong cell-cell adhesion generated by this type of cell junction for which expression of both a desmocollin and a desmoglein is required. In the autoimmune skin diseases pemphigus foliaceous and pemphigus vulgaris (PV), where the autoantigens are, respectively, encoded by the DSG1 and DSG3 genes, severe areas of acantholysis (cell separation), potentially life-threatening in the case of PV, are evident. Dominant mutations in the DSG1 gene causing striate palmoplantar keratoderma result in hyperkeratosis of the skin on the parts of the body where pressure and abrasion are greatest, viz., on the palms and soles. These genes are also candidate tumor suppressor genes in squamous cell carcinomas and other epithelial cancers. We have screened two chromosome 18-specific cosmid libraries by hybridization with previously isolated YAC clones and DSC and DSG cDNAs, and a whole genome PAC library, both by hybridization with the YACs and by screening by PCR using cDNA sequences and YAC end sequence. The contigs were extended by further PCR screens using STSs generated by vectorette walking from the ends of the cosmids and PACs, together with sequence from PAC ends. Despite screening of two libraries, the cosmid contig still had four gaps. The PAC contig filled these gaps and in fact covered the whole locus. The positions of 45 STSs covering the whole of this region are presented. The desmocollin and desmoglein genes, which are about 30-35 kb in size, are quite well separated at approximately 20-30 kb apart and are arranged in two clusters, one DSC cluster and one DSG cluster, which are transcribed outward from the interlocus region. The order of the genes is correlated with the spatial order of gene expression in the developing mouse embryo, and this, and previous transgenic experiments, suggests that long-range genetic elements that coordinate expression of these genes may be present. The complete bacterial clone contig described in this paper is thus a resource not only for future sequencing but also for investigations into the control of expression of these clustered genes.  相似文献   

20.
As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones.Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号