首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Complex genomic libraries are increasingly being used to retrieve complete genes, operons or large genomic fragments directly from environmental samples, without the need to cultivate the respective microorganisms. We report on the construction of three large-insert fosmid libraries in total covering 3 Gbp of community DNA from two different soil samples, a sandy ecosystem and a mixed forest soil. In a fosmid end sequencing approach including 5376 sequence tags of approximately 700 bp length, we show that mostly bacterial and, to a much lesser extent, archaeal and eukaryotic genome fragments (approximately 1% each) have been captured in our libraries. The diversity of putative protein-encoding genes, as reflected by their distribution into different COG clusters, was comparable to that encoded in complete genomes of cultivated microorganisms. A huge variety of genomic fragments has been captured in our libraries, as seen by comparison with sequences in the public databases and by the large variation in G+C contents. We dissect differences between the libraries, which relate to the different ecosystems analysed and to biases introduced by different DNA preparations. Furthermore, a range of taxonomic marker genes (other than 16S rRNA) has been identified that allows the assignment of genome fragments to specific lineages. The complete sequences of two genome fragments identified as being affiliated with Archaea, based on a gene encoding a CDC48 homologue and a thermosome subunit, respectively, are presented and discussed. We thereby extend the genomic information of uncultivated crenarchaeota from soil and offer hints to specific metabolic traits present in this group.  相似文献   

2.
Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.  相似文献   

3.
SUMMARY: We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. MOTIVATION: Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries.  相似文献   

4.
Nearly 7000 Arabidopsis thaliana -expressed sequence tags (ESTs) from 10 cDNA libraries have been sequenced, of which almost 5000 non-redundant tags have been submitted to the EMBL data bank. The quality of the cDNA libraries used is analysed. Similarity searches in international protein data banks have allowed the detection of significant similarities to a wide range of proteins from many organisms. Alignment with ESTs from the rice systematic sequencing project has allowed the detection of amino acid motifs which are conserved between the two organisms, thus identifying tags to genes encoding highly conserved proteins. These genes are candidates for a common framework in genome mapping projects in different plants.  相似文献   

5.
The first sequenced plant genome, from the small mustard plant Arabidopsis thaliana, was published at the end of 2000. The sequencing of the rice genome is well under way. The sizes of plant genomes vary by a factor of up to 1000, and many important crop plants have genomes that are several times larger than the human genome. To gain insight into the gene toolbox of plant species, numerous large-scale EST sequencing projects have been launched successfully, and analysis procedures are constantly being refined to add maximum value to the sequence data. In addition, an alternative approach to exclude repetitive noncoding DNA and to enrich sequence libraries for gene-containing genomic regions has been developed. This strategy has the potential to deliver information about both genes and regulatory regions outside the transcribed regions.  相似文献   

6.
BAC libraries generated from restriction-digested genomic DNA display representational bias and lack some sequences. To facilitate completion of genome projects, procedures have been developed to create BACs from DNA physically sheared to create fragments extending up to 200 kb. The DNA fragments were repaired to create blunt ends and ligated to a new BAC vector. This approach has been tested by generating BAC libraries from Drosophila DNA with insert lengths between 50 and 150 kb. The libraries lack chimeric clone problems as determined by mapping paired BAC-end sequences to the assembled fly genome sequence. The utility of "sheared" libraries was demonstrated by closure of a previous clone gap and by isolation of clones from telomeric regions, which were notably absent from previous Drosophila BAC libraries.  相似文献   

7.
8.

Background

Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.

Results

We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.

Conclusion

Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.  相似文献   

9.
The development of an effective HIV vaccine is both a pressing and a formidable problem. The most encouraging results to date have been achieved using live-attenuated immunodeficiency viruses. However, the frequency of pathogenic breakthroughs has been a deterrent to their development. We suggest that expression libraries generated from viral DNA can produce the immunologic advantages of live vaccines without risk of reversion to pathogenic viruses. The plasmid libraries could be deconvoluted into useful components or administered as complex mixtures. To explore this approach, we designed and tested several of these genetic live vaccines (GLVs) for HIV. We constructed libraries by cloning overlapping fragments of the proviral genome into mammalian expression plasmids, then used them to immunize mice. We found that inserting library fragments into a vector downstream of a secretory gene sequence led to augmented antibody responses, and insertion downstream of a ubiquitin sequence enhanced cytotoxic lymphocyte responses. Also, fragmentation of gag into subgenes broadened T-cell epitope recognition. We have fragmented the genome by sequence-directed and random methods to create libraries with different features. We propose that the characteristics of GLVs support their further investigation as an approach to protection against HIV and other viral pathogens.  相似文献   

10.
The availability of entire genome sequences is expected to revolutionize the way in which biology and medicine are conducted for years to come. However, achieving this promise still requires significant effort in the areas of gene annotation, cloning and expression of thousands of known and heretofore unknown protein-encoding genes. Traditional technologies of manipulating genes are too cumbersome and inefficient when one is dealing with more than a few genes at a time. Entire libraries composed of all protein-encoding open reading frames (ORFs) cloned in highly flexible vectors will be needed to take full advantage of the information found in any genome sequence. The creation of such ORFeome resources using novel technologies for cloning and expressing entire proteomes constitutes an effective gateway from whole genome sequencing efforts to downstream 'omics' applications.  相似文献   

11.
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.  相似文献   

12.
Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.  相似文献   

13.
Second generation sequencing has been widely used to sequence whole genomes. Though various paired-end sequencing methods have been developed to construct the long scaffold from contigs derived from shotgun sequencing, the classical paired-end sequencing of the Bacteria Artificial Chromosome (BAC) or fosmid libraries by the Sanger method still plays an important role in genome assembly. However, sequencing libraries with the Sanger method is expensive and time-consuming. Here we report a new strategy to sequence the paired-ends of genomic libraries with parallel pyrosequencing, using a Chinese amphioxus (Branchiostoma belcheri) BAC library as an example. In total, approximately 12,670 non-redundant paired-end sequences were generated. Mapping them to the primary scaffolds of Chinese amphioxus, we obtained 413 ultra-scaffolds from 1,182 primary scaffolds, and the N50 scaffold length was increased approximately 55 kb, which is about a 10% improvement. We provide a universal and cost-effective method for sequencing the ultra-long paired-ends of genomic libraries. This method can be very easily implemented in other second generation sequencing platforms.  相似文献   

14.
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.  相似文献   

15.
Genomic libraries have been constructed from bovine C. parvum DNA in the lambda ZAP and lambda DASH vectors. Based on an estimated genome size of 2 x 10(4) kilobases (kb), each recombinant library contains greater than 10 genomic equivalents. The average recombinant size for the lambda ZAP library is 2.1 kb and for the lambda DASH library is 14 kb. We have identified genes to major antigens recognized by hyperimmune bovine antiserum. These recombinants are currently being purified and characterized. Limited DNA sequence analysis of random C. parvum clones confirms suggestions that the genome is quite AT-rich. The DNA sequence of random lambda ZAP fusion proteins has identified a potential ATPase, a structural protein and a DNA-binding protein.  相似文献   

16.
The common marmoset is a new world monkey, which has become a valuable experimental animal for biomedical research. This study developed cDNA libraries for the common marmoset from five different tissues. A total of 290 426 high-quality EST sequences were obtained, where 251 587 sequences (86.5%) had homology (1E−100) with the Refseqs of six different primate species, including human and marmoset. In parallel, 270 673 sequences (93.2%) were aligned to the human genome. When 247 090 sequences were assembled into 17 232 contigs, most of the sequences (218 857 or 15 089 contigs) were located in exonic regions, indicating that these genes are expressed in human and marmoset. The other 5578 sequences (or 808 contigs) mapping to the human genome were not located in exonic regions, suggesting that they are not expressed in human. Furthermore, a different set of 118 potential coding sequences were not similar to any Refseqs in any species, and, thus, may represent unknown genes. The cDNA libraries developed in this study are available through RIKEN Bio Resource Center. A Web server for the marmoset cDNAs is available at http://marmoset.nig.ac.jp/index.html, where each marmoset EST sequence has been annotated by reference to the human genome. These new libraries will be a useful genetic resource to facilitate research in the common marmoset.  相似文献   

17.
Simple sequence repeat (SSR) loci are an important marker type for population genetic studies despite the limitation that development of novel loci requires construction and screening of genomic DNA libraries. The common practice of size fractioning genomic DNA before cloning could lead to differential representation of SSR loci within genomic libraries. In addition, linkage mapping studies have shown that small numbers of SSR markers are not randomly distributed within the genomes from which they are isolated. From attempts to clone five SSR repeat sequences in two wild plant species we show that the numbers and repeat type of potential SSR markers depend on the restriction endonuclease used to sample the genome when constructing DNA libraries. This observation is consistent with unequal sampling of the genome by different restriction enzymes. However, as a group the five SSR repeat sequences are not associated with a given restriction enzyme, suggesting they are not clumped within the genome. Use of multiple restriction enzymes to construct DNA libraries may help ensure that cloned SSR loci are drawn from diverse locations in the genome, helping to meet the assumption of randomly located marker loci required for population genetic inferences.  相似文献   

18.
缅甸陆龟线粒体全基因组的测序及分析   总被引:4,自引:0,他引:4  
张颖  聂刘旺  宋娇莲 《动物学报》2007,53(1):151-158
本文参照近缘物种的线粒体基因组序列,设计17对特异引物,采用LD-PCR、PCR及测序技术获得了我国广西产缅甸陆龟的线粒体全基因组序列,分析了其基因组特点和各基因的定位。结果表明:缅甸陆龟线粒体基因组全长为16813bp,碱基组成为35.30%A、26.47%T、12.09%G、26.14%C,包括13个蛋白质编码基因、2个rRNA基因、22个tRNA基因和1个非编码基因控制区(D-Loop区)。缅甸陆龟线粒体基因组各基因长度、位置与典型的脊椎动物相似,其编码蛋白质区域和rRNA基因与其它脊椎动物具有很高的同源性,显示龟类线粒体基因组在进化上十分保守。将缅甸陆龟的线粒体基因组序列提交到GenBank,获得的检索号为DQ656607。本文还结合GenBank中已发表的其它16种龟鳖类动物的线粒体基因组序列,探讨龟鳖类动物不同科间的系统进化关系。  相似文献   

19.
The gene for Saccharomyces cerevisiae inorganic pyrophosphatase, PPA, has been cloned by hybridization of "long" oligonucleotide probes with both cDNA and genomic S. cerevisiae libraries. The nucleotide sequence of 1612 bp from a genomic subclone that includes the entire coding region gives a deduced amino acid sequence that has nine differences (out of a total of 286 residues) from the previously published amino acid sequence that was determined directly. The codon usage in PPA is as expected for a "highly expressed" yeast gene. The upstream region contains a poly dA/dT sequence that might comprise a constitutive promoter. The PPA gene appears to be present in a single copy within the S. cerevisiae genome and has been localized to chromosome II.  相似文献   

20.
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号