首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background

Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed.

Results

In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage.

Conclusions

CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0381-3) contains supplementary material, which is available to authorized users.  相似文献   

2.
Polyploidy, the presence of multiple sets of chromosomes that are similar but not identical, complicates both chromosome walking and assembly of sequence-ready contigs for many plant taxa including a large number of economically-significant crops. Traditional ‘dot-blot hybridization’ or PCR-based assays for identifying BAC clones corresponding to a mapped DNA landmark usually do not provide sufficient information to distinguish between allelic and non-allelic loci. A restriction fragment matching method using pools of BAC DNA in combination with dot-blots reveals the locus specificity of individual BACs that correspond to multi-locus DNA probes, in a manner that can efficiently be applied on a large scale. This approach also provides an alternative means of mapping DNA loci that exploits many advantages of ‘radiation hybrid’ mapping in taxa for which such hybrids are not available. The BAC-RF method is a practical and reliable approach for using high-density RFLP maps to anchor sequence-ready BAC contigs in highly-duplicated genomes, provides an alternative to high-density robotic gridding for screening BAC libraries when the necessary equipment is not available, and permits the expedient isolation of individual members of multigene or repetitive DNA families for a wide range of genetic and evolutionary investigations.  相似文献   

3.
4.
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read." Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of "overlaps," i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the "UMD Overlapper," can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera's Drosophila reads. When we replaced Celera's overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.  相似文献   

5.
Gene finding in novel genomes   总被引:1,自引:0,他引:1  

Background  

Computational gene prediction continues to be an important problem, especially for genomes with little experimental data.  相似文献   

6.

Background

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.

Results

To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.

Conclusions

Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.  相似文献   

7.

Background  

Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs).  相似文献   

8.
Advances in the field of genomics and 'metagenomics' have dramatically revised our view of microbial biodiversity and its potential for biotechnological applications. Considering the estimation that >99% of microorganisms in most environments are not amenable to culturing, very little is known about their genomes, genes and encoded enzymatic activities. The isolation, archiving and analysis of environmental DNA (or so-called 'metagenomes') has enabled us to mine microbial diversity, allowing us to access their genomes, identify protein coding sequences and even to reconstruct biochemical pathways, providing insights into the properties and functions of these organisms. The generation and analysis of (meta)genomic libraries is thus a powerful approach to harvest and archive environmental genetic resources. It will enable us to identify which organisms are present, what they do, and how their genetic information can be beneficial to mankind.  相似文献   

9.

Background

Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/Principal Findings

For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/Significance

Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.  相似文献   

10.
Studies of the complete hepatitis C virus (HCV) life cycle have become possible with the development of an infectious cell culture system using the genotype 2a isolate JFH-1. Taking advantage of this system in the present study, we investigated whether HCV infection leads to superinfection exclusion, a state in which HCV-infected cells are resistant to secondary HCV infection. To discriminate between viral genomes, we inserted genes encoding fluorescent proteins in frame into the 3'-terminal NS5A coding region. These genomes replicated to wild-type levels and supported the production of infectious virus particles. Upon simultaneous infection of Huh-7 cells, co-replication of both viral genomes in the same cell was detected. However, when infections were performed sequentially, secondary infection was severely impaired. This superinfection exclusion was neither due to a reduction of cell surface expression of CD81 and scavenger receptor BI, two molecules implicated in HCV entry, nor due to a functional block at the level of virus entry. Instead, superinfection exclusion was mediated primarily by interference at the level of HCV RNA translation and, presumably, also replication. In summary, our results describe the construction and characterization of viable monocistronic HCV reporter genomes allowing detection of viral replication in infected living cells. By using these genomes, we found that HCV induces superinfection exclusion, which is primarily due to interference at a post-entry step.  相似文献   

11.
Assembly of DNA ‘parts’ to create larger constructs is an essential enabling technique for bioengineering and synthetic biology. Here we describe a simple method, PaperClip, which allows flexible assembly of multiple DNA parts from currently existing libraries cloned in any vector. No restriction enzymes, mutagenesis of internal restriction sites, or reamplification to add end homology are required. Order of assembly is directed by double stranded oligonucleotides—‘Clips’. Clips are formed by ligation of pairs of oligonucleotides corresponding to the ends of each part. PaperClip assembly can be performed by polymerase chain reaction or by cell extract-mediated recombination. Once multi-use Clips have been prepared, assembly of at least six DNA parts in any order can be accomplished with high efficiency within several hours.  相似文献   

12.
Binding of the N-terminus of fibronectin to assembly sites on the cell surface is an essential step in fibronectin fibrillogenesis. Fibronectin matrix assembly sites have customarily been quantified using an iodinated 70 kDa N-terminal fibronectin fragment. The 125I-70 K fragment is a less than ideal reagent because its preparation requires large amounts of plasma fibronectin and it has a fairly short shelf life. An additional limitation is that the cells responsible for binding the 125I-70 K cannot be quantified or identified directly but must be assessed in parallel cultures. To overcome these disadvantages, we developed an ELISA-based assay using a recombinant HA-tagged 70 K fragment. This assay allows for the simultaneous quantification and localization of matrix assembly sites on the surface of adherent cells.  相似文献   

13.

Background

There is a need to characterize genomes of the foodborne pathogen, Salmonella enterica serovar Enteritidis (SE) and identify genetic information that could be ultimately deployed for differentiating strains of the organism, a need that is yet to be addressed mainly because of the high degree of clonality of the organism. In an effort to achieve the first characterization of the genomes of SE of Canadian origin, we carried out massively parallel sequencing of the nucleotide sequence of 11 SE isolates obtained from poultry production environments (n = 9), a clam and a chicken, assembled finished genomes and investigated diversity of the SE genome.

Results

The median genome size was 4,678,683 bp. A total of 4,833 chromosomal genes defined the pan genome of our field SE isolates consisting of 4,600 genes present in all the genomes, i.e., core genome, and 233 genes absent in at least one genome (accessory genome). Genome diversity was demonstrable by the presence of 1,360 loci showing single nucleotide polymorphism (SNP) in the core genome which was used to portray the genetic distances by means of a phylogenetic tree for the SE isolates. The accessory genome consisted mostly of previously identified SE prophage sequences as well as two, apparently full- sized, novel prophages namely a 28 kb sequence provisionally designated as SE-OLF-10058 (3) prophage and a 43 kb sequence provisionally designated as SE-OLF-10012 prophage.

Conclusions

The number of SNPs identified in the relatively large core genome of SE is a reflection of substantial diversity that could be exploited for strain differentiation as shown by the development of an informative phylogenetic tree. Prophage sequences can also be exploited for SE strain differentiation and lineage tracking. This work has laid the ground work for further studies to develop a readily adoptable laboratory test for the subtyping of SE.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-713) contains supplementary material, which is available to authorized users.  相似文献   

14.
Fragment-based lead discovery is a new approach for lead generation that has emerged in the past decade. Because the initial fragments identified in the fragment screening typically show weak binding affinity, an intensive medicinal chemistry effort would be required to grow initial fragments into a potential lead compound. Here we demonstrate a kinase focused evolved fragment (KFEF) library, constructed by click chemistry-based fragment assembly, that is a valuable source of kinase inhibitors. This combinatorial assembly of two fragments, kinase-privileged alkyne fragments and diversified azide fragments, by two cycloaddition reactions shows a unique potential for the one-step synthesis of structurally diverse evolved fragments. The screening of this triazole-based KFEF library allowed the rapid identification of potent lead candidates for FLT3 and GSK3β kinase.  相似文献   

15.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

16.
Genome shrinkage is a common feature of most intracellular pathogens and symbionts. Reduction of genome sizes is among the best-characterized evolutionary ways of intracellular organisms to save and avoid maintaining expensive redundant biological processes. Endosymbiotic bacteria of insects are examples of biological economy taken to completion because their genomes are dramatically reduced. These bacteria are nonmotile, and their biochemical processes are intimately related to those of their host. Because of this relationship, many of the processes in these bacteria have been either lost or have suffered massive remodeling to adapt to the intracellular symbiotic lifestyle. An example of such changes is the flagellum structure that is essential for bacterial motility and infectivity. Our analysis indicates that genes responsible for flagellar assembly have been partially or totally lost in most intracellular symbionts of gamma-Proteobacteria. Comparative genomic analyses show that flagellar genes have been differentially lost in endosymbiotic bacteria of insects. Only proteins involved in protein export within the flagella assembly pathway (type III secretion system and the basal body) have been kept in most of the endosymbionts, whereas those involved in building the filament and hook of flagella have only in few instances been kept, indicating a change in the functional purpose of this pathway. In some endosymbionts, genes controlling protein-export switch and hook length have undergone functional divergence as shown through an analysis of their evolutionary dynamics. Based on our results, we suggest that genes of flagellum have diverged functionally as to specialize in the export of proteins from the bacterium to the host.  相似文献   

17.
Anti-neoplastic cytostatic antiproliferative agents, such as methotrexate, 6-mercaptopurine and cyclophosphamide, were originally used as immunosuppressive drugs. Although these agents induced only modest anti-rejection activity, they caused serious non-specific bone marrow suppression, impairing host resistance and increasing the incidence of infections. Unlike these non-selective agents, cyclosporine A, tacrolimus and sirolimus act more selectively on different stages of the T-lymphocyte (T-cell) and B-lymphocyte (B-cell) activation cycles; however, cyclosporine and tacrolimus are nephrotoxic, whereas sirolimus causes hypertriglyceridaemia. Thus, despite this progress, continued efforts must be made to develop and test new, potentially very selective agents. The agent 15-deoxyspergualin moderately inhibits both mitogen-stimulated T-cell proliferation and the generation of cytotoxic T lymphocytes (CTLs) but does not affect the production of interleukin 2 (IL-2). Another drug, FTY720, has a unique action to prevent rejection, by altering the homing of lymphocytes to the lymphoid compartments. The newest members of the family of antiproliferative agents, namely mycophenolate mofetil, leflunomide and brequinar, are potentially more selective than their predecessors. However, the most promising agents are produced using antisense technology. This approach involves the design of antisense oligodeoxynucleotides; these novel drugs are designed to block allograft rejection by blocking selected messenger RNA (mRNA). This review outlines the mechanisms of action, the limitations of application and the molecular or cellular targets of traditional agents, newly developed drugs and also antisense technology, which is an example of a new application of molecular medicine.  相似文献   

18.
We offer a guide to de novo genome assembly1 using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software.  相似文献   

19.
Uncovering functional associations for genes and gene products remains one of the most significant challenges in biology. The classical approaches, such as homology detection, are mainly suited for predicting approximate molecular function of a protein and should be used in context with other methods. Several studies have emerged that employ knowledge-based procedures to extract functional data for genes from a variety of biological sources. However, data derived from a single biological resource often provides only a limited perspective on their functional associations largely due to systematic bias in the underlying data. The post-genomic era has witnessed the emergence of knowledge-based studies that aim to decipher functional associations by combining several biological evidence types. These are expected to provide better insights into the functional aspects of diverse genes, genomes and networks.  相似文献   

20.
Polyomaviruses are small nonenveloped particles with a circular double-stranded genome, approximately 5 kbp in size. The mammalian polyomaviruses mainly cause persistent subclinical infections in their natural nonimmunocompromised hosts. In contrast, the polyomaviruses of birds--avian polyomavirus (APV) and goose hemorrhagic polyomavirus (GHPV)--are the primary agents of acute and chronic disease with high mortality rates in young birds. Screening of field samples of diseased birds by consensus PCR revealed the presence of two novel polyomaviruses in the liver of an Eurasian bullfinch (Pyrrhula pyrrhula griseiventris) and in the spleen of a Eurasian jackdaw (Corvus monedula), tentatively designated as finch polyomavirus (FPyV) and crow polyomavirus (CPyV), respectively. The genomes of the viruses were amplified by using multiply primed rolling-circle amplification and cloned. Analysis of the FPyV and CPyV genome sequences revealed a close relationship to APV and GHPV, indicating the existence of a distinct avian group among the polyomaviruses. The main characteristics of this group are (i) involvement in fatal disease, (ii) the existence of an additional open reading frame in the 5' region of the late mRNAs, and (iii) a different manner of DNA binding of the large tumor antigen compared to that of the mammalian polyomaviruses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号