期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats

Manisha Brahmachary Audrey Guilmatre Javier Quilez Dan Hasson Christelle Borel Peter Warburton Andrew J. Sharp 《PLoS genetics》2014,10(6)

Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity. 相似文献

2.

De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.)

Young-Min Jeong Won-Hyung Chung Jeong-Hwan Mun Namshin Kim Hee-Ju Yu 《Gene》2014

Radish (Raphanus sativus L.) is an edible root vegetable crop that is cultivated worldwide and whose genome has been sequenced. Here we report the complete nucleotide sequence of the radish cultivar WK10039 chloroplast (cp) genome, along with a de novo assembly strategy using whole genome shotgun sequence reads obtained by next generation sequencing. The radish cp genome is 153,368 bp in length and has a typical quadripartite structure, composed of a pair of inverted repeat regions (26,217 bp each), a large single copy region (83,170 bp), and a small single copy region (17,764 bp). The radish cp genome contains 87 predicted protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence analysis revealed the presence of 91 simple sequence repeats (SSRs) in the radish cp genome. 相似文献

3.

Biased distributions and decay of long interspersed nuclear elements in the chicken genome 总被引：1，自引：0，他引：1

下载免费PDF全文

Abrusán G Krambeck HJ Junier T Giordano J Warburton PE 《Genetics》2008,178(1):573-581

The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates. 相似文献

4.

Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs

Tammi MT Arner E Britton T Andersson B 《Bioinformatics (Oxford, England)》2002,18(3):379-388

An increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical method to separate arbitrarily long, almost identical repeats, which makes it possible to correctly assemble complex repetitive sequence regions. The differences between repeat units may be as low as 1% and the sequencing error may be up to ten times higher. The method is based on the realization that a comparison of only a part of all overlapping sequences at a time in a data set does not generate enough information for a conclusive analysis. Our method uses optimal multi-alignments consisting of all the overlaps of each read. This makes it possible to determine defined nucleotide positions, DNPs, which constitute the differences between the repeat units. Differences between repeats are distinguished from sequencing errors using statistical methods, where the probabilities of obtaining certain combinations of candidate DNPs are calculated using the information from the multi-alignments. The use of DNPs and combinations of DNPs will allow for optimal and rapid assemblies of repeated regions. This method can solve repeats that differ in only two positions in a read length, which is the theoretical limit for repeat separation. We predict that this method will be highly useful in shotgun sequencing in the future. 相似文献

5.

Complete Chloroplast Genome of Tanaecium tetragonolobum: The First Bignoniaceae Plastome

Alison Gon?alves Nazareno Monica Carlsen Lúcia Garcez Lohmann 《PloS one》2015,10(6)

Bignoniaceae is a Pantropical plant family that is especially abundant in the Neotropics. Members of the Bignoniaceae are diverse in many ecosystems and represent key components of the Tropical flora. Despite the ecological importance of the Bignoniaceae and all the efforts to reconstruct the phylogeny of this group, whole chloroplast genome information has not yet been reported for any members of the family. Here, we report the complete chloroplast genome sequence of Tanaecium tetragonolobum (Jacq.) L.G. Lohmann, which was reconstructed using de novo and referenced-based assembly of single-end reads generated by shotgun sequencing of total genomic DNA in an Illumina platform. The gene order and organization of the chloroplast genome of T. tetragonolobum exhibits the general structure of flowering plants, and is similar to other Lamiales chloroplast genomes. The chloroplast genome of T. tetragonolobum is a circular molecule of 153,776 base pairs (bp) with a quadripartite structure containing two single copy regions, a large single copy region (LSC, 84,612 bp) and a small single copy region (SSC, 17,586 bp) separated by inverted repeat regions (IRs, 25,789 bp). In addition, the chloroplast genome of T. tetragonolobum has 38.3% GC content and includes 121 genes, of which 86 are protein-coding, 31 are transfer RNA, and four are ribosomal RNA. The chloroplast genome of T. tetragonolobum presents a total of 47 tandem repeats and 347 simple sequence repeats (SSRs) with mononucleotides being the most common and di-, tri-, tetra-, and hexanucleotides occurring with less frequency. The results obtained here were compared to other chloroplast genomes of Lamiales available to date, providing new insight into the evolution of chloroplast genomes within Lamiales. Overall, the evolutionary rates of genes in Lamiales are lineage-, locus-, and region-specific, indicating that the evolutionary pattern of nucleotide substitution in chloroplast genomes of flowering plants is complex. The discovery of tandem repeats within T. tetragonolobum and the presence of divergent regions between chloroplast genomes of Lamiales provides the basis for the development of markers at various taxonomic levels. The newly developed markers have the potential to greatly improve the resolution of molecular phylogenies. 相似文献

6.

Shotgun haplotyping: a novel method for surveying allelic sequence variation

Lindsay SJ Bonfield JK Hurles ME 《Nucleic acids research》2005,33(18):e152

Haplotypic sequences contain significantly more information than genotypes of genetic markers and are critical for studying disease association and genome evolution. Current methods for obtaining haplotypic sequences require the physical separation of alleles before sequencing, are time consuming and are not scaleable for large surveys of genetic variation. We have developed a novel method for acquiring haplotypic sequences from long PCR products using simple, high-throughput techniques. This method applies modified shotgun sequencing protocols to sequence both alleles concurrently, with read-pair information allowing the two alleles to be separated during sequence assembly. Although the haplotypic sequences can be assembled manually from the resultant data using pre-existing sequence assembly software, we have devised a novel heuristic algorithm to automate assembly and remove human error. We validated the approach on two long PCR products amplified from the human genome and confirmed the accuracy of our sequences against full-length clones of the same alleles. This method presents a simple high-throughput means to obtain full haplotypic sequences potentially up to 20 kb in length and is suitable for surveying genetic variation even in poorly-characterized genomes as it requires no prior information on sequence variation. 相似文献

7.

Correcting base-assignment errors in repeat regions of shotgun assembly

Zhi D Keich U Pevzner P Heber S Tang H 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(1):54-64

Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project 相似文献

8.

串联重复序列的物种差异及其生物功能 总被引：13，自引：0，他引：13

高焕孔杰《动物学研究》2005,26(5):555-564

串联重复序列是指1-200个碱基左右的核心重复单位,以头尾相串联的方式重复多次所组成的重复序列。它广泛存在于真核生物和一些原核生物的基因组中,并表现出种属、碱基组成等的特异性。在基因组整体水平上,各种优势的重复序列类型不同。即使在同一重复序列类型内部,不同重复拷贝类别(如AT、AC 等)在基因组中的存在也表现出很大的差异。同时,这些重复序列类型和各重复拷贝类别在同一物种的不同染色体间,以及基因的编码区和非编码区间也表现种属和碱基组成差异。这些差异显示了重复序列起源和进化的复杂性,可能涉及到多种机制和因素,并与生物功能密切相关。另外,由于重复序列分析软件和统计标准还存在算法、重复长度、完美性等问题,需要进一步探讨。此外,串联重复序列的自身进化关系、全基因组水平上的进化地位、在基因组中的生物功能、重复序列数据库建立和应用研究等,将是今后研究的主要课题。相似文献

9.

Whole-genome sequencing and assembly with high-throughput, short-read technologies

Sundquist A Ronaghi M Tang H Pevzner P Batzoglou S 《PloS one》2007,2(5):e484

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. 相似文献

10.

Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data

Finotello F Lavezzo E Fontana P Peruzzo D Albiero A Barzon L Falda M Di Camillo B Toppo S 《Briefings in bioinformatics》2012,13(3):269-280

Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers' performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results. The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20-30 ×. 相似文献

11.

Genome wide survey of microsatellites in ssDNA viruses infecting vertebrates

Ankit Jain Nikhil MittalPrakash C. Sharma 《Gene》2014

Microsatellites or Simple Sequence Repeats (SSRs) are tandem iterations of one to six base pairs, non-randomly distributed throughout prokaryotic and eukaryotic genomes. Limited knowledge is available about distribution of microsatellites in single stranded DNA (ssDNA) viruses, particularly vertebrate infecting viruses. We studied microsatellite distribution in 118 ssDNA virus genomes belonging to three families of vertebrate infecting viruses namely Circoviridae, Parvoviridae, and Anelloviridae, and found that microsatellites constitute an important component of these virus genomes. Mononucleotide repeats were predominant followed by dinucleotide and trinucleotide repeats. A strong positive relationship existed between number of mononucleotide repeats and genome size among all the three virus families. A similar relationship existed for the occurrence of DTTPH (di-, tri-, tetra-, penta- and hexa-nucleotide) repeats in the families Anelloviridae and Parvoviridae only. Relative abundance and relative density of mononucleotide repeats showed a strong positive relationship with genome size in Circoviridae and Parvoviridae. However, in the case of DTTPH repeats, these features showed a strong relationship with genome size in Circoviridae only. On the other hand, relative microsatellite abundance and relative density of mononucleotide repeats were negatively correlated with GC content (%) in Parvoviridae genomes. On the basis of available annotations, our analysis revealed maximum occurrence of mononucleotide as well as DTTPH repeats in the coding regions of these virus genomes. Interestingly, after normalizing the length of the coding and non-coding regions of each virus genome, we found relative density of microsatellites much higher in the non-coding regions. We understand that the present study will help in the better characterization of the stability, genome organization and evolution of these virus classes and may provide useful leads to decipher the etiopathogenesis of these viruses. 相似文献

12.

Fragment assembly with double-barreled data

Pevzner PA Tang H 《Bioinformatics (Oxford, England)》2001,17(Z1):S225-S233

For the last twenty years fragment assembly was dominated by the "overlap - layout - consensus" algorithms that are used in all currently available assembly tools. However, the limits of these algorithms are being tested in the era of genomic sequencing and it is not clear whether they are the best choice for large-scale assemblies. Although the "overlap - layout - consensus" approach proved to be useful in assembling clones, it faces difficulties in genomic assemblies: the existing algorithms make assembly errors even in bacterial genomes. We abandoned the "overlap - layout - consensus" approach in favour of a new Eulerian Superpath approach that outperforms the existing algorithms for genomic fragment assembly (Pevzner et al. 2001 InProceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB-01), 256-26). In this paper we describe our new EULER-DB algorithm that, similarly to the Celera assembler takes advantage of clone-end sequencing by using the double-barreled data. However, in contrast to the Celera assembler, EULER-DB does not mask repeats but uses them instead as a powerful tool for contig ordering. We also describe a new approach for the Copy Number Problem: "How many times a given repeat is present in the genome?". For long nearly-perfect repeats this question is notoriously difficult and some copies of such repeats may be "lost" in genomic assemblies. We describe our EULER-CN algorithm for the Copy Number Problem that proved to be successful in difficult sequencing projects. 相似文献

13.

Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum

Clícia Grativol Michael Regulski Marcelo Bertalan W. Richard McCombie Felipe Rodrigues da Silva Adhemar Zerlotini Neto Renato Vicentini Laurent Farinelli Adriana Silva Hemerly Robert A. Martienssen Paulo Cavalcanti Gomes Ferreira 《The Plant journal : for cell and molecular biology》2014,79(1):162-172

Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene‐rich regions. Gene‐enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene‐enrichment strategy, we have compared assemblies using methyl‐filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA‐seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single‐nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop. 相似文献

14.

Genome-Wide Analysis of Repetitive Elements in Papaya

Niranjan Nagarajan Rafael Navajas-Pérez Mihai Pop Maqsudul Alam Ray Ming Andrew H. Paterson Steven L. Salzberg 《Tropical plant biology》2008,1(3-4):191-201

Papaya (Carica papaya L.) is an important fruit crop cultivated in tropical and subtropical regions worldwide. A first draft of its genome sequence has been recently released. Together with Arabidopsis, rice, poplar, grapevine and other genomes in the pipeline, it represents a good opportunity to gain insight into the organization of plant genomes. Here we report a detailed analysis of repetitive elements in the papaya genome, including transposable elements (TEs), tandemly-arrayed sequences, and high copy number genes. These repetitive sequences account for ～56% of the papaya genome with TEs being the most abundant at 52%, tandem repeats at 1.3% and high copy number genes at 3%. Most common types of TEs are represented in the papaya genome with retrotransposons being the dominant class, accounting for 40% of the genome. The most prevalent retrotransposons are Ty3-gypsy (27.8%) and Ty1-copia (5.5%). Among the tandem repeats, microsatellites are the most abundant in number, but represent only 0.19% of the genome. Minisatellites and satellites are less abundant, but represent 0.68% and 0.43% of the genome, respectively, due to greater repeat length. Despite an overall smaller gene repertoire in papaya than many other angiosperms, a significant fraction of genes (>2%) are present in large gene families with copy number greater than 20. This repeat database clarified a major part of the papaya genome organization and partly explained the lower gene repertoire in papaya than in Arabidopsis. 相似文献

15.

Gap statistics for whole genome shotgun DNA sequencing projects

Wendl MC Yang SP 《Bioinformatics (Oxford, England)》2004,20(10):1527-1534

MOTIVATION: Investigators utilize gap estimates for DNA sequencing projects. Standard theories assume sequences are independently and identically distributed, leading to appreciable under-prediction of gaps. RESULTS: Using a statistical scaling factor and data from 20 representative whole genome shotgun projects, we construct regression equations that relate coverage to a normalized gap measure. Prokaryotic genomes do not correlate to sequence coverage, while eukaryotes show strong correlation if the chaff is ignored. Gaps decrease at an exponential rate of only about one-third of that predicted via theory alone. Case studies suggest that departure from theory can largely be attributed to assembly difficulties for repeat-rich genomes, but bias and coverage anomalies are also important when repeats are sparse. Such factors cannot be readily characterized a priori, suggesting upper limits on the accuracy of gap prediction. We also find that diminishing coverage probability discussed in other studies is a theoretical artifact that does not arise for the typical project. 相似文献

16.

Towards a whole‐genome sequence for rye (Secale cereale L.)

下载免费PDF全文

Eva Bauer Thomas Schmutzer Ivan Barilar Martin Mascher Heidrun Gundlach Mihaela M. Martis Sven O. Twardziok Bernd Hackauf Andres Gordillo Peer Wilde Malthe Schmidt Viktor Korzun Klaus F.X. Mayer Karl Schmid Chris‐Carolin Schön Uwe Scholz 《The Plant journal : for cell and molecular biology》2017,89(5):853-869

We report on a whole‐genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole‐genome shotgun sequencing of the 7.9‐Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low‐copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high‐density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high‐density genetic map and establishing a synteny‐based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome‐wide map of selection signals differentiating the divergent gene pools. This rye whole‐genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome‐based breeding in rye. 相似文献

17.

Terminally Repeated Sequences on a Herpesvirus Genome Are Deleted following Circularization but Are Reconstituted by Duplication during Cleavage and Packaging of Concatemeric DNA

下载免费PDF全文

Daniel E. Nixon Michael A. McVoy 《Journal of virology》2002,76(4):2009-2013

The mechanisms underlying cleavage of herpesvirus genomes from replicative concatemers are unknown. Evidence from herpes simplex virus type 1 suggests that cleavage occurs by a nonduplicative process; however, additional evidence suggests that terminal repeats may also be duplicated during the cleavage process. This issue has been difficult to resolve due to the variable numbers of reiterated terminal repeats that the herpes simplex virus type 1 genome can contain. Guinea pig cytomegalovirus is a herpesvirus with a simple terminal repeat arrangement that defines two genome types. Type II genomes have a single copy of a 1-kb terminal repeat at both their left and right termini, whereas type I genomes have only one copy at their left termini and lack the repeat at their right termini. In a previous study, we constructed a recombinant guinea pig cytomegalovirus in which certain cis elements were disrupted such that only type II genomes were produced. Here we show that double repeats that are formed by circularization of infecting genomes are rapidly converted to single repeats, such that the junctions between genomes within replicative concatemers formed late in infection almost exclusively contain single copies of the terminal repeat. Therefore, for the recombinant virus, each cleavage event begins with a single repeat within a concatemer yet produces two repeats, one at each of the resulting termini, demonstrating that terminal repeat duplication occurs in conjunction with cleavage. For wild-type guinea pig cytomegalovirus, the formation of type I genomes further suggests that cleavage can also occur by a nonduplicative process and that duplicative and nonduplicative cleavage can occur concurrently. Other herpesviruses having terminal repeats, such as the herpes simplex viruses and human cytomegalovirus, may also utilize repeat duplication and deletion; however, the biological importance of these events remains unknown. 相似文献

18.

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Jiang Du Robert D. Bjornson Zhengdong D. Zhang Yong Kong Michael Snyder Mark B. Gerstein 《PLoS computational biology》2009,5(7)

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost. 相似文献

19.

Multiplex sequencing of bacterial artificial chromosomes for assembling complex plant genomes

下载免费PDF全文

Sebastian Beier Axel Himmelbach Thomas Schmutzer Marius Felder Stefan Taudien Klaus F. X. Mayer Matthias Platzer Nils Stein Uwe Scholz Martin Mascher 《Plant biotechnology journal》2016,14(7):1511-1522

Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path. 相似文献

20.

Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis and other competent prokaryotes.

E P Rocha A Danchin A Viari 《Molecular biology and evolution》1999,16(9):1219-1230

Prokaryotic genomes seem to be optimized toward compactness and have therefore been thought to lack long redundant DNA sequences. However, we identified a large number of long strict repeats in eight prokaryotic complete genomes and found that their density is negatively correlated with genome size. A detailed analysis of the long repeats present in the genome of Bacillus subtilis revealed a very strict constraint on the spatial distribution of repeats in this genome. We interpret this as the hallmark of selection processes leading to the addition of new genetic information. Such addition is independent of insertion sequences and relies on the nonspecific DNA uptake by the competent cell and its subsequent integration in the chromosome in a circular form through a Campbell-like mechanism. Similar patterns are found in other competent genomes of Gram-negative bacteria and Archaea, suggesting a similar evolutionary mechanism. The correlation of the spatial distribution of repeats and the absence of insertion sequences in a genome may indicate, in the framework of our model, that mechanisms aiming at their avoidance/elimination have been developed. 相似文献