首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).  相似文献   

2.
Summary Dialect-1, species-specific repetitive DNA sequence of barley Hordeum vulgare, was cloned and analysed by Southern blot and in situ hybridization. Dialect-1 is dispersed through all barley chromosomes with copy number 5,000 per genome. Two DNA fragments related to Dialect-1 were revealed in phage library, subcloned and mapped. All three clones are structurally heterogenous and it is suggested that the full-length genomic repeat encompassing Dialect-1 is large in size. The Dialect-1 DNA repeat is represented in the genomes of H. vulgare and ssp. agriocrithon and spontaneum in similar form and copy number; it is present in rearranged form with reduced copy number in the genomes of H. bulbosum and H. murinum, and it is absent from genomes of several wild barley species as well as from genomes of wheat, rye, oats and maize. Dialect-1 repeat may be used as a molecular marker in taxonomic studies and for identification of barley chromosomes in interspecies hybrids.  相似文献   

3.
A clustering method for repeat analysis in DNA sequences   总被引:1,自引:0,他引:1  
Volfovsky N  Haas BJ  Salzberg SL 《Genome biology》2001,2(8):research0027.1-research002711

Background

A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.

Results

The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences.

Conclusions

We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.  相似文献   

4.
Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.  相似文献   

5.

Background  

Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes.  相似文献   

6.
7.
Information about evolutionary relationships between species of the genusAllium is desirable in order to facilitate breeding programmes. One approach is to study the distribution of repetitive DNA sequences among species thought on taxonomic grounds, to be closely related. We have used fluorescent in-situ hybridisation (FISH) to examine seven species within sect.Cepa of the genus (A. altaicum, A. cepa, A. fistulosum, A. galanthum, A. pskemense, A. oschaninii andA. vavilovii), one species from sect.Rhizirideum (A. roylei), two species from sect.Allium (A. sativum andA. porrum) and one species from sect.Schoenoprasum (A. schoenoprasum). Each species was probed using a 375 bp repeat sequence isolated fromA. cepa (Barnes & al. 1985), which was generated and labelled by polymerase chain reaction (PCR). No signals were detected in anyAllium species not belonging to sect.Cepa with the exception ofA. roylei, whose designation in sect.Rhizirideum is now questioned. Within sect.Cepa the probe was found to hybridize to the terminal regions of the chromosome arms of all the species examined. In addition a number of interstitial bands were detected. Use of FISH reveals a more detailed map of the location of the repeat sequences than has previously been obtained by C-banding and other staining procedures. The distribution of the terminal and interstitial sites when compared, allow us to identify three species groups namely,A. altaicum andA. fistulosum; A. cepa, A. roylei, A. oschaninii andA. vavilovii; andA. galanthum andA. pskemense.  相似文献   

8.
9.
10.
To date, vertebrate DNA has been found methylated at the 5 position of cytosine exclusively in dinucleotide CpG or CpNpG stretches. On the the other hand, we determined that cytosine was methylated unusually in dinucleotide GpC at 5-GGCC-3 sequences in the teleost Sparus aurata EcoRI satellite DNA family. This finding is the first example of methylated GpC sequences in the eukaryotic genomes. At this regard, we have examined the relative methylation levels at this site of the highly repetitive EcoRI satellite DNA family from Sparus aurata different tissues. The EcoRI repeat was remarkably more methylated in male germ cells but hypomethylated in female germ cells at the Hae III restriction site ( GpC). The novel modification and the differential methylation pattern suggest that EcoRI satellite could have a structural and/or functional role at the centromeres of Sparus aurata.  相似文献   

11.
12.

Motivation

Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking.

Results

We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime.

Conclusion

LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.  相似文献   

13.
Scirtothrips perseae Nakahara was discovered attacking avocados in California, USA, in 1996. Host plant surveys in California indicated that S. perseae has a highly restricted host range with larvae being found only on avocados, while adults were collected from 11 different plant species. As part of a management program for this pest, a “classical” biological control program was initiated and foreign exploration was conducted to delineate the home range of S. perseae, to survey for associated natural enemies and inventory other species of phytophagous thrips on avocados grown in Mexico, Guatemala, Costa Rica, the Dominican Republic, Trinidad, and Brazil. Foreign exploration efforts indicate that S. perseae occurs on avocados grown at high altitudes (>1500 m) from Uruapan in Mexico south to areas around Guatemala City in Guatemala. In Costa Rica, S. perseae is replaced by an undescribed congener as the dominant phytophagous thrips on avocados grown at high altitudes (>1300 m). No species of Scirtothrips were found on avocados in the Dominican Republic, Trinidad, or Brazil. In total, 2136 phytophagous thrips were collected and identified, representing over 47 identified species from at least 19 genera. The significance of these species records is discussed. Of collected material 4% were potential thrips biological control agents. Natural enemies were dominated by six genera of predatory thrips (Aeolothrips, Aleurodothrips, Franklinothrips, Leptothrips, Scolothrips, and Karnyothrips). One genus each of parasitoid (Ceranisus) and predatory mite (Balaustium) were found. Based on the results of our sampling techniques, prospects for the importation of thrips natural enemies for use in a “classical” biological control program in California against S. perseae are not promising.  相似文献   

14.
Summary A set of species-specific repetitive DNA sequences was isolated from Lolium multiflorum and Festuca arundinacea. The degree of their species specificity as well as possible homologies among them were determined by dot-blot hybridization analysis. In order to understand the genomic organization of representative Lolium and Festuca-specific repetitive DNA sequences, we performed Southern blot hybridization and in situ hybridization to metaphase chromosomes.Southern blot hybridization analysis of eight different repetitive DNA sequences of L. multiflorum and one of F. arundinacea indicated either tandem and clustered arrangements of partially dispersed localization in their respective genomes. Some of these sequences, e.g. LMB3, showed a similar genomic organization in F. arundinacea and F. pratensis, but a slightly different organization and degree of redundancy in L. multiflorum. Clones sequences varied in size between 100 bp and 1.2 kb. Estimated copy number in the corresponding haploid genomes varied between 300 and 2×104. Sequence analysis of the highly species-specific sequences from plasmids pLMH2 and pLMB4 (L. multiflorum specific) and from pFAH1 (F. arundinacea specific) revealed some internal repeats without higher order. No homologies between the sequences or to other repetitive sequences were observed. In situ hybridization with these latter sequences to metaphase chromosomes from L. multiflorum, F. arundinacea and from symmetric sexual Festulolium hybrid revealed their relatively even distribution in the corresponding genomes. The in situ hybridization thus also allowed a clearcut simple identification of parental chromosomes in the Festulolium hybrid.The potential use of these species-specific clones as hybridization probes in quantitative dot-blot analysis of the genomic make-up of Festulolium (sexual and somatic) hybrids is also demonstrated.Abbreviations bp Base pair (s) - CMA chromomycin A3 - DAPI 4,6-diamidino-2-phenylindole - IPTG isopropyl -D-thio-galactopyranoside - kb kilobase pair(s) - NBT nitroblue tetrazolium chloride - X-gal 5-bromo-4-chloro-3-inonyl -D-galactopyranoside  相似文献   

15.
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).  相似文献   

16.
To improve the analysis of unknown flanking DNA sequences adjacent to known sequences in nuclear genomes of photoautotrophic eukaryotic organisms, we established the technique of ligation-mediated suppression-PCR (LMS-PCR) in the green alga Chlamydomonas reinhardtii for (1) walking from a specific nuclear insertion fragment of random knockout mutants into the unknown flanking DNA sequence to identify and analyse disrupted genomic DNA regions and for (2) walking from highly conserved DNA regions derived from known gene iso-forms into flanking DNA sequences to identify new members of protein families. The feasibility of LMS-PCR for these applications was successfully demonstrated in two different approaches. The first resulted in the identification of a genomic DNA fragment flanking a nuclear insertion vector in a random knockout mutant whose phenotype was characterised by its inability to perform functional LHC state transitions. The second approach targeted the cab gene family. An oligonucleotide of a cabII gene, derived from a highly conserved region, was used to identify potential cab gene regions in the nuclear genome of Chlamydomonas. LMS-PCR combined with 3′ rapid amplification of cDNA ends (3′ RACE) and a PCR-based screening of a cDNA library resulted in the identification of the new cabII gene lhcb4. Both results clearly indicate that LMS-PCR is a powerful tool for the identification of flanking DNA sequences in the nuclear genome of Chlamydomonas reinhardtii. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

17.
A long-range repeat family of more than 50 kb repeat size is clustered in Chromosomes (Chr) 1 of Mus musculus and M. spretus. In M. musculus this long-range repeat family shows considerable variation of copy-number frequency and contains coding regions for at least two genes. In an intron of a gene, which is part of the repeat, a B2 small interspersed repetitive element (SINE) is inserted at identical positions. The B2 element is present in all copies of the long-range repeat family; it was presumably a component of the ancestral single-copy precursor sequence that gave rise by amplification to the repeat family. Copies of the long-range repeat family vary with respect to the number of TAAA tandem repeats in the A-rich 3 end region of the B2 element. As inferred from polymerase chain reaction (PCR) data, presence and frequency of repeat number variants in the (TAAA)n block are strain and species specific. The B2 element and its flanking regions were sequenced from two copies of the long-range repeat family. Sequence divergence between the two copies (only non-CG base substitutions and deletions/insertions) was determined to be 2.6%. Based on the drift rate in human Alu elements and a correction for the higher drift rates in rodents, and estimate for the divergence time of 1.7 million years was calculated. Since the long-range repeat family is present in M. musculus and M. spretus, it must have evolved by amplification before the separation of the two species about 1–4 million years ago.  相似文献   

18.

Background  

Repetitive DNA is a major fraction of eukaryotic genomes and occurs particularly often in plants. Currently, the sequencing of the sugar beet (Beta vulgaris) genome is under way and knowledge of repetitive DNA sequences is critical for the genome annotation. We generated a c 0 t-1 library, representing highly to moderately repetitive sequences, for the characterization of the major B. vulgaris repeat families. While highly abundant satellites are well-described, minisatellites are only poorly investigated in plants. Therefore, we focused on the identification and characterization of these tandemly repeated sequences.  相似文献   

19.
Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.  相似文献   

20.
Long terminal repeat (LTR) retrotransposons are closely related to retroviruses, and their activities shape eukaryotic genomes. Here, we present a complete Lotus japonicus insertion mutant collection generated by identification of 640 653 new insertion events following de novo activation of the LTR element Lotus retrotransposon 1 (LORE1) ( http://lotus.au.dk ). Insertion preferences are critical for effective gene targeting, and we exploit our large dataset to analyse LTR element characteristics in this context. We infer the mechanism that generates the consensus palindromes typical of retroviral and LTR retrotransposon insertion sites, identify a short relaxed insertion site motif, and demonstrate selective integration into CHG‐hypomethylated genes. These characteristics result in a steep increase in deleterious mutation rate following activation, and allow LORE1 active gene targeting to approach saturation within a population of 134 682 L. japonicus lines. We suggest that saturation mutagenesis using endogenous LTR retrotransposons with germinal activity can be used as a general and cost‐efficient strategy for generation of non‐transgenic mutant collections for unrestricted use in plant research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号