首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have sequenced two complete chloroplast genomes in the Asteraceae, Helianthus annuus (sunflower), and Lactuca sativa (lettuce), which belong to the distantly related subfamilies, Asteroideae and Cichorioideae, respectively. The Helianthus chloroplast genome is 151?104 bp and the Lactuca genome is 152?772 bp long, which is within the usual size range for chloroplast genomes in flowering plants. When compared to tobacco, both genomes have two inversions: a large 22.8-kb inversion and a smaller 3.3-kb inversion nested within it. Pairwise sequence divergence across all genes, introns, and spacers in Helianthus and Lactuca has resulted in the discovery of new, fast-evolving DNA sequences for use in species-level phylogenetics, such as the trnY-rpoB, trnL-rpl32, and ndhC-trnV spacers. Analysis and categorization of shared repeats resulted in seven classes useful for future repeat studies: double tandem repeats, three or more tandem repeats, direct repeats dispersed in the genome, repeats found in reverse complement orientation, hairpin loops, runs of A's or T's in excess of 12 bp, and gene or tRNA similarity. Results from BLAST searches of our genomic sequence against expressed sequence tag (EST) databases for both genomes produced eight likely RNA edited sites (C → U changes). These detailed analyses in Asteraceae contribute to a broader understanding of plastid evolution across flowering plants.  相似文献   

2.
Mononucleotide repeats (MNRs) are abundant in eukaryotic genomes and exhibit a high degree of length variability due to insertion and deletion events. However, the relationship between these repeats and mutation rates in surrounding sequences has not been systematically investigated. We have analyzed the frequency of single nucleotide polymorphisms (SNPs) at positions close to and within MNRs in the human genome. Overall, we find a 2- to 4-fold increase in the SNP frequency at positions immediately adjacent to the boundaries of MNRs, relative to that at more distant bases. This relationship exhibits a strong asymmetry between 3' and 5' ends of repeat tracts and is dependent upon the repeat motif, length and orientation of surrounding repeats. Our analysis suggests that the incorporation or exclusion of bases adjacent to the boundary of the repeat through substitutions, in which these nucleotides mutate towards or away from the base present within the repeat, respectively, may be another mechanism by which MNRs expand and contract in the human genome.  相似文献   

3.
Repetitive sequences are a major constituent of many eukaryote genomes and play roles in gene regulation, chromosome inheritance, nuclear architecture, and genome stability. The identification of repetitive elements has traditionally relied on in-depth, manual curation and computational determination of close relatives based on DNA identity. However, the rapid divergence of repetitive sequence has made identification of repeats by DNA identity difficult even in closely related species. Hence, the presence of unidentified repeats in genome sequences affects the quality of gene annotations and annotation-dependent analyses (e.g. microarray analyses). We have developed an enhanced repeat identification pipeline using two approaches. First, the de novo repeat finding program PILER-DF was used to identify interspersed repetitive elements in several recently finished Dipteran genomes. Repeats were classified, when possible, according to their similarity to known elements described in Repbase and GenBank, and also screened against annotated genes as one means of eliminating false positives. Second, we used a new program called RepeatRunner, which integrates results from both RepeatMasker nucleotide searches and protein searches using BLASTX. Using RepeatRunner with PILER-DF predictions, we masked repeats in thirteen Dipteran genomes and conclude that combining PILER-DF and RepeatRunner greatly enhances repeat identification in both well-characterized and un-annotated genomes.  相似文献   

4.
WindowMasker: window-based masker for sequenced genomes   总被引:3,自引:0,他引:3  
MOTIVATION: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes. RESULTS: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis. AVAILABILITY: WM is included in the NCBI C++ toolkit. The source code for the entire toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. Once the toolkit source is unpacked, the instructions for building WindowMasker application in the UNIX environment can be found in file src/app/winmasker/README.build. SUPPLEMENTARY INFORMATION: Supplementary data are available at ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf  相似文献   

5.
A substantial fraction of vertebrate and invertebrate genomes is composed of mobile elements and their derivatives. One of the most intensively studied transposon families, the P elements of Drosophila, was thought to exist exclusively in the genomes of dipteran insects. Based on the data provided by the human genome project, in 2001 our group has identified a P element-homologous sequence in the human genome. This P element-homologous human gene, named Phsa, is 19,533 nucleotides long, comprises six exons and five introns, and encodes a protein of still unknown function with a length of 903 amino acid residues. The N-terminal THAP domain of the putative Phsa protein shows similarities to the site-specific DNA-binding domain of the Drosophila P element transposase. In the present study, FISH analysis and the screening of a human lambda genomic library revealed a single copy of Phsa located on the long arm of chromosome 4, upstream of a gene coding for the hypothetical protein DKFZp686L1814. The same gene arrangement was found for the homologous gene Pgga in the genome of chicken, thus, displaying Pgga at orthologous position on the long arm of chromosome 4. The single-copy gene status and the absence of terminal inverted repeats and target-site duplications indicate that Phsa and Pgga constitute domesticated stationary sequences. In contrast, a considerable number of P-homologous sequences with terminal inverted repeats and intact target-site duplications could be identified in zebrafish, strongly indicating that Pdre elements were mobile within the zebrafish genome. Pdre elements are the first P-like transposons identified in a vertebrate species. With respect to Phsa, gene expression studies showed that Phsa is expressed in a broad range of human tissues, suggesting that the putative Phsa protein plays a not yet understood but essential role in a specific metabolic pathway. We demonstrate that P-homologous DNA sequences occur in the genomes of 21 analyzed vertebrates but only as rudiments in the rodents. Finally, the evolutionary history of P element-homologous vertebrate sequences is discussed in the context of the "molecular domestication" hypothesis versus the "source gene hypothesis."  相似文献   

6.
Interactions between the termini of adeno-associated virus DNA   总被引:10,自引:0,他引:10  
  相似文献   

7.
Isolation and Characterization of Microsatellites in Snap Bean   总被引:1,自引:0,他引:1  
The objectives of this study were to isolate and characterize microsatellites from a heat tolerant variety of snap bean (Phaseolus vulgaris L.) in order to generate polymorphic genetic markers linked to quantitative trait loci for heat tolerance. A genomic library contained 400-800 bp inserts was constructed and screened for the presence of (GA/CT)n and (CA/GT)n repeats. The proportion of positive clones yielded estimated of 3.72×10 4 such dinucleotide repeats per genome, roughly comparable to the abundance reported in other eukaryotic genomes. Twenty-six positive clones were sequenced. In contrast to mammalian genomes, the (GA/CT)n motif was much more abundant than the (CA/GT)n motif in these clones. The (GA/CT)n repeats also showed longer average repeat length (mean n=10.4 versus 6.5), suggesting that they are better candidates for yielding polymorphic genetic markers in the snap bean genome.  相似文献   

8.
The mechanisms underlying cleavage of herpesvirus genomes from replicative concatemers are unknown. Evidence from herpes simplex virus type 1 suggests that cleavage occurs by a nonduplicative process; however, additional evidence suggests that terminal repeats may also be duplicated during the cleavage process. This issue has been difficult to resolve due to the variable numbers of reiterated terminal repeats that the herpes simplex virus type 1 genome can contain. Guinea pig cytomegalovirus is a herpesvirus with a simple terminal repeat arrangement that defines two genome types. Type II genomes have a single copy of a 1-kb terminal repeat at both their left and right termini, whereas type I genomes have only one copy at their left termini and lack the repeat at their right termini. In a previous study, we constructed a recombinant guinea pig cytomegalovirus in which certain cis elements were disrupted such that only type II genomes were produced. Here we show that double repeats that are formed by circularization of infecting genomes are rapidly converted to single repeats, such that the junctions between genomes within replicative concatemers formed late in infection almost exclusively contain single copies of the terminal repeat. Therefore, for the recombinant virus, each cleavage event begins with a single repeat within a concatemer yet produces two repeats, one at each of the resulting termini, demonstrating that terminal repeat duplication occurs in conjunction with cleavage. For wild-type guinea pig cytomegalovirus, the formation of type I genomes further suggests that cleavage can also occur by a nonduplicative process and that duplicative and nonduplicative cleavage can occur concurrently. Other herpesviruses having terminal repeats, such as the herpes simplex viruses and human cytomegalovirus, may also utilize repeat duplication and deletion; however, the biological importance of these events remains unknown.  相似文献   

9.
For the Xanthomonas campestris pathovar campestris wild-type strain B100 a plasmid-based clone library was constructed. The plasmids carried chromosomal fragments of 3-4 kb in size that were tagged in vitro with the artificial transposon KAN-2. More than 3000 of the transposon target sites were characterized by DNA sequencing. The sequences obtained were compared to the recently published genome of Xanthomonas campestris pathovar campestris strain ATCC 33913. Most of the sequenced clones derived from strain B100 matched the chromosomal sequence of strain ATCC 33913. An alignment to the circular map of this chromosome revealed that the similarities were statistically distributed over the entire genome of strain ATCC 33913. The similarity was obvious for protein coding sequences, as well as for mobile genetic elements. However, four regions in the genome of Xanthomonas campestris pathovar campestris strain ATCC 33913, ranging in size from 11 to 37 kb, were not represented in the sequenced clone library of Xanthomonas campestris pathovar campestris strain B100. On the other hand, 1.2% of the sequenced clones originating from Xanthomonas campestris pathovar campestris strain B100 showed no or insignificant similarities to the genome of strain ATCC 33913.  相似文献   

10.
Transposable genetic elements are ubiquitous, yet their presence or absence at any given position within a genome can vary between individual cells, tissues, or strains. Transposable elements have profound impacts on host genomes by altering gene expression, assisting in genomic rearrangements, causing insertional mutations, and serving as sources of phenotypic variation. Characterizing a genome's full complement of transposons requires whole genome sequencing, precluding simple studies of the impact of transposition on interindividual variation. Here, we describe a global mapping approach for identifying transposon locations in any genome, using a combination of transposon-specific DNA extraction and microarray-based comparative hybridization analysis. We use this approach to map the repertoire of endogenous transposons in different laboratory strains of Saccharomyces cerevisiae and demonstrate that transposons are a source of extensive genomic variation. We also apply this method to mapping bacterial transposon insertion sites in a yeast genomic library. This unique whole genome view of transposon location will facilitate our exploration of transposon dynamics, as well as defining bases for individual differences and adaptive potential.  相似文献   

11.
A novel Tc1-like transposable element has been identified as a new DNA transposon in the mud loach, Misgurnus mizolepis. The M. mizolepis Tc1-like transposon (MMTS) is comprised of inverted terminal repeats and a single gene that codes Tc1-like transposase. The deduced amino acid sequence of the transposase-encoding region of MMTS transposon contains motifs including DDE motif, which was previously recognized in other Tc1-like transposons. However, putative MMTS transposase has only 34-37% identity with well-known Tc1, PPTN, and S elements at the amino acid level. In dot-hybridization analysis used to measure the copy numbers of the MMTS transposon in genomes of the mud loach, it was shown that the MMTS transposon is present at about 3.36 x 104 copies per 2 x 109 bp, and accounts for approximately 0.027% of the mud loach genome. Here, we also describe novel MMTS-like transposons from the genomes of carp-like fishes, flatfish species, and cichlid fishes, which bear conserved inverted repeats flanking an apparently intact transposase gene. Additionally, BLAST searches and phylogenetic analysis indicated that MMTS-like transposons evolved uniquely in fishes, and comprise a new subfamily of Tc1-like transposons, with only modest similarity to Drosophila melanogaster (foldback element FB4, HB2, HB1), Xenopus laevis, Xenopus tropicalis, and Anopheles gambiae (Frisky).  相似文献   

12.
The large-scale bacterial artificial chromosome-end sequencing project of Nile tilapia (Oreochromis niloticus) has generated extensive sequence data that allowed the examination of the repeat content in this fish genome and building of a repeat library specific for this species. This library was established based on Tilapiini repeat sequences from GenBank, sequences orthologous to the repeat library of zebrafish in Repbase, and novel repeats detected by genome analysis using MIRA assembler. We estimate that repeats constitute about 14% of the tilapia genome and also give estimates for the occurrence of the different repeats based on the Basic Local Alignment Search Tool searches within the database of known tilapia sequences. The frequent occurrence of novel repeats in the tilapia genome indicates the importance of using the species-specific repeat masker prior to sequence analyses. A web tool based on the RepeatMasker software was designed to assist tilapia genomics.  相似文献   

13.
A clustering method for repeat analysis in DNA sequences   总被引:1,自引:0,他引:1  
Volfovsky N  Haas BJ  Salzberg SL 《Genome biology》2001,2(8):research0027.1-research002711

Background

A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.

Results

The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences.

Conclusions

We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.  相似文献   

14.
X Zhao  Y Tian  R Yang  H Feng  Q Ouyang  Y Tian  Z Tan  M Li  Y Niu  J Jiang  G Shen  R Yu 《BMC genomics》2012,13(1):435
ABSTRACT: BACKGROUND: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. RESULTS: In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units. CONCLUSIONS: We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.  相似文献   

15.
Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project  相似文献   

16.
Fast-sequencing throughput methods have increased the number of completely sequenced bacterial genomes to about 400 by December 2006, with the number increasing rapidly. These include several strains. In silico methods of comparative genomics are of use in categorizing and phylogenetically sorting these bacteria. Various word-based tools have been used for quantifying the similarities and differences between entire genomes. The simple di-nucleotide frequency comparison, codon specificity and k-mer repeat detection are among some of the well-known methods. In this paper, we show that the Mutual Information function, which is a measure of correlations and a concept from Information Theory, is very effective in determining the similarities and differences among genome sequences of various strains of bacteria such as the plant pathogen Xylella fastidiosa, marine Cyanobacteria Prochlorococcus marinus or animal and human pathogens such as species of Ehrlichia and Legionella. The short-range three-base periodicity, small sequence repeats and long-range correlations taken together constitute a genome signature that can be used as a technique for identifying new bacterial strains with the help of strains already catalogued in the database. There have been several applications of using the Mutual Information function as a measure of correlations in genomics but this is the first whole genome analysis done to detect strain similarities and differences.  相似文献   

17.
A novel family of miniature inverted repeat transposable elements (MITEs) named Pony was discovered in the yellow fever mosquito, Aedes aegypti. It has all the characteristics of MITEs, including terminal inverted repeats, no coding potential, A+T richness, small size, and the potential to form stable secondary structures. Past mobility of PONY: was indicated by the identification of two Pony insertions which resulted in the duplication of the TA dinucleotide targets. Two highly divergent subfamilies, A and B, were identified in A. aegypti based on sequence comparison and phylogenetic analysis of 38 elements. These subfamilies showed less than 62% sequence similarity. However, within each subfamily, most elements were highly conserved, and multiple subgroups could be identified, indicating recent amplifications from different source genes. Different scenarios are presented to explain the evolutionary history of these subfamilies. Both subfamilies share conserved terminal inverted repeats similar to those of the Tc2 DNA transposons in Caenorhabditis elegans, indicating that Pony may have been borrowing the transposition machinery from a Tc2-like transposon in mosquitoes. In addition to the terminal inverted repeats, full-length and partial subterminal repeats of a sequence motif TTGATTCAWATTCCGRACA represent the majority of the conservation between the two subfamilies, indicating that they may be important structural and/or functional components of the Pony elements. In contrast to known autonomous DNA transposons, both subfamilies of PONY: are highly reiterated in the A. aegypti genome (8,400 and 9, 900 copies, respectively). Together, they constitute approximately 1. 1% of the entire genome. Pony elements were frequently found near other transposable elements or in the noncoding regions of genes. The relative abundance of MITEs varies in eukaryotic genomes, which may have in part contributed to the different organizations of the genomes and reflect different types of interactions between the hosts and these widespread transposable elements.  相似文献   

18.
19.
We studied the occurrence of mammalian interspersed repeats (MIRs) in DNA and RNA of vertebrates, invertebrates, and bacteria using the data from GenBank. A special algorithm based on a weight position matrix with optimal alignment using dynamic programming was developed to search for the traces of MIR dissemination. This allowed us to search for highly divergent MIRs carrying deletions and insertions. MIRs were detected in genomes of various fishes, includingLatimeria. This suggests that the origin of MIRs dates back more than 400 million years. The method to search for similarity between highly divergent sequences may be used to find the genome fragments from various ancient repeat families and from various gene families.  相似文献   

20.
The interspersed repeat content of mammalian genomes has been best characterized in human, mouse and cow. In this study, we carried out de novo identification of repeated elements in the equine genome and identified previously unknown elements present at low copy number. The equine genome contains typical eutherian mammal repeats, but also has a significant number of hybrid repeats in addition to clade-specific Long Interspersed Nuclear Elements (LINE). Equus caballus clade specific LINE 1 (L1) repeats can be classified into approximately five subfamilies, three of which have undergone significant expansion. There are 1115 full-length copies of these equine L1, but of the 103 presumptive active copies, 93 fall within a single subfamily, indicating a rapid recent expansion of this subfamily. We also analysed both interspersed and simple sequence repeats (SSR) genome-wide, finding that some repeat classes are spatially correlated with each other as well as with G+C content and gene density. Based on these spatial correlations, we have confirmed that recently-described ancestral vs. clade-specific genome territories can be defined by their repeat content. The clade-specific Short Interspersed Nuclear Element correlations were scattered over the genome and appear to have been extensively remodelled. In contrast, territories enriched for ancestral repeats tended to be contiguous domains. To determine if the latter territories were evolutionarily conserved, we compared these results with a similar analysis of the human genome, and observed similar ancestral repeat enriched domains. These results indicate that ancestral, evolutionarily conserved mammalian genome territories can be identified on the basis of repeat content alone. Interspersed repeats of different ages appear to be analogous to geologic strata, allowing identification of ancient vs. newly remodelled regions of mammalian genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号