首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

4.
The nucleotide sequence of Korean ginseng (Panax schinseng Nees) chloroplast genome has been completed (AY582139). The circular double-stranded DNA, which consists of 156,318 bp, contains a pair of inverted repeat regions (IRa and IRb) with 26,071 bp each, which are separated by small and large single copy regions of 86,106 bp and 18,070 bp, respectively. The inverted repeat region is further extended into a large single copy region which includes the 5' parts of the rpsl9 gene. Four short inversions associated with short palindromic sequences that form stem-loop structures were also observed in the chloroplast genome of P. schinseng compared to that of Nicotiana tabacum. The genome content and the relative positions of 114 genes (75 peptide-encoding genes, 30 tRNA genes, 4 rRNA genes, and 5 conserved open reading frames [ycfs]), however, are identical with the chloroplast DNA of N. tabacum. Sixteen genes contain one intron while two genes have two introns. Of these introns, only one (trnL-UAA) belongs to the self-splicing group I; all remaining introns have the characteristics of six domains belonging to group II. Eighteen simple sequence repeats have been identified from the chloroplast genome of Korean ginseng. Several of these SSR loci show infra-specific variations. A detailed comparison of 17 known completed chloroplast genomes from the vascular plants allowed the identification of evolutionary modes of coding segments and intron sequences, as well as the evaluation of the phylogenetic utilities of chloroplast genes. Furthermore, through the detailed comparisons of several chloroplast genomes, evolutionary hotspots predominated by the inversion end points, indel mutation events, and high frequencies of base substitutions were identified. Large-sized indels were often associated with direct repeats at the end of the sequences facilitating intra-molecular recombination.  相似文献   

5.
Rare genomic changes as a tool for phylogenetics   总被引:1,自引:0,他引:1  
DNA sequence data have offered valuable insights into the relationships between living organisms. However, most phylogenetic analyses of DNA sequences rely primarily on single nucleotide substitutions, which might not be perfect phylogenetic markers. Rare genomic changes (RGCs), such as intron indels, retroposon integrations, signature sequences, mitochondrial and chloroplast gene order changes, gene duplications and genetic code changes, provide a suite of complementary markers with enormous potential for molecular systematics. Recent exploitation of RGCs has already started to yield exciting phylogenetic information.  相似文献   

6.
Abstract— A method is described to assess directly the number of DNA sequence transformations, evolutionary events, required by a phylogenetic topology without the use of multiple sequence alignment. This is accomplished through a generalization of existing character optimization procedures to include insertion and deletion events (indels) in addition to base substitutions. The crux of the model is the treatment of indels as processes as opposed to the patterns implied by multiple sequence alignment. The results of this procedure are directly compatible with parsimony-based tree lengths. In addition to the simplicity of the method, it appears to generate more efficient (simpler) explanations of sequence variation than does multiple alignment.  相似文献   

7.
Microstructural changes such as insertions and deletions (=indels) are a major driving force in the evolution of non-coding DNA sequences. To better understand the mechanisms by which indel mutations arise, as well as the molecular evolution of non-coding regions, the number and pattern of indels and nucleotide substitutions were compared in the whole chloroplast genomes. Comparisons were made for a total of over 38 kb non-coding DNA sequences from 126 intergenic regions in two data sets representing species with different divergence times: sugarcane and maize and Oryza sativa var. indica and japonica. The main findings of this study are: (i) Approximately half of all indels are single nucleotide indels. This observation agrees with previous studies in various organisms. (ii) The distribution and number of indels was different between two data sets, and different patterns were observed for tandem repeat and non-repeat indels. (iii) Distribution pattern of tandem repeat indels showed statistically significant bias towards A/T-rich. (iv) The rate of indel mutation was estimated to be approximately 0.8 +/- 0.04 x 10(-9) per site per year, which was similar to previous estimates in other organisms. (v) The frequencies of nucleotide substitutions and indels were significantly lower in inverted repeat (IR).  相似文献   

8.
In Neurospora crassa, DNA sequence duplications are detected and altered efficiently during the sexual cycle by a process known as RIP (repeat-induced point mutation). Affected sequences are subjected to multiple GC-to-AT mutations. To explore the pattern in which base changes are laid down by RIP we examined two sets of strains. First, we examined the products of a presumptive spontaneous RIP event at the mtr locus. Results of sequencing suggested that a single RIP event produces two distinct patterns of change, descended from the two strands of an affected DNA duplex. Equivalent results were obtained using an exceptional tetrad from a cross with a known duplication flanking the zeta-eta (zeta-eta) locus. The mtr sequence data were also used to further examine the basis for the differential severity of C-to-T mutations on the coding and noncoding strands in genes. The known bias of RIP toward CpA/TpG sites in conjunction with the sequence bias of Neurospora accounts for the differential effect. Finally, we used a collection of tandem repeats (from 16 to 935 bp in length) within the mtr gene to examine the length requirement for RIP. No evidence of RIP was found with duplications shorter than 400 bp while all longer tandem duplications were frequently affected. A comparison of these results with vegetative reversion data for the same duplications is consistent with the idea that reversion of long tandem duplications and RIP share a common step.  相似文献   

9.
The macroevolutionary transition of whales (cetaceans) from a terrestrial quadruped to an obligate aquatic form involved major changes in sensory abilities. Compared to terrestrial mammals, the olfactory system of baleen whales is dramatically reduced, and in toothed whales is completely absent. We sampled the olfactory receptor (OR) subgenomes of eight cetacean species from four families. A multigene tree of 115 newly characterized OR sequences from these eight species and published data for Bos taurus revealed a diverse array of class II OR paralogues in Cetacea. Evolution of the OR gene superfamily in toothed whales (Odontoceti) featured a multitude of independent pseudogenization events, supporting anatomical evidence that odontocetes have lost their olfactory sense. We explored the phylogenetic utility of OR pseudogenes in Cetacea, concentrating on delphinids (oceanic dolphins), the product of a rapid evolutionary radiation that has been difficult to resolve in previous studies of mitochondrial DNA sequences. Phylogenetic analyses of OR pseudogenes using both gene-tree reconciliation and supermatrix methods yielded fully resolved, consistently supported relationships among members of four delphinid subfamilies. Alternative minimizations of gene duplications, gene duplications plus gene losses, deep coalescence events, and nucleotide substitutions plus indels returned highly congruent phylogenetic hypotheses. Novel DNA sequence data for six single-copy nuclear loci and three mitochondrial genes (> 5000 aligned nucleotides) provided an independent test of the OR trees. Nucleotide substitutions and indels in OR pseudogenes showed a very low degree of homoplasy in comparison to mitochondrial DNA and, on average, provided more variation than single-copy nuclear DNA. Our results suggest that phylogenetic analysis of the large OR superfamily will be effective for resolving relationships within Cetacea whether supermatrix or gene-tree reconciliation procedures are used.  相似文献   

10.
We developed a system to examine forward mutations that occurred in the rpsL gene of Escherichia coli placed on a multicopy plasmid. Using this system we determined the mutational specificity for a dnaE173 mutator strain in which the editing function of DNA polymerase III is impeded. The frequency of rpsL- mutations increased 32,000-fold, due to the dnaE173 mutator, and 87 independent rpsL- mutations in the mutator strain were analyzed by DNA sequencing, together with 100 mutants recovered from dnaE+ strain, as the control. While half the number of mutations that occurred in the wild-type strain were caused by insertion elements, no such mutations were recovered from the mutator strain. A novel class of mutation, named "sequence substitution" was present in mutants raised in the dnaE173 strain; seven sequence substitutions induced in the mutator strain occurred at six sites, and all were located in quasipalindromic sequences, carrying the GTG or CAC sequence at one or both endpoints. While other types of mutation were found in both strains, single-base frameshifts were the most frequent events in the mutator strain. Thus, the mutator effect on this class of mutation was 175,000-fold. A total of 95% of the single-base frameshifts in the mutator strain were additions, most of which occurred at runs of A or C bases so as to increase the number of identical residues. Base substitutions, the frequency of which was enhanced 25,000-fold by the mutator effect, occurred primarily at several hotspots in the mutator strain, whereas those induced in the wild-type strain were more randomly distributed throughout the rpsL sequence. The dnaE173 mutator also increased the frequency of duplications 28,000-fold. Of the three duplications recovered from the mutator strain, one was a simple duplication, the region of which was flanked by direct repeats. The other duplications were complex, one half part of which was in the inverted orientation of a region containing two sets of inverted repeats. The same duplications were also recovered from the wild-type strain. The present data suggest that dnaE173 is a novel class of mutator that sharply induces sequence-directed mutagenesis, yielding high frequencies of single base frameshifts, duplications with inversions, sequence substitutions and base substitutions at hotspots.  相似文献   

11.
Nucleotide substitutions, insertions, and deletions constitute the principal molecular mechanisms generating genetic variation on small length scales. In contrast to substitutions, the nature of short DNA insertions and deletions (indels) is far less understood. With the recent availability of whole-genome multiple alignments between human and other primates, detailed investigations on indel characteristics and origin have come within reach. Here, we show that the majority of short (1-100 bp) DNA insertions in the human lineage are tandem duplications of directly adjacent sequence segments with conserved polarity. Indels in microsatellites comprise only a small fraction. The underlying molecular processes generating indels do not necessarily rely on the presence of preexisting duplicates, as would be expected for unequal crossing over, as well as replication slippage. Instead, our findings point toward a mechanism that preferentially occurs in the male germline and is not recombination-mediated. Surprisingly, nonframeshifting tandem duplications and deletions in coding regions still occur at approximately 50% of their genomic background rates. As is already well established in the context of gene and segmental duplications, our results demonstrate that duplications are also likely to constitute the predominant process for rapid generation of new genetic material and function on smaller scales.  相似文献   

12.
Repseek, a tool to retrieve approximate repeats from large DNA sequences   总被引:2,自引:0,他引:2  
Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels. AVAILABILITY: http://wwwabi.snv.jussieu.fr/public/RepSeek/  相似文献   

13.
Proof of authenticity is the greatest challenge in palaeogenetic research, and many safeguards have become standard routine in laboratories specialized on ancient DNA research. Here we describe an as-yet unknown source of artifacts that will require special attention in the future. We show that ancient DNA extracts on their own can have an inhibitory and mutagenic effect under PCR. We have spiked PCR reactions including known human test DNA with 14 selected ancient DNA extracts from human and nonhuman sources. We find that the ancient DNA extracts inhibit the amplification of large fragments to different degrees, suggesting that the usual control against contaminations, i.e., the absence of long amplifiable fragments, is not sufficient. But even more important, we find that the extracts induce mutations in a nonrandom fashion. We have amplified a 148-bp stretch of the mitochondrial HVRI from contemporary human template DNA in spiked PCR reactions. Subsequent analysis of 547 sequences from cloned amplicons revealed that the vast majority (76.97%) differed from the correct sequence by single nucleotide substitutions and/or indels. In total, 34 positions of a 103-bp alignment are affected, and most mutations occur repeatedly in independent PCR amplifications. Several of the induced mutations occur at positions that have previously been detected in studies of ancient hominid sequences, including the Neandertal sequences. Our data imply that PCR-induced mutations are likely to be an intrinsic and general problem of PCR amplifications of ancient templates. Therefore, ancient DNA sequences should be considered with caution, at least as long as the molecular basis for the extract-induced mutations is not understood.  相似文献   

14.
In the class of repeated sequences that occur in DNA, minisatellites have been found polymorphic and became useful tools in genetic mapping and forensic studies. They consist of a heterogeneous tandem array of a short repeat unit. The slightly different units along the array are called variants. Minisatellites evolve mainly through tandem duplications and tandem deletions of variants. Jeffreys et al. (1997) devised a method to obtain the sequence of variants along the array in a digital code and called such sequences maps. Minisatellite maps give access to the detail of mutation processes at work on such loci. In this paper, we design an algorithm to compare two maps under an evolutionary model that includes deletion, insertion, mutation, tandem duplication, and tandem deletion of a variant. Our method computes an optimal alignment in reasonable time; and the alignment score, i.e., the weighted sum of its elementary operations, is a distance metric between maps. The main difficulty is that the optimal sequence of operations depends on the order in which they are applied to the map. Taking the maps of the minisatellite MSY1 of 609 men, we computed all pairwise distances and reconstructed an evolutionary tree of these individuals. MSY1 (DYF155S1) is a hypervariable locus on the Y chromosome. In our tree, the populations of some haplogroups are monophyletic, showing that one can decipher a microevolutionary signal using minisatellite maps comparison.  相似文献   

15.
The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.  相似文献   

16.
T. Q. Trinh  R. R. Sinden 《Genetics》1993,134(2):409-422
We describe a system to measure the frequency of both deletions and duplications between direct repeats. Short 17- and 18-bp palindromic and nonpalindromic DNA sequences were cloned into the EcoRI site within the chloramphenicol acetyltransferase gene of plasmids pBR325 and pJT7. This creates an insert between direct repeated EcoRI sites and results in a chloramphenicol-sensitive phenotype. Selection for chloramphenicol resistance was utilized to select chloramphenicol resistant revertants that included those with precise deletion of the insert from plasmid pBR325 and duplication of the insert in plasmid pJT7. The frequency of deletion or duplication varied more than 500-fold depending on the sequence of the short sequence inserted into the EcoRI site. For the nonpalindromic inserts, multiple internal direct repeats and the length of the direct repeats appear to influence the frequency of deletion. Certain palindromic DNA sequences with the potential to form DNA hairpin structures that might stabilize the misalignment of direct repeats had a high frequency of deletion. Other DNA sequences with the potential to form structures that might destabilize misalignment of direct repeats had a very low frequency of deletion. Duplication mutations occurred at the highest frequency when the DNA between the direct repeats contained no direct or inverted repeats. The presence of inverted repeats dramatically reduced the frequency of duplications. The results support the slippage-misalignment model, suggesting that misalignment occurring during DNA replication leads to deletion and duplication mutations. The results also support the idea that the formation of DNA secondary structures during DNA replication can facilitate and direct specific mutagenic events.  相似文献   

17.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

18.
《Biophysical journal》2021,120(20):4325-4336
Repeat-induced point mutation is a genetic process that creates cytosine-to-thymine (C-to-T) transitions in duplicated genomic sequences in fungi. Repeat-induced point mutation detects duplications (irrespective of their origin, specific sequence, coding capacity, and genomic positions) by a recombination-independent mechanism that likely matches intact DNA double helices directly, without relying on the annealing of complementary single strands. In the fungus Neurospora crassa, closely positioned repeats can induce mutation of the adjoining nonrepetitive regions. This process is related to heterochromatin assembly and requires the cytosine methyltransferase DIM-2. Using DIM-2-dependent mutation as a readout of homologous pairing, we find that GC-rich repeats produce a much stronger response than AT-rich repeats, independently of their intrinsic propensity to become mutated. We also report that direct repeats trigger much stronger DIM-2-dependent mutation than inverted repeats. These results can be rationalized in the light of a recently proposed model of homologous DNA pairing, in which DNA double helices associate by forming sequence-specific quadruplex-based contacts with a concomitant release of supercoiling. A similar process featuring pairing-induced supercoiling may initiate epigenetic silencing of repetitive DNA in other organisms, including humans.  相似文献   

19.
The diploid genome sequence of an individual human   总被引:4,自引:1,他引:3  
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.  相似文献   

20.
Summary Hybridization experiments indicated that the maize genome contains a family of sequences closely related to the Ds1 element originally characterized from theAdh1-Fm335 allele of maize. Examples of these Ds1-related segments were cloned and sequenced. They also had the structural properties of mobile genetic elements, i.e., similar length and internal sequence homology with Ds1, 10- or 11-bp terminal inverted repeats, and characteristic duplications of flanking genomic DNA. All sequences with 11-bp terminal inverted repeats were flanked by 8-bp duplications, but the duplication flanking one sequence with 10-bp inverted repeats was only 6 bp. Similar Ds1-related sequences were cloned fromTripsacum dactyloides. They showed no more divergence from the maize sequences than the individual maize sequences showed when compared with each other. No consensus sequence was evident for the sites at which these sequences had inserted in genomic DNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号