首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.  相似文献   

2.
Little is known about variation of nucleotide insertion/deletions (indels) within species. In Arabidopsis thaliana, we investigated indel polymorphism patterns between two genome sequences and among 96 accessions at 1215 loci. Our study identified patterns in the variation of indel density, size, GC content and distribution, and a correlation between indels and substitutions. We found that the GC content in indel sequences was lower than that in non-indel sequences and that indels typically occur in regions with lower GC content. Patterns of indel frequency distribution among populations were more consistent with neutral expectation than substitution patterns. We also found that the local level of substitutions is positively correlated with indel density and negatively correlated with their distance to the closed indel, suggesting that indels play an important role in nucleotide variation.  相似文献   

3.
4.
5.
Sequence variation in 2.2 kb of non-coding regions of the chloroplast genome of eight dandelions (Taraxacum: Lactuceae) from Asia and Europe is interpreted in the light of the phylogenetic signal of base substitutions vs. indels (insertions-deletions). The four non-coding regions displayed a total of approximately 30 structural mutations of which 9 are potentially phylogenetically informative. Insertions, deletions, and an inversion were found that involved consecutive stretches of up to 172 bases. When compared to phylogenetic relationships of the chloroplast genomes based on nucleotide substitutions only, many homoplasious indels (33%) were detected that differed considerably in length and did not comprise simple sequence repeats typically associated with replication slippage. Though many indels in the intergenic spacers were associated with direct repeats, frequently, the variable stretches participated in inverted repeat stabilized hairpins. In each intergenic spacer or intron examined, nucleotide stretches ranging from 30 to 60 bp were able to fold into stabilized secondary structures. When these indels were homoplasious, they always ranked among the most stabilized hairpins in the non-coding regions. The association of higher order structures that involve both classes of repeats and parallel structural mutations in hot spot regions of the chloroplast genome can be used to differentiate among mutations that differ in phylogenetic reliability.  相似文献   

6.
Insertions and deletions (indels) in chloroplast noncoding regions are common genetic markers to estimate population structure and gene flow, although relatively little is known about indel evolution among recently diverged lineages such as within plant families. Because indel events tend to occur nonrandomly along DNA sequences, recurrent mutations may generate homoplasy for indel haplotypes. This is a potential problem for population studies, because indel haplotypes may be shared among populations after recurrent mutation as well as gene flow. Furthermore, indel haplotypes may differ in fitness and therefore be subject to natural selection detectable as rate heterogeneity among lineages. Such selection could contribute to the spatial patterning of cpDNA haplotypes, greatly complicating the interpretation of cpDNA population structure. This study examined both nucleotide and indel cpDNA variation and divergence at six noncoding regions (psbB-psbH, atpB-rbcL, trnL-trnH, rpl20-5'rps12, trnS-trnG, and trnH-psbA) in 16 individuals from eight species in the Lecythidaceae and a Sapotaceae outgroup. We described patterns of cpDNA changes, assessed the level of indel homoplasy, and tested for rate heterogeneity among lineages and regions. Although regression analysis of branch lengths suggested some degree of indel homoplasy among the most divergent lineages, there was little evidence for indel homoplasy within the Lecythidaceae. Likelihood ratio tests applied to the entire phylogenetic tree revealed a consistent pattern rejecting a molecular clock. Tajima's 1D and 2D tests revealed two taxa with consistent rate heterogeneity, one showing relatively more and one relatively fewer changes than other taxa. In general, nucleotide changes showed more evidence of rate heterogeneity than did indel changes. The rate of evolution was highly variable among the six cpDNA regions examined, with the trnS-trnG and trnH-psbA regions showing as much as 10% and 15% divergence within the Lecythidaceae. Deviations from rate homogeneity in the two taxa were constant across cpDNA regions, consistent with lineage-specific rates of evolution rather than cpDNA region-specific natural selection. There is no evidence that indels are more likely than nucleotide changes to experience homoplasy within the Lecythidaceae. These results support a neutral interpretation of cpDNA indel and nucleotide variation in population studies within species such as Corythophora alta.  相似文献   

7.
Recombination between homologous loci is accompanied by formation of heteroduplexes. Repairing mismatches in heteroduplexes often leads to single nucleotide substitutions in a process known as gene conversion. Gene conversion was shown to be GC‐biased in different organisms; that is, a W(A or T)→S(G or C) substitution is more likely in this process than a S→W substitution. Here, we show that the insertion/deletion ratio for short noncoding indels that reach fixation between species is positively correlated with the recombination rate in Drosophila melanogaster, Homo sapiens, and Saccharomyces cerevisiae. This correlation is both due to an increase of the fixation rate of insertions and decrease of the fixation rate of deletions in regions of high recombination. Whole‐genome data on indel polymorphism and divergence in D. melanogaster rule out mutation biases and selection as the cause of this trend, pointing to insertion‐biased gene conversion as the most likely explanation. The bias toward insertions is the strongest for single‐nucleotide indels, and decreases with indel length. In regions of high recombination rate this bias leads to an up to ~5‐fold excess of fixed short insertions over deletions, and substantially affects the evolution of DNA segments.  相似文献   

8.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

9.
An insertion/deletion polymorphism (Ind2) in the Brassica nigra CONSTANS LIKE 1 (Bni COL1) gene was previously found to be associated with variation in flowering time. In the present study we examine the inter-specific divergence of COL1 in the family Brassicaceae. Analysis of codon substitution models did not reveal evidence of positive Darwinian selection, but comparisons of the COL1 gene in different species revealed a surprising number of indels. A total of 24 indels were found in the 650 bp of the middle variable region of the gene. This high number of indels could reflect a lack of constraint on length of this region of the protein, or the effect of positive selection. The number of indels was close to that expected in non-coding DNA, but the indels were longer in COL1 than those observed in non-coding regions. Reconstruction of indel evolution indicated that most indels resulted from deletions rather than insertions. The Ind2 indel that has shown association with flowering time in Brassica nigra exhibited a remarkable distribution in the Brassicaceae family, indicating that the polymorphism may have persisted more than ten million years. Considering presumed historic populations sizes of Brassicaceae species, such a long persistence time seems unlikely for a neutral polymorphism.  相似文献   

10.
Traditional sequence comparison by alignment employs a mutation model comprised of two events, substitutions and indels (insertions or deletions) of single positions. However, modern genetic analysis knows a variety of more complex mutation events (e.g., duplications, excisions, and rearrangements), especially regarding DNA. With ever more DNA sequence data becoming available, the need to accurately compare sequences which have clearly undergone more complicated types of mutational processes is becoming critical. Herein we introduce a new method for pairwise alignment and comparison of sequences with respect to the special evolution of tandem repeats: substitutions and indels of single positions and, additionally, duplications and excisions of variable degree (i.e., of one or more repeat copies simultaneously) are taken into account. To evaluate our method, we apply it to the spa VNTR (variable number of tandem repeats) cluster of Staphylococcus aureus, a bacterium of high medical importance  相似文献   

11.
The principal sources of genetic variation that can be assayed with restriction enzymes are base substitutions and insertions/deletions (indels). The likelihood of detecting indels as restriction fragment length polymorphisms (RFLPs) is determined by the size and frequency of the indels, and the ability to resolve small indels as RFLPs is limited by the distribution of restriction fragment sizes. In this study, we use aligned sequences from the indica and japonica subspecies of rice ( Oryza sativa L.) to quantify and compare the ability of restriction enzymes to detect indels. We look specifically at two abundant transposable element-derived indel sources: miniature inverted repeat transposable elements (MITEs) and long terminal repeat (LTR) retroelements. From this analysis we conclude that indels rather than base substitutions are the prevailing source of the polymorphism detected in rice. We show that, although MITE derived indels are more abundant than LTR-retroelement derived indels, LTR-retroelements have a greater capacity to generate visible restriction fragment length polymorphism because of their larger size. We find that the variation in the detectability of indels among restriction enzymes can be explained by differences in the frequency and dispersion of their restriction sites in the genome. The parameters that describe the fragment size distributions obtained with the restriction enzymes are highly correlated across the sequenced genomes of rice, Arabidopsis and human, with the exception of some extreme deviations in frequency for particular recognition sequences corresponding to variations in the levels and modes of DNA methylation in the three disparate organisms. Thus, we can predict the relative ability of a restriction enzyme to detect indels derived from a specific source based on the distribution of restriction fragment sizes, even when this is estimated for a distantly related genome.Electronic Supplementary Material Supplementary Material is available in the online version of this article at Communicated by M.-A. Grandbastien  相似文献   

12.
Yang H  Wu Y  Feng J  Yang S  Tian D 《Genomics》2009,93(1):90-97
Mutations, which can alter amino acid constitution, contribute greatly to protein evolution. However, little is reported of their pattern during protein structural evolution. We investigated the distribution of non-synonymous single nucleotide polymorphisms (nsSNPs) and insertions/deletions (indels) along mammal and fruit fly proteins. We found the nsSNPs (and d(N)) and indels increased in protein boundary regions, and this pattern is inversely correlated with the distribution of protein domain density. Additionally, synonymous substitutions (and d(S)) are reduced in 5' and 3' regions, indicating more variable protein boundaries, compared with central interior. All evidence suggests that the inner part of coding sequences (CDSs) is comparatively conserved, whereas the 5' and 3' regions, with higher evolution rates, are more variable. We assumed that due to greater frequencies of nsSNPs and indels in adaptive regions of CDSs it could be easier to ultimately alter, gain, or lose amino acids, thus becoming the front line of protein evolution.  相似文献   

13.
We estimate DNA sequence error rates in Genbank records containing protein-coding and non-coding DNA sequences by comparing sequences of the inbred mouse strain C57BL/6J, sequenced as part of the mouse genome project and independently by other laboratories. C57BL/6J was produced by more than 100 generations of brother-sister mating, and can be assumed to be virtually free of residual polymorphism and mutational variation, so differences between independent sequences can be attributed to error. The estimated single nucleotide error rate for coding DNA is 0.10% (SE 0.012%), which is substantially lower than previous estimates for error rates in Genbank accessions. The estimated single nucleotide error rate for intronic DNA sequences (0.22%; SE 0.051%) is significantly higher than the rate for coding DNA. Since error rates for the mouse genome sequence are very low, the vast majority of the errors we detected are likely to be in individual Genbank accessions. The frequency of insertion-deletion (indel) errors in non-coding DNA approaches that of single nucleotide errors in non-coding DNA, whereas indel errors are uncommon in coding sequences.  相似文献   

14.
Mitochondrial DNA (mtDNA) major non-coding regions were amplified from 73 dogs of eight Japanese native dog breeds and from 21 dogs of 16 non-Japanese dog breeds by the polymerase chain reaction and their DNA sequences were determined. A total of 51 nucleotide positions within the non-coding region (969–972 base pairs) showed nucleotide variations of which 48 were caused by transition. These nucleotide substitutions were abundant in the region proximate to tRNAPro. In addition to the nucleotide substitutions, the dog mtDNA D-loop sequences had a heteroplasmic repetitive sequence (TACACGTÀCG) involving size variation. The DNA sequences of the non-coding region were classified into four different groups by phylogenetic analysis and the deepest branchpoints of this dog phylogeny was calculated to about 100 000 years before the present. Phylogenetic analysis showed that Japanese native dog breeds could not be clearly delimited as distinct breeds. Many haplotypes found in members of some clustering groups were seen in each dog breed, and interbreed nucleotide differences between Japanese dog breeds were almost the same as the intrabreed nucleotide diversities.  相似文献   

15.
16.
We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.  相似文献   

17.
动物mtDNA控制区及保守与异质   总被引:6,自引:1,他引:5  
苏瑛 《四川动物》2005,24(4):669-672
本文通过文献综述,对动物线粒体DNA控制区进行了阐述.从线粒体控制区(control region)基因组的研究出发,重点介绍了动物线粒体控制区基因组结构特点.主要结论:由于碱基替换、插入和缺失以及重复序列数目的变异致使D-loop成为mtDNA中变异最多的区域,但突变和结构重排并不是发生在整个D-loop区域,而是在高变区;大多研究集中在mtDNA D-loop保守区和异质方面:对D-loop序列分析,能较好地阐明动物的起源,在动物亲缘关系鉴定、系统进化和物种形成方式的研究等领域具有广阔的研究和应用前景.  相似文献   

18.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号