首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional sequence comparison by alignment employs a mutation model comprised of two events, substitutions and indels (insertions or deletions) of single positions. However, modern genetic analysis knows a variety of more complex mutation events (e.g., duplications, excisions, and rearrangements), especially regarding DNA. With ever more DNA sequence data becoming available, the need to accurately compare sequences which have clearly undergone more complicated types of mutational processes is becoming critical. Herein we introduce a new method for pairwise alignment and comparison of sequences with respect to the special evolution of tandem repeats: substitutions and indels of single positions and, additionally, duplications and excisions of variable degree (i.e., of one or more repeat copies simultaneously) are taken into account. To evaluate our method, we apply it to the spa VNTR (variable number of tandem repeats) cluster of Staphylococcus aureus, a bacterium of high medical importance  相似文献   

2.

Background  

Insertions and deletions of DNA segments (indels) are together with substitutions the major mutational processes that generate genetic variation. Here we focus on recent DNA insertions and deletions in protein coding regions of the human genome to investigate selective constraints on indels in protein evolution.  相似文献   

3.
Microstructural changes such as insertions and deletions (=indels) are a major driving force in the evolution of non-coding DNA sequences. To better understand the mechanisms by which indel mutations arise, as well as the molecular evolution of non-coding regions, the number and pattern of indels and nucleotide substitutions were compared in the whole chloroplast genomes. Comparisons were made for a total of over 38 kb non-coding DNA sequences from 126 intergenic regions in two data sets representing species with different divergence times: sugarcane and maize and Oryza sativa var. indica and japonica. The main findings of this study are: (i) Approximately half of all indels are single nucleotide indels. This observation agrees with previous studies in various organisms. (ii) The distribution and number of indels was different between two data sets, and different patterns were observed for tandem repeat and non-repeat indels. (iii) Distribution pattern of tandem repeat indels showed statistically significant bias towards A/T-rich. (iv) The rate of indel mutation was estimated to be approximately 0.8 +/- 0.04 x 10(-9) per site per year, which was similar to previous estimates in other organisms. (v) The frequencies of nucleotide substitutions and indels were significantly lower in inverted repeat (IR).  相似文献   

4.
动物mtDNA控制区及保守与异质   总被引:6,自引:1,他引:5  
苏瑛 《四川动物》2005,24(4):669-672
本文通过文献综述,对动物线粒体DNA控制区进行了阐述.从线粒体控制区(control region)基因组的研究出发,重点介绍了动物线粒体控制区基因组结构特点.主要结论:由于碱基替换、插入和缺失以及重复序列数目的变异致使D-loop成为mtDNA中变异最多的区域,但突变和结构重排并不是发生在整个D-loop区域,而是在高变区;大多研究集中在mtDNA D-loop保守区和异质方面:对D-loop序列分析,能较好地阐明动物的起源,在动物亲缘关系鉴定、系统进化和物种形成方式的研究等领域具有广阔的研究和应用前景.  相似文献   

5.
A 5500 base-pair fragment including the beta-globin gene downstream from codon 122 and about 4000 base-pairs of its 5' flanking sequence was cloned from chimpanzee DNA and thoroughly sequenced before being compared with the corresponding human sequence: 88 point differences (83 substitutions and 5 deletions or insertions of 1 base-pair) were detected as well as seven more important deletion/insertion events. These changes occur preferentially in two kinds of structure. First, 40% of the CpG dinucleotides present in either human or chimpanzee sequences are affected by nucleotide variations. This corresponds to a divergence level considerably higher than that expected. Second, most short repeated sequences found in the 5' extragenic sequence are involved in mutational events (amplification or contraction of the number of basic motifs as well as point substitutions or deletions/insertions of 1 base-pair). Considering the very low level of nucleotide sequence divergence between these two closely related species, our data provide direct evidence for CpG and tandem array instability.  相似文献   

6.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

7.
Large indels greatly impact the observable phenotypes in different organisms including plants and human. Hence, extracting large indels with high precision and sensitivity is important. Here, we developed IndelEnsembler to detect large indels in 1047 Arabidopsis whole-genome sequencing data. IndelEnsembler identified 34 093 deletions, 12 913 tandem duplications and 9773 insertions. Our large indel dataset was more comprehensive and accurate compared with the previous dataset of AthCNV (1). We captured nearly twice of the ground truth deletions and on average 27% more ground truth duplications compared with AthCNV, though our dataset has less number of large indels compared with AthCNV. Our large indels were positively correlated with transposon elements across the Arabidopsis genome. The non-homologous recombination events were the major formation mechanism of deletions in Arabidopsis genome. The Neighbor joining (NJ) tree constructed based on IndelEnsembler''s deletions clearly divided the geographic subgroups of 1047 Arabidopsis. More importantly, our large indels represent a previously unassessed source of genetic variation. Approximately 49% of the deletions have low linkage disequilibrium (LD) with surrounding single nucleotide polymorphisms. Some of them could affect trait performance. For instance, using deletion-based genome-wide association study (DEL-GWAS), the accessions containing a 182-bp deletion in AT1G11520 had delayed flowering time and all accessions in north Sweden had the 182-bp deletion. We also found the accessions with 65-bp deletion in the first exon of AT4G00650 (FRI) flowered earlier than those without it. These two deletions cannot be detected in AthCNV and, interestingly, they do not co-occur in any Arabidopsis thaliana accession. By SNP-GWAS, surrounding SNPs of these two deletions do not correlate with flowering time. This example demonstrated that existing large indel datasets miss phenotypic variations and our large indel dataset filled in the gap.  相似文献   

8.
9.
10.
Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.  相似文献   

11.
Small tandem DNA duplications in the range of 15 to 300 base-pairs play an important role in the aetiology of human disease and contribute to genome diversity. Here, we discuss different proposed mechanisms for their occurrence and argue that this type of structural variation mainly results from mutagenic repair of chromosomal breaks. This hypothesis is supported by both bioinformatical analysis of insertions occurring in the genome of different species and disease alleles, as well as by CRISPR/Cas9-based experimental data from different model systems. Recent work points to fill-in synthesis at double-stranded DNA breaks with complementary sequences, regulated by end-joining mechanisms, to account for small tandem duplications. We will review the prevalence of small tandem duplications in the population, and we will speculate on the potential sources of DNA damage that could give rise to this mutational signature. With the development of novel algorithms to analyse sequencing data, small tandem duplications are now more frequently detected in the human genome and identified as oncogenic gain-of-function mutations. Understanding their origin could lead to optimized treatment regimens to prevent therapy-induced activation of oncogenes and might expose novel vulnerabilities in cancer.  相似文献   

12.
13.
14.
Singh ND  Arndt PF  Petrov DA 《Genetics》2005,169(2):709-722
Mutation is the underlying force that provides the variation upon which evolutionary forces can act. It is important to understand how mutation rates vary within genomes and how the probabilities of fixation of new mutations vary as well. If substitutional processes across the genome are heterogeneous, then examining patterns of coding sequence evolution without taking these underlying variations into account may be misleading. Here we present the first rigorous test of substitution rate heterogeneity in the Drosophila melanogaster genome using almost 1500 nonfunctional fragments of the transposable element DNAREP1_DM. Not only do our analyses suggest that substitutional patterns in heterochromatic and euchromatic sequences are different, but also they provide support in favor of a recombination-associated substitutional bias toward G and C in this species. The magnitude of this bias is entirely sufficient to explain recombination-associated patterns of codon usage on the autosomes of the D. melanogaster genome. We also document a bias toward lower GC content in the pattern of small insertions and deletions (indels). In addition, the GC content of noncoding DNA in Drosophila is higher than would be predicted on the basis of the pattern of nucleotide substitutions and small indels. However, we argue that the fast turnover of noncoding sequences in Drosophila makes it difficult to assess the importance of the GC biases in nucleotide substitutions and small indels in shaping the base composition of noncoding sequences.  相似文献   

15.
Centromeric region of human chromosome 21 comprises two long alphoid DNA arrays: the well homogenized and CENP-B box-rich alpha21-I and the alpha21-II, containing a set of less homogenized and CENP-B box-poor subfamilies located closer to the short arm of the chromosome. Continuous alphoid fragment of 100 monomers bordering the non-satellite sequences in human chromosome 21 was mapped to the pericentromeric short arm region by fluorescence in situ hybridization (alpha21-II locus). The alphoid sequence contained several rearrangements including five large deletions within monomers and insertions of three truncated L1 elements. No binding sites for centromeric protein CENP-B were found. We analyzed sequences with alphoid/non-alphoid junctions selectively screened from current databases and revealed various rearrangements disrupting the regular tandem alphoid structure, namely, deletions, duplications, inversions, expansions of short oligonucleotide motifs and insertions of different dispersed elements. The detailed analysis of more than 1100 alphoid monomers from junction regions showed that the vast majority of structural alterations and joinings with non-alphoid DNAs occur in alpha satellite families lacking CENP-B boxes. Most analyzed events were found in sequences located toward the edges of the centromeric alphoid arrays. Different dispersed elements were inserted into alphoid DNA at kinkable dinucleotides (TG, CA or TA) situated between pyrimidine/purine tracks. DNA rearrangements resulting from different processes such as recombination and replication occur at kinkable DNA sites alike insertions but irrespectively of the occurrence of pyrimidine/purine tracks. It seems that kinkable dinucleotides TG, CA and TA are part of recognition signals for many proteins involved in recombination, replication, and insertional events. Alphoid DNA is a good model for studying these processes.  相似文献   

16.
17.
Insertions, substitutions, and the origin of microsatellites   总被引:7,自引:0,他引:7  
This paper uses data from the Human Gene Mutation Database to contrast two hypotheses for the origin of short DNA repeats: substitutions and insertions that duplicate adjacent sequences. Because substitutions are much more common than insertions, they are the dominant source of new 2-repeat loci. Insertions are rarer, but over 70% of the 2-4 base insertion mutations are duplications of adjacent sequences, and over half of these generate new repeat regions. Insertions contribute fewer new repeat loci than substitutions, but their relative importance increases rapidly with repeat number so that all new 4-5-repeat mutations come from insertions, as do all 3-repeat mutations of tetranucleotide repeats. This suggests that the process of repeat duplication that dominates microsatellite evolution at high repeat numbers is also important very early in microsatellite evolution. This result sheds light on the puzzle of the origin of short tandem repeats. It also suggests that most short insertion mutations derive from a slippage-like process during replication.  相似文献   

18.
Recombination between homologous loci is accompanied by formation of heteroduplexes. Repairing mismatches in heteroduplexes often leads to single nucleotide substitutions in a process known as gene conversion. Gene conversion was shown to be GC‐biased in different organisms; that is, a W(A or T)→S(G or C) substitution is more likely in this process than a S→W substitution. Here, we show that the insertion/deletion ratio for short noncoding indels that reach fixation between species is positively correlated with the recombination rate in Drosophila melanogaster, Homo sapiens, and Saccharomyces cerevisiae. This correlation is both due to an increase of the fixation rate of insertions and decrease of the fixation rate of deletions in regions of high recombination. Whole‐genome data on indel polymorphism and divergence in D. melanogaster rule out mutation biases and selection as the cause of this trend, pointing to insertion‐biased gene conversion as the most likely explanation. The bias toward insertions is the strongest for single‐nucleotide indels, and decreases with indel length. In regions of high recombination rate this bias leads to an up to ~5‐fold excess of fixed short insertions over deletions, and substantially affects the evolution of DNA segments.  相似文献   

19.
Rare genomic changes as a tool for phylogenetics   总被引:1,自引:0,他引:1  
DNA sequence data have offered valuable insights into the relationships between living organisms. However, most phylogenetic analyses of DNA sequences rely primarily on single nucleotide substitutions, which might not be perfect phylogenetic markers. Rare genomic changes (RGCs), such as intron indels, retroposon integrations, signature sequences, mitochondrial and chloroplast gene order changes, gene duplications and genetic code changes, provide a suite of complementary markers with enormous potential for molecular systematics. Recent exploitation of RGCs has already started to yield exciting phylogenetic information.  相似文献   

20.
Ancestral allele information is useful for genetics studies. Previously, the identification of ancestral alleles was primarily based on sequence alignments between species. Alternative ways to identify ancestral alleles were proposed in this study based on population sequencing data. The methods described here utilized the diversity between haplotypes harboring ancestral and newly emerged alleles. Simulations showed that these methods were reliable for identifying ancestral alleles when the variants had not aged too greatly. Application to the human genome sequencing data suggested the role of indels in maintaining the GC content in the human genome. The deletion-to-insertion ratios and GC proportions were correlated depending on the sizes of insertions and deletions in the direction of increasing GC content. There were GC-biased fixations in single base-pair insertions and AT-biased fixations in single base-pair deletions in the results based on the proposed methods. In the current study, GC-biased gene conversions in nucleotide substitutions were very slight or insignificant. In the variants of several quantitative trait loci (QTLs), slight GC-biased gene conversion was observed in nucleotide substitutions. For the QTL indels, insertions were observed more often than deletions, and deletion-biased fixation was observed, providing new insights into the evolution of functional genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号