首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

2.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

3.
Traditional sequence comparison by alignment employs a mutation model comprised of two events, substitutions and indels (insertions or deletions) of single positions. However, modern genetic analysis knows a variety of more complex mutation events (e.g., duplications, excisions, and rearrangements), especially regarding DNA. With ever more DNA sequence data becoming available, the need to accurately compare sequences which have clearly undergone more complicated types of mutational processes is becoming critical. Herein we introduce a new method for pairwise alignment and comparison of sequences with respect to the special evolution of tandem repeats: substitutions and indels of single positions and, additionally, duplications and excisions of variable degree (i.e., of one or more repeat copies simultaneously) are taken into account. To evaluate our method, we apply it to the spa VNTR (variable number of tandem repeats) cluster of Staphylococcus aureus, a bacterium of high medical importance  相似文献   

4.
Positive and negative selection on indel variation may explain the correlation between intron length and recombination levels in natural populations of Drosophila. A nucleotide sequence analysis of the 3.5 kilobase sequence of the alcohol dehydrogenase (Adh) region from 139 Drosophila pseudoobscura strains and one D. miranda strain was used to determine whether positive or negative selection acts on indel variation in a gene that experiences high levels of recombination. A total of 30 deletion and 36 insertion polymorphisms were segregating within D. pseudoobscura populations and no indels were fixed between D. pseudoobscura and its two sibling species D. miranda and D. persimilis. The ratio of Tajima's D to its theoretical minimum value (D(min)) was proposed as a metric to assess the heterogeneity in D among D. pseudoobscura loci when the number of segregating sites differs among loci. The magnitude of the D/D(min) ratio was found to increase as the rate of population expansion increases, allowing one to assess which loci have an excess of rare variants due to population expansion versus purifying selection. D. pseudoobscura populations appear to have had modest increases in size accounting for some of the observed excess of rare variants. The D/D(min) ratio rejected a neutral model for deletion polymorphisms. Linkage disequilibrium among pairs of indels was greater than between pairs of segregating nucleotides. These results suggest that purifying selection removes deletion variation from intron sequences, but not insertion polymorphisms. Genome rearrangement and size-dependent intron evolution are proposed as mechanisms that limit runaway intron expansion.  相似文献   

5.
MOTIVATION: Tandem repeats (TRs) are associated with human disease, play a role in evolution and are important in regulatory processes. Despite their importance, locating and characterizing these patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect conservation of patterns and complex pattern structures. We study recognition algorithms for two complex pattern structures: variable length tandem repeats (VLTRs) and multi-period tandem repeats (MPTRs). RESULTS: We extend previous algorithmic research to a class of regular tandem repeats (RegTRs). We formally define RegTRs, as well as two important subclasses: VLTRs and MPTRs. We present algorithms for identification of TRs in these classes. Furthermore, our algorithms identify degenerate VLTRs and MPTRs: repeats containing substitutions, insertions and deletions. To illustrate our work, we present results of our analysis for two difficult regions in cattle and human data which reflect practical occurrences of these subclasses in GenBank sequence data. In addition, we show the applicability of our algorithmic techniques for identifying Alu sequences, gene clusters and other distant regions of similarity. We illustrate this with an example from yeast chromosome I.  相似文献   

6.
Nuclear DNA intron sequences are increasingly used to investigate evolutionary relationships among closely related organisms. The phylogenetic usefulness of intron sequences at higher taxonomic levels has, however, not been firmly established and very few studies have used these markers to address evolutionary questions above the family level. In addition, the mechanisms driving intron evolution are not well understood. We compared DNA sequence data derived from three presumably independently segregating introns (THY, PRKC I and MGF) across 158 mammalian species. All currently recognized extant eutherian mammalian orders were included with the exception of Cingulata, Dermoptera and Scandentia. The total aligned length of the data was 6366 base pairs (bp); after the exclusion of autapomorphic insertions, 1511 bp were analyzed. In many instances the Bayesian and parsimony analyses were complementary and gave significant posterior probability and bootstrap support (>80) for the monophyly of Afrotheria, Euarchontoglires, Laurasiatheria and Boreoeutheria. Apart from finding congruent support when using these methods, the intron data also provided several indels longer than 3 bp that support, among others, the monophyly of Afrotheria, Paenungulata, Ferae and Boreoeutheria. A quantitative analysis of insertions and deletions suggested that there was a 75% bias towards deletions. The average insertion size in the mammalian data set was 16.49 bp +/- 57.70 while the average deletion was much smaller (4.47 bp +/- 14.17). The tendency towards large insertions and small deletions is highlighted by the observation that out of a total of 17 indels larger than 100 bp, 15 were insertions. The majority of indels (>60% of all events) were 1 or 2 bp changes. Although the average overall indel substitution rate of 0.00559 per site is comparable to that previously reported for rodents and primates, individual analyses among different evolutionary lineages provide evidence for differences in the formation rate of indels among the different mammalian groups.  相似文献   

7.
Three types of sequence variations--single-nucleotide polymorphisms (SNPs), insertions and deletions (indels), and short tandem repeats (STRs)--have been extensively reported in mammalian genomes. In this study, we discovered a novel type of sequence variation, i.e., multiple-nucleotide length polymorphisms (MNLPs) in bovine UCN3 (Urocortin 3) and its receptor CRHR2 (corticotropin-releasing hormone receptor 2) genes. Both MNLPs featured involvement of multiple-nucleotide length polymorphisms (5-18 bases), low sequence identity, and 1.7- to 11-fold changes in promoter activity between two alleles. Therefore, this novel genetic complexity would contribute significantly to the evolutionary, functional, and phenotypic complexity of genomes within or among species.  相似文献   

8.
Nucleotide substitutions, insertions, and deletions constitute the principal molecular mechanisms generating genetic variation on small length scales. In contrast to substitutions, the nature of short DNA insertions and deletions (indels) is far less understood. With the recent availability of whole-genome multiple alignments between human and other primates, detailed investigations on indel characteristics and origin have come within reach. Here, we show that the majority of short (1-100 bp) DNA insertions in the human lineage are tandem duplications of directly adjacent sequence segments with conserved polarity. Indels in microsatellites comprise only a small fraction. The underlying molecular processes generating indels do not necessarily rely on the presence of preexisting duplicates, as would be expected for unequal crossing over, as well as replication slippage. Instead, our findings point toward a mechanism that preferentially occurs in the male germline and is not recombination-mediated. Surprisingly, nonframeshifting tandem duplications and deletions in coding regions still occur at approximately 50% of their genomic background rates. As is already well established in the context of gene and segmental duplications, our results demonstrate that duplications are also likely to constitute the predominant process for rapid generation of new genetic material and function on smaller scales.  相似文献   

9.
Jo YD  Park J  Kim J  Song W  Hur CG  Lee YH  Kang BC 《Plant cell reports》2011,30(2):217-229
Plants in the family Solanaceae are used as model systems in comparative and evolutionary genomics. The complete chloroplast genomes of seven solanaceous species have been sequenced, including tobacco, potato and tomato, but not peppers. We analyzed the complete chloroplast genome sequence of the hot pepper, Capsicum annuum. The pepper chloroplast genome was 156,781 bp in length, including a pair of inverted repeats (IR) of 25,783 bp. The content and the order of 133 genes in the pepper chloroplast genome were identical to those of other solanaceous plastomes. To characterize pepper plastome sequence, we performed comparative analysis using complete plastome sequences of pepper and seven solanaceous plastomes. Frequency and contents of large indels and tandem repeat sequences and distribution pattern of genome-wide sequence variations were investigated. In addition, a phylogenetic analysis using concatenated alignments of coding sequences was performed to determine evolutionary position of pepper in Solanaceae. Our results revealed two distinct features of pepper plastome compared to other solanaceous plastomes. Firstly, large indels, including insertions on accD and rpl20 gene sequences, were predominantly detected in the pepper plastome compared to other solanaceous plastomes. Secondly, tandem repeat sequences were particularly frequent in the pepper plastome. Taken together, our study represents unique features of evolution of pepper plastome among solanaceous plastomes.  相似文献   

10.
Ptak SE  Petrov DA 《Genetics》2002,162(3):1233-1244
Studies of "dead-on-arrival" transposable elements in Drosophila melanogaster found that deletions outnumber insertions approximately 8:1 with a median size for deletions of approximately 10 bp. These results are consistent with the deletion and insertion profiles found in most other Drosophila pseudogenes. In contrast, a recent study of D. melanogaster introns found a deletion/insertion ratio of 1.35:1, with 84% of deletions being shorter than 10 bp. This discrepancy could be explained if deletions, especially long deletions, are more frequently strongly deleterious than insertions and are eliminated disproportionately from intron sequences. To test this possibility, we use analysis and simulations to examine how deletions and insertions of different lengths affect different components of splicing and determine the distribution of deletions and insertions that preserve the original exons. We find that, consistent with our predictions, longer deletions affect splicing at a much higher rate compared to insertions and short deletions. We also explore other potential constraints in introns and show that most of these also disproportionately affect large deletions. Altogether we demonstrate that constraints in introns may explain much of the difference in the pattern of deletions and insertions observed in Drosophila introns and pseudogenes.  相似文献   

11.
Microsatellites (simple sequence repeats [SSRs]) are highly variable molecular markers that are a rich and readily assayed source of variation for population genetic studies. Cross-amplification between closely related species is possible when there are no (or few) sequence differences in the primer binding sites. The occurrence of nonhomologous fragments of the same size (size homoplasy) is a contraint of microsatellites. Size homoplasy can be caused by insertions/deletions (indels) in SSR flanking regions. We found that size variation in locus ssrQZAG9 is due to different repeat numbers of the SSR motifs but also to indels in SSR flanking regions. Indels were found within species belonging to sectionsRobur andCerris of genusQuercus and also between species of the 2 sections. In sectionRobur (Quercis robur L.,Quercus petraea [Matt.] Liebl.,Quercus pubescens Willd.), we detected rare alleles with an indel of 57 bp or 62 bp followed by a smaller indel of 12 bp in the SSR flanking regions. These alleles show a size range overlapping with that of alleles amplified inQuercus cerris L. (sectionCerris). Multiple alignments with sequences of sectionRobur revealed the same SSR repeat motif but multiple indels in SSR flanking regions inQ. cerris. We discuss the effects of size homoplasy of SSR loci for the study of interspecific gene flow and on estimates of population differentiation.  相似文献   

12.

Background  

Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels.  相似文献   

13.
Slipped-strand mispairing (SSM) may play an major role in repetitive DNA sequence evolution by generating large numbers of short frameshift mutations within simple tandem repeats. Here we examine the frequency and size spectrum of frameshifts generated within poly-CA/TG sequences inserted into bacteriophage M13 in Escherichia coli hosts. The frequency of detectable frameshifts within a 40 bp tract of poly-CA/TG is greater than one percent and increases more than linearly with length, being lower by a factor of four in a 22 bp target sequence. The frequency increases more than 13-fold in mutL and mutS host cells, suggesting that a high proportion of frameshift events are normally repaired by methyl-directed mismatch repair. Of the 87 sequenced frameshifts in this study, 96% result from deletion or insertion of only or two 2 bp repeat units. The most frequent events are 2 bp deletions, 2 bp insertions, and 4 bp deletions, the relative frequencies of these events being about 18:6:1.  相似文献   

14.
15.
16.
A new mathematical method was used for the first time to search for tandem repeats with insertions and deletions in the full-length sequence of the A. thaliana genome. The method is based on a new algorithm for multiple alignment of sequences of certain periods without using paired comparisons of sequences. We identified 13997 periodic sites 2 to 50 characters long, only approximately 30% of which were known earlier. The possible origin and use of the identified sites with tandem repeats are discussed.  相似文献   

17.
18.
19.
T Hong  K Drlica  A Pinter    E Murphy 《Journal of virology》1991,65(1):551-555
During infection of cells by retroviruses, some of the nonintegrated viral DNA can be found as a circular form containing two tandem, directly repeated long terminal repeats. The nucleotide sequence at the point where the long terminal repeats join (the circle junction) can be used to deduce the terminal nucleotides of the linear form of the viral DNA. Comparison of the termini of linear viral DNA with sequences at the junctions between the integrated provirus and the host chromosome has revealed that for most retroviruses 2 bp are removed from each end of the linear viral DNA during integration. For human immunodeficiency virus type 1 (HIV-1), however, sequence considerations involving primer-binding sites had suggested that only 1 bp is removed during integration. We obtained the nucleotide sequences at the ends of HIV-1 DNA by using the polymerase chain reaction to amplify fragments corresponding to the HIV-1 circle junction. Of 17 clones containing amplified sequences, 10 had identical circle junctions that contained an additional 4 bp (GTAC) relative to the integrated provirus. This indicates that, as for other retroviruses, 2 bp are removed from each end of the linear HIV-1 viral DNA during integration. The remaining seven isolates contained insertions or deletions at the circle junction.  相似文献   

20.
动物mtDNA控制区及保守与异质   总被引:6,自引:1,他引:5  
苏瑛 《四川动物》2005,24(4):669-672
本文通过文献综述,对动物线粒体DNA控制区进行了阐述.从线粒体控制区(control region)基因组的研究出发,重点介绍了动物线粒体控制区基因组结构特点.主要结论:由于碱基替换、插入和缺失以及重复序列数目的变异致使D-loop成为mtDNA中变异最多的区域,但突变和结构重排并不是发生在整个D-loop区域,而是在高变区;大多研究集中在mtDNA D-loop保守区和异质方面:对D-loop序列分析,能较好地阐明动物的起源,在动物亲缘关系鉴定、系统进化和物种形成方式的研究等领域具有广阔的研究和应用前景.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号