首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

2.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

3.
Little is known about variation of nucleotide insertion/deletions (indels) within species. In Arabidopsis thaliana, we investigated indel polymorphism patterns between two genome sequences and among 96 accessions at 1215 loci. Our study identified patterns in the variation of indel density, size, GC content and distribution, and a correlation between indels and substitutions. We found that the GC content in indel sequences was lower than that in non-indel sequences and that indels typically occur in regions with lower GC content. Patterns of indel frequency distribution among populations were more consistent with neutral expectation than substitution patterns. We also found that the local level of substitutions is positively correlated with indel density and negatively correlated with their distance to the closed indel, suggesting that indels play an important role in nucleotide variation.  相似文献   

4.
The evolution and diversification of different types of photosynthetic reaction centers (RCs) remains an important unresolved problem. We report here novel sequence features of the core proteins from Type I RCs (RC-I) and Type II RCs (RC-II) whose analyses provide important insights into the evolution of the RCs. The sequence alignments of the RC-I core proteins contain two conserved inserts or deletions (indels), a 3 amino acid (aa) indel that is uniquely found in all RC-I homologs from Cyanobacteria (both PsaA and PsaB) and a 1 aa indel that is specifically shared by the Chlorobi and Acidobacteria homologs. Ancestral sequence reconstruction provides evidence that the RC-I core protein from Heliobacteriaceae (PshA), lacking these indels, is most closely related to the ancestral RC-I protein. Thus, the identified 3 aa and 1 aa indels in the RC-I protein sequences must have been deletions, which occurred, respectively, in an ancestor of the modern Cyanobacteria containing a homodimeric form of RC-I and in a common ancestor of the RC-I core protein from Chlorobi and Acidobacteria. We also report a conserved 1 aa indel in the RC-II protein sequences that is commonly shared by all homologs from Cyanobacteria but not found in the homologs from Chloroflexi, Proteobacteria and Gemmatimonadetes. Ancestral sequence reconstruction provides evidence that the RC-II subunits lacking this indel are more similar to the ancestral RC-II protein. The results of flexible structural alignments of the indel-containing region of the RC-II protein with the homologous region in the RC-I core protein, which shares structural similarity with the RC-II homologs, support the view that the 1 aa indel present in the RC-II homologs from Cyanobacteria is a deletion, which was not present in the ancestral form of the RC-II protein. Our analyses of the conserved indels found in the RC-I and RC-II proteins, thus, support the view that the earliest photosynthetic lineages with living descendants likely contained only a single RC (RC-I or RC-II), and the presence of both RC-I and RC-II in a linked state, as found in the modern Cyanobacteria, is a derivation from these earlier phototrophs.  相似文献   

5.
MOTIVATION: The two mutation processes that have the largest impact on genome evolution at small scales are substitutions, and sequence insertions and deletions (indels). While the former have been studied extensively, indels have received less attention, and in particular, the problem of inferring indel rates between pairs of divergent sequence remains unsolved. Here, I describe a novel and accurate method for estimating neutral indel rates between divergent pairs of genomes. RESULTS: Simulations suggest that new method for estimating indel rates is accurate to within 2%, at divergences corresponding to that of human and mouse. Applying the method to these species, I show that indel rates are up to twice higher than is apparent from alignments, and depend strongly on the local G + C content. These results indicate that at these evolutionary distances, the contribution of indels to sequence divergence is much larger than hitherto appreciated. In particular, the ratio of substitution to indel rates between human and mouse appears to be around gamma = 8, rather than the currently accepted value of about gamma = 14.  相似文献   

6.
7.
Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genotypes for indels of all lengths, especially for low coverage sequence data. In this paper, we present GINDEL, an approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. We test our approach on both simulated and real data and compare with existing tools, including Genome STRiP, Pindel and Clever-sv. Results show that GINDEL works well for deletions larger than 50 bp on both high and low coverage data. Also, GINDEL performs well for insertion genotyping on both simulated and real data. For comparison, Genome STRiP performs less well for shorter deletions (50–200 bp) on both simulated and real sequence data from the 1000 Genomes Project. Clever-sv performs well for intermediate deletions (200–1500 bp) but is less accurate when coverage is low. Pindel only works well for high coverage data, but does not perform well at low coverage. To summarize, we show that GINDEL not only can call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches. The program GINDEL can be downloaded at: http://sourceforge.net/p/gindel  相似文献   

8.
Single-nucleotide polymorphisms (SNPs) are the most frequent variations in the genome of any organism. SNP discovery approaches such as resequencing or data mining enable the identification of insertion deletion (indel) polymorphisms. These indels can be treated as biallelic markers and can be utilized for genetic mapping and diagnostics. In this study 655 indels have been identified by resequencing 502 maize (Zea mays) loci across 8 maize inbreds (selected for their high allelic variation). Of these 502 loci, 433 were polymorphic, with indels identified in 215 loci. Of the 655 indels identified, single-nucleotide indels accounted for more than half (54.8%) followed by two- and three-nucleotide indels. A high frequency of 6-base (3.4%) and 8-base (2.3%) indels were also observed. When analysis is restricted to the B73 and Mo17 genotypes, 53% of the loci analyzed contained indels, with 42% having an amplicon size difference. Three novel miniature inverted-repeat transposable element (MITE)-like sequences were identified as insertions near genes. The utility of indels as genetic markers was demonstrated by using indel polymorphisms to map 22 loci in a B73 × Mo17 recombinant inbred population. This paper clearly demonstrates that the resequencing of 3 EST sequence and the discovery and mapping of indel markers will position corresponding expressed genes on the genetic map.  相似文献   

9.
10.
Indels in DNA sequences frequently affect more than a single nucleotide, creating problems for alignment, character coding and phylogenetic analysis. However, the size and frequency of multiple‐residue indels is not usually tested, and with popular alignment packages their reconstruction is indirectly acheived by reducing the affine (gap extension) cost. We explored the length distribution of indels in intron sequences of the gene Mp20 by modifying the gap opening and gap extension costs. Given a “known” tree for the study group, global homology levels were greatest under low gap cost, with gap extension costs of roughly 0.4‐fold the opening cost. Different approaches to gap coding and weighting suggested that taxonomic congruence was correlated with high frequencies of multiple‐position indels, with a maximum indel length of 2–5 bp and few indels above 15 bp, but also including a proportion of indels > 100 bp. Only a small minority of indels could be reconstructed as single‐position indels. Consequently, tree topologies improved when homologous multinucleotide indels were recoded as binary characters which are otherwise highly homoplastic and weighted characters in single‐position coding. In tree‐generating alignment procedures as implemented in POY, where gap penalty determines the character weight during tree search, the problem of assigning inappropriately high weight to multiple‐residue indels could partly be overcome by setting the extension costs to about 0.4‐fold lower than gap opening costs. We conclude that multiple consecutive gap positions are not independent characters and hence methods for parsimony reconstruction of long indels are required. Finally, we also observed a general lack of correlation between taxonomic and character congruence, demonstrating the difficulties of applying congruence criteria to decide among competing alignments. This highlights the value of recent model‐based alignment procedures which can implement the statistical distributions of indel size classes, and do not rely on potentially circular strategies for optimizing overall congruence. © The Willi Hennig Society 2006.  相似文献   

11.
Microsatellites (simple sequence repeats [SSRs]) are highly variable molecular markers that are a rich and readily assayed source of variation for population genetic studies. Cross-amplification between closely related species is possible when there are no (or few) sequence differences in the primer binding sites. The occurrence of nonhomologous fragments of the same size (size homoplasy) is a contraint of microsatellites. Size homoplasy can be caused by insertions/deletions (indels) in SSR flanking regions. We found that size variation in locus ssrQZAG9 is due to different repeat numbers of the SSR motifs but also to indels in SSR flanking regions. Indels were found within species belonging to sectionsRobur andCerris of genusQuercus and also between species of the 2 sections. In sectionRobur (Quercis robur L.,Quercus petraea [Matt.] Liebl.,Quercus pubescens Willd.), we detected rare alleles with an indel of 57 bp or 62 bp followed by a smaller indel of 12 bp in the SSR flanking regions. These alleles show a size range overlapping with that of alleles amplified inQuercus cerris L. (sectionCerris). Multiple alignments with sequences of sectionRobur revealed the same SSR repeat motif but multiple indels in SSR flanking regions inQ. cerris. We discuss the effects of size homoplasy of SSR loci for the study of interspecific gene flow and on estimates of population differentiation.  相似文献   

12.
13.
The plant mitochondrial rps3 intron was analyzed for substitution and indel rate variation among 15 monocot and dicot angiosperms from 10 genera, including perennial and annual taxa. Overall, the intron sequence was very conserved among angiosperms. Based on length polymorphism, 10 different alleles were identified among the 10 genera. These allelic differences were mainly attributable to large indels. An insertion of 133 nucleotides, observed in the Alnus intron was partially or completely absent in the other lineages of the family Betulaceae. This insertion was located within domain IV of the secondary-structure model of this group IIA intron. A mobile element of 47 nucleotides that showed homology to sequences located in rice rps3 intron and in intergenic plant mitochondrial genomes was found within this insertion. Both substitution and indel rates were low among the Betulaceae sequences, but substitution rates were increasingly larger than indel rates in comparisons involving more distantly related taxa. From a secondary-structure model, regions involved in helical structures were shown to be well preserved from indels as compared to substitutions, but compensatory changes were not observed among the angiosperm sequences analyzed. Using approximate divergence times based on the fossil record, substitution and indel rate heterogeneity was observed between different pairs of annual and perennial taxa. In particular, the annual petunia and primrose evolved more than 15 and 10 times faster, for substitution and indel rates respectively, than the perennial birch and alder. This is the first demonstration of an evolutionary rate difference between perennial and annual forms in noncoding DNA, lending support to neutral causes such as the generation time, population size, and speciation rate effects to explain such rate heterogeneity. Surprisingly, the sequence from the rps3 intron had a high identity with the sequence of intron 1 from the angiosperm mitochondrial nad5 gene, suggesting a common origin of these two group IIA introns.  相似文献   

14.
A search was performed for single-nucleotide polymorphisms (SNP) and short insertions-deletions (indels) in 34 melon (Cucumis melo L.) expressed sequence tag (EST) fragments between two distantly related melon genotypes, a group Inodorus 'Piel de sapo' market class breeding line T111 and the Korean accession PI 161375. In total, we studied 15 kb of melon sequence. The average frequency of SNPs between the two genotypes was one every 441 bp. One indel was also found every 1666 bp. Seventy-five percent of the polymorphisms were located in introns and the 3'untranslated regions. On average, there were 1.26 SNPs plus indels per amplicon. We explored three different SNP detection systems to position five of the SNPs in a melon genetic map. Three of the SNPs were mapped using cleaved amplified polymorphic sequence (CAPS) markers, one SNP was mapped using the single primer extension reaction with fluorescent-labelled dideoxynucleotides, and one indel was mapped using polyacrilamide gel electrophoresis separation. The discovery of SNPs based on ESTs and a suitable system for SNP detection has broad potential utility in melon genome mapping.  相似文献   

15.
16.
The presence of heterozygous indels in a DNA sequence usually results in the sequence being discarded. If the sequence trace is of high enough quality, however, it will contain enough information to reconstruct the two constituent sequences with very little ambiguity. Solutions already exist using comparisons with a known reference sequence, but this is often unavailable for nonmodel organisms or novel DNA regions. I present a program which determines the sizes and positions of heterozygous indels in a DNA sequence and reconstructs the two constituent haploid sequences. No external data such as a reference sequence or other prior knowledge are required. Simulation suggests an accuracy of >99% from a single read, with errors being eliminable by the inclusion of a second sequencing read, such as one using a reverse primer. Diploid sequences can be fully reconstructed across any number of heterozygous indels, with two overlapping sequencing reads almost always sufficient to infer the entire DNA sequence. This eliminates the need for costly and laborious cloning, and allows data to be used which would otherwise be discarded. With no more laboratory work than is needed to produce two normal sequencing reads, two aligned haploid sequences can be produced quickly and accurately and with extensive phasing information.  相似文献   

17.
Sawyer SL  Howell WM  Brookes AJ 《BioTechniques》2003,35(2):292-6, 298
Genome variation provides researchers with thousands of markers with which to study human demographic history and phenotypes. Insertion-deletion (indel) polymorphism is an important and abundant form of human genome variation, and convenient methods for genotyping indels are therefore needed. Here we evaluate dynamic allele-specific hybridization (DASH) for its ability to score indels. Evaluation of six model indel DASH assays based on synthetic oligonucleotides showed that length differences of 1-5 bp were accurately scored. Only single probes were required to assay indels of 3-4 bp or less, while longer indels tended to require the use of both allele probes serially. The best results were obtained by central placing of the probe over the indel. Model study findings were confirmed by running indel DASH assays upon PCR-amplified targets representing four polymorphisms from Alzheimer's disease candidate genes APBB1 and LRP1. These indels were genotyped in a set of 121 patients and 156 controls. While no disease association was found, the data quality confirmed that DASH is a robust and useful procedure for genotyping indels of the size range typically found in the human genome.  相似文献   

18.
Nuclear DNA intron sequences are increasingly used to investigate evolutionary relationships among closely related organisms. The phylogenetic usefulness of intron sequences at higher taxonomic levels has, however, not been firmly established and very few studies have used these markers to address evolutionary questions above the family level. In addition, the mechanisms driving intron evolution are not well understood. We compared DNA sequence data derived from three presumably independently segregating introns (THY, PRKC I and MGF) across 158 mammalian species. All currently recognized extant eutherian mammalian orders were included with the exception of Cingulata, Dermoptera and Scandentia. The total aligned length of the data was 6366 base pairs (bp); after the exclusion of autapomorphic insertions, 1511 bp were analyzed. In many instances the Bayesian and parsimony analyses were complementary and gave significant posterior probability and bootstrap support (>80) for the monophyly of Afrotheria, Euarchontoglires, Laurasiatheria and Boreoeutheria. Apart from finding congruent support when using these methods, the intron data also provided several indels longer than 3 bp that support, among others, the monophyly of Afrotheria, Paenungulata, Ferae and Boreoeutheria. A quantitative analysis of insertions and deletions suggested that there was a 75% bias towards deletions. The average insertion size in the mammalian data set was 16.49 bp +/- 57.70 while the average deletion was much smaller (4.47 bp +/- 14.17). The tendency towards large insertions and small deletions is highlighted by the observation that out of a total of 17 indels larger than 100 bp, 15 were insertions. The majority of indels (>60% of all events) were 1 or 2 bp changes. Although the average overall indel substitution rate of 0.00559 per site is comparable to that previously reported for rodents and primates, individual analyses among different evolutionary lineages provide evidence for differences in the formation rate of indels among the different mammalian groups.  相似文献   

19.
Nucleotide substitution, insertion and deletion (indel) events are the major driving forces that have shaped genomes. Using the recently identified human ribosomal protein (RP) pseudogene sequences, we have thoroughly studied DNA mutation patterns in the human genome. We analyzed a total of 1726 processed RP pseudogene sequences, comprising more than 700 000 bases. To be sure to differentiate the sequence changes occurring in the functional genes during evolution from those occurring in pseudogenes after they were fixed in the genome, we used only pseudogene sequences originating from parts of RP genes that are identical in human and mouse. Overall, we found that nucleotide transitions are more common than transversions, by roughly a factor of two. Moreover, the substitution rates amongst the 12 possible nucleotide pairs are not homogeneous as they are affected by the type of immediately neighboring nucleotides and the overall local G+C content. Finally, our dataset is large enough that it has many indels, thus allowing for the first time statistically robust analysis of these events. Overall, we found that deletions are about three times more common than insertions (3740 versus 1291). The frequencies of both these events follow characteristic power–law behavior associated with the size of the indel. However, unexpectedly, the frequency of 3 bp deletions (in contrast to 3 bp insertions) violates this trend, being considerably higher than that of 2 bp deletions. The possible biological implications of such a 3 bp bias are discussed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号