首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Heteroplasmic tandem repeats in the mitochondrial control region have been documented in a wide variety of vertebrate species. We have examined the control region from 11 species in the family Crocodylidae and identified two different types of heteroplasmic repetitive sequences in the conserved sequence block (CSB) domain-an extensive poly-A tract that appears to be involved in the formation of secondary structure and a series of tandem repeats located downstream ranging from approximately 50 to approximately 80 bp in length. We describe this portion of the crocodylian control region in detail and focus on members of the family Crocodylidae. We then address the origins of the tandemly repeated sequences in this family and suggest hypotheses to explain possible mechanisms of expansion/contraction of the sequences. We have also examined control region sequences from Alligator and Caiman and offer hypotheses for the origin of tandem repeats found in those taxa. Finally, we present a brief analysis of intraindividual and interindividual haplotype variation by examining representatives of Morelet's crocodile (Crocodylus moreletii).  相似文献   

2.
3.
4.
Epstein-Barr virus DNA is known to have partially homologous segments, designated DL and DR, near the left and right ends of the long unique region (Raab-Traub et al., Cell 22:257-267, 1980). DL and DR are each partially composed of tandem direct repeat sequences. DL contains 11 to 14 repeats of a 124-base-pair sequence designated IR2. DR contains approximately 30 direct repeats of a 103-base-pair sequence designated IR4. The DL and DR sequences have colinear partial homology for approximately 2.4 and 1.5 kilobase pairs to the right of IR2 and IR4, respectively. IR2 and IR4 are similar sequences and evolved in part from a common ancestor. Both sequences are 84% guanine and cytosine and have limited homology to Epstein-Barr virus IR1 and to the herpes simplex virus type 1 inverted terminal repeat "a" sequence. IR2 encodes part of an abundant 2.5-kilobase persistent early EBV RNA expressed in productively infected cells, but does not encode part of the 3-kilobase Epstein-Barr virus RNA which is transcribed from the adjacent IR1-U2 region of the Epstein-Barr virus genome in latently infected cells.  相似文献   

5.
Genomic DNA contains a wide variety of repetitive sequences. In Escherichia coli, there have been several classes of repetitive sequences reported, some of which cluster as tandem repeats. We propose a novel method for analyzing symbolic sequences by two-dimensional pattern formation with color-coding. We applied this method for searching tandem repeats in the E. coli genome and found approximately 50 repeats with periods longer than 30 bases. The longest repeat has a period of 1267 bases.  相似文献   

6.
The Tibetan macaque, which is endemic to China,is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature(IUCN)(2017). Short tandem repeats(STRs) refer to repetitive elements of genome sequence that range in length from 1–6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques,we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples.A total of 1077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetraand di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software(lob STR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals(P0.05, t-test). We also found a greater number of homozygous STRs than heterozygous STRs(P0.05,t-test), with the Emei and Jianyang Tibetan macaques showing more heterozygous loci than Huangshan Tibetan macaques. The proportion of insertions and mean variation of alleles in the Emei and Jianyang individuals were slightly higher than those in the Huangshan individuals, thus revealing differences in STR allele size between the two populations.The polymorphic STR loci identified based on the reference genome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin,indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.  相似文献   

7.
Finding approximate tandem repeats in genomic sequences.   总被引:1,自引:0,他引:1  
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.  相似文献   

8.
The genomes of many species are dominated by short sequences repeated consecutively. It is estimated that over 10% of the human genome consists of tandemly repeated sequences. Finding repeated regions in long sequences is important in sequence analysis. We develop a software, LocRepeat, that finds regions of pseudo-periodic repeats in a long sequence. We use the definition of Li et al. 1 for the pseudo-periodic partition of a region and extend the algorithm that can select the repeated region from a given long sequence and give the pseudo-periodic partition of the region.  相似文献   

9.
We describe a remarkably conserved nucleotide sequence, the many copies of which may occupy up to 1% of the genomes of E. coli and S. typhimurium. This sequence, the REP (repetitive extragenic palindromic) sequence, is about 35 nucleotides long, includes an inverted repeat, and can occur singly or in multiple adjacent copies. A possible role for the REP sequences in regulation of gene expression has been thoroughly investigated. While the REP sequences do not appear to modulate differential gene expression within an operon, they can affect the expression of both upstream and downstream genes to a small extent, probably by affecting the rate of mRNA degradation. Possible roles for the REP sequence in mRNA degradation, chromosome structure, and recombination are discussed.  相似文献   

10.
We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer.  相似文献   

11.
Candida glabratais an opportunistic pathogen in humans, responsible for approximately 20% of disseminated candidiasis. Candida glabrata's ability to adhere to host tissue is mediated by GPI-anchored cell wall proteins (GPI-CWPs); the corresponding genes contain long tandem repeat regions. These repeat regions resulted in assembly errors in the reference genome. Here, we performed a de novo assembly of the C. glabrata type strain CBS138 using long single-molecule real-time reads, with short read sequences (Illumina) for refinement, and constructed telomere-to-telomere assemblies of all 13 chromosomes. Our assembly has excellent agreement overall with the current reference genome, but we made substantial corrections within tandem repeat regions. Specifically, we removed 62 genes of which 45 were scrambled due to misassembly in the reference. We annotated 31 novel ORFs of which 24 ORFs are GPI-CWPs. In addition, we corrected the tandem repeat structure of an additional 21 genes. Our corrections to the genome were substantial, with the length of new genes and tandem repeat corrections amounting to approximately 3.8% of the ORFeome length. As most corrections were within the coding regions of GPI-CWP genes, our genome assembly establishes a high-quality reference set of genes and repeat structures for the functional analysis of these cell surface proteins.  相似文献   

12.
A unique group of large icosahedral viruses that infect a unicellular green alga (Chlorella sp. NC64A) were isolated from freshwater sources in Japan. These viruses contain a linear double-stranded DNA (dsDNA) genome with hairpin ends. A physical map was constructed for the genomic DNA of CVK1 (Chlorella virus isolated in Kyoto, no. 1) by pulsed-field gel electrophoresis of restriction fragments. The nucleotide sequences around both termini of the CVK1 DNA revealed the presence of inverted terminal repeats (ITR) of approximately 1.0 kb. Adjacent to the ITR, unique sequence elements of 10 to 20 by were directly repeated 20 to 30 times in tandem array. Several copies of these repeat elements were deleted in virus mutants that were occasionally generated from Chlorella cells that were in a putative CVK1 carrier state. These repeats might represent a hot spot of rearrangement in the CVK1 genome.  相似文献   

13.
DNA from ground squirrels of the Citellus genus (Rodentia, Sciuridae) were analysed by centrifugation in the presence of CsCl followed by digestion by restriction endonucleases. Digestion of DNA of two species C. undulatus and C. fulvus by 10 of the 16 restriction endonucleases used led to formation of electrophoretically discrete fragments that are multiple to 330 b.p. in length which points out the tandem organization of repetitive sequences similar to the satellite DNA of many mammal species. However, upon centrifugation we failed to reveal a satellite band in these species; hence the tandem repeats refer to the class of cryptic satellites in the ground squirrels and do not differ in base composition from the remaining part of DNA. The main fraction of the genome was revealed in the form of discrete fragments by cleavage with HindIII and AluI. Both of these restriction endonucleases were used for comparative analysis of DNA of 12 Citellus species. It has been shown that DNA of all species can be digested by HindIII and yields a series of fragments that are multiple to 330-30 b.p. in length and the total content of which varies from species to species within 4-22%. The fraction of the tandem repeats does not correlate with the systematic position of species nor with the amount of heterochromatin in the chromosomes. AluI cuts the DNA of 11 species yielding 110 and 220 b.p. fragments compared to only 60 and 280 b.p. in the DNA of C. dauricus. Under HindIII digestion we can also reveal the tandem repeats in marmot, which is phylogenetically close to the Citellus of the Marmota genus, but they have another periodicity--180 b.p. We propose that the age of ground squirrels repeats is 2-3 million years and they are significantly younger than the marmot repeats.  相似文献   

14.

   

The placozoan Trichoplax adhaerens has a compact genome with many primitive eumetazoan characteristics. In order to gain a better understanding of its genome architecture, we conducted a detailed analysis of repeat content in this genome. The transposable element (TE) content is lower than that of other metazoans, and the few TEs present in the genome appear to be inactive. A new phylogenetic clade of the gypsy-like LTR retrotransposons was identified, which includes the majority of gypsy-like elements in Trichoplax. A particular microsatellite motif (ACAGT) exhibits unexpectedly high abundance, and also has strong association with its nearby genes.  相似文献   

15.
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs.  相似文献   

16.
Polymers of arbitrary oligonucleotides can be used to detect polymorphic loci in a wide range of vertebrate genomes. Using 60 such probes, we previously reported the selection of the most efficient STR probes for polymorphism detection in the set of genomes investigated. We now report the use of this selection for the mouse genome and its contribution to genetic mapping. Twenty-three synthetic tandem repeats (STRs) sequences were probed on a recombinant inbred panel C57B1/6 x DBA/2. The loci detected are distributed in 70 linkage groups; 42 of these groups, corresponding to about 100 different polymorphic loci, include reference markers. These linkage groups appear to be evenly distributed within all the 20 mouse chromosomes with apparently no bias of repartition towards telomeres or centromeres.  相似文献   

17.

Background

Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison.

Results

In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors.

Conclusions

We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.
  相似文献   

18.
Large variation in genome size as determined by the nuclear DNA content and the mitotic chromosome size among diploid rice species is revealed using flow cytometry and image analyses. Both the total chromosomal length (r_0.939) and the total chromosomal area (r_0.927) correlated well with the nuclear DNA content. Among all the species examined, Oryza australiensis (E genome) and O. brachyantha (F genome), respectively, were the largest and smallest in genome size. O. sativa (A genome) involving all the cultivated species showed the intermediate genome size between them. The distribution patterns of genome-specific repetitive DNA sequences were physically determined using fluorescence in situ hybridization (FISH). O. brachyantha had limited sites of the repetitive DNA sequences specific to the F genome. O. australiensis showed overall amplification of genome-specific DNA sequences throughout the chromosomes. The amplification of the repetitive DNA sequences causes the variation in the chromosome morphology and thus the genome size among diploid species in the genus Oryza.  相似文献   

19.
A unique group of large icosahedral viruses that infect a unicellular green alga (Chlorella sp. NC64A) were isolated from freshwater sources in Japan. These viruses contain a linear double-stranded DNA (dsDNA) genome with hairpin ends. A physical map was constructed for the genomic DNA of CVK1 (Chlorella virus isolated in Kyoto, no. 1) by pulsed-field gel electrophoresis of restriction fragments. The nucleotide sequences around both termini of the CVK1 DNA revealed the presence of inverted terminal repeats (ITR) of approximately 1.0 kb. Adjacent to the ITR, unique sequence elements of 10 to 20 by were directly repeated 20 to 30 times in tandem array. Several copies of these repeat elements were deleted in virus mutants that were occasionally generated from Chlorella cells that were in a putative CVK1 carrier state. These repeats might represent a hot spot of rearrangement in the CVK1 genome.  相似文献   

20.
MOTIVATION: Tandem repeats are associated with disease genes, play an important role in evolution and are important in genomic organization and function. Although much research has been done on short perfect patterns of repeats, there has been less focus on imperfect repeats. Thus, there is an acute need for a tandem repeats database that provides reliable and up to date information on both perfect and imperfect tandem repeats in the human genome and relates these to disease genes. RESULTS: This paper presents a web-accessible relational tandem repeats database that relates tandem repeats to gene locations and disease genes of the human genome. In contrast to other available databases, this database identifies both perfect and imperfect repeats of 1-2000 bp unit lengths. The utility of this database has been illustrated by analysing these repeats for their distribution and frequencies across chromosomes and genomic locations and between protein-coding and non-coding regions. The applicability of this database to identify diseases associated with previously uncharacterized tandem repeats is demonstrated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号