首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison.

Results

In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors.

Conclusions

We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.
  相似文献   

2.
All complete or nearly complete mitochondrial genomes of Metazoa (2819) have been subject to bioinformatic analysis to investigate the distribution and features of repeated and palindromic sequences. Repeats are ubiquitous, with 29.9% of genomes containing at least one and 1.95% of total genome length being repeated. Repeat boundaries were tested for the presence of secondary structure motifs, consensus sequences or small repeats, features generally reported as associated with duplications. No significant relationship was detected, suggesting the non ubiquitousness of such features. A mechanism related to gene conversion is proposed to explain the origin of small interspersed repeats.  相似文献   

3.

Background  

Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats.  相似文献   

4.
Taylor JS  Breden F 《Genetics》2000,155(3):1313-1320
The standard slipped-strand mispairing (SSM) model for the formation of variable number tandem repeats (VNTRs) proposes that a few tandem repeats, produced by chance mutations, provide the "raw material" for VNTR expansion. However, this model is unlikely to explain the formation of VNTRs with long motifs (e.g., minisatellites), because the likelihood of a tandem repeat forming by chance decreases rapidly as the length of the repeat motif increases. Phylogenetic reconstruction of the birth of a mitochondrial (mt) DNA minisatellite in guppies suggests that VNTRs with long motifs can form as a consequence of SSM at noncontiguous repeats. VNTRs formed in this manner have motifs longer than the noncontiguous repeat originally formed by chance and are flanked by one unit of the original, noncontiguous repeat. SSM at noncontiguous repeats can therefore explain the birth of VNTRs with long motifs and the "imperfect" or "short direct" repeats frequently observed adjacent to both mtDNA and nuclear VNTRs.  相似文献   

5.
The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor-based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool.  相似文献   

6.
The rapid proliferation of genomic DNA sequences has created a significant need for software that can both focus on relatively small areas (such as within genes or promoters) and provide wide-zoom views of patterns across entire genomes. We present our DNA Motif Lexicon that enables users to perform genome-wide searches for motifs of interest and create customizable results pages, where results differ in the degree and extent of annotation. Searching for a particular motif is akin to a word search in a natural language; our motif lexicon speaks to this new time when we will increasingly rely upon DNA dictionaries that offer rich types of annotation. Indeed, the concept of "lexomics", introduced in this paper may be appropriate to the types of meta-analyses appropriate to the deciphering of regulatory information. Currently supporting five genomes, our web-based lexicon allows users to look up motifs of interest and build user-defined result pages to include the following: (1) all base pair locations where a motif is found with links to further search the "neighborhoods" near each of these locations; whether each location of the motif is genic (within) a gene, intergenic, or a bridging sequence (overlapping a gene boundary) (2) NCBI hot-links to nearest upstream and downstream genes for each location (3) statistical information about the query (4) whether the motif is a certain type of repeat (5) links for the reverse, complement and reverse-complement of the motif of interest and (6) hot-links to PubMed abstracts which mention the motif of interest. A software framework facilitates the continual development of new annotation modules. The tool is located at: http://genomics.wheatoncollege.edu/cgi-bin/lexicon.exe.  相似文献   

7.

Background

CRISPR has been becoming a hot topic as a powerful technique for genome editing for human and other higher organisms. The original CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats coupled with CRISPR-associated proteins) is an important adaptive defence system for prokaryotes that provides resistance against invading elements such as viruses and plasmids. A CRISPR cassette contains short nucleotide sequences called spacers. These unique regions retain a history of the interactions between prokaryotes and their invaders in individual strains and ecosystems. One important ecosystem in the human body is the human gut, a rich habitat populated by a great diversity of microorganisms. Gut microbiomes are important for human physiology and health. Metagenome sequencing has been widely applied for studying the gut microbiomes. Most efforts in metagenome study has been focused on profiling taxa compositions and gene catalogues and identifying their associations with human health. Less attention has been paid to the analysis of the ecosystems of microbiomes themselves especially their CRISPR composition.

Results

We conducted a preliminary analysis of CRISPR sequences in a human gut metagenomic data set of Chinese individuals of type-2 diabetes patients and healthy controls. Applying an available CRISPR-identification algorithm, PILER-CR, we identified 3169 CRISPR cassettes in the data, from which we constructed a set of 1302 unique repeat sequences and 36,709 spacers. A more extensive analysis was made for the CRISPR repeats: these repeats were submitted to a more comprehensive clustering and classification using the web server tool CRISPRmap. All repeats were compared with known CRISPRs in the database CRISPRdb. A total of 784 repeats had matches in the database, and the remaining 518 repeats from our set are potentially novel ones.

Conclusions

The computational analysis of CRISPR composition based contigs of metagenome sequencing data is feasible. It provides an efficient approach for finding potential novel CRISPR arrays and for analysing the ecosystem and history of human microbiomes.
  相似文献   

8.
Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.  相似文献   

9.
The terminal structure of the linear mitochondrial DNA (mtDNA) from the yeast Candida parapsilosis was investigated. This mtDNA, 30 kb long, has symmetrical ends forming inverted terminal repeats. These repeats are made up of a variable number of tandemly repeating units of 738 by each; the terminal nucleotide corresponds to a precise position within the last repeat unit sequence. The ends had an open structure accessible to enzymes, with a 5 single-stranded extension of about 110 nucleotides. No circular forms were detected in the DNA preparations. Two other unrelated species, Pichia philodendra and Candida salmanticensis also appear to have a linear mtDNA of similar organization. These linear DNAs (which we name Type 2 linear mtDNAs) are distinct from the previously described linear mtDNAs of yeasts whose termini are formed by a closed hairpin loop (Type 1 linear mtDNA). The terminal structure of C. parapsilosis mtDNA is reminiscent of the linear mitochondrial genomes of the ciliate Tetrahymena although, in the latter, the telomeric tandem repeat unit is considerably shorter.  相似文献   

10.
Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1–6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.  相似文献   

11.
本文以人腺病毒B亚种31条基因组序列及D亚种39条基因组序列为研究材料,利用ImperfectMicrosatelliteExtractor和DNAMAN软件对这些基因组序列中简单重复序列(SSR)的分布情况进行了系统性分析和比较。分析结果显示:人腺病毒B、D亚种基因组中简单重复序列的平均相对密度是十分接近的,但在不同类型SSR中分布情况又有所不同。D亚种中二型SSR明显高于B亚种,在两亚种一型SSR中(A)n、(T)n都是比较多的,而在两亚种二型SSR中的(CG/GC)n表现出了较高的偏好性。在同亚种多序列比对分析中,D亚种表现出了更高的稳定性。B、D亚种中SSR的这种特异性分布可能与它们的进化机制和致病性有关。  相似文献   

12.
A tool for searching pattern and fingerprint databases is described.Fingerprints are groups of motifs excised from conserved regionsof sequence alignments and used for iterative database scanning.The constituent motifs are thus encoded as small alignmentsin which sequence information is maximised with each databasepass; they therefore differ from regular-expression patterns,in which alignments are reduced to single consensus sequences.Different database formats have evolved to store these disparatetypes of information, namely the PROSITE dictionary of patternsand the PRINTS fingerprint database, but programs have not beenavailable with the flexibility to search them both. We havedeveloped a facility to do this: the system allows query sequencesto be scanned against either PROSITE, the full PRINTS database,or against individual fingerprints. The results of fingerprintsearches are displayed simultaneously in both text and graphicalwindows to render them more tangible to the user. Where structuralcoordinates are available, identified motifs may be visualisedin a 3D context. The program runs on Silicon Graphics machinesusing GL graphics libraries and on machines with X servers supportingthe PEX extension: its use is illustrated here by depictingthe location of low-density lipoprotein-binding (LDL) motifsand leucine-rich repeats in a mosaic G-protein-coupled receptor(GPCR).  相似文献   

13.
The complete mitochondrial DNA (mtDNA) genome of the Eunapius subterraneus (Porifera, Demospongiae), a unique stygobitic sponge, was analyzed and compared with previously published mitochondrial genomes from this group. The 24,850 bp long mtDNA genome is circular with the same gene composition as found in other metazoans. Intergenic regions (IGRs) comprise 24.7% of mtDNA and are abundant with direct and inverted repeats and palindromic elements as well as with open reading fames (ORFs) whose distribution and homology was compared with other available mt genomes with a special focus on freshwater sponges. Phylogenetic analyses based on concatenated amino acid sequences from 12 mt protein genes placed E. subterraneus in a well-supported monophyletic clade with the freshwater sponges, Ephydatia muelleri and Lubomirskia baicalensis. Our study showed high homology of mtDNA genomes among freshwater sponges, implying their recent split.  相似文献   

14.
Short interspersed elements (SINEs) are ubiquitous in mammalian genomes. Remarkable variety of these repeats among placental orders indicates that most of them amplified in each lineage independently, following mammalian radiation. Here, we present an ancient family of repeats, whose sequence divergence and common occurrence among placental mammals, marsupials and monotremes indicate their amplification during the Mesozoic era. They are called MIRs for abundant Mammalian-wide Interspersed Repeats. With approximately 120,000 copies still detectable in the human genome (0.2-0.3% DNA), MIRs represent a 'fossilized' record of a major genetic event preceding the radiation of placental orders.  相似文献   

15.

Background  

Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel type of direct repeat found in a wide range of bacteria and archaea. CRISPRs are beginning to attract attention because of their proposed mechanism; that is, defending their hosts against invading extrachromosomal elements such as viruses. Existing repeat detection tools do a poor job of identifying CRISPRs due to the presence of unique spacer sequences separating the repeats. In this study, a new tool, CRT, is introduced that rapidly and accurately identifies CRISPRs in large DNA strings, such as genomes and metagenomes.  相似文献   

16.
Motivation: Genomes contain biologically significant informationthat extends beyond that encoded in genes. Some of this informationrelates to various short dispersed repeats distributed throughoutthe genome. The goal of this work was to combine tools for detectionof statistically significant dispersed repeats in DNA sequenceswith tools to aid development of hypotheses regarding theirpossible physiological functions in an easy-to-use web-basedenvironment. Results: Ab Initio Motif Identification Environment (AIMIE)was designed to facilitate investigations of dispersed sequencemotifs in prokaryotic genomes. We used AIMIE to analyze theEscherichia coli and Haemophilus influenzae genomes in orderto demonstrate the utility of the new environment. AIMIE detectedrepeated extragenic palindrome (REP) elements, CRISPR repeats,uptake signal sequences, intergenic dyad sequences and severalother over-represented sequence motifs. Distributional patternsof these motifs were analyzed using the tools included in AIMIE. Availability: AIMIE and the related software can be accessedat our web site http://www.cmbl.uga.edu/software.html. Contact: mrazek{at}uga.edu Associate Editor: Alex Bateman  相似文献   

17.
In most yeast species, the mitochondrial DNA (mtDNA) has been reported to be a circular molecule. However, two cases of linear mtDNA with specific termini have previously been described. We examined the frequency of occurrence of linear forms of mtDNA among yeasts by pulsed-field gel electrophoresis. Among the 58 species from the genera Pichia and Williopsis that we examined, linear mtDNA was found with unexpectedly high frequency. Thirteen species contained a linear mtDNA, as confirmed by restriction mapping, and labeling, and electron microscopy. The mtDNAs from Pichia pijperi, Williopsis mrakii, and P. jadinii were studied in detail. In each case, the left and right terminal fragments shared homologous sequences. Between the terminal repeats, the order of mitochondrial genes was the same in all of the linear mtDNAs examined, despite a large variation of the genome size. This constancy of gene order is in contrast with the great variation of gene arrangement in circular mitochondrial genomes of yeasts. The coding sequences determined on several genes were highly homologous to those of the circular mtDNAs, suggesting that these two forms of mtDNA are not of distant origins.  相似文献   

18.
MRD is a database system to access the microsatellite repeats information of genomes such as archea, eubacteria, and other eukaryotic genomes whose sequence information is available in public domains. MRD stores information about simple tandemly repeated k-mer sequences where k= 1 to 6, i.e. monomer to hexamer. The web interface allows the users to search for the repeat of their interest and to know about the association of the repeat with genes and genomic regions in the specific organism. The data contains the abundance and distribution of microsatellites in the coding and non-coding regions of the genome. The exact location of repeats with respect to genomic regions of interest (such as UTR, exon, intron or intergenic regions) whichever is applicable to organism is highlighted. MRD is available on the World Wide Web at and/or . The database is designed as an open-ended system to accommodate the microsatellite repeats information of other genomes whose complete sequences will be available in future through public domain.  相似文献   

19.
A method for fast database search for all k-nucleotide repeats.   总被引:3,自引:0,他引:3       下载免费PDF全文
A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). Although the functions of these repeating regions are not well understood, they appear important for understanding the expression, regulation and evolution of DNA. For example, increases in the number of trinucleotide repeats have been associated with human genetic disease, including Fragile-X mental retardation and Huntington's disease. Repeats are also useful as a tool in mapping and identifying DNA; the number of copies of a particular pattern at a site is often variable among individuals (polymorphic) and is therefore helpful in locating genes via linkage studies and also in providing DNA fingerprints of individuals. The number of repeating regions is unknown as is the distribution of pattern sizes. It would be useful to search for such regions in the DNA database in order that they may be studied more fully. The DNA database currently consists of approximately 150 million basepairs and is growing exponentially. Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides. It is our hope that the examples in this paper will illustrate the unrealized diversity of repeats in DNA and that the program we have developed will be a useful tool for locating new and interesting repeats.  相似文献   

20.
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号