首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.  相似文献   

5.
DNA replication is a highly regulated process that is initiated from replication origins, but the elements of chromatin structure that contribute to origin activity have not been fully elucidated. To identify histone post-translational modifications important for DNA replication, we initiated a genetic screen to identify interactions between genes encoding chromatin-modifying enzymes and those encoding proteins required for origin function in the budding yeast Saccharomyces cerevisiae. We found that enzymes required for histone H3K4 methylation, both the histone methyltransferase Set1 and the E3 ubiquitin ligase Bre1, are required for robust growth of several hypomorphic replication mutants, including cdc6-1. Consistent with a role for these enzymes in DNA replication, we found that both Set1 and Bre1 are required for efficient minichromosome maintenance. These phenotypes are recapitulated in yeast strains bearing mutations in the histone substrates (H3K4 and H2BK123). Set1 functions as part of the COMPASS complex to mono-, di-, and tri-methylate H3K4. By analyzing strains lacking specific COMPASS complex members or containing H2B mutations that differentially affect H3K4 methylation states, we determined that these replication defects were due to loss of H3K4 di-methylation. Furthermore, histone H3K4 di-methylation is enriched at chromosomal origins. These data suggest that H3K4 di-methylation is necessary and sufficient for normal origin function. We propose that histone H3K4 di-methylation functions in concert with other histone post-translational modifications to support robust genome duplication.  相似文献   

6.
7.
8.

Background

Mitochondrial DNA (mtDNA) deletions cause disease and accumulate during aging, yet our understanding of the molecular mechanisms underlying their formation remains rudimentary. Guanine-quadruplex (GQ) DNA structures are associated with nuclear DNA instability in cancer; recent evidence indicates they can also form in mitochondrial nucleic acids, suggesting that these non-B DNA structures could be associated with mtDNA deletions. Currently, the multiple types of GQ sequences and their association with human mtDNA stability are unknown.

Results

Here, we show an association between human mtDNA deletion breakpoint locations (sites where DNA ends rejoin after deletion of a section) and sequences with G-quadruplex forming potential (QFP), and establish the ability of selected sequences to form GQ in vitro. QFP contain four runs of either two or three consecutive guanines (2G and 3G, respectively), and we identified four types of QFP for subsequent analysis: intrastrand 2G, intrastrand 3G, duplex derived interstrand (ddi) 2G, and ddi 3G QFP sequences. We analyzed the position of each motif set relative to either 5'' or 3'' unique mtDNA deletion breakpoints, and found that intrastrand QFP sequences, but not ddi QFP sequences, showed significant association with mtDNA deletion breakpoint locations. Moreover, a large proportion of these QFP sequences occur at smaller distances to breakpoints relative to distribution-matched controls. The positive association of 2G QFP sequences persisted when breakpoints were divided into clinical subgroups. We tested in vitro GQ formation of representative mtDNA sequences containing these 2G QFP sequences and detected robust GQ structures by UV–VIS and CD spectroscopy. Notably, the most frequent deletion breakpoints, including those of the "common deletion", are bounded by 2G QFP sequence motifs.

Conclusions

The potential for GQ to influence mitochondrial genome stability supports a high-priority investigation of these structures and their regulation in normal and pathological mitochondrial biology. These findings emphasize the potential importance of helicases that subsequently resolve GQ to maintain the stability of the mitochondrial genome.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-677) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Selecting an appropriate substitution model and deriving a tree topology for a given sequence set are essential in phylogenetic analysis. However, such time consuming, computationally intensive tasks rely on knowledge of substitution model theories and related expertise to run through all possible combinations of several separate programs. To ensure a thorough and efficient analysis and avert tedious manipulations of various programs, this work presents an intuitive framework, the phylogenetic reconstruction with automatic likelihood model selectors (PALM), with convincing, updated algorithms and a best-fit model selection mechanism for seamless phylogenetic analysis.

Methodology

As an integrated framework of ClustalW, PhyML, MODELTEST, ProtTest, and several in-house programs, PALM evaluates the fitness of 56 substitution models for nucleotide sequences and 112 substitution models for protein sequences with scores in various criteria. The input for PALM can be either sequences in FASTA format or a sequence alignment file in PHYLIP format. To accelerate the computing of maximum likelihood and bootstrapping, this work integrates MPICH2/PhyML, PalmMonitor and Palm job controller across several machines with multiple processors and adopts the task parallelism approach. Moreover, an intuitive and interactive web component, PalmTree, is developed for displaying and operating the output tree with options of tree rooting, branches swapping, viewing the branch length values, and viewing bootstrapping score, as well as removing nodes to restart analysis iteratively.

Significance

The workflow of PALM is straightforward and coherent. Via a succinct, user-friendly interface, researchers unfamiliar with phylogenetic analysis can easily use this server to submit sequences, retrieve the output, and re-submit a job based on a previous result if some sequences are to be deleted or added for phylogenetic reconstruction. PALM results in an inference of phylogenetic relationship not only by vanquishing the computation difficulty of ML methods but also providing statistic methods for model selection and bootstrapping. The proposed approach can reduce calculation time, which is particularly relevant when querying a large data set. PALM can be accessed online at http://palm.iis.sinica.edu.tw.  相似文献   

10.
It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N2) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at: http://rna.urmc.rochester.edu/RNAstructure.html.  相似文献   

11.
12.
Members of the RecQ family of helicases are known for their roles in DNA repair, replication, and recombination. Mutations in the human RecQ helicases, WRN and BLM, cause Werner and Bloom syndromes, which are diseases characterized by genome instability and an increased risk of cancer. While WRN contains both a helicase and an exonuclease domain, the Drosophila melanogaster homolog, WRNexo, contains only the exonuclease domain. Therefore the Drosophila model system provides a unique opportunity to study the exonuclease functions of WRN separate from the helicase. We created a null allele of WRNexo via imprecise P-element excision. The null WRNexo mutants are not sensitive to double-strand break-inducing reagents, suggesting that the exonuclease does not play a key role in homologous recombination-mediated repair of DSBs. However, WRNexo mutant embryos have a reduced hatching frequency and larvae are sensitive to the replication fork-stalling reagent, hydroxyurea (HU), suggesting that WRNexo is important in responding to replication stress. The role of WRNexo in the HU-induced stress response is independent of Rad51. Interestingly, the hatching defect and HU sensitivity of WRNexo mutants do not occur in flies containing an exonuclease-dead copy of WRNexo, suggesting that the role of WRNexo in replication is independent of exonuclease activity. Additionally, WRNexo and Blm mutants exhibit similar sensitivity to HU and synthetic lethality in combination with mutations in structure-selective endonucleases. We propose that WRNexo and BLM interact to promote fork reversal following replication fork stalling and in their absence regressed forks are restarted through a Rad51-mediated process.  相似文献   

13.
Aggregatibacter actinomycetemcomitans is a major etiological agent of periodontitis. Here we report the complete genome sequence of serotype c strain D11S-1, which was recovered from the subgingival plaque of a patient diagnosed with generalized aggressive periodontitis.Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontal disease, in particular aggressive periodontitis (12). The natural population of A. actinomycetemcomitans is clonal (7). Six A. actinomycetemcomitans serotypes are distinguished based on the structural and serological characteristics of the O antigen of LPS (6, 7). Three of the serotypes (a, b, and c) comprise >80% of all strains, and each serotype represents a distinct clonal lineage (1, 6, 7). Serotype c strain D11S-1 was cultured from a subgingival plaque sample of a patient diagnosed with generalized aggressive periodontitis. The complete genome sequencing of the strain was determined by 454 pyrosequencing (10), which achieved 25× coverage. Assembly was performed using the Newbler assembler (454, Branford, CT) and generated 199 large contigs, with 99.3% of the bases having a quality score of 40 and above. The contigs were aligned with the genome of the sequenced serotype b strain HK1651 (http://www.genome.ou.edu/act.html) using software written in house. The putative contig gaps were then closed by primer walking and sequencing of PCR products over the gaps. The final genome assembly was further confirmed by comparison of an in silico NcoI restriction map to the experimental map generated by optical mapping (8). The genome structure of the D11S-1 strain was compared to that of the sequenced strain HK1651 using the program MAUVE (2, 3). The automated annotation was done using a protocol similar to the annotation engine service at The Institute for Genomic Research/J. Craig Venter Institute with some local modifications. Briefly, protein-coding genes were identified using Glimmer3 (4). Each protein sequence was then annotated by comparing to the GenBank nonredundant protein database. BLAST-Extend-Repraze was applied to the predicted genes to identify genes that might have been truncated due to a frameshift mutation or premature stop codon. tRNA and rRNA genes were identified by using tRNAScan-SE (9) and a similarity search to our in-house RNA database, respectively.The D11S-1 circular genome contains 2,105,764 nucleotides, a GC content of 44.55%, 2,134 predicted coding sequences, and 54 tRNA and 19 rRNA genes (see additional data at http://expression.washington.edu/bumgarnerlab/publications.php). The distribution of predicted genes based on functional categories was similar between D11S-1 and HK1651 (http://expression.washington.edu/bumgarnerlab/publications.php). One hundred six and 86 coding sequences were unique to strain D11S-1 and HK1651, respectively (http://expression.washington.edu/bumgarnerlab/publications.php). Genomic islands were identified based on annotations for strain HK1651 and based on manual inspection of contiguous D11S-1 specific DNA regions with G+C bias (http://expression.washington.edu/bumgarnerlab/publications.php). Among 12 identified genomics islands, 5 (B, C, D, E and G; cytolethal distending toxin gene cluster, tight adherence gene cluster, O-antigen biosynthesis and transport gene cluster, leukotoxin gene cluster, and lipoligosaccharide biosynthesis enzyme gene, respectively) correspond to islands 2 to 5 and 8 of strain HK1651 (http://www.oralgen.lanl.gov/) (5). Island F (∼5 kb) is homologous to a portion of the 12.5-kb island 7 in HK1651. Five genomic islands (H to L) were unique to strain D11S-1. The remaining island (A) is a fusion of genomic islands 1 and 6, in strain HK1651. The genome of D11S-1 is largely in synteny with the genome of the sequenced serotype b strain HK1651 but contained several large-scale genomic rearrangements.Strain D11S-1 harbors a 43-kb bacteriophage and two plasmids of 31 and 23 kb (http://expression.washington.edu/bumgarnerlab/publications.php). Excluding an ∼9-kb region of low homology, the phage showed >90% nucleotide sequence identity with AaΦ23 (11). A 49-bp attB site (11) was identified at coordinates 2,024,825 to 2,024,873. The location of the inserted phage was identified in the optical map of strain D11S-1 and further confirmed by PCR amplification and sequencing of the regions flanking the insertion site. A closed circular form of the phage was also detected in strain D11S-1 by PCR analysis of the phage ends. The 23-kb plasmid is homologous to pVT745 (92% nucleotide identities). The 31-kb plasmid is a novel plasmid. It has significant homologies in short regions (<2 kb) to Haemophilus influenzae biotype aegyptius plasmid pF1947 and other plasmids.  相似文献   

14.
15.
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes
  1. a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
  2. a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
  3. two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
  4. effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/.  相似文献   

16.
We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.  相似文献   

17.
G-quadruplexes are non-canonical structures of nucleic acids, in which guanine bases form planar G-tetrads (G·G·G·G) that stack on each other in the core of the structure. G-quadruplexes generally contain multiple times of four (4n) guanines in the core. Here, we study the structure of G-quadruplexes with only (4n - 1) guanines in the core. The solution structure of a DNA sequence containing 11 guanines showed the formation of a parallel G-quadruplex involving two G-tetrads and one G-triad with a vacant site. Molecular dynamics simulation established the formation of a stable G-triad·water complex, where water molecules mimic the position of the missing guanine in the vacant site. The concept of forming G-quadruplexes with missing guanines in the core broadens the current definition of G-quadruplex-forming sequences. The potential ability of such structures to bind different metabolites, including guanine, guanosine and GTP, in the vacant site, could have biological implications in regulatory functions. Formation of this unique binding pocket in the G-triad could be used as a specific target in drug design.  相似文献   

18.
19.
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号