首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.  相似文献   

2.
3.
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).
This is a PLOS Computational Biology Software Article
  相似文献   

4.
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).  相似文献   

5.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

6.
Many layouts exist for visualizing phylogenetic trees, allowing to display the same information (evolutionary relationships) in different ways. For large phylogenies, the choice of the layout is a key element, because the printable area is limited, and because interactive on-screen visualizers can lead to unreadable phylogenetic relationships at high zoom levels. A visual inspection of available layouts for rooted trees reveals large empty areas that one may want to fill in order to use less drawing space and eventually gain readability. This can be achieved by using the nonlayered tidy tree layout algorithm that was proposed earlier but was never used in a phylogenetic context so far. Here, we present its implementation, and we demonstrate its advantages on simulated and biological data (the measles virus phylogeny). Our results call for the integration of this new layout in phylogenetic software. We implemented the nonlayered tidy tree layout in R language as a stand-alone function (available at https://github.com/damiendevienne/non-layered-tidy-trees), as an option in the tree plotting function of the R package ape, and in the recent tool for visualizing reconciled phylogenetic trees thirdkind (https://github.com/simonpenel/thirdkind/wiki).  相似文献   

7.
Dbf4-dependent kinase (DDK) and cyclin-dependent kinase (CDK) are essential to initiate DNA replication at individual origins. During replication stress, the S-phase checkpoint inhibits the DDK- and CDK-dependent activation of late replication origins. Rad53 kinase is a central effector of the replication checkpoint and both binds to and phosphorylates Dbf4 to prevent late-origin firing. The molecular basis for the Rad53Dbf4 physical interaction is not clear but occurs through the Dbf4 N terminus. Here we found that both Rad53 FHA1 and FHA2 domains, which specifically recognize phospho-threonine (pT), interacted with Dbf4 through an N-terminal sequence and an adjacent BRCT domain. Purified Rad53 FHA1 domain (but not FHA2) bound to a pT Dbf4 peptide in vitro, suggesting a possible phospho-threonine-dependent interaction between FHA1 and Dbf4. The Dbf4Rad53 interaction is governed by multiple contacts that are separable from the Cdc5- and Msa1-binding sites in the Dbf4 N terminus. Importantly, abrogation of the Rad53Dbf4 physical interaction blocked Dbf4 phosphorylation and allowed late-origin firing during replication checkpoint activation. This indicated that Rad53 must stably bind to Dbf4 to regulate its activity.  相似文献   

8.
9.
Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.  相似文献   

10.
Strain HIMB11 is a planktonic marine bacterium isolated from coastal seawater in Kaneohe Bay, Oahu, Hawaii belonging to the ubiquitous and versatile Roseobacter clade of the alphaproteobacterial family Rhodobacteraceae. Here we describe the preliminary characteristics of strain HIMB11, including annotation of the draft genome sequence and comparative genomic analysis with other members of the Roseobacter lineage. The 3,098,747 bp draft genome is arranged in 34 contigs and contains 3,183 protein-coding genes and 54 RNA genes. Phylogenomic and 16S rRNA gene analyses indicate that HIMB11 represents a unique sublineage within the Roseobacter clade. Comparison with other publicly available genome sequences from members of the Roseobacter lineage reveals that strain HIMB11 has the genomic potential to utilize a wide variety of energy sources (e.g. organic matter, reduced inorganic sulfur, light, carbon monoxide), while possessing a reduced number of substrate transporters.  相似文献   

11.
eIF5A is an essential and evolutionary conserved translation elongation factor, which has recently been proposed to be required for the translation of proteins with consecutive prolines. The binding of eIF5A to ribosomes occurs upon its activation by hypusination, a modification that requires spermidine, an essential factor for mammalian fertility that also promotes yeast mating. We show that in response to pheromone, hypusinated eIF5A is required for shmoo formation, localization of polarisome components, induction of cell fusion proteins, and actin assembly in yeast. We also show that eIF5A is required for the translation of Bni1, a proline-rich formin involved in polarized growth during shmoo formation. Our data indicate that translation of the polyproline motifs in Bni1 is eIF5A dependent and this translation dependency is lost upon deletion of the polyprolines. Moreover, an exogenous increase in Bni1 protein levels partially restores the defect in shmoo formation seen in eIF5A mutants. Overall, our results identify eIF5A as a novel and essential regulator of yeast mating through formin translation. Since eIF5A and polyproline formins are conserved across species, our results also suggest that eIF5A-dependent translation of formins could regulate polarized growth in such processes as fertility and cancer in higher eukaryotes.  相似文献   

12.
The unc-17 gene encodes the vesicular acetylcholine transporter (VAChT) in Caenorhabditis elegans. unc-17 reduction-of-function mutants are small, slow growing, and uncoordinated. Several independent unc-17 alleles are associated with a glycine-to-arginine substitution (G347R), which introduces a positive charge in the ninth transmembrane domain (TMD) of UNC-17. To identify proteins that interact with UNC-17/VAChT, we screened for mutations that suppress the uncoordinated phenotype of UNC-17(G347R) mutants. We identified several dominant allele-specific suppressors, including mutations in the sup-1 locus. The sup-1 gene encodes a single-pass transmembrane protein that is expressed in a subset of neurons and in body muscles. Two independent suppressor alleles of sup-1 are associated with a glycine-to-glutamic acid substitution (G84E), resulting in a negative charge in the SUP-1 TMD. A sup-1 null mutant has no obvious deficits in cholinergic neurotransmission and does not suppress unc-17 mutant phenotypes. Bimolecular fluorescence complementation (BiFC) analysis demonstrated close association of SUP-1 and UNC-17 in synapse-rich regions of the cholinergic nervous system, including the nerve ring and dorsal nerve cords. These observations suggest that UNC-17 and SUP-1 are in close proximity at synapses. We propose that electrostatic interactions between the UNC-17(G347R) and SUP-1(G84E) TMDs alter the conformation of the mutant UNC-17 protein, thereby restoring UNC-17 function; this is similar to the interaction between UNC-17/VAChT and synaptobrevin.  相似文献   

13.
Outbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions. We developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. MicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully operational without an internet connection. Using publicly available data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. Users can email vog.cdc@ecarteborcim for support. The source code is available at https://github.com/cdcgov/microbetrace.  相似文献   

14.
Ribosome biogenesis has been studied extensively in the yeast Saccharomyces cerevisiae. Yeast Ltv1 is a conserved 40S-associated biogenesis factor that has been proposed to function in small subunit nuclear export. Here we show that Ltv1 has a canonical leucine-rich nuclear export signal (NES) at its extreme C terminus that is both necessary for Crm1 interaction and Ltv1 export. The C terminus of Ltv1 can substitute for the NES in the 60S-export adapter Nmd3, demonstrating that it is a functional NES. Overexpression of an Ltv1 lacking its NES (Ltv1∆C13) was strongly dominant negative and resulted in the nuclear accumulation of RpS3-GFP; however, export of the pre-40S was not affected. In addition, expression of endogenous levels of Ltv1∆C protein complemented both the slow-growth phenotype and the 40S biogenesis defect of an ltv1 deletion mutant. Thus, if Ltv1 is a nuclear export adapter for the pre-40S subunit, its function must be fully redundant with additional export factors. The dominant negative phenotype of Ltv1∆NES overexpression was suppressed by co-overexpressing RpS3 and its chaperone, Yar1, or by deletion of the RpS3-binding site in Ltv1∆NES, suggesting that titration of RpS3 by Ltv1∆NES is deleterious in yeast. The dominant-negative phenotype did not correlate with a decrease in 40S levels but rather with a reduction in the polysome-to-monosome ratio, indicating reduced rates of translation. We suggest that titration of RpS3 by excess nuclear Ltv1 interferes with 40S function or with a nonribosomal function of RpS3.  相似文献   

15.
Genomic stability, stress response, and nutrient signaling all play critical, evolutionarily conserved roles in lifespan determination. However, the molecular mechanisms coordinating these processes with longevity remain unresolved. Here we investigate the involvement of the yeast anaphase promoting complex (APC) in longevity. The APC governs passage through M and G1 via ubiquitin-dependent targeting of substrate proteins and is associated with cancer and premature aging when defective. Our two-hybrid screen utilizing Apc5 as bait recovered the lifespan determinant Fob1 as prey. Fob1 is unstable specifically in G1, cycles throughout the cell cycle in a manner similar to Clb2 (an APC target), and is stabilized in APC (apc5CA) and proteasome (rpn10) mutants. Deletion of FOB1 increased replicative lifespan (RLS) in wild type (WT), apc5CA, and apc10 cells, and suppressed apc5CA cell cycle progression and rDNA recombination defects. Alternatively, increased FOB1 expression decreased RLS in WT cells, but did not reduce the already short apc5CA RLS, suggesting an epistatic interaction between apc5CA and fob1. Mutation to a putative L-Box (Fob1E420V), a Destruction Box-like motif, abolished Fob1 modifications, stabilized the protein, and increased rDNA recombination. Our work provides a mechanistic role played by the APC to promote replicative longevity and genomic stability in yeast.  相似文献   

16.
ChIP-seq is a powerful method for obtaining genome-wide maps of protein-DNA interactions and epigenetic modifications. CHANCE (CHip-seq ANalytics and Confidence Estimation) is a standalone package for ChIP-seq quality control and protocol optimization. Our user-friendly graphical software quickly estimates the strength and quality of immunoprecipitations, identifies biases, compares the user''s data with ENCODE''s large collection of published datasets, performs multi-sample normalization, checks against quantitative PCR-validated control regions, and produces informative graphical reports. CHANCE is available at https://github.com/songlab/chance.  相似文献   

17.
18.
Saccharomonospora cyanea Runmao et al. 1988 is a member of the genus Saccharomonospora in the family Pseudonocardiaceae that is moderately well characterized at the genome level thus far. Members of the genus Saccharomonospora are of interest because they originate from diverse habitats, such as soil, leaf litter, manure, compost, surface of peat, moist, over-heated grain, and ocean sediment, where they probably play a role in the primary degradation of plant material by attacking hemicellulose. Species of the genus Saccharomonospora are usually Gram-positive, non-acid fast, and are classified among the actinomycetes. S. cyanea is characterized by a dark blue (= cyan blue) aerial mycelium. After S. viridis, S. azurea, and S. marina, S. cyanea is only the fourth member in the genus for which a completely sequenced (non-contiguous finished draft status) type strain genome will be published. Here we describe the features of this organism, together with the draft genome sequence, and annotation. The 5,408,301 bp long chromosome with its 5,139 protein-coding and 57 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).  相似文献   

19.
Rhizobium leguminosarum bv. trifolii SRDI943 (strain syn. V2-2) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from a root nodule of Trifolium michelianum Savi cv. Paradana that had been grown in soil collected from a mixed pasture in Victoria, Australia. This isolate was found to have a broad clover host range but was sub-optimal for nitrogen fixation with T. subterraneum (fixing 20-54% of reference inoculant strain WSM1325) and was found to be totally ineffective with the clover species T. polymorphum and T. pratense. Here we describe the features of R. leguminosarum bv. trifolii strain SRDI943, together with genome sequence information and annotation. The 7,412,387 bp high-quality-draft genome is arranged into 5 scaffolds of 5 contigs, contains 7,317 protein-coding genes and 89 RNA-only encoding genes, and is one of 100 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.  相似文献   

20.
Over the past 35 years, developmental geneticists have made impressive progress toward an understanding of how genes specify morphology and function, particularly as they relate to the specification of each physical component of an organism. In the last 20 years, male courtship behavior in Drosophila melanogaster has emerged as a robust model system for the study of genetic specification of behavior. Courtship behavior is both complex and innate, and a single gene, fruitless (fru), is both necessary and sufficient for all aspects of the courtship ritual. Typically, loss of male-specific Fruitless protein function results in male flies that perform the courtship ritual incorrectly, slowly, or not at all. Here we describe a novel requirement for fru: we have identified a group of cells in which male Fru proteins are required to reduce the speed of courtship initiation. In addition, we have identified a gene, Trapped in endoderm 1 (Tre1), which is required in these cells for normal courtship and mating behavior. Tre1 encodes a G-protein-coupled receptor required for establishment of cell polarity and cell migration and has previously not been shown to be involved in courtship behavior. We describe the results of feminization of the Tre1-expressing neurons, as well as the effects on courtship behavior of mutation of Tre1. In addition, we show that Tre1 is expressed in a sexually dimorphic pattern in the central and peripheral nervous systems and investigate the role of the Tre1 cells in mate identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号