首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

2.
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology).
This is a “Topic Page” article for PLOS Computational Biology.
  相似文献   

3.
4.
It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).  相似文献   

5.
Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating), to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods. Open standards, data, and software are also key parts of flow cytometry bioinformatics. Data standards include the widely adopted Flow Cytometry Standard (FCS) defining how data from cytometers should be stored, but also several new standards under development by the International Society for Advancement of Cytometry (ISAC) to aid in storing more detailed information about experimental design and analytical steps. Open data is slowly growing with the opening of the CytoBank database in 2010 and FlowRepository in 2012, both of which allow users to freely distribute their data, and the latter of which has been recommended as the preferred repository for MIFlowCyt-compliant data by ISAC. Open software is most widely available in the form of a suite of Bioconductor packages, but is also available for web execution on the GenePattern platform.
This is a “Topic Page” article for PLOS Computational Biology.
  相似文献   

6.
Outbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions. We developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. MicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully operational without an internet connection. Using publicly available data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. Users can email vog.cdc@ecarteborcim for support. The source code is available at https://github.com/cdcgov/microbetrace.  相似文献   

7.
Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively.  相似文献   

8.
Is it possible to learn and create a first Hidden Markov Model (HMM) without programming skills or understanding the algorithms in detail? In this concise tutorial, we present the HMM through the 2 general questions it was initially developed to answer and describe its elements. The HMM elements include variables, hidden and observed parameters, the vector of initial probabilities, and the transition and emission probability matrices. Then, we suggest a set of ordered steps, for modeling the variables and illustrate them with a simple exercise of modeling and predicting transmembrane segments in a protein sequence. Finally, we show how to interpret the results of the algorithms for this particular problem. To guide the process of information input and explicit solution of the basic HMM algorithms that answer the HMM questions posed, we developed an educational webserver called HMMTeacher. Additional solved HMM modeling exercises can be found in the user’s manual and answers to frequently asked questions. HMMTeacher is available at https://hmmteacher.mobilomics.org, mirrored at https://hmmteacher1.mobilomics.org. A repository with the code of the tool and the webpage is available at https://gitlab.com/kmilo.f/hmmteacher.  相似文献   

9.
Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.  相似文献   

10.
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS.  相似文献   

11.
microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3’ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2–7 of any short (s)RNA that can enter the RISC. Therefore, we developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. We provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). Our methods (found at https://github.com/ebartom/SPOROS and at Code Ocean: https://doi.org/10.24433/CO.1732496.v1) are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings.  相似文献   

12.
Many layouts exist for visualizing phylogenetic trees, allowing to display the same information (evolutionary relationships) in different ways. For large phylogenies, the choice of the layout is a key element, because the printable area is limited, and because interactive on-screen visualizers can lead to unreadable phylogenetic relationships at high zoom levels. A visual inspection of available layouts for rooted trees reveals large empty areas that one may want to fill in order to use less drawing space and eventually gain readability. This can be achieved by using the nonlayered tidy tree layout algorithm that was proposed earlier but was never used in a phylogenetic context so far. Here, we present its implementation, and we demonstrate its advantages on simulated and biological data (the measles virus phylogeny). Our results call for the integration of this new layout in phylogenetic software. We implemented the nonlayered tidy tree layout in R language as a stand-alone function (available at https://github.com/damiendevienne/non-layered-tidy-trees), as an option in the tree plotting function of the R package ape, and in the recent tool for visualizing reconciled phylogenetic trees thirdkind (https://github.com/simonpenel/thirdkind/wiki).  相似文献   

13.
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).  相似文献   

14.
Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.  相似文献   

15.
Despite the importance of clathrin-mediated endocytosis (CME) for cell biology, it is unclear if all components of the machinery have been discovered and many regulatory aspects remain poorly understood. Here, using Saccharomyces cerevisiae and a fluorescence microscopy screening approach we identify previously unknown regulatory factors of the endocytic machinery. We further studied the top scoring protein identified in the screen, Ubx3, a member of the conserved ubiquitin regulatory X (UBX) protein family. In vivo and in vitro approaches demonstrate that Ubx3 is a new coat component. Ubx3-GFP has typical endocytic coat protein dynamics with a patch lifetime of 45 ± 3 sec. Ubx3 contains a W-box that mediates physical interaction with clathrin and Ubx3-GFP patch lifetime depends on clathrin. Deletion of the UBX3 gene caused defects in the uptake of Lucifer Yellow and the methionine transporter Mup1 demonstrating that Ubx3 is needed for efficient endocytosis. Further, the UBX domain is required both for localization and function of Ubx3 at endocytic sites. Mechanistically, Ubx3 regulates dynamics and patch lifetime of the early arriving protein Ede1 but not later arriving coat proteins or actin assembly. Conversely, Ede1 regulates the patch lifetime of Ubx3. Ubx3 likely regulates CME via the AAA-ATPase Cdc48, a ubiquitin-editing complex. Our results uncovered new components of the CME machinery that regulate this fundamental process.  相似文献   

16.
The Saccharomyces cerevisiae type 2C protein phosphatase Ptc1 is required for a wide variety of cellular functions, although only a few cellular targets have been identified. A genetic screen in search of mutations in protein kinase–encoding genes able to suppress multiple phenotypic traits caused by the ptc1 deletion yielded a single gene, MKK1, coding for a MAPK kinase (MAPKK) known to activate the cell-wall integrity (CWI) Slt2 MAPK. In contrast, mutation of the MKK1 paralog, MKK2, had a less significant effect. Deletion of MKK1 abolished the increased phosphorylation of Slt2 induced by the absence of Ptc1 both under basal and CWI pathway stimulatory conditions. We demonstrate that Ptc1 acts at the level of the MAPKKs of the CWI pathway, but only the Mkk1 kinase activity is essential for ptc1 mutants to display high Slt2 activation. We also show that Ptc1 is able to dephosphorylate Mkk1 in vitro. Our results reveal the preeminent role of Mkk1 in signaling through the CWI pathway and strongly suggest that hyperactivation of Slt2 caused by upregulation of Mkk1 is at the basis of most of the phenotypic defects associated with lack of Ptc1 function.  相似文献   

17.
Dbf4-dependent kinase (DDK) and cyclin-dependent kinase (CDK) are essential to initiate DNA replication at individual origins. During replication stress, the S-phase checkpoint inhibits the DDK- and CDK-dependent activation of late replication origins. Rad53 kinase is a central effector of the replication checkpoint and both binds to and phosphorylates Dbf4 to prevent late-origin firing. The molecular basis for the Rad53Dbf4 physical interaction is not clear but occurs through the Dbf4 N terminus. Here we found that both Rad53 FHA1 and FHA2 domains, which specifically recognize phospho-threonine (pT), interacted with Dbf4 through an N-terminal sequence and an adjacent BRCT domain. Purified Rad53 FHA1 domain (but not FHA2) bound to a pT Dbf4 peptide in vitro, suggesting a possible phospho-threonine-dependent interaction between FHA1 and Dbf4. The Dbf4Rad53 interaction is governed by multiple contacts that are separable from the Cdc5- and Msa1-binding sites in the Dbf4 N terminus. Importantly, abrogation of the Rad53Dbf4 physical interaction blocked Dbf4 phosphorylation and allowed late-origin firing during replication checkpoint activation. This indicated that Rad53 must stably bind to Dbf4 to regulate its activity.  相似文献   

18.
Kinetochores are conserved protein complexes that bind the replicated chromosomes to the mitotic spindle and then direct their segregation. To better comprehend Saccharomyces cerevisiae kinetochore function, we dissected the phospho-regulated dynamic interaction between conserved kinetochore protein Cnn1CENP-T, the centromere region, and the Ndc80 complex through the cell cycle. Cnn1 localizes to kinetochores at basal levels from G1 through metaphase but accumulates abruptly at anaphase onset. How Cnn1 is recruited and which activities regulate its dynamic localization are unclear. We show that Cnn1 harbors two kinetochore-localization activities: a C-terminal histone-fold domain (HFD) that associates with the centromere region and a N-terminal Spc24/Spc25 interaction sequence that mediates linkage to the microtubule-binding Ndc80 complex. We demonstrate that the established Ndc80 binding site in the N terminus of Cnn1, Cnn160–84, should be extended with flanking residues, Cnn125–91, to allow near maximal binding affinity to Ndc80. Cnn1 localization was proposed to depend on Mps1 kinase activity at Cnn1–S74, based on in vitro experiments demonstrating the Cnn1Ndc80 complex interaction. We demonstrate that from G1 through metaphase, Cnn1 localizes via both its HFD and N-terminal Spc24/Spc25 interaction sequence, and deletion or mutation of either region results in anomalous Cnn1 kinetochore levels. At anaphase onset (when Mps1 activity decreases) Cnn1 becomes enriched mainly via the N-terminal Spc24/Spc25 interaction sequence. In sum, we provide the first in vivo evidence of Cnn1 preanaphase linkages with the kinetochore and enrichment of the linkages during anaphase.  相似文献   

19.
The unc-17 gene encodes the vesicular acetylcholine transporter (VAChT) in Caenorhabditis elegans. unc-17 reduction-of-function mutants are small, slow growing, and uncoordinated. Several independent unc-17 alleles are associated with a glycine-to-arginine substitution (G347R), which introduces a positive charge in the ninth transmembrane domain (TMD) of UNC-17. To identify proteins that interact with UNC-17/VAChT, we screened for mutations that suppress the uncoordinated phenotype of UNC-17(G347R) mutants. We identified several dominant allele-specific suppressors, including mutations in the sup-1 locus. The sup-1 gene encodes a single-pass transmembrane protein that is expressed in a subset of neurons and in body muscles. Two independent suppressor alleles of sup-1 are associated with a glycine-to-glutamic acid substitution (G84E), resulting in a negative charge in the SUP-1 TMD. A sup-1 null mutant has no obvious deficits in cholinergic neurotransmission and does not suppress unc-17 mutant phenotypes. Bimolecular fluorescence complementation (BiFC) analysis demonstrated close association of SUP-1 and UNC-17 in synapse-rich regions of the cholinergic nervous system, including the nerve ring and dorsal nerve cords. These observations suggest that UNC-17 and SUP-1 are in close proximity at synapses. We propose that electrostatic interactions between the UNC-17(G347R) and SUP-1(G84E) TMDs alter the conformation of the mutant UNC-17 protein, thereby restoring UNC-17 function; this is similar to the interaction between UNC-17/VAChT and synaptobrevin.  相似文献   

20.
Cdk1 activity drives both mitotic entry and the metaphase-to-anaphase transition in all eukaryotes. The kinase Wee1 and the phosphatase Cdc25 regulate the mitotic activity of Cdk1 by the reversible phosphorylation of a conserved tyrosine residue. Mutation of cdc25 in Schizosaccharomyces pombe blocks Cdk1 dephosphorylation and causes cell cycle arrest. In contrast, deletion of MIH1, the cdc25 homolog in Saccharomyces cerevisiae, is viable. Although Cdk1-Y19 phosphorylation is elevated during mitosis in mih1∆ cells, Cdk1 is dephosphorylated as cells progress into G1, suggesting that additional phosphatases regulate Cdk1 dephosphorylation. Here we show that the phosphatase Ptp1 also regulates Cdk1 dephosphorylation in vivo and can directly dephosphorylate Cdk1 in vitro. Using a novel in vivo phosphatase assay, we also show that PP2A bound to Rts1, the budding yeast B56-regulatory subunit, regulates dephosphorylation of Cdk1 independently of a function regulating Swe1, Mih1, or Ptp1, suggesting that PP2ARts1 either directly dephosphorylates Cdk1-Y19 or regulates an unidentified phosphatase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号