首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.  相似文献   

2.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

3.
4.
5.
Is it possible to learn and create a first Hidden Markov Model (HMM) without programming skills or understanding the algorithms in detail? In this concise tutorial, we present the HMM through the 2 general questions it was initially developed to answer and describe its elements. The HMM elements include variables, hidden and observed parameters, the vector of initial probabilities, and the transition and emission probability matrices. Then, we suggest a set of ordered steps, for modeling the variables and illustrate them with a simple exercise of modeling and predicting transmembrane segments in a protein sequence. Finally, we show how to interpret the results of the algorithms for this particular problem. To guide the process of information input and explicit solution of the basic HMM algorithms that answer the HMM questions posed, we developed an educational webserver called HMMTeacher. Additional solved HMM modeling exercises can be found in the user’s manual and answers to frequently asked questions. HMMTeacher is available at https://hmmteacher.mobilomics.org, mirrored at https://hmmteacher1.mobilomics.org. A repository with the code of the tool and the webpage is available at https://gitlab.com/kmilo.f/hmmteacher.  相似文献   

6.
Accurate quantification of nerves in cancer specimens is important to understand cancer behaviour. Typically, nerves are manually detected and counted in digitised images of thin tissue sections from excised tumours using immunohistochemistry. However the images are of a large size with nerves having substantial variation in morphology that renders accurate and objective quantification difficult using existing manual and automated counting techniques. Manual counting is precise, but time-consuming, susceptible to inconsistency and has a high rate of false negatives. Existing automated techniques using digitised tissue sections and colour filters are sensitive, however, have a high rate of false positives. In this paper we develop a new automated nerve detection approach, based on a deep learning model with an augmented classification structure. This approach involves pre-processing to extract the image patches for the deep learning model, followed by pixel-level nerve detection utilising the proposed deep learning model. Outcomes assessed were a) sensitivity of the model in detecting manually identified nerves (expert annotations), and b) the precision of additional model-detected nerves. The proposed deep learning model based approach results in a sensitivity of 89% and a precision of 75%. The code and pre-trained model are publicly available at https://github.com/IA92/Automated_Nerves_Quantification.  相似文献   

7.

Background

The Smith-Waterman algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools for next-generation sequencing data. Though various fast Smith-Waterman implementations are developed, they are either designed as monolithic protein database searching tools, which do not return detailed alignment, or are embedded into other tools. These issues make reusing these efficient Smith-Waterman implementations impractical.

Results

To facilitate easy integration of the fast Single-Instruction-Multiple-Data Smith-Waterman algorithm into third-party software, we wrote a C/C++ library, which extends Farrar’s Striped Smith-Waterman (SSW) to return alignment information in addition to the optimal Smith-Waterman score. In this library we developed a new method to generate the full optimal alignment results and a suboptimal score in linear space at little cost of efficiency. This improvement makes the fast Single-Instruction-Multiple-Data Smith-Waterman become really useful in genomic applications. SSW is available both as a C/C++ software library, as well as a stand-alone alignment tool at: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library.

Conclusions

The SSW library has been used in the primary read mapping tool MOSAIK, the split-read mapping program SCISSORS, the MEI detector TANGRAM, and the read-overlap graph generation program RZMBLR. The speeds of the mentioned software are improved significantly by replacing their ordinary Smith-Waterman or banded Smith-Waterman module with the SSW Library.  相似文献   

8.
Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called ‘Reverse-Alignment’ and ‘Deep-Scan’ to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.  相似文献   

9.
Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools.  相似文献   

10.

Background and Purpose

The measurement of cortical shrinkage is a candidate marker of disease progression in Alzheimer’s. This study evaluated the performance of two pipelines: Civet-CLASP (v1.1.9) and Freesurfer (v5.3.0).

Methods

Images from 185 ADNI1 cases (69 elderly controls (CTR), 37 stable MCI (sMCI), 27 progressive MCI (pMCI), and 52 Alzheimer (AD) patients) scanned at baseline, month 12, and month 24 were processed using the two pipelines and two interconnected e-infrastructures: neuGRID (https://neugrid4you.eu) and VIP (http://vip.creatis.insa-lyon.fr). The vertex-by-vertex cross-algorithm comparison was made possible applying the 3D gradient vector flow (GVF) and closest point search (CPS) techniques.

Results

The cortical thickness measured with Freesurfer was systematically lower by one third if compared to Civet’s. Cross-sectionally, Freesurfer’s effect size was significantly different in the posterior division of the temporal fusiform cortex. Both pipelines were weakly or mildly correlated with the Mini Mental State Examination score (MMSE) and the hippocampal volumetry. Civet differed significantly from Freesurfer in large frontal, parietal, temporal and occipital regions (p<0.05). In a discriminant analysis with cortical ROIs having effect size larger than 0.8, both pipelines gave no significant differences in area under the curve (AUC). Longitudinally, effect sizes were not significantly different in any of the 28 ROIs tested. Both pipelines weakly correlated with MMSE decay, showing no significant differences. Freesurfer mildly correlated with hippocampal thinning rate and differed in the supramarginal gyrus, temporal gyrus, and in the lateral occipital cortex compared to Civet (p<0.05). In a discriminant analysis with ROIs having effect size larger than 0.6, both pipelines yielded no significant differences in the AUC.

Conclusions

Civet appears slightly more sensitive to the typical AD atrophic pattern at the MCI stage, but both pipelines can accurately characterize the topography of cortical thinning at the dementia stage.  相似文献   

11.
Conformational entropy for atomic-level, three dimensional biomolecules is known experimentally to play an important role in protein-ligand discrimination, yet reliable computation of entropy remains a difficult problem. Here we describe the first two accurate and efficient algorithms to compute the conformational entropy for RNA secondary structures, with respect to the Turner energy model, where free energy parameters are determined from UV absorption experiments. An algorithm to compute the derivational entropy for RNA secondary structures had previously been introduced, using stochastic context free grammars (SCFGs). However, the numerical value of derivational entropy depends heavily on the chosen context free grammar and on the training set used to estimate rule probabilities. Using data from the Rfam database, we determine that both of our thermodynamic methods, which agree in numerical value, are substantially faster than the SCFG method. Thermodynamic structural entropy is much smaller than derivational entropy, and the correlation between length-normalized thermodynamic entropy and derivational entropy is moderately weak to poor. In applications, we plot the structural entropy as a function of temperature for known thermoswitches, such as the repression of heat shock gene expression (ROSE) element, we determine that the correlation between hammerhead ribozyme cleavage activity and total free energy is improved by including an additional free energy term arising from conformational entropy, and we plot the structural entropy of windows of the HIV-1 genome. Our software RNAentropy can compute structural entropy for any user-specified temperature, and supports both the Turner’99 and Turner’04 energy parameters. It follows that RNAentropy is state-of-the-art software to compute RNA secondary structure conformational entropy. Source code is available at https://github.com/clotelab/RNAentropy/; a full web server is available at http://bioinformatics.bc.edu/clotelab/RNAentropy, including source code and ancillary programs.  相似文献   

12.
microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3’ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2–7 of any short (s)RNA that can enter the RISC. Therefore, we developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. We provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). Our methods (found at https://github.com/ebartom/SPOROS and at Code Ocean: https://doi.org/10.24433/CO.1732496.v1) are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings.  相似文献   

13.
We report the development and testing of software called QuantiFly: an automated tool to quantify Drosophila egg laying. Many laboratories count Drosophila eggs as a marker of fitness. The existing method requires laboratory researchers to count eggs manually while looking down a microscope. This technique is both time-consuming and tedious, especially when experiments require daily counts of hundreds of vials. The basis of the QuantiFly software is an algorithm which applies and improves upon an existing advanced pattern recognition and machine-learning routine. The accuracy of the baseline algorithm is additionally increased in this study through correction of bias observed in the algorithm output. The QuantiFly software, which includes the refined algorithm, has been designed to be immediately accessible to scientists through an intuitive and responsive user-friendly graphical interface. The software is also open-source, self-contained, has no dependencies and is easily installed (https://github.com/dwaithe/quantifly). Compared to manual egg counts made from digital images, QuantiFly achieved average accuracies of 94% and 85% for eggs laid on transparent (defined) and opaque (yeast-based) fly media. Thus, the software is capable of detecting experimental differences in most experimental situations. Significantly, the advanced feature recognition capabilities of the software proved to be robust to food surface artefacts like bubbles and crevices. The user experience involves image acquisition, algorithm training by labelling a subset of eggs in images of some of the vials, followed by a batch analysis mode in which new images are automatically assessed for egg numbers. Initial training typically requires approximately 10 minutes, while subsequent image evaluation by the software is performed in just a few seconds. Given the average time per vial for manual counting is approximately 40 seconds, our software introduces a timesaving advantage for experiments starting with as few as 20 vials. We also describe an optional acrylic box to be used as a digital camera mount and to provide controlled lighting during image acquisition which will guarantee the conditions used in this study.  相似文献   

14.
Female genital schistosomiasis (FGS) is characterized by a pattern of lesions which manifest at the cervix and the vagina, such as homogeneous and grainy sandy patches, rubbery papules in addition to neovascularization. A tool for quantification of the lesions is needed to improve FGS research and control programs. Hitherto, no tools are available to quantify clinical pathology at the cervix in a standardized and reproducible manner. This study aimed to develop and validate a cervical lesion proportion (CLP) measure for quantification of cervical pathology in FGS. A digital imaging technique was applied in which a grid containing 424 identical squares was positioned on high resolution digital images from the cervix of 70 women with FGS. CLP was measured for each image by observers counting the total number of squares containing at least one type of FGS associated lesion. For assessment of inter- and intra-observer reliability, three different observers measured CLP independently. In addition, a rubbery papule count (RPC) was determined in a similar manner. The intraclass correlation coefficient was 0.94 (excellent) for the CLP inter-rater reliability and 0.90 (good) for intra-rater reliability and the coefficients for the RPC were 0.88 and 0.80 (good), respectively. The CLP facilitated a reliable and reproducible quantification of FGS associated lesions of the cervix. In the future, grading of cervical pathology by CLP may provide insight into the natural course of schistosome egg-induced pathology of the cervix and may have a role in assessing praziquantel treatment efficacy against FGS.Trial Registration: ClinicalTrials.gov, trial number NCT04115072; trial URL https://clinicaltrials.gov/ct2/show/NCT04115072?term=Female+genital+schistosomiasis+AND+Madagascar&draw=2&rank=1.  相似文献   

15.
Outbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions. We developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. MicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully operational without an internet connection. Using publicly available data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. Users can email vog.cdc@ecarteborcim for support. The source code is available at https://github.com/cdcgov/microbetrace.  相似文献   

16.
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems’ output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems’ annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.  相似文献   

17.
Cellular force generation and force transmission are of fundamental importance for numerous biological processes and can be studied with the methods of Traction Force Microscopy (TFM) and Monolayer Stress Microscopy. Traction Force Microscopy and Monolayer Stress Microscopy solve the inverse problem of reconstructing cell-matrix tractions and inter- and intra-cellular stresses from the measured cell force-induced deformations of an adhesive substrate with known elasticity. Although several laboratories have developed software for Traction Force Microscopy and Monolayer Stress Microscopy computations, there is currently no software package available that allows non-expert users to perform a full evaluation of such experiments. Here we present pyTFM, a tool to perform Traction Force Microscopy and Monolayer Stress Microscopy on cell patches and cell layers grown in a 2-dimensional environment. pyTFM was optimized for ease-of-use; it is open-source and well documented (hosted at https://pytfm.readthedocs.io/) including usage examples and explanations of the theoretical background. pyTFM can be used as a standalone Python package or as an add-on to the image annotation tool ClickPoints. In combination with the ClickPoints environment, pyTFM allows the user to set all necessary analysis parameters, select regions of interest, examine the input data and intermediary results, and calculate a wide range of parameters describing forces, stresses, and their distribution. In this work, we also thoroughly analyze the accuracy and performance of the Traction Force Microscopy and Monolayer Stress Microscopy algorithms of pyTFM using synthetic and experimental data from epithelial cell patches.  相似文献   

18.
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
This is a PLOS Computational Biology Software paper.
  相似文献   

19.
20.
Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号