首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 864 毫秒
1.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

2.
3.
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.  相似文献   

4.
5.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.  相似文献   

6.
Read-depths (RDs) are frequently used in identifying structural variants (SVs) from sequencing data. For existing RD-based SV callers, it is difficult for them to determine breakpoints in single-nucleotide resolution due to the noisiness of RD data and the bin-based calculation. In this paper, we propose to use the deep segmentation model UNet to learn base-wise RD patterns surrounding breakpoints of known SVs. We integrate model predictions with an RD-based SV caller to enhance breakpoints in single-nucleotide resolution. We show that UNet can be trained with a small amount of data and can be applied both in-sample and cross-sample. An enhancement pipeline named RDBKE significantly increases the number of SVs with more precise breakpoints on simulated and real data. The source code of RDBKE is freely available at https://github.com/yaozhong/deepIntraSV.  相似文献   

7.
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.  相似文献   

8.
Identifying copy number variants (CNVs) can provide diagnoses to patients and provide important biological insights into human health and disease. Current exome and targeted sequencing approaches cannot detect clinically and biologically-relevant CNVs outside their target area. We present SavvyCNV, a tool which uses off-target read data from exome and targeted sequencing data to call germline CNVs genome-wide. Up to 70% of sequencing reads from exome and targeted sequencing fall outside the targeted regions. We have developed a new tool, SavvyCNV, to exploit this ‘free data’ to call CNVs across the genome. We benchmarked SavvyCNV against five state-of-the-art CNV callers using truth sets generated from genome sequencing data and Multiplex Ligation-dependent Probe Amplification assays. SavvyCNV called CNVs with high precision and recall, outperforming the five other tools at calling CNVs genome-wide, using off-target or on-target reads from targeted panel and exome sequencing. We then applied SavvyCNV to clinical samples sequenced using a targeted panel and were able to call previously undetected clinically-relevant CNVs, highlighting the utility of this tool within the diagnostic setting. SavvyCNV outperforms existing tools for calling CNVs from off-target reads. It can call CNVs genome-wide from targeted panel and exome data, increasing the utility and diagnostic yield of these tests. SavvyCNV is freely available at https://github.com/rdemolgen/SavvySuite.  相似文献   

9.
A synchrotron X-ray microscope is a powerful imaging apparatus for taking high-resolution and high-contrast X-ray images of nanoscale objects. A sufficient number of X-ray projection images from different angles is required for constructing 3D volume images of an object. Because a synchrotron light source is immobile, a rotational object holder is required for tomography. At a resolution of 10 nm per pixel, the vibration of the holder caused by rotating the object cannot be disregarded if tomographic images are to be reconstructed accurately. This paper presents a computer method to compensate for the vibration of the rotational holder by aligning neighboring X-ray images. This alignment process involves two steps. The first step is to match the “projected feature points” in the sequence of images. The matched projected feature points in the - plane should form a set of sine-shaped loci. The second step is to fit the loci to a set of sine waves to compute the parameters required for alignment. The experimental results show that the proposed method outperforms two previously proposed methods, Xradia and SPIDER. The developed software system can be downloaded from the URL, http://www.cs.nctu.edu.tw/~chengchc/SCTA or http://goo.gl/s4AMx.  相似文献   

10.
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the ‘structural alignment’ space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The ‘best’ centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.  相似文献   

11.
12.
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.  相似文献   

13.
The functional integrity of neurons requires the bidirectional active transport of synaptic vesicles (SVs) in axons. The kinesin motor KIF1A transports SVs from somas to stable SV clusters at synapses, while dynein moves them in the opposite direction. However, it is unclear how SV transport is regulated and how SVs at clusters interact with motor proteins. We addressed these questions by isolating a rare temperature-sensitive allele of Caenorhabditis elegans unc-104 (KIF1A) that allowed us to manipulate SV levels in axons and dendrites. Growth at 20° and 14° resulted in locomotion rates that were ∼3 and 50% of wild type, respectively, with similar effects on axonal SV levels. Corresponding with the loss of SVs from axons, mutants grown at 14° and 20° showed a 10- and 24-fold dynein-dependent accumulation of SVs in their dendrites. Mutants grown at 14° and switched to 25° showed an abrupt irreversible 50% decrease in locomotion and a 50% loss of SVs from the synaptic region 12-hr post-shift, with no further decreases at later time points, suggesting that the remaining clustered SVs are stable and resistant to retrograde removal by dynein. The data further showed that the synapse-assembly proteins SYD-1, SYD-2, and SAD-1 protected SV clusters from degradation by motor proteins. In syd-1, syd-2, and sad-1 mutants, SVs accumulate in an UNC-104-dependent manner in the distal axon region that normally lacks SVs. In addition to their roles in SV cluster stability, all three proteins also regulate SV transport.  相似文献   

14.
15.
Idealized models of walking and running demonstrate that, energetically, walking should be favoured up to, and even somewhat over, those speeds and step lengths that can be achieved while keeping the stance leg under compression. Around these speeds, and especially with relatively long step lengths, computer optimization predicts a third, ‘hybrid’, gait: (inverted) pendular running (Srinivasan & Ruina 2006 Nature 439, 72–75 (doi:10.1038/nature04113)). This gait involves both walking-like vaulting mechanics and running-like ballistic paths. Trajectories of horizontal versus vertical centre of mass velocities—‘hodographs’—over the step cycle are distinctive for each gait: anticlockwise for walk; clockwise for run; figure-of-eight for the hybrid gait. Both pheasants and guineafowl demonstrate each gait at close to the predicted speed/step length combinations, although fully aerial ballistic phases are never achieved during the hybrid or ‘Grounded Inverted Pendular Running’ gait.  相似文献   

16.
The present paper is a commentary to ‘Identification and characterization of hADSCderived exosome proteins from different isolation methods’ (Huang et al. 2021; 10.1111/jcmm.16775). Given the enthusiasm for the potential of mesenchymal stromal cell‐derived extracellular vesicles (MSC‐EVs), some considerations deserve attention as they move through successive stages of research and application into humans. We herein remark the prerequisite of generating that evidence ensuring a high consistency in safety, composition and biological activity of the intended MSC‐EV preparations, and the suitability of disparate isolation techniques to produce efficacious EV preparations and fulfil requirements for standardized clinical‐grade biomanufacturing.  相似文献   

17.

Background

Computational biology contributes to a variety of areas related to life sciences and, due to the growing impact of translational medicine - the scientific approach to medicine in tight relation with basic science -, it is becoming an important player in clinical-related areas. In this study, we use computation methods in order to improve our understanding of the complex interactions that occur between molecules related to Rheumatoid Arthritis (RA).

Methodology

Due to the complexity of the disease and the numerous molecular players involved, we devised a method to construct a systemic network of interactions of the processes ongoing in patients affected by RA. The network is based on high-throughput data, refined semi-automatically with carefully curated literature-based information. This global network has then been topologically analysed, as a whole and tissue-specifically, in order to translate the experimental molecular connections into topological motifs meaningful in the identification of tissue-specific markers and targets in the diagnosis, and possibly in the therapy, of RA.

Significance

We find that some nodes in the network that prove to be topologically important, in particular AKT2, IL6, MAPK1 and TP53, are also known to be associated with drugs used for the treatment of RA. Importantly, based on topological consideration, we are also able to suggest CRKL as a novel potentially relevant molecule for the diagnosis or treatment of RA. This type of finding proves the potential of in silico analyses able to produce highly refined hypotheses, based on vast experimental data, to be tested further and more efficiently. As research on RA is ongoing, the present map is in fieri, despite being -at the moment- a reflection of the state of the art. For this reason we make the network freely available in the standardised and easily exportable .xml CellDesigner format at ‘www.picb.ac.cn/ClinicalGenomicNTW/temp.html’ and ‘www.celldesigner.org’.  相似文献   

18.
DNA nanotechnology exploits the programmable specificity afforded by base-pairing to produce self-assembling macromolecular objects of custom shape. For building megadalton-scale DNA nanostructures, a long ‘scaffold’ strand can be employed to template the assembly of hundreds of oligonucleotide ‘staple’ strands into a planar antiparallel array of cross-linked helices. We recently adapted this ‘scaffolded DNA origami’ method to producing 3D shapes formed as pleated layers of double helices constrained to a honeycomb lattice. However, completing the required design steps can be cumbersome and time-consuming. Here we present caDNAno, an open-source software package with a graphical user interface that aids in the design of DNA sequences for folding 3D honeycomb-pleated shapes A series of rectangular-block motifs were designed, assembled, and analyzed to identify a well-behaved motif that could serve as a building block for future studies. The use of caDNAno significantly reduces the effort required to design 3D DNA-origami structures. The software is available at http://cadnano.org/, along with example designs and video tutorials demonstrating their construction. The source code is released under the MIT license.  相似文献   

19.
microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3’ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2–7 of any short (s)RNA that can enter the RISC. Therefore, we developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. We provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). Our methods (found at https://github.com/ebartom/SPOROS and at Code Ocean: https://doi.org/10.24433/CO.1732496.v1) are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号