首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

2.
A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.  相似文献   

3.
4.
5.
Metabolomics and proteomics, like other omics domains, usually face a data mining challenge in providing an understandable output to advance in biomarker discovery and precision medicine. Often, statistical analysis is one of the most difficult challenges and it is critical in the subsequent biological interpretation of the results. Because of this, combined with the computational programming skills needed for this type of analysis, several bioinformatic tools aimed at simplifying metabolomics and proteomics data analysis have emerged. However, sometimes the analysis is still limited to a few hidebound statistical methods and to data sets with limited flexibility. POMAShiny is a web-based tool that provides a structured, flexible and user-friendly workflow for the visualization, exploration and statistical analysis of metabolomics and proteomics data. This tool integrates several statistical methods, some of them widely used in other types of omics, and it is based on the POMA R/Bioconductor package, which increases the reproducibility and flexibility of analyses outside the web environment. POMAShiny and POMA are both freely available at https://github.com/nutrimetabolomics/POMAShiny and https://github.com/nutrimetabolomics/POMA, respectively.  相似文献   

6.
Septin proteins bind GTP and heterooligomerize into filaments with conserved functions across a wide range of eukaryotes. Most septins hydrolyze GTP, altering the oligomerization interfaces; yet mutations designed to abolish nucleotide binding or hydrolysis by yeast septins perturb function only at high temperatures. Here, we apply an unbiased mutational approach to this problem. Mutations causing defects at high temperature mapped exclusively to the oligomerization interface encompassing the GTP-binding pocket, or to the pocket itself. Strikingly, cold-sensitive defects arise when certain of these same mutations are coexpressed with a wild-type allele, suggestive of a novel mode of dominance involving incompatibility between mutant and wild-type molecules at the septin–septin interfaces that mediate filament polymerization. A different cold-sensitive mutant harbors a substitution in an unstudied but highly conserved region of the septin Cdc12. A homologous domain in the small GTPase Ran allosterically regulates GTP-binding domain conformations, pointing to a possible new functional domain in some septins. Finally, we identify a mutation in septin Cdc3 that restores the high-temperature assembly competence of a mutant allele of septin Cdc10, likely by adopting a conformation more compatible with nucleotide-free Cdc10. Taken together, our findings demonstrate that GTP binding and hydrolysis promote, but are not required for, one-time events—presumably oligomerization-associated conformational changes—during assembly of the building blocks of septin filaments. Restrictive temperatures impose conformational constraints on mutant septin proteins, preventing new assembly and in certain cases destabilizing existing assemblies. These insights from yeast relate directly to disease-causing mutations in human septins.  相似文献   

7.
The perturbations of protein-protein interactions (PPIs) were found to be the main cause of cancer. Previous PPI prediction methods which were trained with non-disease general PPI data were not compatible to map the PPI network in cancer. Therefore, we established a novel cancer specific PPI prediction method dubbed NECARE, which was based on relational graph convolutional network (R-GCN) with knowledge-based features. It achieved the best performance with a Matthews correlation coefficient (MCC) = 0.84±0.03 and an F1 = 91±2% compared with other methods. With NECARE, we mapped the cancer interactome atlas and revealed that the perturbations of PPIs were enriched on 1362 genes, which were named cancer hub genes. Those genes were found to over-represent with mutations occurring at protein-macromolecules binding interfaces. Furthermore, over 56% of cancer treatment-related genes belonged to hub genes and they were significantly related to the prognosis of 32 types of cancers. Finally, by coimmunoprecipitation, we confirmed that the NECARE prediction method was highly reliable with a 90% accuracy. Overall, we provided the novel network-based cancer protein-protein interaction prediction method and mapped the perturbation of cancer interactome. NECARE is available at: https://github.com/JiajunQiu/NECARE.  相似文献   

8.
We present Virtual Pharmacist, a web-based platform that takes common types of high-throughput data, namely microarray SNP genotyping data, FASTQ and Variant Call Format (VCF) files as inputs, and reports potential drug responses in terms of efficacy, dosage and toxicity at one glance. Batch submission facilitates multivariate analysis or data mining of targeted groups. Individual analysis consists of a report that is readily comprehensible to patients and practioners who have basic knowledge in pharmacology, a table that summarizes variants and potential affected drug response according to the US Food and Drug Administration pharmacogenomic biomarker labeled drug list and PharmGKB, and visualization of a gene-drug-target network. Group analysis provides the distribution of the variants and potential affected drug response of a target group, a sample-gene variant count table, and a sample-drug count table. Our analysis of genomes from the 1000 Genome Project underlines the potentially differential drug responses among different human populations. Even within the same population, the findings from Watson’s genome highlight the importance of personalized medicine. Virtual Pharmacist can be accessed freely at http://www.sustc-genome.org.cn/vp or installed as a local web server. The codes and documentation are available at the GitHub repository (https://github.com/VirtualPharmacist/vp). Administrators can download the source codes to customize access settings for further development.  相似文献   

9.
Genomic enrichment methods and next-generation sequencing produce uneven coverage for the portions of the genome (the loci) they target; this information is essential for ascertaining the suitability of each locus for further analysis. lociNGS is a user-friendly accessory program that takes multi-FASTA formatted loci, next-generation sequence alignments and demographic data as input and collates, displays and outputs information about the data. Summary information includes the parameters coverage per locus, coverage per individual and number of polymorphic sites, among others. The program can output the raw sequences used to call loci from next-generation sequencing data. lociNGS also reformats subsets of loci in three commonly used formats for multi-locus phylogeographic and population genetics analyses – NEXUS, IMa2 and Migrate. lociNGS is available at https://github.com/SHird/lociNGS and is dependent on installation of MongoDB (freely available at http://www.mongodb.org/downloads). lociNGS is written in Python and is supported on MacOSX and Unix; it is distributed under a GNU General Public License.  相似文献   

10.
Immunotherapies provide effective treatments for previously untreatable tumors and identifying tumor-specific epitopes can help elucidate the molecular determinants of therapy response. Here, we describe a pipeline, ISOTOPE (ISOform-guided prediction of epiTOPEs In Cancer), for the comprehensive identification of tumor-specific splicing-derived epitopes. Using RNA sequencing and mass spectrometry for MHC-I associated proteins, ISOTOPE identified neoepitopes from tumor-specific splicing events that are potentially presented by MHC-I complexes. Analysis of multiple samples indicates that splicing alterations may affect the production of self-epitopes and generate more candidate neoepitopes than somatic mutations. Although there was no difference in the number of splicing-derived neoepitopes between responders and non-responders to immune therapy, higher MHC-I binding affinity was associated with a positive response. Our analyses highlight the diversity of the immunogenic impacts of tumor-specific splicing alterations and the importance of studying splicing alterations to fully characterize tumors in the context of immunotherapies. ISOTOPE is available at https://github.com/comprna/ISOTOPE.  相似文献   

11.
Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy–Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.  相似文献   

12.
DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50–70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS.  相似文献   

13.
We present a de novo re-determination of the secondary (2°) structure and domain architecture of the 23S and 5S rRNAs, using 3D structures, determined by X-ray diffraction, as input. In the traditional 2° structure, the center of the 23S rRNA is an extended single strand, which in 3D is seen to be compact and double helical. Accurately assigning nucleotides to helices compels a revision of the 23S rRNA 2° structure. Unlike the traditional 2° structure, the revised 2° structure of the 23S rRNA shows architectural similarity with the 16S rRNA. The revised 2° structure also reveals a clear relationship with the 3D structure and is generalizable to rRNAs of other species from all three domains of life. The 2° structure revision required us to reconsider the domain architecture. We partitioned the 23S rRNA into domains through analysis of molecular interactions, calculations of 2D folding propensities and compactness. The best domain model for the 23S rRNA contains seven domains, not six as previously ascribed. Domain 0 forms the core of the 23S rRNA, to which the other six domains are rooted. Editable 2° structures mapped with various data are provided (http://apollo.chemistry.gatech.edu/RibosomeGallery).  相似文献   

14.
15.
microRNAs (miRNAs) are (18-22nt long) noncoding short (s)RNAs that suppress gene expression by targeting the 3’ untranslated region of target mRNAs. This occurs through the seed sequence located in position 2-7/8 of the miRNA guide strand, once it is loaded into the RNA induced silencing complex (RISC). G-rich 6mer seed sequences can kill cells by targeting C-rich 6mer seed matches located in genes that are critical for cell survival. This results in induction of Death Induced by Survival gene Elimination (DISE), through a mechanism we have called 6mer seed toxicity. miRNAs are often quantified in cells by aligning the reads from small (sm)RNA sequencing to the genome. However, the analysis of any smRNA Seq data set for predicted 6mer seed toxicity requires an alternative workflow, solely based on the exact position 2–7 of any short (s)RNA that can enter the RISC. Therefore, we developed SPOROS, a semi-automated pipeline that produces multiple useful outputs to predict and compare 6mer seed toxicity of cellular sRNAs, regardless of their nature, between different samples. We provide two examples to illustrate the capabilities of SPOROS: Example one involves the analysis of RISC-bound sRNAs in a cancer cell line (either wild-type or two mutant lines unable to produce most miRNAs). Example two is based on a publicly available smRNA Seq data set from postmortem brains (either from normal or Alzheimer’s patients). Our methods (found at https://github.com/ebartom/SPOROS and at Code Ocean: https://doi.org/10.24433/CO.1732496.v1) are designed to be used to analyze a variety of smRNA Seq data in various normal and disease settings.  相似文献   

16.
SPOR domains are ∼70 amino acids long and occur in >1,500 proteins identified by sequencing of bacterial genomes. The SPOR domains in the FtsN cell division proteins from Escherichia coli and Caulobacter crescentus have been shown to bind peptidoglycan. Besides FtsN, E. coli has three additional SPOR domain proteins—DamX, DedD, and RlpA. We show here that all three of these proteins localize to the septal ring in E. coli. The loss of DamX or DedD either alone or in combination with mutations in genes encoding other division proteins resulted in a variety of division phenotypes, demonstrating that DamX and DedD participate in cytokinesis. In contrast, RlpA mutants divided normally. Follow-up studies revealed that the SPOR domains themselves localize to the septal ring in vivo and bind peptidoglycan in vitro. Even SPOR domains from heterologous organisms, including Aquifex aeolicus, localized to septal rings when produced in E. coli and bound to purified E. coli peptidoglycan sacculi. We speculate that SPOR domains localize to the division site by binding preferentially to septal peptidoglycan. We further suggest that SPOR domain proteins are a common feature of the division apparatus in bacteria. DamX was characterized further and found to interact with multiple division proteins in a bacterial two-hybrid assay. One interaction partner is FtsQ, and several synthetic phenotypes suggest that DamX is a negative regulator of FtsQ function.Cell division in Escherichia coli is mediated by a collection of approximately 20 proteins, all of which localize to the midcell, where they form a structure called the septal ring, or divisome. About half of these proteins are essential for cell division. The corresponding temperature-sensitive mutants or depletion strains become filamentous and die under nonpermissive conditions. The remaining proteins are not essential under most laboratory conditions. In some cases null mutations reveal modest division defects, but in other cases division defects become apparent only under certain growth conditions or in combination with mutations in genes for other division proteins. For reviews of this topic, see references 18, 22, 29, and 67.One of the essential cell division proteins is a bitopic membrane protein named FtsN (see Fig. Fig.1A)1A) (13, 14). How FtsN facilitates cell division is not clear. Because overproduction of FtsN rescues a variety of mutants with lesions in genes for other cell division proteins [ftsA(Ts), ftsI(Ts) ftsQ(Ts), ftsEX null, ftsK null, and ftsP (sufI) null strains], it seems likely that one function of FtsN is to improve the assembly and/or stability of the septal ring (13, 20, 24, 30, 58, 63). Very recent evidence indicates that FtsN plays an important role in triggering constriction, probably by allosteric activation of some other component of the septal ring (26).Open in a separate windowFIG. 1.SPOR domain proteins included in this study. (A) Membrane topology and number of amino acids in each domain as retrieved from UniProt release 15.7 (http://www.uniprot.org) or the GTOP update of 15 December 2008 (http://spock.genes.nig.ac.jp/∼genome/gtop.html). N, amino terminus; CM, cytoplasmic membrane; OM, outer membrane. RlpA and VPA1294 have a covalently attached lipid at their amino termini. (B) Multiple-sequence alignment of SPOR domains shown in the present study to localize to the septal ring of E. coli. Sequences were aligned manually to the position-specific scoring matrix (PSSM) from http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi with the SPOR domain (Pfam accession no. 05036) as the PSSM identifier (PSSM ID). Residues with identity to those in the consensus sequence from the PSSM alignment are shaded gray. Numbers to the left refer to the first positions of the SPOR domains in the indicated proteins.A notable feature of FtsN is that it contains at its C terminus a peptidoglycan (PG) binding domain known as a SPOR domain (Pfam accession no. 05036) (23, 65, 72). SPOR domains are both common and widespread in bacteria. At the time of this writing (August 2009), over 1,500 proteins that contain a SPOR domain are listed in the Pfam database (23). These proteins come from over 500 bacterial species. The domain is named after the founding member of the protein family, a Bacillus subtilis protein named CwlC that is produced relatively late in the process of sporulation (41). CwlC, which comprises an N-terminal amidase domain and a C-terminal SPOR domain, facilitates release of the mature spore by degrading PG in the mother cell (48, 61).Our interest in SPOR domain proteins was piqued during a study of Vibrio parahaemolyticus (in collaboration with Linda McCarter) when we observed that a gene of unknown function, designated vpa1294, is highly induced in V. parahaemolyticus swarmer cells. The VPA1294 protein was annotated as a “putative DamX-related protein” (44; http://genome.gen-info.osaka-u.ac.jp/bacteria/vpara/). To learn about DamX, we turned to the EcoGene website (http://ecogene.org/) (57), which noted that (i) DamX from E. coli has an essentially unknown function, (ii) overproduction of DamX inhibits cell division (43), and (iii) DamX is one of four E. coli proteins that contain a SPOR domain, the others being the cell division protein FtsN and two proteins of unknown function, DedD and RlpA. Based on this information, we decided to investigate whether DamX, DedD, and RlpA are involved in cell division in E. coli. While this work was in progress, the Thanbichler laboratory demonstrated that Caulobacter crescentus has an FtsN-like protein that is needed for cell division (49) and the de Boer laboratory published a report on DamX, DedD, and RlpA from E. coli (26). We also learned that J. Maddock''s laboratory has been investigating DamX, DedD, and RlpA from E. coli (personal communication). Importantly, the major findings from all four laboratories are in general agreement: SPOR domain proteins are widespread in bacteria, many of these proteins are involved in cell division, and SPOR domains are sufficient for septal localization, probably because SPOR domains bind to septal PG.  相似文献   

17.
Evolutionary conservation is a fundamental resource for predicting the substitutability of amino acids and the loss of function in proteins. The use of multiple sequence alignment alone—without considering the evolutionary relationships among sequences—results in the redundant counting of evolutionarily related alteration events, as if they were independent. Here, we propose a new method, PHACT, that predicts the pathogenicity of missense mutations directly from the phylogenetic tree of proteins. PHACT travels through the nodes of the phylogenetic tree and evaluates the deleteriousness of a substitution based on the probability differences of ancestral amino acids between neighboring nodes in the tree. Moreover, PHACT assigns weights to each node in the tree based on their distance to the query organism. For each potential amino acid substitution, the algorithm generates a score that is used to calculate the effect of substitution on protein function. To analyze the predictive performance of PHACT, we performed various experiments over the subsets of two datasets that include 3,023 proteins and 61,662 variants in total. The experiments demonstrated that our method outperformed the widely used pathogenicity prediction tools (i.e., SIFT and PolyPhen-2) and achieved a better predictive performance than other conventional statistical approaches presented in dbNSFP. The PHACT source code is available at https://github.com/CompGenomeLab/PHACT.  相似文献   

18.
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta''s FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5′ UTR; the reverse complement of the 5′ UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3′ UTR. For eleven of these elements (the stems in SL1–8, reverse complement of SL1–4, FSE, s2m and 3′ UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets (‘FARFAR2-SARS-CoV-2’, https://github.com/DasLab/FARFAR2-SARS-CoV-2; and ‘FARFAR2-Apo-Riboswitch’, at https://github.com/DasLab/FARFAR2-Apo-Riboswitch’) include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.  相似文献   

19.
Target identification is one of the most critical steps following cell-based phenotypic chemical screens aimed at identifying compounds with potential uses in cell biology and for developing novel disease therapies. Current in silico target identification methods, including chemical similarity database searches, are limited to single or sequential ligand analysis that have limited capabilities for accurate deconvolution of a large number of compounds with diverse chemical structures. Here, we present CSNAP (Chemical Similarity Network Analysis Pulldown), a new computational target identification method that utilizes chemical similarity networks for large-scale chemotype (consensus chemical pattern) recognition and drug target profiling. Our benchmark study showed that CSNAP can achieve an overall higher accuracy (>80%) of target prediction with respect to representative chemotypes in large (>200) compound sets, in comparison to the SEA approach (60–70%). Additionally, CSNAP is capable of integrating with biological knowledge-based databases (Uniprot, GO) and high-throughput biology platforms (proteomic, genetic, etc) for system-wise drug target validation. To demonstrate the utility of the CSNAP approach, we combined CSNAP''s target prediction with experimental ligand evaluation to identify the major mitotic targets of hit compounds from a cell-based chemical screen and we highlight novel compounds targeting microtubules, an important cancer therapeutic target. The CSNAP method is freely available and can be accessed from the CSNAP web server (http://services.mbi.ucla.edu/CSNAP/).  相似文献   

20.

Background

Assembling genes from next-generation sequencing data is not only time consuming but computationally difficult, particularly for taxa without a closely related reference genome. Assembling even a draft genome using de novo approaches can take days, even on a powerful computer, and these assemblies typically require data from a variety of genomic libraries. Here we describe software that will alleviate these issues by rapidly assembling genes from distantly related taxa using a single library of paired-end reads: aTRAM, automated Target Restricted Assembly Method. The aTRAM pipeline uses a reference sequence, BLAST, and an iterative approach to target and locally assemble the genes of interest.

Results

Our results demonstrate that aTRAM rapidly assembles genes across distantly related taxa. In comparative tests with a closely related taxon, aTRAM assembled the same sequence as reference-based and de novo approaches taking on average < 1 min per gene. As a test case with divergent sequences, we assembled >1,000 genes from six taxa ranging from 25 – 110 million years divergent from the reference taxon. The gene recovery was between 97 – 99% from each taxon.

Conclusions

aTRAM can quickly assemble genes across distantly-related taxa, obviating the need for draft genome assembly of all taxa of interest. Because aTRAM uses a targeted approach, loci can be assembled in minutes depending on the size of the target. Our results suggest that this software will be useful in rapidly assembling genes for phylogenomic projects covering a wide taxonomic range, as well as other applications. The software is freely available http://www.github.com/juliema/aTRAM.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0515-2) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号