首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

When studying the genetics of a human trait, we typically have to manage both genome-wide and targeted genotype data. There can be overlap of both people and markers from different genotyping experiments; the overlap can introduce several kinds of problems. Most times the overlapping genotypes are the same, but sometimes they are different. Occasionally, the lab will return genotypes using a different allele labeling scheme (for example 1/2 vs A/C). Sometimes, the genotype for a person/marker index is unreliable or missing. Further, over time some markers are merged and bad samples are re-run under a different sample name. We need a consistent picture of the subset of data we have chosen to work with even though there might possibly be conflicting measurements from multiple data sources.

Results

We have developed the dbVOR database, which is designed to hold data efficiently for both genome-wide and targeted experiments. The data are indexed for fast retrieval by person and marker. In addition, we store pedigree and phenotype data for our subjects. The dbVOR database allows us to select subsets of the data by several different criteria and to merge their results into a coherent and consistent whole. Data may be filtered by: family, person, trait value, markers, chromosomes, and chromosome ranges. The results can be presented in columnar, Mega2, or PLINK format.

Conclusions

dbVOR serves our needs well. It is freely available from https://watson.hgen.pitt.edu/register. Documentation for dbVOR can be found at https://watson.hgen.pitt.edu/register/docs/dbvor.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0505-4) contains supplementary material, which is available to authorized users.  相似文献   

2.
3.
As the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here, we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.  相似文献   

4.
We present a de novo re-determination of the secondary (2°) structure and domain architecture of the 23S and 5S rRNAs, using 3D structures, determined by X-ray diffraction, as input. In the traditional 2° structure, the center of the 23S rRNA is an extended single strand, which in 3D is seen to be compact and double helical. Accurately assigning nucleotides to helices compels a revision of the 23S rRNA 2° structure. Unlike the traditional 2° structure, the revised 2° structure of the 23S rRNA shows architectural similarity with the 16S rRNA. The revised 2° structure also reveals a clear relationship with the 3D structure and is generalizable to rRNAs of other species from all three domains of life. The 2° structure revision required us to reconsider the domain architecture. We partitioned the 23S rRNA into domains through analysis of molecular interactions, calculations of 2D folding propensities and compactness. The best domain model for the 23S rRNA contains seven domains, not six as previously ascribed. Domain 0 forms the core of the 23S rRNA, to which the other six domains are rooted. Editable 2° structures mapped with various data are provided (http://apollo.chemistry.gatech.edu/RibosomeGallery).  相似文献   

5.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.  相似文献   

6.
Chemical graph generators are software packages to generate computer representations of chemical structures adhering to certain boundary conditions. Their development is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.  相似文献   

7.
Small RNA RNA-seq for microRNAs (miRNAs) is a rapidly developing field where opportunities still exist to create better bioinformatics tools to process these large datasets and generate new, useful analyses. We built miRge to be a fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries. miRNAs are summarized at the level of raw reads in addition to reads per million (RPM). Reads for all other RNA species (tRNA, rRNA, snoRNA, mRNA) are provided, which is useful for identifying potential contaminants and optimizing small RNA purification strategies. miRge was designed to optimally identify miRNA isomiRs and employs an entropy based statistical measurement to identify differential production of isomiRs. This allowed us to identify decreasing entropy in isomiRs as stem cells mature into retinal pigment epithelial cells. Conversely, we show that pancreatic tumor miRNAs have similar entropy to matched normal pancreatic tissues. In a head-to-head comparison with other miRNA analysis tools (miRExpress 2.0, sRNAbench, omiRAs, miRDeep2, Chimira, UEA small RNA Workbench), miRge was faster (4 to 32-fold) and was among the top-two methods in maximally aligning miRNAs reads per sample. Moreover, miRge has no inherent limits to its multiplexing. miRge was capable of simultaneously analyzing 100 small RNA-Seq samples in 52 minutes, providing an integrated analysis of miRNA expression across all samples. As miRge was designed for analysis of single as well as multiple samples, miRge is an ideal tool for high and low-throughput users. miRge is freely available at http://atlas.pathology.jhu.edu/baras/miRge.html.  相似文献   

8.
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta''s FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5′ UTR; the reverse complement of the 5′ UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3′ UTR. For eleven of these elements (the stems in SL1–8, reverse complement of SL1–4, FSE, s2m and 3′ UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets (‘FARFAR2-SARS-CoV-2’, https://github.com/DasLab/FARFAR2-SARS-CoV-2; and ‘FARFAR2-Apo-Riboswitch’, at https://github.com/DasLab/FARFAR2-Apo-Riboswitch’) include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.  相似文献   

9.
We have designed and studied antisense oligodeoxynucleotides (oligonucleotides; oligos) which we call ‘pseudo-cyclic oligonucleotides’ (PCOs). PCOs contain two oligonucleotide segments attached through their 3′-3′- or 5′-5′-ends. One of the segments of the PCO is an antisense oligo complementary to a target mRNA, and the other is a short protective oligo that is 5–8 nucleotides long and complementary to the 3′- or 5′-end of the antisense oligo. As a result of complementarity between the antisense and protective oligo segments, PCOs form intramolecular pseudo-cyclic structures in the absence of the target RNA. The antisense oligo segment of PCOs used for the studies described here is complementary to an 18-nucleotide-long site on the mRNA of the protein kinase A regulatory subunit RI (PKA-RI). Thermal melting studies of PCOs in the absence and presence of the complementary RNA suggest that the pseudo-cyclic structures formed in the absence of the target RNA dissociate, bind to the target RNA, and form heteroduplexes. The results of RNase H cleavage assays suggest that PCOs bind to complementary RNA and activate RNase H in a manner similar to that of an 18-mer conventional antisense PS-oligo. In snake venom (a 3′-exonuclease) or spleen (a 5′-exonuclease) phosphodiesterase digestion studies, PCOs are more stable than conventional antisense oligos because of the presence of 3′-3′- or 5′-5′-linkages and the formation of intramolecular pseudo-cyclic structures. PCOs with a phosphorothioate antisense oligo segment inhibited cell growth of MDA-MB-468 and GEO cancer cell lines similar to that of the conventional antisense PS-oligo, suggesting efficient cellular uptake and target binding. The nuclease stability studies in mice suggest that PCOs have higher in vivo stability than antisense PS-oligos. The studies in mice showed similar pharmacokinetic and tissue distribution profiles for PCOs to those of antisense PS-oligos in general, but rapid elimination from selected tissues.  相似文献   

10.
11.
Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.

This is a PLOS Computational Biology Software paper.
  相似文献   

12.
13.
Recent studies have shown that RNA structural motifs play essential roles in RNA folding and interaction with other molecules. Computational identification and analysis of RNA structural motifs remains a challenging task. Existing motif identification methods based on 3D structure may not properly compare motifs with high structural variations. Other structural motif identification methods consider only nested canonical base-pairing structures and cannot be used to identify complex RNA structural motifs that often consist of various non-canonical base pairs due to uncommon hydrogen bond interactions. In this article, we present a novel RNA structural alignment method for RNA structural motif identification, RNAMotifScan, which takes into consideration the isosteric (both canonical and non-canonical) base pairs and multi-pairings in RNA structural motifs. The utility and accuracy of RNAMotifScan is demonstrated by searching for kink-turn, C-loop, sarcin-ricin, reverse kink-turn and E-loop motifs against a 23S rRNA (PDBid: 1S72), which is well characterized for the occurrences of these motifs. Finally, we search these motifs against the RNA structures in the entire Protein Data Bank and the abundances of them are estimated. RNAMotifScan is freely available at our supplementary website (http://genome.ucf.edu/RNAMotifScan).  相似文献   

14.
Novel tools for in silico design of RNA constructs such as riboregulators are required in order to reduce time and cost to production for the development of diagnostic and therapeutic advances. Here, we present MoiRNAiFold, a versatile and user-friendly tool for de novo synthetic RNA design. MoiRNAiFold is based on Constraint Programming and it includes novel variable types, heuristics and restart strategies for Large Neighborhood Search. Moreover, this software can handle dozens of design constraints and quality measures and improves features for RNA regulation control of gene expression, such as Translation Efficiency calculation. We demonstrate that MoiRNAiFold outperforms any previous software in benchmarking structural RNA puzzles from EteRNA. Importantly, with regard to biologically relevant RNA designs, we focus on RNA riboregulators, demonstrating that the designed RNA sequences are functional both in vitro and in vivo. Overall, we have generated a powerful tool for de novo complex RNA design that we make freely available as a web server (https://moiraibiodesign.com/design/).  相似文献   

15.
Knowledge of the interactions between proteins and nucleic acids is the basis of understanding various biological activities and designing new drugs. How to accurately identify the nucleic-acid-binding residues remains a challenging task. In this paper, we propose an accurate predictor, GraphBind, for identifying nucleic-acid-binding residues on proteins based on an end-to-end graph neural network. Considering that binding sites often behave in highly conservative patterns on local tertiary structures, we first construct graphs based on the structural contexts of target residues and their spatial neighborhood. Then, hierarchical graph neural networks (HGNNs) are used to embed the latent local patterns of structural and bio-physicochemical characteristics for binding residue recognition. We comprehensively evaluate GraphBind on DNA/RNA benchmark datasets. The results demonstrate the superior performance of GraphBind than state-of-the-art methods. Moreover, GraphBind is extended to other ligand-binding residue prediction to verify its generalization capability. Web server of GraphBind is freely available at http://www.csbio.sjtu.edu.cn/bioinf/GraphBind/.  相似文献   

16.
17.
18.
19.
20.
Practical identifiability of Systems Biology models has received a lot of attention in recent scientific research. It addresses the crucial question for models’ predictability: how accurately can the models’ parameters be recovered from available experimental data. The methods based on profile likelihood are among the most reliable methods of practical identification. However, these methods are often computationally demanding or lead to inaccurate estimations of parameters’ confidence intervals. Development of methods, which can accurately produce parameters’ confidence intervals in reasonable computational time, is of utmost importance for Systems Biology and QSP modeling.We propose an algorithm Confidence Intervals by Constraint Optimization (CICO) based on profile likelihood, designed to speed-up confidence intervals estimation and reduce computational cost. The numerical implementation of the algorithm includes settings to control the accuracy of confidence intervals estimates. The algorithm was tested on a number of Systems Biology models, including Taxol treatment model and STAT5 Dimerization model, discussed in the current article.The CICO algorithm is implemented in a software package freely available in Julia (https://github.com/insysbio/LikelihoodProfiler.jl) and Python (https://github.com/insysbio/LikelihoodProfiler.py).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号