首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Superpositioning is an important problem in structural biology. Determining an optimal superposition requires a one-to-one correspondence between the atoms of two proteins structures. However, in practice, some atoms are missing from their original structures. Current superposition implementations address the missing data crudely by ignoring such atoms from their structures.

Results

In this paper, we propose an effective method for superpositioning pairwise and multiple structures without sequence alignment. It is a two-stage procedure including data reduction and data registration.

Conclusions

Numerical experiments demonstrated that our method is effective and efficient. The code package of protein structure superposition method for addressing the cases with missing data is implemented by MATLAB, and it is freely available from: http://sourceforge.net/projects/pssm123/files/?source=navbar
  相似文献   

2.
Lyu  Chuqiao  Wang  Lei  Zhang  Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.
  相似文献   

3.

Background

DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the “words” based only on the DNA sequences.

Methods

We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract “DNA words” that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods.

Results

The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary.

Conclusions

Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
  相似文献   

4.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

5.

Aims

This paper describes a simple technique of axillary and breast massage which improves the successful identification of blue sentinel nodes using patent blue dye alone.

Methods

Patent blue dye was injected in the subdermal part of the retroaroelar area in 167 patients having surgical treatment for invasive breast cancer. Three stage axillary lymphatic massage was performed prior to making the axillary incision for sentinel lymph node biopsy. All patients had completion axillary sampling or clearance.

Results

A blue lymphatic duct leading to lymph nodes of the first drainage was identified in 163 (97%) of the patients. Results are compared with 168 patients who had sentinel lymph node biopsy using blue dye without axillary massage. Allergic reactions were observed in four patients (1.2%).

Conclusion

Three stage axillary lymphatic massage improves the successful identification of a blue sentinel lymph node in breast cancer patients.
  相似文献   

6.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

7.
8.
9.

Background

Proteins play fundamental and crucial roles in nearly all biological processes, such as, enzymatic catalysis, signaling transduction, DNA and RNA synthesis, and embryonic development. It has been a long-standing goal in molecular biology to predict the tertiary structure of a protein from its primary amino acid sequence. From visual comparison, it was found that a 2D triangular lattice model can give a better structure modeling and prediction for proteins with short primary amino acid sequences.

Methods

This paper proposes a hybrid of hill-climbing and genetic algorithm (HHGA) based on elite-based reproduction strategy for protein structure prediction on the 2D triangular lattice.

Results

The simulation results show that the proposed HHGA can successfully deal with the protein structure prediction problems. Specifically, HHGA significantly outperforms conventional genetic algorithms and is comparable to the state-of-the-art method in terms of free energy.

Conclusions

Thanks to the enhancement of local search on the global search, the proposed HHGA achieves promising results on the 2D triangular protein structure prediction problem. The satisfactory simulation results demonstrate the effectiveness of the proposed HHGA and the utility of the 2D triangular lattice model for protein structure prediction.
  相似文献   

10.

Background:

Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing.

Results:

Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist.

Conclusion:

Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet http://bionlp.sourceforge.net.
  相似文献   

11.

Background

Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.

Results

Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.

Conclusions

RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.
  相似文献   

12.

Background

Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement.

Results

We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures.

Conclusion

Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query.

Availability

FlowerPower is available as a webserver at http://phylogenomics.berkeley.edu/flowerpower/.
  相似文献   

13.
14.

Background

Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism.

Results

The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates.

Conclusions

In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/.
  相似文献   

15.

Background

Massively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns.

Results

We simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels of fragmentation and fragmentation biases, transition rates at sequence ends, and sequencing depths. Mapping of sample read pools to reference sequences revealed several trends, including decreased alignment success with increased read length and decreased variant recovery with increased divergence. Variants were generally called with high accuracy, however identification of SNPs (single-nucleotide polymorphisms) was less accurate for high damage/low divergence samples. Modest increases in sequencing depth resulted in rapid gains in total variant recovery, and limited improvements to recovery of heterozygous variants.

Conclusions

This in silico study suggests aDNA-associated damage patterns minimally impact variant call accuracy and recovery from short-read alignment, while modest increases in sequencing depth can greatly improve variant recovery.
  相似文献   

16.

Background

In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.

Methods

The physical basis of these intuitive maps is clarified by means of analytically solvable problems.

Results

Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.

Conclusion

Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.
  相似文献   

17.
18.

Background

Miniature inverted-repeat transposable element (MITE) is a type of class II non-autonomous transposable element playing a crucial role in the process of evolution in biology. There is an urgent need to develop bioinformatics tools to effectively identify MITEs on a whole genome-wide scale. However, most of currently existing tools suffer from low ability to deal with large eukaryotic genomes.

Methods

In this paper, we proposed a novel tool MiteFinderII, which was adapted from our previous algorithm MiteFinder, to efficiently detect MITEs from genomics sequences. It has six major steps: (1) build K-mer Index and search for inverted repeats; (2) filtration of inverted repeats with low complexity; (3) merger of inverted repeats; (4) filtration of candidates with low score; (5) selection of final MITE sequences; (6) selection of representative sequences.

Results

To test the performance, MiteFinderII and three other existing algorithms were applied to identify MITEs on the whole genome of oryza sativa. Results suggest that MiteFinderII outperforms existing popular tools in terms of both specificity and recall. Additionally, it is much faster and more memory-efficient than other tools in the detection.

Conclusion

MiteFinderII is an accurate and effective tool to detect MITEs hidden in eukaryotic genomes. The source code is freely accessible at the website: https://github.com/screamer/miteFinder.
  相似文献   

19.

Background

Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences.

Methods

The proposed 20 real-value RSA predictors were designed on the basis of the support vector regression method with a set of informative physicochemical properties (PCPs) obtained by means of an optimal feature selection algorithm. Then, molecular dynamics simulations were performed for validating the knowledge discovered by analysis of the selected PCPs.

Results

The RSA predictors had the mean absolute error of 14.11% and a correlation coefficient of 0.69, better than the existing predictors. The hydrophilic-residue predictors preferred PCPs of buried amino acid residues to PCPs of exposed ones as prediction features. A hydrophobic spine composed of exposed hydrophobic residues of an α-helix was discovered by analyzing the PCPs of RSA predictors corresponding to hydrophobic residues. For example, the results of a molecular dynamics simulation of wild-type sequences and their mutants showed that proteins 1MOF and 2WRP_H16I (Protein Data Bank IDs), which have a perfectly hydrophobic spine, have more stable structures than 1MOF_I54D and 2WRP do (which do not have a perfectly hydrophobic spine).

Conclusions

We identified informative PCPs to design high-performance RSA predictors and to analyze these PCPs for identification of novel protein folding characteristics. A hydrophobic spine in a protein can help to stabilize exposed α-helices.
  相似文献   

20.

Introduction

Metabolomic profiling combines Nuclear Magnetic Resonance spectroscopy with supervised statistical analysis that might allow to better understanding the mechanisms of a disease.

Objectives

In this study, the urinary metabolic profiling of individuals with porphyrias was performed to predict different types of disease, and to propose new pathophysiological hypotheses.

Methods

Urine 1H-NMR spectra of 73 patients with asymptomatic acute intermittent porphyria (aAIP) and familial or sporadic porphyria cutanea tarda (f/sPCT) were compared using a supervised rule-mining algorithm. NMR spectrum buckets bins, corresponding to rules, were extracted and a logistic regression was trained.

Results

Our rule-mining algorithm generated results were consistent with those obtained using partial least square discriminant analysis (PLS-DA) and the predictive performance of the model was significant. Buckets that were identified by the algorithm corresponded to metabolites involved in glycolysis and energy-conversion pathways, notably acetate, citrate, and pyruvate, which were found in higher concentrations in the urines of aAIP compared with PCT patients. Metabolic profiling did not discriminate sPCT from fPCT patients.

Conclusion

These results suggest that metabolic reprogramming occurs in aAIP individuals, even in the absence of overt symptoms, and supports the relationship that occur between heme synthesis and mitochondrial energetic metabolism.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号