首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.

Background

The Clusters of Orthologous Groups (COGs) of proteins systematize evolutionary related proteins into specific groups with similar functions. However, the available databases do not provide means to assess the extent of similarity between the COGs.

Aim

We intended to provide a method for identification and visualization of evolutionary relationships between the COGs, as well as a respective web server.

Results

Here we introduce the COGcollator, a web tool for identification of evolutionarily related COGs and their further analysis. We demonstrate the utility of this tool by identifying the COGs that contain distant homologs of (i) the catalytic subunit of bacterial rotary membrane ATP synthases and (ii) the DNA/RNA helicases of the superfamily 1.

Reviewers

This article was reviewed by Drs. Igor N. Berezovsky, Igor Zhulin and Yuri Wolf.
  相似文献   

2.

Background

DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the “words” based only on the DNA sequences.

Methods

We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract “DNA words” that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods.

Results

The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary.

Conclusions

Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
  相似文献   

3.

Purpose of review

Black yeast-like fungi are capable of causing a wide range of infections, including invasive disease. The diagnosis of infections caused by these species can be problematic. We review the changes in the nomenclature and taxonomy of these fungi, and methods used for detection and species identification that aid in diagnosis.

Recent findings

Molecular assays, including DNA barcode analysis and rolling circle amplification, have improved our ability to correctly identify these species. A proteomic approach using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has also shown promising results. While progress has been made with molecular techniques using direct specimens, data are currently limited.

Summary

Molecular and proteomic assays have improved the identification of black yeast-like fungi. However, improved molecular and proteomic databases and better assays for the detection and identification in direct specimens are needed to improve the diagnosis of disease caused by black yeast-like fungi.
  相似文献   

4.

Background

In the context of sensory and cognitive-processing deficits in ADHD patients, there is considerable evidence of altered event related potentials (ERP). Most of the studies, however, were done on ADHD children. Using the independent component analysis (ICA) method, ERPs can be decomposed into functionally different components. Using the classification method of support vector machine, this study investigated whether features of independent ERP components can be used for discrimination of ADHD adults from healthy subjects.

Methods

Two groups of age- and sex-matched adults (74 ADHD, 74 controls) performed a visual two stimulus GO/NOGO task. ERP responses were decomposed into independent components by means of ICA. A feature selection algorithm defined a set of independent component features which was entered into a support vector machine.

Results

The feature set consisted of five latency measures in specific time windows, which were collected from four different independent components. The independent components involved were a novelty component, a sensory related and two executive function related components. Using a 10-fold cross-validation approach, classification accuracy was 92%.

Conclusions

This study was a first attempt to classify ADHD adults by means of support vector machine which indicates that classification by means of non-linear methods is feasible in the context of clinical groups. Further, independent ERP components have been shown to provide features that can be used for characterizing clinical populations.
  相似文献   

5.

Background

Wild orchids are illegally harvested and traded in Nepal for use in local traditional medicine, horticulture, and international trade. This study aims to: 1) identify the diversity of species of wild orchids in trade in Nepal; 2) study the chain of commercialization from collector to client and/or export; 3) map traditional knowledge and medicinal use of orchids; and 4) integrate the collected data to propose a more sustainable approach to orchid conservation in Nepal.

Methods

Trade, species diversity, and traditional use of wild-harvested orchids were documented during field surveys of markets and through interviews. Trade volumes and approximate income were estimated based on surveys and current market prices. Orchid material samples were identified to species level using a combination of morphology and DNA barcoding.

Results

Orchid trade is a long tradition, and illegal export to China, India and Hong Kong is rife. Estimates show that 9.4 tons of wild orchids were illegally traded from the study sites during 2008/2009. A total of 60 species of wild orchids were reported to be used in traditional medicinal practices to cure at least 38 different ailments, including energizers, aphrodisiacs and treatments of burnt skin, fractured or dislocated bones, headaches, fever and wounds. DNA barcoding successfully identified orchid material to species level that remained sterile after culturing.

Conclusions

Collection of wild orchids was found to be widespread in Nepal, but illegal trade is threatening many species in the wild. Establishment of small-scale sustainable orchid breeding enterprises could be a valuable alternative for the production of medicinal orchids for local communities. Critically endangered species should be placed on CITES Appendix I to provide extra protection to those species. DNA barcoding is an effective method for species identification and monitoring of illegal cross-border trade.
  相似文献   

6.

Background

With the improvements in biosensors and high-throughput image acquisition technologies, life science laboratories are able to perform an increasing number of experiments that involve the generation of a large amount of images at different imaging modalities/scales. It stresses the need for computer vision methods that automate image classification tasks.

Results

We illustrate the potential of our image classification method in cell biology by evaluating it on four datasets of images related to protein distributions or subcellular localizations, and red-blood cell shapes. Accuracy results are quite good without any specific pre-processing neither domain knowledge incorporation. The method is implemented in Java and available upon request for evaluation and research purpose.

Conclusion

Our method is directly applicable to any image classification problems. We foresee the use of this automatic approach as a baseline method and first try on various biological image classification problems.
  相似文献   

7.
Lyu  Chuqiao  Wang  Lei  Zhang  Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.
  相似文献   

8.

Objective

To develop a simple method for efficient expression of classical swine fever virus (CSFV) E2 protein.

Results

The pFastBac HT B vector (pFastHTB-M1) was modified by adding a melittin signal peptide sequence. The E2 gene fragment without the transmembrane region was cloned into pFastHTB-M1. The modified vector has clear advantage over the original one, as evidenced by the purified recombinant E2 protein that was detected significantly by SDS-PAGE.

Conclusions

The modified vector has the potential for large-scale production and easy purification of the CSFV E2 protein or other proteins of interests.
  相似文献   

9.

Background

Mixtures of beta distributions are a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1.

Methods

While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm combines latent variables with the method of moments instead of maximum likelihood, which has computational advantages over the popular EM algorithm.

Results

As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels. We also demonstrate that we can accurately infer the number of mixture components.

Conclusions

The hybrid algorithm between likelihood-based component un-mixing and moment-based parameter estimation is a robust and efficient method for beta mixture estimation. We provide an implementation of the method (“betamix”) as open source software under the MIT license.
  相似文献   

10.

Background

Methylation analysis of cell-free DNA is a encouraging tool for tumor diagnosis, monitoring and prognosis. Sensitivity of methylation analysis is a very important matter due to the tiny amounts of cell-free DNA available in plasma. Most current methods of DNA methylation analysis are based on the difference of bisulfite-mediated deamination of cytosine between cytosine and 5-methylcytosine. However, the recovery of bisulfite-converted DNA based on current methods is very poor for the methylation analysis of cell-free DNA.

Results

We optimized a rapid method for the crucial steps of bisulfite conversion with high recovery of cell-free DNA. A rapid deamination step and alkaline desulfonation was combined with the purification of DNA on a silica column. The conversion efficiency and recovery of bisulfite-treated DNA was investigated by the droplet digital PCR. The optimization of the reaction results in complete cytosine conversion in 30 min at 70 °C and about 65% of recovery of bisulfite-treated cell-free DNA, which is higher than current methods.

Conclusions

The method allows high recovery from low levels of bisulfite-treated cell-free DNA, enhancing the analysis sensitivity of methylation detection from cell-free DNA.
  相似文献   

11.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

12.

Background

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.

Result

Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.

Conclusion

Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.

Reviewers

This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
  相似文献   

13.
Gao S  Xu S  Fang Y  Fang J 《Proteome science》2012,10(Z1):S7

Background

Identification of phosphorylation sites by computational methods is becoming increasingly important because it reduces labor-intensive and costly experiments and can improve our understanding of the common properties and underlying mechanisms of protein phosphorylation.

Methods

A multitask learning framework for learning four kinase families simultaneously, instead of studying each kinase family of phosphorylation sites separately, is presented in the study. The framework includes two multitask classification methods: the Multi-Task Least Squares Support Vector Machines (MTLS-SVMs) and the Multi-Task Feature Selection (MT-Feat3).

Results

Using the multitask learning framework, we successfully identify 18 common features shared by four kinase families of phosphorylation sites. The reliability of selected features is demonstrated by the consistent performance in two multi-task learning methods.

Conclusions

The selected features can be used to build efficient multitask classifiers with good performance, suggesting they are important to protein phosphorylation across 4 kinase families.
  相似文献   

14.

Objectives

To develop a method for reliable quantification of viral vectors, which is necessary for determining the optimal dose of vector particles in clinical trials to obtain the desired effects without severe unwanted immune responses.

Results

A significant level of vector plasmid remained in retroviral and lentiviral vector samples, which led to overestimation of viral titers when using the conventional RT-qPCR-based genomic titration method. To address this problem, we developed a new method in which the residual plasmid was quantified by an additional RT-qPCR step, and standard molecules and primer sets were optimized. The obtained counts were then used to correct the conventionally measured genomic titers of viral samples. While the conventional method produced significantly higher genomic titers for mutant retroviral vectors than for wild-type vectors, our method produced slightly higher or equivalent titers, corresponding with the general idea that mutation of viral components mostly results in reduced or, at best, retained titers.

Conclusion

Subtraction of the number of residual vector plasmid molecules from the conventionally measured genomic titer can yield reliable quantification of retroviral and lentiviral vector samples, a prerequisite to advancing the safety of gene therapy applications.
  相似文献   

15.
16.

Background

Dementia is an age-related cognitive decline which is indicated by an early degeneration of cortical and sub-cortical structures. Characterizing those morphological changes can help to understand the disease development and contribute to disease early prediction and prevention. But modeling that can best capture brain structural variability and can be valid in both disease classification and interpretation is extremely challenging. The current study aimed to establish a computational approach for modeling the magnetic resonance imaging (MRI)-based structural complexity of the brain using the framework of hidden Markov models (HMMs) for dementia recognition.

Methods

Regularity dimension and semi-variogram were used to extract structural features of the brains, and vector quantization method was applied to convert extracted feature vectors to prototype vectors. The output VQ indices were then utilized to estimate parameters for HMMs. To validate its accuracy and robustness, experiments were carried out on individuals who were characterized as non-demented and mild Alzheimer's diseased. Four HMMs were constructed based on the cohort of non-demented young, middle-aged, elder and demented elder subjects separately. Classification was carried out using a data set including both non-demented and demented individuals with a wide age range.

Results

The proposed HMMs have succeeded in recognition of individual who has mild Alzheimer's disease and achieved a better classification accuracy compared to other related works using different classifiers. Results have shown the ability of the proposed modeling for recognition of early dementia.

Conclusion

The findings from this research will allow individual classification to support the early diagnosis and prediction of dementia. By using the brain MRI-based HMMs developed in our proposed research, it will be more efficient, robust and can be easily used by clinicians as a computer-aid tool for validating imaging bio-markers for early prediction of dementia.
  相似文献   

17.

Background

Staged palliative surgery markedly shifts the balance of volume load on a single ventricle and pulmonary vascular bed. Blalock-Taussig shunt necessitates a single ventricle eject blood to both the systemic and pulmonary circulation. On the contrary, bidirectional cavopulmonary shunt release the single ventricle from pulmonary circulation.

Case presentation

We report a case of tricuspid atresia patient who underwent first palliative surgery and second palliative surgery. Volume loading condition was assessed by energetic parameters (energy loss, kinetic energy) intraoperatively using vector flow mapping. These energetic parameters can simply indicate the volume loading condition.

Conclusion

Vector flow mapping was useful tool for monitoring volume loading condition in congenital heart disease surgery.
  相似文献   

18.

Background:

The wide availability of genome-scale data for several organisms has stimulated interest in computational approaches to gene function prediction. Diverse machine learning methods have been applied to unicellular organisms with some success, but few have been extensively tested on higher level, multicellular organisms. A recent mouse function prediction project (MouseFunc) brought together nine bioinformatics teams applying a diverse array of methodologies to mount the first large-scale effort to predict gene function in the laboratory mouse.

Results:

In this paper, we describe our contribution to this project, an ensemble framework based on the support vector machine that integrates diverse datasets in the context of the Gene Ontology hierarchy. We carry out a detailed analysis of the performance of our ensemble and provide insights into which methods work best under a variety of prediction scenarios. In addition, we applied our method to Saccharomyces cerevisiae and have experimentally confirmed functions for a novel mitochondrial protein.

Conclusion:

Our method consistently performs among the top methods in the MouseFunc evaluation. Furthermore, it exhibits good classification performance across a variety of cellular processes and functions in both a multicellular organism and a unicellular organism, indicating its ability to discover novel biology in diverse settings.
  相似文献   

19.

Introduction

Untargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies.

Objectives

We aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses.

Methods

We developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization.

Results

After SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected.

Conclusion

SVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods.
  相似文献   

20.

Introduction

Mass spectrometry imaging (MSI) experiments result in complex multi-dimensional datasets, which require specialist data analysis tools.

Objectives

We have developed massPix—an R package for analysing and interpreting data from MSI of lipids in tissue.

Methods

massPix produces single ion images, performs multivariate statistics and provides putative lipid annotations based on accurate mass matching against generated lipid libraries.

Results

Classification of tissue regions with high spectral similarly can be carried out by principal components analysis (PCA) or k-means clustering.

Conclusion

massPix is an open-source tool for the analysis and statistical interpretation of MSI data, and is particularly useful for lipidomics applications.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号