首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Fourier Transform Mass Spectrometry coupled with Liquid Chromatography(LC-FTMS) has been widely used in proteomics. Past investigation has revealed that there exists an intensity dependent random suppression in peptide elution profiles in LC-FTMS data. The suppression is homogenous for the same peptide but non-homogenous for different peptides. The correction of suppressed profiles and an estimation on the range of suppression are necessary for accurate and reliable quantification using FTMS data.

Results

A software package, Gcorr, is presented. The software corrects peptide profiles that satisfy correction conditions, and it can predict fold change null distributions at different intensity levels. Subsequently, the significance P-values of measured fold changes can be estimated based on the predicted null distributions. We have used an 1:1 LC-FTMS label-free dataset pair collected based on the same sample to verify that our predicted null distributions conforms to that of the observed null distribution.

Conclusions

This software is able to provide suppression correction for peptide profiles, suppression distribution analysis and peptide differential expression analysis in terms of its fold change significance. The software is freely available at http://compgenomics.utsa.edu/Suppression_Study.html.
  相似文献   

2.
The gray mouse lemur (Microcebus murinus) is considered a useful primate model for translational research. In the framework of IMI PharmaCog project (Grant Agreement n°115009, www.pharmacog.org), we tested the hypothesis that spectral electroencephalographic (EEG) markers of motor and locomotor activity in gray mouse lemurs reflect typical movement-related desynchronization of alpha rhythms (about 8–12 Hz) in humans. To this aim, EEG (bipolar electrodes in frontal cortex) and electromyographic (EMG; bipolar electrodes sutured in neck muscles) data were recorded in 13 male adult (about 3 years) lemurs. Artifact-free EEG segments during active state (gross movements, exploratory movements or locomotor activity) and awake passive state (no sleep) were selected on the basis of instrumental measures of animal behavior, and were used as an input for EEG power density analysis. Results showed a clear peak of EEG power density at alpha range (7–9 Hz) during passive state. During active state, there was a reduction in alpha power density (8–12 Hz) and an increase of power density at slow frequencies (1–4 Hz). Relative EMG activity was related to EEG power density at 2–4 Hz (positive correlation) and at 8–12 Hz (negative correlation). These results suggest for the first time that the primate gray mouse lemurs and humans may share basic neurophysiologic mechanisms of synchronization of frontal alpha rhythms in awake passive state and their desynchronization during motor and locomotor activity. These EEG markers may be an ideal experimental model for translational basic (motor science) and applied (pharmacological and non-pharmacological interventions) research in Neurophysiology.  相似文献   

3.
Structural variation (SV) has been reported to be associated with numerous diseases such as cancer. With the advent of next generation sequencing (NGS) technologies, various types of SV can be potentially identified. We propose a model based clustering approach utilizing a set of features defined for each type of SV events. Our method, termed SVMiner, not only provides a probability score for each candidate, but also predicts the heterozygosity of genomic deletions. Extensive experiments on genome-wide deep sequencing data have demonstrated that SVMiner is robust against the variability of a single cluster feature, and it significantly outperforms several commonly used SV detection programs. SVMiner can be downloaded from http://cbc.case.edu/svminer/.  相似文献   

4.
5.
6.
7.
Louisa A. Stark 《Genetics》2015,200(3):679-680
The Genetics Society of America’s Elizabeth W. Jones Award for Excellence in Education recognizes significant and sustained impact on genetics education. The 2015 awardee, Louisa Stark, has made a major impact on global access to genetics education through her work as director of the University of Utah Genetic Science Learning Center. The Center’s Learn.Genetics and Teach.Genetics websites are the most widely used online genetic education resources in the world. In 2014, they were visited by 18 million students, educators, scientists, and members of the public. With over 60 million page views annually, Learn.Genetics is among the most used sites on the Web.Open in a separate window  相似文献   

8.
Aspirin-exacerbated respiratory disease (AERD) remains widely underdiagnosed in asthmatics, primarily due to insufficient awareness of the relationship between aspirin ingestion and asthma exacerbation. The identification of aspirin hypersensitivity is therefore essential to avoid serious aspirin complications. The goal of the study was to develop plasma biomarkers to predict AERD. We identified differentially expressed genes in peripheral blood mononuclear cells (PBMC) between subjects with AERD and those with aspirin-tolerant asthma (ATA). The genes were matched with the secreted protein database (http://spd.cbi.pku.edu.cn/) to select candidate proteins in the plasma. Plasma levels of the candidate proteins were then measured in AERD (n = 40) and ATA (n = 40) subjects using an enzyme-linked immunosorbent assay (ELISA). Target genes were validated as AERD biomarkers using an ROC curve analysis. From 175 differentially expressed genes (p-value <0.0001) that were queried to the secreted protein database, 11 secreted proteins were retrieved. The gene expression patterns were predicted as elevated for 7 genes and decreased for 4 genes in AERD as compared with ATA subjects. Among these genes, significantly higher levels of plasma eosinophil-derived neurotoxin (RNASE2) were observed in AERD as compared with ATA subjects (70(14.62∼311.92) µg/ml vs. 12(2.55∼272.84) µg/ml, p-value <0.0003). Based on the ROC curve analysis, the AUC was 0.74 (p-value = 0.0001, asymptotic 95% confidence interval [lower bound: 0.62, upper bound: 0.83]) with 95% sensitivity, 60% specificity, and a cut-off value of 27.15 µg/ml. Eosinophil-derived neurotoxin represents a novel biomarker to distinguish AERD from ATA.  相似文献   

9.
Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request.  相似文献   

10.

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.  相似文献   

11.
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.  相似文献   

12.

Background

Predicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy.

Results

We propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations.

Conclusion

Applying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper.

Software availability

The software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar.The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work.  相似文献   

13.
Recent studies have revealed that a small non-coding RNA, microRNA (miRNA) down-regulates its mRNA targets. This effect is regarded as an important role in various biological processes. Many studies have been devoted to predicting miRNA-target interactions. These studies indicate that the interactions may only be functional in some specific tissues, which depend on the characteristics of an miRNA. No systematic methods have been established in the literature to investigate the correlation between miRNA-target interactions and tissue specificity through microarray data. In this study, we propose a method to investigate miRNA-target interaction-supported tissues, which is based on experimentally validated miRNA-target interactions. The tissue specificity results by our method are in accordance with the experimental results in the literature.

Availability and Implementation

Our analysis results are available at http://tsmti.mbc.nctu.edu.tw/ and http://www.stat.nctu.edu.tw/hwang/tsmti.html.  相似文献   

14.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.  相似文献   

15.
A Genomic Islands (GI) is a chunk of DNA sequence in a genome whose origin can be traced back to other organisms or viruses. The detection of GIs plays an indispensable role in biomedical research, due to the fact that GIs are highly related to special functionalities such as disease-causing GIs - pathogenicity islands. It is also very important to visualize genomic islands, as well as the supporting features corresponding to the genomic islands in the genome. We have developed a program, Genomic Island Visualization (GIV), which displays the locations of genomic islands in a genome, as well as the corresponding supportive feature information for GIs. GIV was implemented in C++, and was compiled and executed on Linux/Unix operating systems.

Availability

GIV is freely available for non-commercial use at http://www5.esu.edu/cpsc/bioinfo/software/GIV  相似文献   

16.
Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.  相似文献   

17.
In this study, we introduce the fast wavelet transform (WT) as a method for investigating the effects of morphine on the electroencephalogram (EEG), respiratory activity and blood pressure in fetal lambs. Morphine was infused intravenously at 25 mg/h. The EEG, respiratory activity and blood pressure signals were analyzed using WT. We performed wavelet decomposition for five sets of parameters D 2j where -1 < j 5. The five series WTs represent the detail signal bandwidths: 1, 16–32 Hz; 2, 8–16 Hz; 3, 4–8 Hz; 4, 2–4 Hz; 5, 1–2 Hz. Before injection of the high-dose morphine, power in the EEG was high in all six frequency bandwidths. The respiratory and blood pressure signals showed common frequency components with respect to time and were coincident with the low-voltage fast activity (LVFA) EEG signal. Respiratory activity was observed during only some of the LVFA periods, and was completely absent during high-voltage slow activity (HVSA) EEG. The respiratory signal showed dominant power in the fourth wavelet band, and less power in the third and fifth bands. The blood pressure signal was also characterized by dominant power in the fourth wavelet band. This power was significantly increased during periods of respiratory activity. There was a strong relationship between fetal EEG, blood pressure and breathing movements. However, the injection of high-dose morphine resulted in a disruption of the normal cyclic pattern between the two EEG states and a significant increase in power in the first wavelet band. In addition, the high-dose drug resulted in a significant increase in the power of respiratory signal in the fourth and fifth wavelet bands, while power was reduced in the third wavelet band. Breathing activity was also continuous after the drug. The high-dose morphine also caused a temporary power shift from the third wavelet band to the fourth wavelet band for the 30-min period after injection of drug. Finally, high-dose morphine completely destroyed the correlation between EEG, breathing and blood pressure signals.  相似文献   

18.
Aggregatibacter actinomycetemcomitans is a major etiological agent of periodontitis. Here we report the complete genome sequence of serotype c strain D11S-1, which was recovered from the subgingival plaque of a patient diagnosed with generalized aggressive periodontitis.Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontal disease, in particular aggressive periodontitis (12). The natural population of A. actinomycetemcomitans is clonal (7). Six A. actinomycetemcomitans serotypes are distinguished based on the structural and serological characteristics of the O antigen of LPS (6, 7). Three of the serotypes (a, b, and c) comprise >80% of all strains, and each serotype represents a distinct clonal lineage (1, 6, 7). Serotype c strain D11S-1 was cultured from a subgingival plaque sample of a patient diagnosed with generalized aggressive periodontitis. The complete genome sequencing of the strain was determined by 454 pyrosequencing (10), which achieved 25× coverage. Assembly was performed using the Newbler assembler (454, Branford, CT) and generated 199 large contigs, with 99.3% of the bases having a quality score of 40 and above. The contigs were aligned with the genome of the sequenced serotype b strain HK1651 (http://www.genome.ou.edu/act.html) using software written in house. The putative contig gaps were then closed by primer walking and sequencing of PCR products over the gaps. The final genome assembly was further confirmed by comparison of an in silico NcoI restriction map to the experimental map generated by optical mapping (8). The genome structure of the D11S-1 strain was compared to that of the sequenced strain HK1651 using the program MAUVE (2, 3). The automated annotation was done using a protocol similar to the annotation engine service at The Institute for Genomic Research/J. Craig Venter Institute with some local modifications. Briefly, protein-coding genes were identified using Glimmer3 (4). Each protein sequence was then annotated by comparing to the GenBank nonredundant protein database. BLAST-Extend-Repraze was applied to the predicted genes to identify genes that might have been truncated due to a frameshift mutation or premature stop codon. tRNA and rRNA genes were identified by using tRNAScan-SE (9) and a similarity search to our in-house RNA database, respectively.The D11S-1 circular genome contains 2,105,764 nucleotides, a GC content of 44.55%, 2,134 predicted coding sequences, and 54 tRNA and 19 rRNA genes (see additional data at http://expression.washington.edu/bumgarnerlab/publications.php). The distribution of predicted genes based on functional categories was similar between D11S-1 and HK1651 (http://expression.washington.edu/bumgarnerlab/publications.php). One hundred six and 86 coding sequences were unique to strain D11S-1 and HK1651, respectively (http://expression.washington.edu/bumgarnerlab/publications.php). Genomic islands were identified based on annotations for strain HK1651 and based on manual inspection of contiguous D11S-1 specific DNA regions with G+C bias (http://expression.washington.edu/bumgarnerlab/publications.php). Among 12 identified genomics islands, 5 (B, C, D, E and G; cytolethal distending toxin gene cluster, tight adherence gene cluster, O-antigen biosynthesis and transport gene cluster, leukotoxin gene cluster, and lipoligosaccharide biosynthesis enzyme gene, respectively) correspond to islands 2 to 5 and 8 of strain HK1651 (http://www.oralgen.lanl.gov/) (5). Island F (∼5 kb) is homologous to a portion of the 12.5-kb island 7 in HK1651. Five genomic islands (H to L) were unique to strain D11S-1. The remaining island (A) is a fusion of genomic islands 1 and 6, in strain HK1651. The genome of D11S-1 is largely in synteny with the genome of the sequenced serotype b strain HK1651 but contained several large-scale genomic rearrangements.Strain D11S-1 harbors a 43-kb bacteriophage and two plasmids of 31 and 23 kb (http://expression.washington.edu/bumgarnerlab/publications.php). Excluding an ∼9-kb region of low homology, the phage showed >90% nucleotide sequence identity with AaΦ23 (11). A 49-bp attB site (11) was identified at coordinates 2,024,825 to 2,024,873. The location of the inserted phage was identified in the optical map of strain D11S-1 and further confirmed by PCR amplification and sequencing of the regions flanking the insertion site. A closed circular form of the phage was also detected in strain D11S-1 by PCR analysis of the phage ends. The 23-kb plasmid is homologous to pVT745 (92% nucleotide identities). The 31-kb plasmid is a novel plasmid. It has significant homologies in short regions (<2 kb) to Haemophilus influenzae biotype aegyptius plasmid pF1947 and other plasmids.  相似文献   

19.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.  相似文献   

20.
Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号