首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The computer program LUDI for automated structure-based drug design is described. The program constructs possible new ligands for a given protein of known three-dimensional structure. This novel approach is based upon rules about energetically favourable non-bonded contact geometries between functional groups of the protein and the ligand which are derived from a statistical analysis of crystal packings of organic molecules. In a first step small fragments are docked into the protein binding site in such a way that hydrogen bonds and ionic interactions can be formed with the protein and hydrophobic pockets are filled with lipophilic groups of the ligands. The program can then append further fragments onto a previously positioned fragments or onto an already existing ligand (e.g., a lead structure that one seeks to improve). It is also possible to link several fragments together by bridge fragments to form a complete molecule. All putative ligands retrieved or constructed by LUDI are scored. We use a simple scoring function that was fitted to experimentally determined binding constants of protein–ligand complexes. LUCI is a very fast program with typical execution times of 1–5 min on a work station and is therefore suitable for interactive usage.  相似文献   

2.
3.

Background

Single-cell RNA sequencing (scRNA-Seq) is an emerging technology that has revolutionized the research of the tumor heterogeneity. However, the highly sparse data matrices generated by the technology have posed an obstacle to the analysis of differential gene regulatory networks.

Results

Addressing the challenges, this study presents, as far as we know, the first bioinformatics tool for scRNA-Seq-based differential network analysis (scdNet). The tool features a sample size adjustment of gene-gene correlation, comparison of inter-state correlations, and construction of differential networks. A simulation analysis demonstrated the power of scdNet in the analyses of sparse scRNA-Seq data matrices, with low requirement on the sample size, high computation efficiency, and tolerance of sequencing noises. Applying the tool to analyze two datasets of single circulating tumor cells (CTCs) of prostate cancer and early mouse embryos, our data demonstrated that differential gene regulation plays crucial roles in anti-androgen resistance and early embryonic development.

Conclusions

Overall, the tool is widely applicable to datasets generated by the emerging technology to bring biological insights into tumor heterogeneity and other studies. MATLAB implementation of scdNet is available at https://github.com/ChenLabGCCRI/scdNet.
  相似文献   

4.
ACUA: a software tool for automated codon usage analysis   总被引:1,自引:0,他引:1  
Currently available codon usage analysis tools lack intuitive graphical user interface and are limited to inbuilt calculations. ACUA (Automated Codon Usage Tool) has been developed to perform high throughput sequence analysis aiding statistical profiling of codon usage. The results of ACUA are presented in a spreadsheet with all perquisite codon usage data required for statistical analysis, displayed in a graphical interface. The package is also capable of on-click sequence retrieval from the results interface, and this feature is unique to ACUA. AVAILABILITY: The package is available for non-commercial purposes and can be downloaded from: http://www.bioinsilico.com/acua.  相似文献   

5.
Crystalline bacterial cell surface layers (S-layers) have been identified in a great number of different species of bacteria and represent an almost universal feature of archaea. Isolated native S-layer proteins and S-layer fusion proteins incorporating functional sequences self-assemble into monomolecular crystalline arrays in suspension, on a great variety of solid substrates and on various lipid structures including planar membranes and liposomes. S-layers have proven to be particularly suited as building blocks and patterning elements in a biomolecular construction kit involving all major classes of biological molecules (proteins, lipids, glycans, nucleic acids and combinations of them) enabling innovative approaches for the controlled 'bottom-up' assembly of functional supramolecular structures and devices. Here, we review the basic principles of S-layer proteins and the application potential of S-layers in nanobiotechnology and biomimetics including life and nonlife sciences.  相似文献   

6.
Kopf E  Shnitzer D  Zharhary D 《Proteomics》2005,5(9):2412-2416
Antibody arrays are a promising new tool for mass analysis of protein level changes in cells responding to different stimuli. Here we describe a novel antibody array system called Panorama Ab Microarray Cell Signaling, that contains 224 antibodies spotted on FAST nitrocellulose-coated slides that can detect protein levels as low as a few nanograms per mL. The antibodies spotted are specific for proteins important in various areas of cell signaling such as phosphorylation, cell cycle, apoptosis, nuclear signaling and cytoskeleton proteins. Furthermore, for some of the protein targes, the Panorama Ab Microarray can detect phosphorylated and nonphosphorylated forms of the traget protein. We found that treatment of the slides post-spotting is important for the array performance (ratio of signal to background) and its stability. Panorama Ab Microarray was used to analyze changes in protein expression in F9 embryonic carcinoma cells stimulated to differentiate by all-trans retinoic acid. We found that the level of several proteins, among them cell cycle regulators and kineases, was either up- or down-regulated. For more than ten protein targets, the results obtained by the Panorama Ab Microarray were confirmed by immunoblotting.  相似文献   

7.
Angiogenesis is the generation of mature vascular networks from pre-existing vessels. Angiogenesis is crucial during the organism' development, for wound healing and for the female reproductive cycle. Several murine experimental systems are well suited for studying developmental and pathological angiogenesis. They include the embryonic hindbrain, the post-natal retina and allantois explants. In these systems vascular networks are visualised by appropriate staining procedures followed by microscopical analysis. Nevertheless, quantitative assessment of angiogenesis is hampered by the lack of readily available, standardized metrics and software analysis tools. Non-automated protocols are being used widely and they are, in general, time--and labour intensive, prone to human error and do not permit computation of complex spatial metrics. We have developed a light-weight, user friendly software, AngioTool, which allows for quick, hands-off and reproducible quantification of vascular networks in microscopic images. AngioTool computes several morphological and spatial parameters including the area covered by a vascular network, the number of vessels, vessel length, vascular density and lacunarity. In addition, AngioTool calculates the so-called "branching index" (branch points/unit area), providing a measurement of the sprouting activity of a specimen of interest. We have validated AngioTool using images of embryonic murine hindbrains, post-natal retinas and allantois explants. AngioTool is open source and can be downloaded free of charge.  相似文献   

8.
9.
Phylomat: an automated protein motif analysis tool for phylogenomics   总被引:2,自引:0,他引:2  
Recent progress in genomics, proteomics, and bioinformatics enables unprecedented opportunities to examine the evolutionary history of molecular, cellular, and developmental pathways through phylogenomics. Accordingly, we have developed a motif analysis tool for phylogenomics (Phylomat, http://alg.ncsa.uiuc.edu/pmat) that scans predicted proteome sets for proteins containing highly conserved amino acid motifs or domains for in silico analysis of the evolutionary history of these motifs/domains. Phylomat enables the user to download results as full protein or extracted motif/domain sequences from each protein. Tables containing the percent distribution of a motif/domain in organisms normalized to proteome size are displayed. Phylomat can also align the set of full protein or extracted motif/domain sequences and predict a neighbor-joining tree from relative sequence similarity. Together, Phylomat serves as a user-friendly data-mining tool for the phylogenomic analysis of conserved sequence motifs/domains in annotated proteomes from the three domains of life.  相似文献   

10.
Identification of specific protein phosphorylation sites provides predicative signatures of cellular activity and specific disease states such as cancer, diabetes, Alzheimer disease, and rheumatoid arthritis. Recent progress in phosphopeptide isolation technology and tandem mass spectrometry has provided the means to identify thousands of phosphorylation sites from a single biological sample. These advances now make it possible to profile global changes in the phosphoproteome at an unprecedented level. However, although this technology is generating a wealth of information, there is currently no efficient means to identify phosphoprotein signatures shared among large phosphoprotein databases. Identification of common phosphoprotein signatures found in biologically relevant systems and their conservation throughout evolution would provide valuable insight into mechanisms of signal transduction and cell function. Here we describe the development of a computational program (PhosphoBlast) that can rapidly match thousands of phosphopeptides that share phosphorylation sites within and across species. PhosphoBlast analysis of several large phosphoprotein datasets from the literature revealed common phosphorylation signatures shared across diverse experimental platforms and species. Moreover PhosphoBlast is a powerful analysis tool to identify specific phosphosite mutations. Comparison of the mouse and human phosphoproteomes revealed more than 130 specific phosphoamino acid mutations, some of which are predicted to alter protein function. Further analysis revealed that known phosphorylated amino acids are more evolutionally conserved than the Ser/Thr/Tyr amino acids not known to be phosphorylated. Together our results demonstrate that PhosphoBlast is a versatile mining tool capable of identifying related phosphorylation signatures and phosphoamino acid mutations among complex proteomics datasets in a highly efficient and accurate manner. PhosphoBlast will aid in the informatics analysis of the phosphoproteome and the identification of phosphoprotein biomarkers of disease.  相似文献   

11.
SUMMARY: DNAfan (DNA Feature ANalyzer) is a tool combining sequence-filtering and pattern searching. DNAfan automatically extracts user-defined sets of sequence fragments from large sequence sets. Fragments are defined by annotated gene feature keys and co- or non-occurring patterns within the feature or close to it. A gene feature parser and a pattern-based filter tool localizes and extracts the specific subset of sequences. The selected sequence data can subsequently be retrieved for analyses or further processed with DNAfan to find the occurrence of specific patterns or structural motifs. DNAfan is a powerful tool for pattern analysis. Its filter features restricts the pattern search to a well-defined set of sequences, allowing drastic reduction in false positive hits. AVAILABILITY: http://bighost.ba.itb.cnr.it:8080/Framework.  相似文献   

12.
In the following work we discuss the application of image processing and pattern recognition to the field of quantitative phycology. We overview the area of image processing and review previously published literature pertaining to the image analysis of phycological images and, in particular, cyanobacterial image processing. We then discuss the main operations used to process images and quantify data contained within them. To demonstrate the utility of image processing to cyanobacteria classification, we present details of an image analysis system for automatically detecting and classifying several cyanobacterial taxa of Lake Biwa, Japan. Specifically, we initially target the genus Microcystis for detection and classification from among several species of Anabaena. We subsequently extend the system to classify a total of six cyanobacteria species. High-resolution microscope images containing a mix of the above species and other nontargeted objects are analyzed, and any detected objects are removed from the image for further analysis. Following image enhancement, we measure object properties and compare them to a previously compiled database of species characteristics. Classification of an object as belonging to a particular class membership (e.g., “Microcystis,”“A. smithii,”“Other,” etc.) is performed using parametric statistical methods. Leave-one-out classification results suggest a system error rate of approximately 3%. Received: September 6, 1999 / Accepted: February 6, 2000  相似文献   

13.

Background

Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis.

Results

Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user.

Conclusions

SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site.
  相似文献   

14.
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.Recent technological advancements have enabled the large-scale sampling of genomes from uncultured microbial taxa, through the high-throughput sequencing of single amplified genomes (SAGs; Rinke et al., 2013; Swan et al., 2013) and assembly and binning of genomes from metagenomes (GMGs; Cuvelier et al., 2010; Sharon and Banfield, 2013). The importance of these products in assessing community structure and function has been established beyond doubt (Kalisky and Quake, 2011). Multiple Displacement Amplification (MDA) and sequencing of single cells has been immensely successful in capturing rare and novel phyla, generating valuable references for phylogenetic anchoring. However, efforts to conduct MDA and sequencing in a high-throughput manner have been heavily impaired by contamination from DNA introduced by the environmental sample, as well as introduced during the MDA or sequencing process (Woyke et al., 2011; Engel et al., 2014; Field et al., 2014). Similarly, metagenome binning and assembly often carries various errors and artifacts depending on the methods used (Nielsen et al., 2014). Even cultured isolate genomes have been shown to lack immunity to contamination with other species (Parks et al., 2014; Mukherjee et al., 2015). As sequencing of these genome product types rapidly increases, contaminant sequences are finding their way into public databases as reference sequences. It is therefore extremely important to define standardized and automated protocols for quality control and decontamination, which would go a long way towards establishing quality standards for all microbial genome product types.Current procedures for decontamination and quality control of genome sequences in single cells and metagenome bins are heavily manual and can consume hours/megabase when performed by expert biologists. Supervised decontamination typically involves homology-based inspection of ribosomal RNA sequences and protein coding genes, as well as visual analysis of k-mer frequency plots and guanine–cytosine content (Clingenpeel, 2015). Manual decontamination is also possible through the software SmashCell (Harrington et al., 2010), which contains a tool for visual identification of contaminants from a self-organizing map and corresponding U-matrix. Another existing software tool, DeconSeq (Schmieder and Edwards, 2011), automatically removes contaminant sequences, however, the contaminant databases are required input. The former lacks automation, whereas the latter requires prior knowledge of contaminants, rendering both applications impractical for high-throughput decontamination.Here, we introduce ProDeGe, the first fully automated computational protocol for decontamination of genomes. ProDeGe uses a combination of homology-based and sequence composition-based approaches to separate contaminant sequences from the target genome draft. It has been pre-calibrated to discard at least 84% of the contaminant sequence, which results in retention of a median 84% of the target sequence. The standalone software is freely available at http://prodege.jgi-psf.org//downloads/src and can be run on any system that has Perl, R (R Core Team, 2014), Prodigal (Hyatt et al., 2010) and NCBI Blast (Camacho et al., 2009) installed. A graphical viewer allowing further exploration of data sets and exporting of contigs accompanies the web application for ProDeGe at http://prodege.jgi-psf.org, which is open to the wider scientific community as a decontamination service (Supplementary Figure S1).The assembly and corresponding NCBI taxonomy of the data set to be decontaminated are required inputs to ProDeGe (Figure 1a). Contigs are annotated with genes following which, eukaryotic contamination is removed based on homology of genes at the nucleotide level using the eukaryotic subset of NCBI''s Nucleotide database as the reference. For detecting prokaryotic contamination, a curated database of reference contigs from the set of high-quality genomes within the Integrated Microbial Genomes (IMG; Markowitz et al., 2014) system is used as the reference. This ensures that errors in public reference databases due to poor quality of sequencing, assembly and annotation do not negatively impact the decontamination process. Contigs determined as belonging to the target organism based on nucleotide level homology to sequences in the above database are defined as ‘Clean'', whereas those aligned to other organisms are defined as ‘Contaminant''. Contigs whose origin cannot be determined based on alignment are classified as ‘Undecided''. Classified clean and contaminated contigs are used to calibrate the separation in the subsequent 5-mer based binning module, which classifies undecided contigs as ‘Clean'' or ‘Contaminant'' using principal components analysis (PCA) of 5-mer frequencies. This parameter can also be specified by the user. When data sets do not have taxonomy deeper than phylum level, or a single confident taxonomic bin cannot be detected using sequence alignment, solely 9-mer based binning is used due to more accurate overall classification. In the absence of a user-defined cutoff, a pre-calibrated cutoff for 80% or more specificity separates the clean contigs from contaminated sequences in the resulting PCA of the 9-mer frequency matrix. Details on ProDeGe''s custom database, evaluation of the performance of the system and exploration of the parameter space to calibrate ProDeGe for a high accurate classification rate are provided in the Supplementary Material.Open in a separate windowFigure 1(a) Schematic overview of the ProDeGe engine. (b) Features of data sets used to validate ProDeGe: SAGs from the Arabidopsis endophyte sequencing project, MDM project, public data sets found in IMG but not sequenced at the JGI, as well as genomes from metagenomes. All the data and results can be found in Supplementary Table S3.The performance of ProDeGe was evaluated using 182 manually screened SAGs (Figure 1b,Supplementary Table S1) from two studies whose data sets are publicly available within the IMG system: genomes of 107 SAGs from an Arabidopsis endophyte sequencing project and 75 SAGs from the Microbial Dark Matter (MDM) project* (only 75/201 SAGs from the MDM project had 1:1 mapping between contigs in the unscreened and the manually screened versions, hence these were used; Rinke et al., 2013). Manual curation of these SAGs demonstrated that the use of ProDeGe prevented 5311 potentially contaminated contigs in these data sets from entering public databases. Figure 2a demonstrates the sensitivity vs specificity plot of ProDeGe results for the above data sets. Most of the data points in Figure 2a cluster in the top right of the box reflecting a median retention of 89% of the clean sequence (sensitivity) and a median rejection of 100% of the sequence of contaminant origin (specificity). In addition, on average, 84% of the bases of a data set are accurately classified. ProDeGe performs best when the target organism has sequenced homologs at the class level or deeper in its high-quality prokaryotic nucleotide reference database. If the target organism''s taxonomy is unknown or not deeper than domain level, or there are few contigs with taxonomic assignments, a target bin cannot be assessed and thus ProDeGe removes contaminant contigs using sequence composition only. The few samples in Figure 2a that demonstrate a higher rate of false positives (lower specificity) and/or reduced sensitivity typically occur when the data set contains few contaminant contigs or ProDeGe incorrectly assumes that the largest bin is the target bin. Some data sets contain a higher proportion of contamination than target sequence and ProDeGe''s performance can suffer under this condition. However, under all other conditions, ProDeGe demonstrates high speed, specificity and sensitivity (Figure 2). In addition, ProDeGe demonstrates better performance in overall classification when nucleotides are considered than when contigs are considered, illustrating that longer contigs are more accurately classified (Supplementary Table S1).Open in a separate windowFigure 2ProDeGe accuracy and performance scatterplots of 182 manually curated single amplified genomes (SAGs), where each symbol represents one SAG data set. (a) Accuracy shown by sensitivity (proportion of bases confirmed ‘Clean'') vs specificity (proportion of bases confirmed ‘Contaminant'') from the Endophyte and Microbial Dark Matter (MDM) data sets. Symbol size reflects input data set size in megabases. Most points cluster in the top right of the plot, showing ProDeGe''s high accuracy. Median and average overall results are shown in Supplementary Table S1. (b) ProDeGe completion time in central processing unit (CPU) core hours for the 182 SAGs. ProDeGe operates successfully at an average rate of 0.30 CPU core hours per megabase of sequence. Principal components analysis (PCA) of a 9-mer frequency matrix costs more computationally than PCA of a 5-mer frequency matrix used with blast-binning. The lack of known taxonomy for the MDM data sets prevents blast-binning, thus showing longer finishing times than the endophyte data sets, which have known taxonomy for use in blast-binning.All SAGs used in the evaluation of ProDeGe were assembled using SPAdes (Bankevich et al., 2012). In-house testing has shown that reads assembled with SPAdes from different strains or even slightly divergent species of the same genera may be combined into the same contig (Personal communications, KT and Robert Bowers). Ideally, the DNA in a well that gets sequenced belongs to a single cell. In the best case, contaminant sequences need to be at least from a different species to be recognized as such by the homology-based screening stage. In the absence of closely related sequenced organisms, contaminant sequences need to be at least from a different genus to be recognized as such by the composition-based screening stage (Supplementary Material). Thus, there is little risk of ProDeGe separating sequences from clonal populations or strains. We have found species- and genus-level contamination in MDA samples to be rare.To evaluate the quality of publicly available uncultured genomes, ProDeGe was used to screen 185 SAGs and 14 GMGs (Figure 1b). Compared with CheckM (Parks et al., 2014), a tool which calculates an estimate of genome sequence contamination using marker genes, ProDeGe generally marks a higher proportion of sequence as ‘Contaminant'' (Supplementary Table S2). This is because ProDeGe has been calibrated to perform at high specificity levels. The command line version of ProDeGe allows users to conduct their own calibration and specify a user-defined distance cutoff. Further, CheckM only outputs the proportion of contamination, but ProDeGe actually labels each contig as ‘Clean'' or ‘Contaminant'' during the process of automated removal.The web application for ProDeGe allows users to export clean and contaminant contigs, examine contig gene calls with their corresponding taxonomies, and discover contig clusters in the first three components of their k-dimensional space. Non-linear approaches for dimensionality reduction of k-mer vectors are gaining popularity (van der Maaten and Hinton, 2008), but we observed no systematic advantage of using t-Distributed Stochastic Neighbor Embedding over PCA (Supplementary Figure S2).ProDeGe is the first step towards establishing a standard for quality control of genomes from both cultured and uncultured microorganisms. It is valuable for preventing the dissemination of contaminated sequence data into public databases, avoiding resulting misleading analyses. The fully automated nature of the pipeline relieves scientists of hours of manual screening, producing reliably clean data sets and enabling the high-throughput screening of data sets for the first time. ProDeGe, therefore, represents a critical component in our toolkit during an era of next-generation DNA sequencing and cultivation-independent microbial genomics.  相似文献   

15.
Because most macroecological and biodiversity data are spatially autocorrelated, special tools for describing spatial structures and dealing with hypothesis testing are usually required. Unfortunately, most of these methods have not been available in a single statistical package. Consequently, using these tools is still a challenge for most ecologists and biogeographers. In this paper, we present sam (Spatial Analysis in Macroecology), a new, easy-to-use, freeware package for spatial analysis in macroecology and biogeography. Through an intuitive, fully graphical interface, this package allows the user to describe spatial patterns in variables and provides an explicit spatial framework for standard techniques of regression and correlation. Moran's I autocorrelation coefficient can be calculated based on a range of matrices describing spatial relationships, for original variables as well as for residuals of regression models, which can also include filtering components (obtained by standard trend surface analysis or by principal coordinates of neighbour matrices). sam also offers tools for correcting the number of degrees of freedom when calculating the significance of correlation coefficients. Explicit spatial modelling using several forms of autoregression and generalized least-squares models are also available. We believe this new tool will provide researchers with the basic statistical tools to resolve autocorrelation problems and, simultaneously, to explore spatial components in macroecological and biogeographical data. Although the program was designed primarily for the applications in macroecology and biogeography, most of sam 's statistical tools will be useful for all kinds of surface pattern spatial analysis. The program is freely available at http://www.ecoevol.ufg.br/sam (permanent URL at http://purl.oclc.org/sam/ ).  相似文献   

16.
17.
We describe Abacus, a computational tool for extracting spectral counts from MS/MS data sets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic data sets for subsequent, more sophisticated statistical analysis.  相似文献   

18.
The iTRAQ labeling method combined with shotgun proteomic techniques represents a new dimension in multiplexed quantitation for relative protein expression measurement in different cell states. To expedite the analysis of vast amounts of spectral data, we present a fully automated software package, called Multi-Q, for multiplexed iTRAQ-based quantitation in protein profiling. Multi-Q is designed as a generic platform that can accommodate various input data formats from search engines and mass spectrometer manufacturers. To calculate peptide ratios, the software automatically processes iTRAQ's signature peaks, including peak detection, background subtraction, isotope correction, and normalization to remove systematic errors. Furthermore, Multi-Q allows users to define their own data-filtering thresholds based on semiempirical values or statistical models so that the computed results of fold changes in peptide ratios are statistically significant. This feature facilitates the use of Multi-Q with various instrument types with different dynamic ranges, which is an important aspect of iTRAQ analysis. The performance of Multi-Q is evaluated with a mixture of 10 standard proteins and human Jurkat T cells. The results are consistent with expected protein ratios and thus demonstrate the high accuracy, full automation, and high-throughput capability of Multi-Q as a large-scale quantitation proteomics tool. These features allow rapid interpretation of output from large proteomic datasets without the need for manual validation. Executable Multi-Q files are available on Windows platform at http://ms.iis.sinica.edu.tw/Multi-Q/.  相似文献   

19.
We present AUDENS, a new platform-independent open source tool for automated de novo sequencing of peptides from MS/MS data. We implemented a dynamic programming algorithm and combined it with a flexible preprocessing module which is designed to distinguish between signal and other peaks. By applying a user-defined set of heuristics, AUDENS screens through the spectrum and assigns high relevance values to putative signal peaks. The algorithm constructs a sequence path through the MS/MS spectrum using the peak relevances to score each suggested sequence path, i.e., the corresponding amino acid sequence. At present, we consider AUDENS a prototype that unfolds its biggest potential if used in parallel with other de novo sequencing tools. AUDENS is available open source and can be downloaded with further documentation at http://www.ti.inf.ethz.ch/pw/software/audens/ .  相似文献   

20.
Microsatellites (MSs) are DNA regions consisting of repeated short motif(s). MSs are linked to several diseases and have important biomedical applications. Thus, researchers have developed several computational tools to detect MSs. However, the currently available tools require adjusting many parameters, or depend on a list of motifs or on a library of known MSs. Therefore, two laboratories analyzing the same sequence with the same computational tool may obtain different results due to the user-adjustable parameters. Recent studies have indicated the need for a standard computational tool for detecting MSs. To this end, we applied machine-learning algorithms to develop a tool called MsDetector. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is memory- and time-efficient. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号