首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: This paper gives a new and efficient algorithm for the sparse logistic regression problem. The proposed algorithm is based on the Gauss-Seidel method and is asymptotically convergent. It is simple and extremely easy to implement; it neither uses any sophisticated mathematical programming software nor needs any matrix operations. It can be applied to a variety of real-world problems like identifying marker genes and building a classifier in the context of cancer diagnosis using microarray data. RESULTS: The gene selection method suggested in this paper is demonstrated on two real-world data sets and the results were found to be consistent with the literature. AVAILABILITY: The implementation of this algorithm is available at the site http://guppy.mpe.nus.edu.sg/~mpessk/SparseLOGREG.shtml Supplementary Information: Supplementary material is available at the site http://guppy.mpe.nus.edu.sg/~mpessk/SparseLOGREG.shtml  相似文献   

2.
SUMMARY: Eukaryotes have both 'intron containing' and 'intron less' genes. Several databases are available for 'intron containing' genes in eukaryotes. In this note, we describe a database for 'intron less' genes from eukaryotes. 'Intron less' eukaryotic genes having prokaryotic architecture will help to understand gene evolution in a much simpler way unlike 'intron containing' genes. AVAILABILITY: SEGE is available at http://intron.bic.nus.edu.sg/seg/ CONTACT: mmeena@ntu.edu.sg  相似文献   

3.
BLAST++ is a tool that is integrated with NCBI BLAST, allowing multiple, say K, queries to be searched against a database concurrently. The results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but BLAST++ completes the processing in a much shorter time. AVAILABILITY: http://xena1.ddns.comp.nus.edu.sg/~genesis/blast++ Supplementary information: http://xena1.ddns.comp.nus.edu.sg/~genesis/blast++  相似文献   

4.
MOTIVATION: Analysis of gene expression data can provide insights into the positive and negative co-regulation of genes. However, existing methods such as association rule mining are computationally expensive and the quality and quantities of the rules are sensitive to the support and confidence values. In this paper, we introduce the concept of positive and negative co-regulated gene cluster (PNCGC) that more accurately reflects the co-regulation of genes, and propose an efficient algorithm to extract PNCGCs. RESULTS: We experimented with the Yeast dataset and compared our resulting PNCGCs with the association rules generated by the Apriori mining algorithm. Our results show that our PNCGCs identify some missing co-regulations of association rules, and our algorithm greatly reduces the large number of rules involving uncorrelated genes generated by the Apriori scheme. AVAILABILITY: The software is available upon request.  相似文献   

5.
SUMMARY: The relationship between intron distribution in the eukaryotic gene and protein structural elements is essential for understanding the origin and evolution of genes. XdomView is a web-based viewer mapping protein structural domains and intron positions in eukaryotic homologues to its tertiary structure. The association of sequence signals to 3D structure in XdomView provides a valuable visualization environment for eukaryotic gene organization, gene evolution, protein folding and protein structure classification. AVAILABILITY: Freely available from http://surya.bic.nus.edu.sg/xdom.  相似文献   

6.
7.
MOTIVATION: Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements. RESULTS: We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters. AVAILABILITY: The open-source package gimm is available at http://eh3.uc.edu/gimm.  相似文献   

8.
We propose a detailed protein structure alignment method named "MatAlign". It is a two-step algorithm. Firstly, we represent 3D protein structures as 2D distance matrices, and align these matrices by means of dynamic programming in order to find the initially aligned residue pairs. Secondly, we refine the initial alignment iteratively into the optimal one according to an objective scoring function. We compare our method against DALI and CE, which are among the most accurate and the most widely used of the existing structural comparison tools. On the benchmark set of 68 protein structure pairs by Fischer et al., MatAlign provides better alignment results, according to four different criteria, than both DALI and CE in a majority of cases. MatAlign also performs as well in structural database search as DALI does, and much better than CE does. MatAlign is about two to three times faster than DALI, and has about the same speed as CE. The software and the supplementary information for this paper are available at http://xena1.ddns.comp.nus.edu.sg/~genesis/MatAlign/.  相似文献   

9.
MOTIVATION: One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. RESULTS: We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. AVAILABILITY: A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). CONTACT: ekzmao@ntu.edu.sg.  相似文献   

10.
ExInt: an Exon Intron Database   总被引:5,自引:0,他引:5       下载免费PDF全文
The Exon/Intron Database (ExInt) stores information of all GenBank eukaryotic entries containing an annotated intron sequence. Data are available through a retrieval system, as flat-files and as a MySQL dump file. In this report we discuss several implementations added to ExInt, which is accessible at http://intron.bic.nus.edu.sg/exint/newexint/exint.html.  相似文献   

11.
The Exon/Intron (ExInt) database incorporates information on the exon/intron structure of eukaryotic genes. Features in the database include: intron nucleotide sequence, amino acid sequence of the corresponding protein, position of the introns at the amino acid level and intron phase. From ExInt, we have also generated four additional databases each with ExInt entries containing predicted introns, introns experimentally defined, organelle introns or nuclear introns. ExInt is accessible through a retrieval system with pointers to GenBank. The database can be searched by keywords, locus name, NID, accession number or length of the protein. ExInt is freely accessible at http://intron.bic.nus.edu.sg/exint/exint.html  相似文献   

12.
13.
SUMMARY: Disease processes often involve crosstalks between proteins in different pathways. Different proteins have been used as separate therapeutic targets for the same disease. Synergetic targeting of multiple targets has been explored in combination therapy of a number of diseases. Potential harmful interactions of multiple targeting have also been closely studied. To facilitate mechanistic study of drug actions and a more comprehensive understanding the relationship between different targets of the same disease, it is useful to develop a database of known therapeutically relevant multiple pathways (TRMPs). Information about non-target proteins and natural small molecules involved in these pathways also provides useful hint for searching new therapeutic targets and facilitate the understanding of how therapeutic targets interact with other molecules in performing specific tasks. The TRMPs database is designed to provide information about such multiple pathways along with related therapeutic targets, corresponding drugs/ligands, targeted disease conditions, constituent individual pathways, structural and functional information about each protein in the pathways. Cross links to other databases are also introduced to facilitate the access of information about individual pathways and proteins. AVAILABILITY: This database can be accessed at http://bidd.nus.edu.sg/group/trmp/trmp.asp and it currently contains 11 entries of multiple pathways, 97 entries of individual pathways, 120 targets covering 72 disease conditions together with 120 sets of drugs directed at each of these targets. Each entry can be retrieved through multiple methods including multiple pathway name, individual pathway name and disease name. SUPPLEMENTARY INFORMATION: http://bidd.nus.edu.sg/group/trmp/sm.pdf  相似文献   

14.
MOTIVATION: Intron sliding is the relocation of intron-exon boundaries over short distances and is often also referred to as intron slippage or intron migration or intron drift. We have generated a database containing discordant intron positions in homologous genes (MIDB--Mismatched Intron DataBase). Discordant intron positions are those that are either closely located in homologous genes (within a window of 10 nucleotides) or an intron position that is present in one gene but not in any of its homologs. The MIDB database aims at systematically collecting information about mismatched introns in the genes from GenBank and organizing it into a form useful for understanding the genomics and dynamics of introns thereby helping understand the evolution of genes. RESULTS: Intron displacement or sliding is critically important for explaining the present distribution of introns among orthologous and paralogous genes. MIDB allows examining of intron movements and allows mapping of intron positions from homologous proteins onto a single sequence. The database is of potential use for molecular biologists in general and for researchers who are interested in gene evolution and eukaryotic gene structure. Partial analysis of this database allowed us to identify a few putative cases of intron sliding. AVAILABILITY: http://intron.bic.nus.edu.sg/midb/midb.html  相似文献   

15.
High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.  相似文献   

16.
SUMMARY: Microarrays have been used to perform high-throughput genetic analyses such as single-nucleotide polymorphisms detection and microbial genome analysis. Some of these analyses require real-time monitoring of the hybridization signals with respect to a varying experimental condition, such as temperature. However, current microarray imaging and analysis packages typically do not possess such real-time capabilities. Therefore, microarray image analyses are often time-consuming and labour-intensive. LabArray was developed to expedite such processes by enabling real-time monitoring of microarray signals. AVAILABILITY: LabArray is available at http://www.eng.nus.edu.sg/civil/Labarray/labarray.htm CONTACT: cveliuwt@nus.edu.sg SUPPLEMENTARY INFORMATION: Screenshots and instructions for use are available at the above website.  相似文献   

17.
SUMMARY: Data processing, analysis and visualization (datPAV) is an exploratory tool that allows experimentalist to quickly assess the general characteristics of the data. This platform-independent software is designed as a generic tool to process and visualize data matrices. This tool explores organization of the data, detect errors and support basic statistical analyses. Processed data can be reused whereby different step-by-step data processing/analysis workflows can be created to carry out detailed investigation. The visualization option provides publication-ready graphics. Applications of this tool are demonstrated at the web site for three cases of metabolomics, environmental and hydrodynamic data analysis. AVAILABILITY: datPAV is available free for academic use at http://www.sdwa.nus.edu.sg/datPAV/.  相似文献   

18.
In this paper, we present a new scheme named ProtClass for automatic classification of three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classification scheme. Neither detailed structural alignment nor multiple binary classifications are required in this scheme. We adopt a nearest neighbor-based classification strategy. We use a filter-and-refine scheme. In the first step, we filter out the improbable answers using the precalculated parameters from the training data. In the second, we perform a relatively more detailed nearest neighbor search on the remaining answers. We use very concise and effective encoding schemes of the 3D protein structures in both steps. We compare our proposed method against two other dedicated protein structure classification schemes, namely SGM and CPMine. The experimental results show that ProtClass is slightly better in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is much more accurate, while their running times are about the same. We also compare ProtClass against a structural alignment-based classification scheme named DALI, which is found to be more accurate, but extremely slow. The software is available upon request from the authors. The supplementary information on ProtClass method can be found at: http://xena1.ddns.comp.nus.edu.sg/ approximately genesis/PClass.htm.  相似文献   

19.
20.
RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests F1 and F2. This problem asks for a substructure of F1 and a substructure of F2 that exhibit a high similarity. Since an RNA molecule's secondary structure can be represented as a labeled ordered forest, the problem we study has a direct application to finding potential motifs. We generalize the previously studied concept of a closed subforest to a gapped subforest and present the first algorithm for computing the optimal local gapped subforest alignment of F1 and F2. We also show that our technique can improve the time and space complexity of the previously most efficient algorithm for optimal local closed subforest alignment. Furthermore, we prove that a special case of our local gapped subforest alignment problem is equivalent to a problem known in the literature as the local sequence-structure alignment problem (lssa) and modify our main algorithm to obtain a much faster algorithm for lssa than the one previously proposed. An implementation of our algorithm is available at www.comp.nus.edu.sg/~bioinfo/LGSFAligner/. Its running time is significantly faster than the original lssa program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号