首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: An unmanageably large amount of data on genome sequences is accumulating, prompting researchers to develop new methods to analyze them. We have devised a novel method designated oligostickiness, a measure roughly proportional to the binding affinity of an oligonucleotide to a DNA of interest, in order to analyze genome sequences as a whole. RESULTS: Fifteen representative genomes such as Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, H. sapiens and others were analyzed by this method using more than 50 probe dodecanucleotides, offering the following findings: (i) Genome sequences can be specifically featured by way of oligostickiness maps. (ii) Oligostickiness analysis, which is similar to but more informative than (G + C) content or repetitive sequence analysis, can reveal intra-genomic structures such as mosaic structures (E. coli and B. subtilis) and highly sticky/non-sticky regions of biological meanings. (iii) Some probe oligonucleotides such as dC(12) and dT(12) can be used for classifying genomes, clearly discriminating prokaryotes and eukaryotes. (iv) Based on global oligostickiness, which is the average value of the local oligostickinesses, the features of a genome could be visualized in spider web mode. The pattern of a spider web as well as a set of oligostickiness maps is highly characteristic to each genome or chromosome. Thus, we called it as chromosome texture, leading to a finding that all the chromosomes contained in a cell, so far investigated, have a common texture. AVAILABILITY: Oligostickinesses maps used in this work are available at http://gp.fms.saitama-u.ac.jp/ CONTACT: koichi@fms.saitama-u.ac.jp  相似文献   

2.
SUMMARY: P-cats is a web server that predicts the catalytic residues in proteins from the atomic coordinates. P-cats receives a coordinate file of the tertiary structure and sends out analytical results via e-mail. The reply contains a summary and two URLs to allow the user to examine the conserved residues: one for interactive images of the prediction results and the other for a graphical view of the multiple sequence alignment. AVAILABILITY: P-cats is freely available at http://p-cats.hgc.jp/p-cats CONTACT: kino@ims.u-tokyo.ac.jp  相似文献   

3.
TRFMA provides a Web environment for analyzing T-RFLP results based on molecular weights of the fragments, rather than the numbers of nucleotides, to increase accuracy. The 16S rRNA data are saved as an XML file containing around 650 sequences (light version) and a MySQL database containing around 50 000 sequences (full version), which are connected to Web server via PHP5 and manipulated on an Internet browser. AVAILABILITY: TRFMA is freely available at http://myamagu.dent.kyushu-u.ac.jp/bioinformatics/trfma/index.html and can be downloaded from the same site.  相似文献   

4.
The non-redundant Bacillus subtilis database (NRSub) has been developed in the context of the sequencing project devoted to this bacterium. As this project has reached completion, the whole genome is now available as a single contig. Thanks to the ACNUC database management system and its associated retrieval system Query_win, each functional region of the genome can be accessed individually. Extra annotations have been added such as accession numbers for the genes, locations on the genetic map, codon adaptation index values, as well as cross-references with other collections. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access NRSub through two dedicated World Wide Web servers located in France (http://acnuc. univ-lyon1.fr/nrsub/nrsub.html ) and in Japan (http://ddbjs4h.genes. nig.ac.jp/ ).  相似文献   

5.
Depository of low-molecular-weight compounds or metabolites detected in various organisms in a non-targeted manner is indispensable for metabolomics research. Due to the diverse chemical compounds, various mass spectrometry (MS) setups with state-of-the-art technologies have been used. Over the past two decades, we have analyzed various biological samples by using gas chromatography-mass spectrometry, liquid chromatography-mass spectrometry, or capillary electrophoresis-mass spectrometry, and archived the datasets in the depository MassBase (http://webs2.kazusa.or.jp/massbase/). As the format of MS datasets depends on the MS setup used, we converted each raw binary dataset of the mass chromatogram to text file format, and thereafter, information of the chromatograph peak was extracted in the text file from the converted file. In total, the depository comprises 46,493 datasets, of which 38,750 belong to the plant species and 7,743 are authentic or mixed chemicals as well as other sources (microorganisms, animals, and foods), as on August 1, 2020. All files in the depository can be downloaded in bulk from the website. Mass chromatograms of 90 plant species obtained by LC-Fourier transform ion cyclotron resonance MS or Orbitrap MS, which detect the ionized molecules with high accuracy allowing speculation of chemical compositions, were converted to text files by the software PowerGet, and the chemical annotation of each peak was added. The processed datasets were deposited in the annotation database KomicMarket2 (http://webs2.kazusa.or.jp/km2/). The archives provide fundamental resources for comparative metabolomics and functional genomics, which may result in deeper understanding of living organisms.  相似文献   

6.
A grid layout algorithm for automatic drawing of biochemical networks   总被引:4,自引:0,他引:4  
MOTIVATION: Visualization is indispensable in the research of complex biochemical networks. Available graph layout algorithms are not adequate for satisfactorily drawing such networks. New methods are required to visualize automatically the topological architectures and facilitate the understanding of the functions of the networks. RESULTS: We propose a novel layout algorithm to draw complex biochemical networks. A network is modeled as a system of interacting nodes on squared grids. A discrete cost function between each node pair is designed based on the topological relation and the geometric positions of the two nodes. The layouts are produced by minimizing the total cost. We design a fast algorithm to minimize the discrete cost function, by which candidate layouts can be produced efficiently. A simulated annealing procedure is used to choose better candidates. Our algorithm demonstrates its ability to exhibit cluster structures clearly in relatively compact layout areas without any prior knowledge. We developed Windows software to implement the algorithm for CADLIVE. AVAILABILITY: All materials can be freely downloaded from http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/ SUPPLEMENTARY INFORMATION: http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/  相似文献   

7.
SUMMARY: PreDs is a WWW server that predicts the dsDNA-binding sites on protein molecular surfaces generated from the atomic coordinates in a PDB format. The prediction was done by evaluating the electrostatic potential, the local curvature and the global curvature on the surfaces. Results of the prediction can be interactively checked with our original surface viewer. AVAILABILITY: PreDs is available free of charge from http://pre-s.protein.osaka-u.ac.jp/~preds/ CONTACT: kino@ims.u-tokyo.ac.jp.  相似文献   

8.
In the context of the international project aiming at sequencing the whole genome of Bacillus subtilis we have developed NRSub, a non-redundant database of sequences from this organism. Starting from the B.subtilis sequences available in the repository collections we have removed all encountered duplications, then we have added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage index). We have also added cross-references with EMBL/GenBank/DDBJ, MEDLINE, SWISS-PROT and ENZYME databases. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access the database through two dedicated World Wide Web servers located in France (http://acnuc.univ-lyon1.fr/nrsub/nrsub.++ +html ) and in Japan (http://ddbjs4h.genes.nig.ac.jp/ ).  相似文献   

9.
MOTIVATION: We developed an algorithm to reconstruct ancestral sequences, taking into account the rate variation among sites of the protein sequences. Our algorithm maximizes the joint probability of the ancestral sequences, assuming that the rate is gamma distributed among sites. Our algorithm probably finds the global maximum. The use of 'joint' reconstruction is motivated by studies that use the sequences at all the internal nodes in a phylogenetic tree, such as, for instance, the inference of patterns of amino-acid replacement, or tracing the biochemical changes that occurred during the evolution of a given protein family. RESULTS: We give an algorithm that guarantees finding the global maximum. The efficient search method makes our method applicable to datasets with large number sequences. We analyze ancestral sequences of five gene families, exploring the effect of the amount of among-site-rate-variation, and the degree of sequence divergence on the resulting ancestral states. AVAILABILITY AND SUPPLEMENTARY INFORMATION: http://evolu3.ism.ac.jp/~tal/ Contact: tal@ism.ac.jp  相似文献   

10.
DNA Data Bank of Japan (DDBJ) for genome scale research in life science   总被引:5,自引:0,他引:5  
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools.  相似文献   

11.
CONSEL: for assessing the confidence of phylogenetic tree selection.   总被引:10,自引:0,他引:10  
CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Probability (BP), the Kishino-Hasegawa (KH) test, the Shimodaira-Hasegawa (SH) test, and the Weighted Shimodaira-Hasegawa (WSH) test. CONSEL calculates all these p-values from the output of the phylogeny program packages such as Molphy, PAML, and PAUP*. Furthermore, CONSEL is applicable to a wide class of problems where the BPs are available. AVAILABILITY: The programs are written in C language. The source code for Unix and the executable binary for DOS are found at http://www.ism.ac.jp/~shimo/ CONTACT: shimo@ism.ac.jp  相似文献   

12.
The hydrophobic cores of proteins predicted by wavelet analysis   总被引:7,自引:0,他引:7  
MOTIVATION: In the process of protein construction, buried hydrophobic residues tend to assemble in a core of a protein. Methods used to predict these cores involve use or no use of sequential alignment. In the case of a close homology, prediction was more accurate if sequential alignment was used. If the homology was weak, predictions would be unreliable. A hydrophobicity plot involving the hydropathy index is useful for purposes of prediction, and smoothing is essential. However, the proposed methods are insufficient. We attempted to predict hydrophobic cores with a low frequency extracted from the hydrophobicity plot, using wavelet analysis. RESULTS: The cores were predicted at a rate of 68.7%, by cross-validation. Using wavelet analysis, the cores of non-homologous proteins can be predicted with close to 70% accuracy, without sequential alignment. AVAILABILITY: The program used in this study is available from Intergalactic Reality (http://www.intergalact.com). CONTACT: hirakawa@grt.kyushu-u.ac.jp, kuhara@grt.kyushu-u.ac.jp  相似文献   

13.
MOTIVATION: Glycans are the third major class of biomolecules following DNA and proteins. They are extremely vital for the functioning of multicellular organisms. However, comparing the fast development of sequence analysis techniques, informatics work on glycans have a long way to go. Alignment algorithms for glycan tree structures are one of the foremost concerns. In addition, the statistical analysis of these algorithms in terms of biological significance needs to be addressed. RESULTS: We developed a tree-structure alignment algorithm for glycans and performed a statistical analysis of these alignment scores such that biologically interesting features could be captured into a score matrix for glycans. We generated our score matrix in a manner similar to BLOSUM, but with slight variations to accomodate our glycan data, including the incorporation of linkage information. We verified the effectiveness of our new glycan score matrix by illustrating how well the resulting score matrix entries correspond with biological knowledge. Future work for even better improvements with the use of a variety of score matrices for different subclasses of glycans due to their complexity is also discussed. CONTACT: mami@kuicr.kyoto-u.ac.jp SUPPLEMENTARY INFORMATION: The glycan score matrix can be downloaded from http://kanehisa.kuicr.kyoto-u.ac.jp/Paper/kcam/glycanMatrix0.1.txt.  相似文献   

14.
Iwasaki W  Yamamoto Y  Takagi T 《PloS one》2010,5(12):e15305
In this paper, we describe a server/client literature management system specialized for the life science domain, the TogoDoc system (Togo, pronounced Toe-Go, is a romanization of a Japanese word for integration). The server and the client program cooperate closely over the Internet to provide life scientists with an effective literature recommendation service and efficient literature management. The content-based and personalized literature recommendation helps researchers to isolate interesting papers from the "tsunami" of literature, in which, on average, more than one biomedical paper is added to MEDLINE every minute. Because researchers these days need to cover updates of much wider topics to generate hypotheses using massive datasets obtained from public databases or omics experiments, the importance of having an effective literature recommendation service is rising. The automatic recommendation is based on the content of personal literature libraries of electronic PDF papers. The client program automatically analyzes these files, which are sometimes deeply buried in storage disks of researchers' personal computers. Just saving PDF papers to the designated folders makes the client program automatically analyze and retrieve metadata, rename file names, synchronize the data to the server, and receive the recommendation lists of newly published papers, thus accomplishing effortless literature management. In addition, the tag suggestion and associative search functions are provided for easy classification of and access to past papers (researchers who read many papers sometimes only vaguely remember or completely forget what they read in the past). The TogoDoc system is available for both Windows and Mac OS X and is free. The TogoDoc Client software is available at http://tdc.cb.k.u-tokyo.ac.jp/, and the TogoDoc server is available at https://docman.dbcls.jp/pubmed_recom.  相似文献   

15.
This data paper describes the native vascular aquatic plant floras of 268 Japanese lakes recorded from 1899–2011. The data were compiled from 201 literature sources, most of which were written in Japanese and published in local journals or individual reports rather than in major scientific journals. The literature was searched using web-based services (i.e., Google Scholar, http://scholar.google.com/; CiNii, http://ci.nii.ac.jp/en; JDreamII, http://pr.jst.go.jp/jdream2/; and ISI, http://apps.webofknowledge.com) and by private communication with experts or local governments. Scientific names were consolidated under currently-accepted nomenclature. Four datasets, FloraDB, LakeDB, SpeciesDB, and LiteratureDB, were created to include records of the flora of each lake in each year, the names and locations of the lakes, the scientific names and synonyms of the aquatic vascular plants, and a literature list, respectively. These data can be used to study long-term changes in the species composition and/or richness of aquatic plants in Japanese lakes.  相似文献   

16.
DNA Data Bank of Japan at work on genome sequence data.   总被引:5,自引:3,他引:2       下载免费PDF全文
We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data.  相似文献   

17.
A database of mutations in human eye disease genes has been constructed. This KMeyeDB employs a database software MutationView which provides graphical data presentation and analysis as a smooth user-interface. Currently, the KMeyeDB contains mutation data of 16 different genes for 18 eye diseases. The KMeyeDB is accessible through http://mutview.dmb.med.keio.ac.jp with advanced internet browsers.  相似文献   

18.
'Melina' assists users to compare the results of four public softwares for DNA motif extraction in order to both confirm the reliability of each finding and avoid missing important motifs. It is also useful to optimize the sensitivity of software with a series of different parameter settings. AVAILABILITY: Melina is available at http://www.hgc.ims.u-tokyo.ac.jp/Melina/.  相似文献   

19.
Gene recognition by combination of several gene-finding programs   总被引:8,自引:1,他引:7  
MOTIVATION: A number of programs have been developed to predict the eukaryotic gene structures in DNA sequences. However, gene finding is still a challenging problem. RESULTS: We have explored the effectiveness when the results of several gene-finding programs were re- analyzed and combined. We studied several methods with four programs (FEXH, GeneParser3, GEN-SCAN and GRAIL2). By HIGHEST-policy combination method or BOUNDARY method, approximate correlation (AC) improved by 3- 5% in comparison with the best single gene-finding program. From another viewpoint, OR-based combination of the four programs is the most reliable to know whether a candidate exon overlaps with the real exon or not, although it is less sensitive than GENSCAN for exon-intron boundaries. Our methods can easily be extended to combine other programs. AVAILABILITY: We have developed a server program (Shirokane System) and a client program (GeneScope) to use the methods. GeneScope is available through a WWW site (http://gf.genome.ad.jp/). CONTACT: katsu,takagi@ims.u-tokyo.ac.jp   相似文献   

20.
MOTIVATION: Since their initial development, integration and construction of databases for molecular-level data have progressed. Though biological molecules are related to each other and form a complex system, the information is stored in the vast archives of the literature or in diverse databases. There is no unified naming convention for biological object, and biological terms may be ambiguous or polysemic. This makes the integration and interaction of databases difficult. In order to eliminate these problems, machine-readable natural language resources appear to be quite promising. We have developed a workbench for protein name abbreviation dictionary (PNAD) building. RESULTS: We have developed PNAD Construction Support System (PNAD-CSS), which offers various convenient facilities to decrease the construction costs of a protein name abbreviation dictionary of which entries are collected from abstracts in biomedical papers. The system allows the users to concentrate on higher level interpretation by removing some troublesome tasks, e.g. management of abstracts, extracting protein names and their abbreviations, and so on. To extract a pair of protein names and abbreviations, we have developed a hybrid system composed of the PROPER System and the PNAD System. The PNAD System can extract the pairs from parenthetical-paraphrases involved in protein names, the PROPER System identified these paris, with 98.95% precision, 95.56% recall and 97.58% complete precision. AVAILABILITY: PROPER System is freely available from http://www.hgc.inc.u-tokyo.ac.jp/service/tooldoc /KeX/intro.html. The other software are also available on request. Contact the authors. CONTACT: mikio@ims.u-tokyo.ac.jp  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号