期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GB2sequin - A file converter preparing custom GenBank files for database submission

Pascal Lehwark Stephan Greiner 《Genomics》2019,111(4):759-761

The typical wet lab user often annotates smaller sequences in the GenBank format, but resulting files are not accepted for database submission by NCBI. This makes submission of such annotations a cumbersome task. Here we present “GB2sequin” an easy-to-use web application that converts custom annotations in the GenBank format into the NCBI direct submission format Sequin. Additionally, the program generates a “five-column, tab-delimited feature table” and a FASTA file. Those are required for submission through BankIt or the update of an existing GenBank entry. We specifically developed “GB2sequin” for the regular wet lab researcher with strong focus on user-friendliness and flexibility. The application is equipped with an intuitive graphical interface and a comprehensive documentation. It can be employed to prepare any GenBank file for database submission and is freely available online at https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html. 相似文献

2.

NCL: a C++ class library for interpreting data files in NEXUS format

Lewis PO 《Bioinformatics (Oxford, England)》2003,19(17):2330-2331

The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored therein. Because users are not required to reformat the data file for each program, use of the NEXUS format prevents cut-and-paste errors as well as the proliferation of copies of the original data file. The purpose of making the NCL available is to encourage the use of the NEXUS format by making it relatively easy for programmers to add the ability to interpret NEXUS files in newly developed software. AVAILABILITY: The NCL is freely available under the GNU General Public License from http://hydrodictyon.eeb.uconn.edu/ncl/ Supplementary information: Documentation for the NCL (general information and source code documentation) is available in HTML format at http://hydrodictyon.eeb.uconn.edu/ncl/ 相似文献

3.

NEXUS: an extensible file format for systematic information

Maddison DR Swofford DL Maddison WP 《Systematic biology》1997,46(4):590-621

相似文献

4.

GB‐to‐TNT: facilitating creation of matrices from GenBank and diagnosis of results in TNT

Pablo A. Goloboff Santiago A. Catalano 《Cladistics : the international journal of the Willi Hennig Society》2012,28(5):503-513

This paper presents a pipeline, implemented in an open‐source program called GB→TNT (GenBank‐to‐TNT), for creating large molecular matrices, starting from GenBank files and finishing with TNT matrices which incorporate taxonomic information in the terminal names. GB→TNT is designed to retrieve a defined genomic region from a bulk of sequences included in a GenBank file. The user defines the genomic region to be retrieved and several filters (genome, length of the sequence, taxonomic group, etc.); each genomic region represents a different data block in the final TNT matrix. GB→TNT first generates Fasta files from the input GenBank files, then creates an alignment for each of those (by calling an alignment program), and finally merges all the aligned files into a single TNT matrix. The new version of TNT can make use of the taxonomic information contained in the terminal names, allowing easy diagnosis of results, evaluation of fit between the trees and the taxonomy, and automatic labelling or colouring of tree branches with the taxonomic groups they represent. © The Willi Hennig Society 2012. 相似文献

5.

Interpreting peptide mass spectra by VEMS

Matthiesen R Lundsgaard M Welinder KG Bauw G 《Bioinformatics (Oxford, England)》2003,19(6):792-793

Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database. 相似文献

6.

Flower: extracting information from pyrosequencing data

Malde K 《Bioinformatics (Oxford, England)》2011,27(7):1041-1042

The SFF file format produced by Roche's 454 sequencing technology is a compact, binary format that contains the flow values that are used for base and quality calling of the reads. Applications, e.g. in metagenomics, often depend on accurate sequence information, and access to flow values is important to estimate the probability of errors. Unfortunately, the programs supplied by Roche for accessing this information are not publicly available. Flower is a program that can extract the information contained in SFF files, and convert it to various textual output formats. AVAILABILITY: Flower is freely available under the General Public License. 相似文献

7.

Genomer — A Swiss Army Knife for Genome Scaffolding

Michael D. Barton Hazel A. Barton 《PloS one》2013,8(6)

The increasing accessibility and reduced costs of sequencing has made genome analysis accessible to more and more researchers. Yet there remains a steep learning curve in the subsequent computational steps required to process raw reads into a database-deposited genome sequence. Here we describe “Genomer,” a tool to simplify the manual tasks of finishing and uploading a genome sequence to a database. Genomer can format a genome scaffold into the common files required for submission to GenBank. This software also simplifies updating a genome scaffold by allowing a human-readable YAML format file to be edited instead of large sequence files. Genomer is written as a command line tool and is an effort to make the manual process of genome scaffolding more robust and reproducible. Extensive documentation and video tutorials are available at http://next.gs. 相似文献

8.

基于Visual C#语言的DICOM标准医学图像的显示和实现

杨汝奚弘光高建新丁祖泉《上海生物医学工程》2007,28(1):12-15,56

介绍DICOM3.0医学图像文件的格式和C#语言的特点,首次利用Visual C#语言对该标准的图像进行显示和处理,能够直接读取DICOM格式原始图像数据,并可批量转换成BMP等格式进行处理,此项工作可为医学图像处理研究及相关医学图像软件开发奠定基础。相似文献

9.

GBParsy: A GenBank flatfile parser library with high speed

Tae-Ho Lee Yeon-Ki Kim Baek Hie Nahm 《BMC bioinformatics》2008,9(1):321

相似文献

10.

Computer programs in nucleic acid synthesis: synthetic strategy development using solid-phase chemical techniques with data storage, retrieval and analysis capabilities.

下载免费PDF全文

S Lombardi H Seidell S Pulford W Dutton S Parekh 《Nucleic acids research》1984,12(5):2581-2591

A computer program has been designed to aid development of synthetic strategies for oligonucleotides produced by solid-phase chemical techniques. The program reduces the time required to develop a strategy and a data file from hours to minutes. The program contains inventories, provides cost analyses, and generates and stores other associated data. The program searches an inventory of sequences for that sequence to avoid duplicate synthesis. If the sequence is not in the inventory the program devises a synthetic strategy, calculates the amounts of reagents and labor costs necessary to complete the synthetic oligonucleotide. The program also deducts the reagents from inventory files. Physical data is also calculated. A file is generated in a sequence inventory for storage of the data as well as other data that will be generated during the purification processes. All variable parameters can be easily edited. The programs were designed to provide a cross-referencing feature for data analysis and can use several parameters as a constant. 相似文献

11.

ESPript: analysis of multiple sequence alignments in PostScript.

P Gouet E Courcelle D I Stuart F Métoz 《Bioinformatics (Oxford, England)》1999,15(4):305-308

MOTIVATION: The program ESPript (Easy Sequencing in PostScript) allows the rapid visualization, via PostScript output, of sequences aligned with popular programs such as CLUSTAL-W or GCG PILEUP. It can read secondary structure files (such as that created by the program DSSP) to produce a synthesis of both sequence and structural information. RESULTS: ESPript can be run via a command file or a friendly html-based user interface. The program calculates an homology score by columns of residues and can sort this calculation by groups of sequences. It offers a palette of markers to highlight important regions in the alignment. ESPript can also paste information on residue conservation into coordinate files, for subsequent visualization with a graphics program. AVAILABILITY: ESPript can be accessed on its Web site at http://www.ipbs.fr/ESPript. Sources and helpfiles can be downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is held in the directory pub/ESPript. 相似文献

12.

Predict7, a program for protein structure prediction 总被引：4，自引：0，他引：4

R S Cármenes J P Freije M M Molina J M Martín 《Biochemical and biophysical research communications》1989,159(2):687-693

We describe a program for protein sequence analysis which runs in IBM PC computers. Protein sequences are loaded from files in Mount-Conrad and Lipman-Pearson format. Seven features are analyzed: hydrophilicity, hydropathy, surface probability, side chain flexibility, antigenicity, secondary structure and N-glycosylation sites. Numeric results can be shown, printed or stored in files exportable to other programs. Graphics of up to four predictions can be displayed on the screen, printed out or plotted, with several definable options. This program has been designed to be fast, user-friendly and to be shared with the scientific community. 相似文献

13.

Codon usage tabulated from international DNA sequence databases: status for the year 2000 总被引：13，自引：0，他引：13

Nakamura Y Gojobori T Ikemura T 《Nucleic acids research》2000,28(1):292

The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes. 相似文献

14.

A mass spectrometry proteomics data management platform

Sharma V Eng JK Maccoss MJ Riffle M 《Molecular & cellular proteomics : MCP》2012,11(9):824-831

Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/. 相似文献

15.

PISCES: a protein sequence culling server 总被引：21，自引：0，他引：21

Wang G Dunbrack RL 《Bioinformatics (Oxford, England)》2003,19(12):1589-1591

PISCES is a public server for culling sets of protein sequences from the Protein Data Bank (PDB) by sequence identity and structural quality criteria. PISCES can provide lists culled from the entire PDB or from lists of PDB entries or chains provided by the user. The sequence identities are obtained from PSI-BLAST alignments with position-specific substitution matrices derived from the non-redundant protein sequence database. PISCES therefore provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity and often overestimates sequence identity by aligning only well-conserved fragments. PDB sequences are updated weekly. PISCES can also cull non-PDB sequences provided by the user as a list of GenBank identifiers, a FASTA format file, or BLAST/PSI-BLAST output. 相似文献

16.

GeneMachine: gene prediction and sequence annotation

Makalowska I Ryan JF Baxevanis AD 《Bioinformatics (Oxford, England)》2001,17(9):843-844

MOTIVATION: A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. RESULTS: We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. AVAILABILITY: GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/. 相似文献

17.

A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis. 总被引：9，自引：1，他引：8

下载免费PDF全文

J Pustell F C Kafatos 《Nucleic acids research》1986,14(1):479-488

We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described. 相似文献

18.

Microcomputer programs for DNA sequence analysis. 总被引：21，自引：5，他引：16

下载免费PDF全文

B Conrad D W Mount 《Nucleic acids research》1982,10(1):31-38

Computer programs are described which allow (a) analysis of DNA sequences to be performed on a laboratory microcomputer or (b) transfer of DNA sequences between a laboratory microcomputer and another computer system, such as a DNA library. The sequence analysis programs are interactive, do not require prior experience with computers and in many other respects resemble programs which have been written for larger computer systems (1-7). The user enters sequence data into a text file, accesses this file with the programs, and is then able to (a) search for restriction enzyme sites or other specified sequences, (b) translate in one or more reading frames in one or both directions in order to find open reading frames, or (c) determine codon usage in the sequence in one or more given reading frames. The results are given in table format and a restriction map is generated. The modem program permits collection of large amounts of data from a sequence library into a permanent file on the microcomputer disc system, or transfer of laboratory data in the reverse direction to a remote computer system. 相似文献

19.

concatenator: sequence data matrices handling made easy

Pina-Martins F Paulo OS 《Molecular ecology resources》2008,8(6):1254-1255

concatenator is a simple and user-friendly software that implements two very useful functions for phylogenetics data analysis. It concatenates NEXUS files of several fragments in a single NEXUS file ready to be used in phylogenetics software, such as paup and mrbayes and it converts FASTA sequence data files to NEXUS and vice-versa. Additionally, concatenated files can be prepared for partition tests in paup. It is freely available in http://cobig2.fc.ul.pt. 相似文献

20.

A microcomputer program for hydropathic analysis of proteins with I/O through word processing and graphics software

Weise Michael J. 《Bioinformatics (Oxford, England)》1986,2(2):103-106

A BASIC program has been devised for the hydropathic analysisof protein sequences according to the method of Kyte and Doolittle(1982). The program uses sequence data from input files thatare created with a word processor and produces two types ofoutput file: one contains a bar graph of the hydropathic profilein a format that can be easily edited; the other is a tabulationof hydropathic indices along a protein's sequence that can beused as input by the program for the production of a bar graphor as input into other graphics and analysis software. An MS-DOSmicrocomputer, operating under IBM BASICA or GWBASIC and a dotmatrix printer with block graphics capabilities are the onlyhardware requirements for graphic display of hydropathy profiles.The program is capable of unattended analysis from a list ofup to 15 input files. ; accepted on March 10, 1986 相似文献