首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
The typical wet lab user often annotates smaller sequences in the GenBank format, but resulting files are not accepted for database submission by NCBI. This makes submission of such annotations a cumbersome task. Here we present “GB2sequin” an easy-to-use web application that converts custom annotations in the GenBank format into the NCBI direct submission format Sequin. Additionally, the program generates a “five-column, tab-delimited feature table” and a FASTA file. Those are required for submission through BankIt or the update of an existing GenBank entry. We specifically developed “GB2sequin” for the regular wet lab researcher with strong focus on user-friendliness and flexibility. The application is equipped with an intuitive graphical interface and a comprehensive documentation. It can be employed to prepare any GenBank file for database submission and is freely available online at https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html.  相似文献   

2.
The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored therein. Because users are not required to reformat the data file for each program, use of the NEXUS format prevents cut-and-paste errors as well as the proliferation of copies of the original data file. The purpose of making the NCL available is to encourage the use of the NEXUS format by making it relatively easy for programmers to add the ability to interpret NEXUS files in newly developed software. AVAILABILITY: The NCL is freely available under the GNU General Public License from http://hydrodictyon.eeb.uconn.edu/ncl/ Supplementary information: Documentation for the NCL (general information and source code documentation) is available in HTML format at http://hydrodictyon.eeb.uconn.edu/ncl/  相似文献   

3.
4.
This paper presents a pipeline, implemented in an open‐source program called GB→TNT (GenBank‐to‐TNT), for creating large molecular matrices, starting from GenBank files and finishing with TNT matrices which incorporate taxonomic information in the terminal names. GB→TNT is designed to retrieve a defined genomic region from a bulk of sequences included in a GenBank file. The user defines the genomic region to be retrieved and several filters (genome, length of the sequence, taxonomic group, etc.); each genomic region represents a different data block in the final TNT matrix. GB→TNT first generates Fasta files from the input GenBank files, then creates an alignment for each of those (by calling an alignment program), and finally merges all the aligned files into a single TNT matrix. The new version of TNT can make use of the taxonomic information contained in the terminal names, allowing easy diagnosis of results, evaluation of fit between the trees and the taxonomy, and automatic labelling or colouring of tree branches with the taxonomic groups they represent. © The Willi Hennig Society 2012.  相似文献   

5.
Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database.  相似文献   

6.
The SFF file format produced by Roche's 454 sequencing technology is a compact, binary format that contains the flow values that are used for base and quality calling of the reads. Applications, e.g. in metagenomics, often depend on accurate sequence information, and access to flow values is important to estimate the probability of errors. Unfortunately, the programs supplied by Roche for accessing this information are not publicly available. Flower is a program that can extract the information contained in SFF files, and convert it to various textual output formats. AVAILABILITY: Flower is freely available under the General Public License.  相似文献   

7.
The increasing accessibility and reduced costs of sequencing has made genome analysis accessible to more and more researchers. Yet there remains a steep learning curve in the subsequent computational steps required to process raw reads into a database-deposited genome sequence. Here we describe “Genomer,” a tool to simplify the manual tasks of finishing and uploading a genome sequence to a database. Genomer can format a genome scaffold into the common files required for submission to GenBank. This software also simplifies updating a genome scaffold by allowing a human-readable YAML format file to be edited instead of large sequence files. Genomer is written as a command line tool and is an effort to make the manual process of genome scaffolding more robust and reproducible. Extensive documentation and video tutorials are available at http://next.gs.  相似文献   

8.
介绍DICOM3.0医学图像文件的格式和C#语言的特点,首次利用Visual C#语言对该标准的图像进行显示和处理,能够直接读取DICOM格式原始图像数据,并可批量转换成BMP等格式进行处理,此项工作可为医学图像处理研究及相关医学图像软件开发奠定基础。  相似文献   

9.
10.
A computer program has been designed to aid development of synthetic strategies for oligonucleotides produced by solid-phase chemical techniques. The program reduces the time required to develop a strategy and a data file from hours to minutes. The program contains inventories, provides cost analyses, and generates and stores other associated data. The program searches an inventory of sequences for that sequence to avoid duplicate synthesis. If the sequence is not in the inventory the program devises a synthetic strategy, calculates the amounts of reagents and labor costs necessary to complete the synthetic oligonucleotide. The program also deducts the reagents from inventory files. Physical data is also calculated. A file is generated in a sequence inventory for storage of the data as well as other data that will be generated during the purification processes. All variable parameters can be easily edited. The programs were designed to provide a cross-referencing feature for data analysis and can use several parameters as a constant.  相似文献   

11.
MOTIVATION: The program ESPript (Easy Sequencing in PostScript) allows the rapid visualization, via PostScript output, of sequences aligned with popular programs such as CLUSTAL-W or GCG PILEUP. It can read secondary structure files (such as that created by the program DSSP) to produce a synthesis of both sequence and structural information. RESULTS: ESPript can be run via a command file or a friendly html-based user interface. The program calculates an homology score by columns of residues and can sort this calculation by groups of sequences. It offers a palette of markers to highlight important regions in the alignment. ESPript can also paste information on residue conservation into coordinate files, for subsequent visualization with a graphics program. AVAILABILITY: ESPript can be accessed on its Web site at http://www.ipbs.fr/ESPript. Sources and helpfiles can be downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is held in the directory pub/ESPript.  相似文献   

12.
Predict7, a program for protein structure prediction   总被引:4,自引:0,他引:4  
We describe a program for protein sequence analysis which runs in IBM PC computers. Protein sequences are loaded from files in Mount-Conrad and Lipman-Pearson format. Seven features are analyzed: hydrophilicity, hydropathy, surface probability, side chain flexibility, antigenicity, secondary structure and N-glycosylation sites. Numeric results can be shown, printed or stored in files exportable to other programs. Graphics of up to four predictions can be displayed on the screen, printed out or plotted, with several definable options. This program has been designed to be fast, user-friendly and to be shared with the scientific community.  相似文献   

13.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.  相似文献   

14.
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.  相似文献   

15.
PISCES: a protein sequence culling server   总被引:21,自引:0,他引:21  
PISCES is a public server for culling sets of protein sequences from the Protein Data Bank (PDB) by sequence identity and structural quality criteria. PISCES can provide lists culled from the entire PDB or from lists of PDB entries or chains provided by the user. The sequence identities are obtained from PSI-BLAST alignments with position-specific substitution matrices derived from the non-redundant protein sequence database. PISCES therefore provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity and often overestimates sequence identity by aligning only well-conserved fragments. PDB sequences are updated weekly. PISCES can also cull non-PDB sequences provided by the user as a list of GenBank identifiers, a FASTA format file, or BLAST/PSI-BLAST output.  相似文献   

16.
MOTIVATION: A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. RESULTS: We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. AVAILABILITY: GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.  相似文献   

17.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

18.
Microcomputer programs for DNA sequence analysis.   总被引:21,自引:5,他引:16       下载免费PDF全文
Computer programs are described which allow (a) analysis of DNA sequences to be performed on a laboratory microcomputer or (b) transfer of DNA sequences between a laboratory microcomputer and another computer system, such as a DNA library. The sequence analysis programs are interactive, do not require prior experience with computers and in many other respects resemble programs which have been written for larger computer systems (1-7). The user enters sequence data into a text file, accesses this file with the programs, and is then able to (a) search for restriction enzyme sites or other specified sequences, (b) translate in one or more reading frames in one or both directions in order to find open reading frames, or (c) determine codon usage in the sequence in one or more given reading frames. The results are given in table format and a restriction map is generated. The modem program permits collection of large amounts of data from a sequence library into a permanent file on the microcomputer disc system, or transfer of laboratory data in the reverse direction to a remote computer system.  相似文献   

19.
concatenator is a simple and user-friendly software that implements two very useful functions for phylogenetics data analysis. It concatenates NEXUS files of several fragments in a single NEXUS file ready to be used in phylogenetics software, such as paup and mrbayes and it converts FASTA sequence data files to NEXUS and vice-versa. Additionally, concatenated files can be prepared for partition tests in paup. It is freely available in http://cobig2.fc.ul.pt.  相似文献   

20.
A BASIC program has been devised for the hydropathic analysisof protein sequences according to the method of Kyte and Doolittle(1982). The program uses sequence data from input files thatare created with a word processor and produces two types ofoutput file: one contains a bar graph of the hydropathic profilein a format that can be easily edited; the other is a tabulationof hydropathic indices along a protein's sequence that can beused as input by the program for the production of a bar graphor as input into other graphics and analysis software. An MS-DOSmicrocomputer, operating under IBM BASICA or GWBASIC and a dotmatrix printer with block graphics capabilities are the onlyhardware requirements for graphic display of hydropathy profiles.The program is capable of unattended analysis from a list ofup to 15 input files. ; accepted on March 10, 1986  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号