首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Microcomputer programs for DNA sequence analysis.   总被引:21,自引:5,他引:16       下载免费PDF全文
Computer programs are described which allow (a) analysis of DNA sequences to be performed on a laboratory microcomputer or (b) transfer of DNA sequences between a laboratory microcomputer and another computer system, such as a DNA library. The sequence analysis programs are interactive, do not require prior experience with computers and in many other respects resemble programs which have been written for larger computer systems (1-7). The user enters sequence data into a text file, accesses this file with the programs, and is then able to (a) search for restriction enzyme sites or other specified sequences, (b) translate in one or more reading frames in one or both directions in order to find open reading frames, or (c) determine codon usage in the sequence in one or more given reading frames. The results are given in table format and a restriction map is generated. The modem program permits collection of large amounts of data from a sequence library into a permanent file on the microcomputer disc system, or transfer of laboratory data in the reverse direction to a remote computer system.  相似文献   

2.
The partial amino acid sequences of 121 rice proteins separated by two-dimensional gel electrophoresis (2D-PAGE), were determined for a protein sequence data file. In the Rice Genome Research Program (RGP), more than 20,000 cDNA clones randomly selected from rice cDNA libraries have been sequenced to construct a cDNA catalog. Complimentary DNAs encoding about 30% of proteins in the protein sequence data file could be identified in the catalog by computer search. It was deduced that 20,000–40,000 genes are present in the rice genome. Only half of about 20,000 cDNAs sequenced in the RGP, corresponding to 1/4–1/2 of genes present in the entire rice genome, should have unique sequences after considering gene redundancy. This is consistent with the fact that the cDNAs encoding about 30% of the sequenced proteins could be identified in the catalog. If the size of the cDNA catalog is enlarged further, cDNAs encoding all proteins separated by 2D-PAGE could be easily identified from the catalog by using the protein sequence data.  相似文献   

3.
SPLICE, a software tool for the extraction of sequences fromfiles in GenBank tape format, has been developed. The programcan analyze the features table in this format and use any ofthe information provided to write the corresponding sequencesinto a standard sequence file format suitable for use with sequenceanalysis programs. Sequences that are present as several subsequentfragments in a single GenBank file, such as those encoding apeptide, can be spliced together by the program. Further, sequencesthat are present in more than one Genbank file, such as an exonwhich spans several different files, can also be spliced intoone sequence. SPLICE runs under the MS/DOS and Unix operatingsystems, can be called as a sub-process by other programs andcan process batches of files. Received on December 26, 1989; accepted on May 30, 1990  相似文献   

4.
利用VBA查找核酸数据库DNA保守序列   总被引:1,自引:0,他引:1  
采用VBA编写了查找核酸数据库保守序列的四个相关程序,“导入DNA序列”程序可以将Fasta格式的DNA序列文本文件存放到Excel Sheetl的A列中,保留每个序列的Gi号,删除多余的注释部分;“整理DNA序列”程序可以将DNA序列Gi号存放到A列中,B列为对应Gi号的完整序列;“DNA随机序列”程序可以产生DNA随机序列;“发现DNA保守序列”程序可以将随机序列与下载的DNA序列比对,查找每一种随机序列的出现频率.以大豆基因组序列为实例,说明了这些程序的应用方法.该程序弥补了流行序列比对软件的不足,为PCR设计引物、分析基因功能以及种质资源鉴定等方面提供新的工具.  相似文献   

5.
A method for the rapid correlation of tandem mass spectra to a list of protein sequences in a database has been developed. The combination of the fast and accurate computational search algorithm, X!Tandem, and a Linux cluster parallel computing environment with PVM or MPI, significantly reduces the time required to perform the correlation of tandem mass spectra to protein sequences in a database. A file of tandem mass spectra is divided into a specified number of files, each containing an equal number of the spectra from the larger file. These files are then searched in parallel against a protein sequence database. The results of each parallel output file are collated into one file for viewing through a web interface. Thousands of spectra can be searched in an accurate, practical, and time effective manner. The source code for running Parallel Tandem utilizing either PVM or MPI on Linux operating system is available from http://www.thegpm.org. This source code is made available under Artistic License from the authors.  相似文献   

6.
采用cDNA末端快速扩增的办法,从孔石莼(Ulva pertusa)中克隆获得质体蓝素基因。该基因完整的cDNA为787bp,包括40 bp 5’端非编码区和327 bp的3’端非编码区,以及一个420 bp的开放阅读框架,编码139个氨基酸的蛋白质。该基因编码质体蓝素的前体肽,其N端41个氨基酸残基为信号肽,后面为98个氨基酸残基的成熟肽。从Genbank中选择了13个质体蓝素的前体肽基因进行序列比对分析和构建进化树。孔石莼质体蓝素基因与其它质体蓝素基因的同源性为48.2%至78.8%。该进化树将来源于6种藻类植物的7个质体蓝素基因聚类在一起,显示出它们较近的进化关系。同样,也表现出11种生物的分子进化关系。序列比对结果显示,在质体蓝素的基因序列中存在两个高度保守的基序,它编码质体蓝素蛋白的铜结合活性位点。  相似文献   

7.
Synthetic oligonucleotides have proven to be extremely useful probes for screening cDNA and genomic libraries. Selection of the appropriate probe can be more easily and accurately achieved with the use of the computer program PROBFIND. The user enters the amino acid sequence from a file or from the keyboard, selects the minimum length allowed for the probe and the maximum allowable degeneracy. The computer prints a list of the sequences of potential probes which meet these minimum specifications and the location of the corresponding sequence in the protein to the screen and to a file. The user may modify the specifications for length and degeneracy at any time during the output of data, which allows for rapid selection of the desired probe. The program is interactive, accepts any file format with only a single modification of the file, is written in BASIC, and requires less than 6 kbytes of memory. This makes the program easy to use and adaptable even to unsophisticated microcomputers.  相似文献   

8.
The primary structure of chicken ribosomal protein L5.   总被引:1,自引:0,他引:1  
The nucleotide sequence of a cDNA for chicken ribosomal protein L5, which is considered to associate with 5S rRNA, was determined. The cDNA is 975 bp long. The deduced protein has 297 amino acids and has a molecular mass of 34,090 Da. A comparative analysis of the amino acid sequences of chicken L5 and its homologous proteins revealed an extremely conserved region which contains a cluster of basic amino acids.  相似文献   

9.
Reference cDNA library facilities available from European sources   总被引:1,自引:0,他引:1  
cDNA libraries are the cornerstone of efforts to identify the relatively small regions of genomes that are responsible for biological effects. Gene hunter seeking candidate genes, via a variety of approaches, ultimately focus on the cloning, sequencing, and expression of cDNAs. Assistance is now available to researchers in the form of genome programs, whose initial goals include assembly of a complete collection of expressed sequences derived from the genome of interest. The concept of reference sets of cDNA libraries is that the aims of genome programs are served most effectively by different laboratories working on a common set of high-quality arrayed cDNA libraries, using different experimental approaches, thereby reducing unnecessary duplication of effort, and maximizing the amount of information that one set of resources can provide.  相似文献   

10.
Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.  相似文献   

11.

Background  

The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries.  相似文献   

12.
MOTIVATION: The program ESPript (Easy Sequencing in PostScript) allows the rapid visualization, via PostScript output, of sequences aligned with popular programs such as CLUSTAL-W or GCG PILEUP. It can read secondary structure files (such as that created by the program DSSP) to produce a synthesis of both sequence and structural information. RESULTS: ESPript can be run via a command file or a friendly html-based user interface. The program calculates an homology score by columns of residues and can sort this calculation by groups of sequences. It offers a palette of markers to highlight important regions in the alignment. ESPript can also paste information on residue conservation into coordinate files, for subsequent visualization with a graphics program. AVAILABILITY: ESPript can be accessed on its Web site at http://www.ipbs.fr/ESPript. Sources and helpfiles can be downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is held in the directory pub/ESPript.  相似文献   

13.
A search for new potential coding sequences was conducted within two overlapping cosmid genomic DNA clusters of about 170 and 45 kb from the swine major histocompatibility complex class III region. The sequences were detected with various probes, including pools of swine cDNA, homologous and heterologous genomic sequences, and synthetic oligonucleotides. The 170 kb cluster was centered on the tumor necrosis factor genes (TNF), and the 45 kb cluster contained the heat-shock protein 70 genes (HSP70). The TNF cluster revealed the presence of five new genes: lymphotoxin , BAT1, BAT2, BAT3, and a sequence related to DNA-binding factors. No sequence homologous to B144 was found in the TNF cluster, although other unidentified coding sequences may be present in this cluster. The HSP70 cluster contained a gene identified as BAT6, that is, tRNA-valyl synthetase. These results provide new evidence that the genomic maps of these various genes in the TNF and HSP70 sub-regions are similar in swine and human.  相似文献   

14.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

15.
An intact cDNA fromArabidopsis thaliana for adenine phosphoribosyltransferase (APRT) was isolated and sequenced. The cDNA is 729 nucleotides in length and predicts a protein ofM r 27140. The deduced amino acid sequence has been compared with those of other APRTs and shown to be most similar to theEscherichia coli protein. Construction of a molecular tree of the known APRT amino acid sequences indicates theA. thaliana andE. coli APRT sequences form one cluster and the currently available vertebrate and invertebrate sequences form a separate grouping. Since it is possible to select either for or against the expression of APRT, the isolation of this APRT cDNA clone will allow these selection schemes to be used in plant genetic experiments.  相似文献   

16.
cid is a computational tool developed in the Web environment to process cloned DNA fragments with the objective of masking the vector and adaptor regions, detecting the presence of microsatellites and designing the most appropriate primer pairs for the amplification of the identified repetitive sequences. This entire process is executed by the user in a simple and automated manner with the data input as a Zip file of chromatograms or a multiFASTA file. Thus, it is possible to analyse dozens of sequences at the same time, optimizing data processing and the search for the information of interest. cid is freely available on http://www.shrimp.ufscar.br/cid/index.php.  相似文献   

17.
18.

Background  

The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences.  相似文献   

19.
The present century has witnessed an unprecedented rise in genome sequences owing to various genome-sequencing programs. However, the same has not been replicated with cDNA or expressed sequence tags (ESTs). Hence, prediction of protein coding sequence of genes from this enormous collection of genomic sequences presents a significant challenge. While robust high throughput methods of cloning and expression could be used to meet protein requirements, lack of intron information creates a bottleneck. Computational programs designed for recognizing intron–exon boundaries for a particular organism or group of organisms have their own limitations. Keeping this in view, we describe here a method for construction of intron-less gene from genomic DNA in the absence of cDNA/EST information and organism-specific gene prediction program. The method outlined is a sequential application of bioinformatics to predict correct intron–exon boundaries and splicing by overlap extension PCR for spliced gene synthesis. The gene construct so obtained can then be cloned for protein expression. The method is simple and can be used for any eukaryotic gene expression.  相似文献   

20.
MOTIVATION: Clustering sequences of a full-length cDNA library into alternative splice form candidates is a very important problem. RESULTS: We developed a new efficient algorithm to cluster sequences of a full-length cDNA library into alternative splice form candidates. Current clustering algorithms for cDNAs tend to produce too many clusters containing incorrect splice form candidates. Our algorithm is based on a spliced sequence alignment algorithm that considers splice sites. The spliced sequence alignment algorithm is a variant of an ordinary dynamic programming algorithm, which requires O(nm) time for checking a pair of sequences where n and m are the lengths of the two sequences. Since the time bound is too large to perform all-pair comparison for a large set of sequences, we developed new techniques to reduce the computation time without affecting the accuracy of the output clusters. Our algorithm was applied to 21 076 mouse cDNA sequences of the FANTOM 1.10 database to examine its performance and accuracy. In these experiments, we achieved about 2-12-fold speedup against a method using only a traditional hash-based technique. Moreover, without using any information of the mouse genome sequence data or any gene data in public databases, we succeeded in listing 87-89% of all the clusters that biologists have annotated manually. AVAILABILITY: We provide a web service for cDNA clustering located at https://access.obigrid.org/ibm/cluspa/, for which registration for the OBIGrid (http://www.obigrid.org) is required.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号