首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: One of the most interesting features of genomes (both coding and non-coding regions) is the presence of relatively short tandemly repeated DNA sequences known as tandem repeats (TRs). We developed a new PC-based stand-alone software analysis program, combining sequence motif searches with keywords such as organs, tissues, cell lines or development stages for finding exact, inexact and compound, TRs. Tandem Repeats Analyzer 1.5 (TRA) has several advanced repeat search parameters/options over other repeat finder programs as it does not only accept GenBank, FASTA and expressed sequence tag (EST) sequence files but also does analysis of multifiles with multisequences. Advanced user-defined parameters/options let the researchers use different motif lengths search criteria for varying motif lengths simultaneously. The outputs show statistical results to be evaluated by the user. The discovery of TRs in ESTs could be useful for both gene mapping and association studies and discovering TRs located in coding regions of important genes that are expressed under various conditions of environment, stress, organ, tissue and development stage. RESULTS: In this paper, we demonstrated applications of TRA using 175 899 ESTs sequences for three Arabidopsis spp. downloaded from GenBank. The EST-SSRs/ESTs ratios were found 43.1%, 15.3% and 2.34% in A.lyrata, A.thaliana and A.halleri, respectively. Analysis revealed that organs, tissues and development stages possessed different amounts of repeats and repeat compositions. This indicated that the distribution of TRs among the tissues or organs may not be random differing from the untranscribed repeats found in genomes. AVAILABILITY: The program can be obtained free by anonymous FTP from ftp.akdeniz.edu.tr/Araclar/TRA.  相似文献   

2.
Using the Phred/Phrap/Polyphred/Consed pipeline established in the National Livestock Research Institute of Korea, we predicted candidate coding single nucleotide polymorphisms (cSNPs) from 7,600 expressed sequence tags (ESTs) derived from three cDNA libraries (liver, M. longissimus dorsi, and intermuscular fat) of Hanwoo (Korean native cattle) steers. From the 7,600 ESTs, 829 contigs comprising more than two EST reads were assembled using the Phrap assembler. Based on the contig analysis, 201 candidate cSNPs were identified in 129 contigs, in which transitions (69%) outnumbered transversions (31%). To verify whether the predicted cSNPs are real, 17 SNPs involved in lipid and energy metabolism were selected from the ESTs. Twelve of these were confirmed to be real while five were identified as artifacts, possibly due to expressed sequence tag sequence error. Further analysis of the 12 verified cSNPs was performed using the program BLASTX. Five were identified as nonsynonymous cSNPs, five were synonymous cSNPs, and two SNPs were located in 3'-UTRs. Our data indicated that a relatively high SNP prediction rate (71%) from a large EST database could produce abundant cSNPs rapidly, which can be used as valuable genetic markers in cattle.  相似文献   

3.
利用生物信息学方法,对GeneBank中截至2006年9月收录的全部来源于山羊基因的共计637条表达序列标签(EST)进行了综合及分类分析。结果显示,在来源于绒山羊含毛囊皮肤的392条EST序列中有48条为编码角蛋白或角蛋白关联蛋白的基因;在乳腺来源的245条EST中则无此类基因,而是如免疫球蛋白基因、维生素转运蛋白基因、MHC等诸多种类基因,以免疫和酶类居多。两类不同组织来源的FST相应基因相似之处最显著的是编码核糖体蛋白的基因,其中含毛囊皮肤组织的EST中有15.1%,乳腺组织为21.6%,并且有两种不同组织来源的EST组成的26条基序(contigs)中,编码核糖体蛋白的有17条,高达65.4%。  相似文献   

4.
An expressed sequence tag (EST) library was constructed from hemocytes of the black tiger shrimp (Penaeus monodon) to identify genes associated with immunity in this economically important species. The number of complementary DNA clones in the constructed library was approximately 4 x 10(5). Of these, 615 clones having inserts larger than 500 bp were unidirectionally sequenced and analyzed by homology searches against data in GenBank. Significant homology to known genes was found in 314 (51%) of the 615 clones, but the remaining 301 sequences (49%) did not match any sequence in GenBank. Approximately 35% of the matched ESTs were significantly identified by the BLASTN and BLASTX programs, while 65% were recognized only by the BLASTX program. Of the 615 clones, 55 (8.9%) were identified as putative immune-related genes. The isolated genes were composed of those coding for enzymes and proteins in the clotting system and the prophenoloxidase-activating system, antioxidative enzymes, antimicrobial peptides, and serine proteinase inhibitors. Three full-length ESTs encoding antimicrobial peptides (antilipopolysaccharide and penaeidin homologues) and a heat shock protein (cpn10 homologue) are reported.  相似文献   

5.
Bovine coding region single nucleotide polymorphisms located proximal to quantitative trait loci were identified to facilitate bovine QTL fine mapping research. A total of 692,763 bovine SNPs was extracted from 39,432 UniGene clusters, and 53,446 candidate SNPs were found to be a depth >3. In order to validate the in silico SNPs experimentally, 186 animals representing 14 breeds and 100 mixed breeds were analyzed. Genotyping of 40 randomly selected candidate SNPs revealed that 43% of these SNPs ranged in frequency from 0.009 to 0.498. To identify non-synonymous SNPs and to correct for possible frameshift errors in the ESTs at the predicted SNP positions, we designed a program that determines coding regions by protein-sequence referencing, and identified 17,735 nsSNPs. The SNPs and bovine quantitative traits loci informations were integrated into a bovine SNP data: BcSNPdb (http://snugenome.snu.ac.kr/BtcSNP/). Currently there are 43 different kinds of quantitative traits available. Thus, these SNPs would serve as valuable resources for exploiting genomic variation that influence economically and agriculturally important traits in cows.  相似文献   

6.
A total of 10,154 5'-end expressed sequence tags (EST) were established from the normalized and size-selected cDNA libraries of a marine red alga, Porphyra yezoensis. Among the ESTs, 2140 were unique species, and the remaining 8014 were grouped into 1127 species. Database search of the 3267 non-redundant ESTs by BLAST algorithm showed that the sequences of 1080 species (33.1%) have similarity to those of registered genes from various organisms including higher plants, mammals, yeasts, and cyanobacteria, while 2187 (66.9%) are novel. Codon usage analysis in the coding regions of 101 non-redundant EST groups showing significant similarity to known genes indicated the higher GC contents at the third position of codons (79.4%) than the first (62.2%) and the second position (45.0%), suggesting that the genome has been exposed to high GC pressure during evolution. The sequence data of individual ESTs are available at the web site http://www.kazusa.or.jp/en/plant/porphyra/EST/.  相似文献   

7.
Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance.  相似文献   

8.
9.
10.
Expressed sequence tags (ESTs) have been obtained from several hundred brain cDNAs as an initial effort to characterize expressed brain genes. These ESTs will become tools for human genome mapping and they will also provide candidate causative genes for inherited disorders affecting the central nervous system. We have developed a procedure for the rapid chromosomal assignment of these ESTs: cDNA sequences are first analyzed by a computer program to determine regions likely not to be interrupted by introns in the genomic DNA. A pair of oligonucleotide primers is then designed to amplify this region by the polymerase chain reaction using DNA template from human-rodent somatic cell hybrid chromosomal panels. The chromosomal assignment of the cDNA is determined by studying the segregation of the amplified products in these panels. In this paper we describe the mapping of 46 brain ESTs, as well as observations on the amplification of rodent sequences.  相似文献   

11.
MOTIVATION: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html  相似文献   

12.
13.
MicroRNAs are small (20-22 nucleotides) none coding, regulatory RNAs, whose pivotal role in gene expression has been associated in number of diseases, therefore prediction of miRNA is an essential yet challenging field. In this study miRNAs of C. roseus are predicted along with their possible target genes. A total of 19,899 ESTs were downloaded from dbEST database and processed and trimmed through SeqClean. Nine sequences were trashed and 31 sequences were trimmed by the program and the resulting sequences were submitted to Repeatmasker and TGICL for clustering and assembly. This contig database was now used to find the putative miRNAs by performing a local BLAST with the miRNAs of B. rapa retrieved from miRBase. The targets were scanned by hybridizing screened ESTs with the UTRs of human using miRanda software. Finally, 7 putative miRNAs were found to hybridize with the various targets of signal transduction and apoptosis that may play significant role in preventing diseases like Leukemia, Arthritis and Alzheimer.  相似文献   

14.
A rapid PCR-based method for genetically mapping ESTs   总被引:12,自引:0,他引:12  
A simple, semi-automatable procedure was developed for converting expressed sequence tags (ESTs) into mappable genetic markers. The polymerase chain reaction is used to amplify regions immediately 5′ or 3′ to the coding regions of genes in order to maximise sequence variability between alleles. Fragment length and nucleotide substitution polymorphisms among amplified alleles can be detected using either ethidium bromide staining or automated laser-based fluorescence. A 6% non-denaturing acrylamide gel, analysed with an ABI 377 DNA sequencer, proved capable of resolving homoduplexes and heteroduplexes formed between amplified alleles containing nucleotide substitutions as well as resolving allelic length differences. With this approach 75% of 60 ESTs from a range of Pinus species could be genetically mapped in each of three pedigrees from P. radiata and P. taeda. Furthermore, three or four alleles were detected in each pedigree for 42% of the EST markers. Received: 4 January 2000 / Accepted: 26 May 2000  相似文献   

15.
EST clustering error evaluation and correction   总被引:4,自引:0,他引:4  
MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.  相似文献   

16.
17.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

18.
To obtain an initial overview of gene diversity and expression pattern in porcine thymus, 11,712 ESTs (Expressed Sequence Tags) from 100-day-old porcine thymus (FTY) were sequenced and 7,071 cleaned ESTs were used for gene expression analysis. Clustered by the PHRAP program, 959 contigs and 3,074 singlets were obtained. Blast search showed that 806 contigs and 1,669 singlets (totally 5,442 ESTs) had homologues in GenBank and 1,629 ESTs were novel. According to the Gene Ontology classification, 36.99% ESTs were cataloged into the gene expression group, indicating that although the functional gene (18.78% in defense group) of thymus is expressed in a certain degree, the 100-day-old porcine thymus still exists in a developmental stage. Comparative analysis showed that the gene expression pattern of the 100-day-old porcine thymus is similar to that of the human infant thymus.  相似文献   

19.
20.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号