首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.  相似文献   

2.
3.
Expressed sequence tags (ESTs) represent 500-1000-bp-long sequences corresponding to mRNAs derived from different sources (cell lines, tissues, etc.). The human EST database contains over 8,000,000 sequences, with over 4,000,000,000 total nucleotides. RNA molecules are transcribed from a genomic DNA template; therefore, all ESTs should match corresponding genomes. Nevertheless, we have found in the human EST database approximately 11,000 ESTs not matching sequences in the human genome database. The presence of "trash" ESTs (TESTs) in the EST database could result from DNA or RNA contamination of the laboratory equipment, tissues, or cell lines. TESTs could also represent sequences from unidentified human genes or from species inhabiting the human body. Here, we attempt to identify the sources of human EST database contaminations. In particular, we discuss systematic contamination of the mammalian EST databases with sequences of plants.  相似文献   

4.
5.
MOTIVATION: A whole set of Expressed Sequence Tags (ESTs) from the Sf9 cell line of Spodoptera frugiperda is presented here for the first time. By this way we want to identify both conserved and specific genes of this pest species. We also expect from this analysis to find a class of protein sequences providing a tool to explore genomic features and phylogeny of Lepidoptera. RESULTS: The ESTs display both housekeeping as well as developmentally regulated genes, and a high percentage of sequences with unknown function. Among the identified ORFs, almost all ribosomal proteins (RPs) were found with high EST redundancy and hence sequence accuracy. The codon usage found among RP genes is in average surprisingly much less biased in Lepidoptera than in other organisms. Other Spodoptera genes also displayed a low bias, suggesting a general genome expression feature in this Lepidoptera. We also found that the L35A and L36 RP sequences, respectively, display 40 and 10 amino-acid insertions, both being present only in insects. Sequence analysis suggests that they are probably not subjected to a strong selective pressure and may be good phylogenetic markers for Lepidoptera. Most interestingly, the Lepidoptera sequences of 9 RP genes displayed a specific signature different from the canonical one. We conclude that the RP family allows valuable comparative genomics and phylogeny of Lepidoptera. AVAILABILITY: All EST sequence data are available from the private 'Spodo-Base' upon request.  相似文献   

6.
柱花草栽培种热研2号(Stylosanthes guianensis‘Reyan2’)对铝毒有较强的耐受性。为了鉴定其在铝胁迫下的诱导基因,利用抑制消减杂交(SSH)技术构建在300μmol·L-1铝胁迫下正向cDNA文库。挑选插入片段大于300bp的600个克隆进行测序,共获得504条表达序列标签(EST)。序列重复性分析表明,其中12.1%的EST只有1次重复,61.4%的EST有2-16次重复,重复出现次数较高的EST是细胞色素P450(53次,占10.5%)、病原诱导型胰蛋白酶抑制剂(44次,占8.7%)和衰老相关蛋白(37次,占7.3%)。BLASTX分析显示,504条EST中有97种非冗余基因,其中包括46条功能已知基因和51条功能未知序列。46条功能已知EST中有30个为已报道铝胁迫相关基因,16个是新发现的铝胁迫相关基因。SSHcDNA文库提供的信息为阐明柱花草耐铝毒的分子机制提供了重要线索。  相似文献   

7.
With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline "EST-SSR-MARKER PIPELINE (ESMP)". This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding. AVAILABILITY: The database is available for free at http://bioinfo.aau.ac.in/ESMP.  相似文献   

8.
The presence of at least ten mouse LDH-A pseudogenes was demonstrated in the genomic blot analysis, and four different processed pseudogenes have thus far been isolated and characterized. In this report, the nucleotide sequences to two different mouse lactate dehydrogenase-A processed pseudogenes, M11 and M14, were determined and compared with the protein-coding sequences of the mouse and rat LDH-A functional genes. In the pseudogene M11, the sequence of 64 nucleotides from codon no. 257 to 278 was tandemly duplicated. In the pseudogene M14, the sequence of 22 nucleotides from codon no. 68 to 75 was replaced by an inserted repetitive sequence of 242 nucleotides homologous to a mouse truncated R element. The pattern of nucleotide substitutions accumulated in mouse LDH-A pseudogenes M11 and M14, as well as that of pseudogene M10 identified previously, was analyzed, and the substitution frequencies of the C or G at the CG dinucleotide were found to be high.  相似文献   

9.
The pineal gland is the circadian oscillator in the chicken, regulating diverse functions ranging from egg laying to feeding. Here, we describe the isolation and characterization of expressed sequence tags (ESTs) isolated from a chicken pineal gland cDNA library. A total of 192 unique sequences were analysed and submitted to GenBank; 6% of the ESTs matched neither GenBank cDNA sequences nor the newly assembled chicken genomic DNA sequence, three ESTs aligned with sequences designated to be on the Z_random, while one matched a W chromosome sequence and could be useful in cataloguing functionally important genes on this sex chromosome. Additionally, single nucleotide polymorphisms (SNPs) were identified and validated in 10 ESTs that showed 98% or higher sequence similarity to known chicken genes. Here, we have described resources that may be useful in comparative and functional genomic analysis of genes expressed in an important organ, the pineal gland, in a model and agriculturally important organism.  相似文献   

10.
The fibroin gene expression pattern and regulation of the posterior silkgland were studied by means of expressed sequence tags (ESTs) using the first and fifth day larvae of the fifth instar of silkworm, Bombyx mori L (strain: C 108). The results showed that there were 911 repetitive ESTs and 1950 single sequences (Singlets) among total 2861 consentient sequences, which were spliced. 1335 sequences were identified and the other 1526 were unknown. 5560 sequences (55.89%) in the posterior silkgland cell of the silkworm were new ESTs without ho-mology with EST data published by Mita et al. The number of repetitive ESTs and single sequences from the first day larvae of the fifth instar was double more than that of the fifth day of the same instar in the silkworms. The unigenes which were more than 50 in repetitive EST size (contig size) came to only about 0.5% in total consentient sequences. There were significant differences between gene expression frequencies, and expressed genes were related to fibroin synthesis and its secretion and fibroin composition. Comparing the fifth day with the first day of the fifth instar, the genes-expressed quantity of fibroin heavy-chain gene was 18 fold higher, fibroin light-chain gene 9 fold and fibroin P52 gene 8 fold. 508 genes functioned for cellular component and 315 for enzyme after function tracing. These results implied that the gene expression of the first day was mainly for preparation for fibroin synthesis except for the growth of silkgland cells, and the gene expression of the fifth day of the fifth instar was mainly for synthesizing and excreting fibroin. Because the ratio of heavy chain, light chain and p25 of fibroin was not 6:6:1 as theoretically expected, or its special H-chain structure, the H-chain gene was not easy to detect through EST technique. Most of genes among total 2861 consentient sequences functioned for fibroin synthesis and secretion. This suggested the fibroin synthesis and secretion procedure of the posterior silkgland was more complex than the knowledge we have.  相似文献   

11.
The fibroin gene expression pattern and regulation of the posterior silkgland were studied by means of expressed sequence tags (ESTs) using the first and fifth day larvae of the fifth instar of silkworm, Bombyx mori L (strain: C 108). The results showed that there were 911 repetitive ESTs and 1950 single sequences (Singlets) among total 2861 consentient sequences, which were spliced. 1335 sequences were identified and the other 1526 were unknown. 5560 sequences (55.89%) in the posterior silkgland cell of the silkworm were new ESTs without homology with EST data published by Mita et al. The number of repetitive ESTs and single sequences from the first day larvae of the fifth instar was double more than that of the fifth day of the same instar in the silkworms. The unigenes which were more than 50 in repetitive EST size (contig size) came to only about 0.5% in total consentient sequences. There were significant differences between gene expression frequencies, and expressed genes were related to fibroin synthesis and its secretion and fibroin composition. Comparing the fifth day with the first day of the fifth instar, the genes-expressed quantity of fibroin heavy-chain gene was 18 fold higher, fibroin light-chain gene 9 fold and fibroin P52 gene 8 fold. 508 genes functioned for cellular component and 315 for enzyme after function tracing. These results implied that the gene expression of the first day was mainly for preparation for fibroin synthesis except for the growth of silkgland cells, and the gene expression of the fifth day of the fifth instar was mainly for synthesizing and excreting fibroin. Because the ratio of heavy chain, light chain and p25 of fibroin was not 6:6:1 as theoretically expected, or its special H-chain structure, the H-chain gene was not easy to detect through EST technique. Most of genes among total 2861 consentient sequences functioned for fibroin synthesis and secretion. This suggested the fibroin synthesis and secretion procedure of the posterior silkgland was more complex than the knowledge we have.  相似文献   

12.
Computational analysis of alternative splicing using EST tissue information   总被引:2,自引:0,他引:2  
Expressed sequence tags (ESTs) from normal and tumor tissues have been deposited in public databases. These ESTs and all mRNA sequences were aligned with the human genome sequence using LEADS, Compugen's alternative splicing modeling platform. We developed a novel computational approach to analyze tissue information of aligned ESTs in order to identify cancer-specific alternative splicing and gene segments highly expressed in particular cancers. Several genes, including one encoding a possible pre-mRNA splicing factor, displayed cancer-specific alternative splicing. In addition, multiple candidate gene segments highly expressed in colon cancers were identified.  相似文献   

13.
L Gieser  A Swaroop 《Genomics》1992,13(3):873-876
Expressed sequence tags (ESTs) provide useful molecular landmarks for physical mapping and identify the position of an expressed region in the genome. The use of subtracted cDNA libraries enriched for tissue-specific genes as a source of ESTs should reduce the repetitive isolation of constitutively expressed sequences. We report here the sequence tags from the 3'-end region of 58 new directionally cloned cDNAs from a subtracted human retinal pigment epithelium (RPE) cell line library. Eight of the cDNAs have been assigned to human chromosomes using PCR-based EST assays. Chromosomal mapping of subtracted RPE cDNA clones may also help in identifying candidate genes for inherited eye diseases.  相似文献   

14.
A PCR primer sequence is called degenerate if some of its positions have several possible bases. The degeneracy of the primer is the number of unique sequence combinations it contains. We study the problem of designing a pair of primers with prescribed degeneracy that match a maximum number of given input sequences. Such problems occur when studying a family of genes that is known only in part, or is known in a related species. We prove that various simplified versions of the problem are hard, show the polynomiality of some restricted cases, and develop approximation algorithms for one variant. Based on these algorithms, we implemented a program called HYDEN for designing highly-degenerate primers for a set of genomic sequences. We report on the success of the program in an experimental scheme for identifying all human olfactory receptor (OR) genes. In that project, HYDEN was used to design primers with degeneracies up to 10(10) that amplified with high specificity many novel genes of that family, tripling the number of OR genes known at the time.  相似文献   

15.
16.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

17.
铝胁迫下柱花草SSH文库构建及表达序列标签分析   总被引:1,自引:0,他引:1  
柱花草栽培种热研2号(Stylosanthes guianensis ‘Reyan 2’)对铝毒有较强的耐受性。为了鉴定其在铝胁迫下的诱导 基因, 利用抑制消减杂交(SSH)技术构建在300 μmol·L–1铝胁迫下正向cDNA文库。挑选插入片段大于300 bp的600个克隆进行测序, 共获得504条表达序列标签(EST)。序列重复性分析表明, 其中12.1%的EST只有1次重复, 61.4%的EST有2–16次重复, 重复出现次数较高的EST是细胞色素P450(53次, 占10.5%)、病原诱导型胰蛋白酶抑制剂(44次, 占8.7%)和衰老相关蛋白(37次, 占7.3%)。BLASTX分析显示, 504条EST中有97种非冗余基因, 其中包括46条功能已知基因和51条功能未知序列。46条功能已知EST中有30个为已报道铝胁迫相关基因, 16个是新发现的铝胁迫相关基因。SSH cDNA文库提供的信息为阐明柱花草耐铝毒的分子机制提供了重要线索。  相似文献   

18.
The rat K and T kininogen genes show different modes of mRNA production. The K gene encodes two distinct mRNAs for high molecular weight (HMW) and low molecular weight (LMW) kininogens. These two mRNAs are generated by differential usage of the 3'-terminal exon (LMW exon) and the exon next to and upstream from the LMW exon (HMW exon) through alternative splicing and polyadenylation. In contrast, the T gene generates one mRNA by using selectively the LMW exon, although the T gene is extremely homologous to the K gene. In this study, we constructed a series of chimeric kininogen genes by not only exchanging equivalent restriction fragments of the two genes but also replacing nucleotides that differ between the two genes. We then examined the sequences and the mechanisms governing the different expression patterns of the two genes by transfecting the chimeric genes into heterologous COS cells. The results indicated that the different expression patterns of the K and T genes are governed by two separate internal sequences of the HMW and LMW exons. The internal HMW sequence contains a set of five repetitive sequences, and these repetitive sequences are highly complementary to the 5' portion of U1 snRNA. Furthermore, the nucleotide differences in the U1 snRNA-complementary sequences between the K and T genes have marked effects on the relative formation of the HMW and LMW mRNAs; this indicates that the repetitive sequences complementary to U1 snRNA play a crucial role in determining the relative expression of the two mRNAs. Based on these findings, we discuss a novel mechanism for alternative RNA processing, in which splicing efficiency is controlled by the interaction of U1 small nuclear ribonucleoproteins and the U1 snRNA-complementary repetitive sequences of the kininogen pre-mRNA.  相似文献   

19.
Using a strategy requiring only modest computational resources, wheat expressed sequence tag (EST) sequences from various sources were assembled into contigs and compared with a nonredundant barley sequence assembly, with ESTs, with complete draft genome sequences of rice and Arabidopsis thaliana, and with ESTs from other plant species. These comparisons indicate that (i) wheat sequences available from public sources represent a substantial proportion of the diversity of wheat coding sequences, (ii) prediction of open reading frames in the whole genome sequence improves when supplemented with EST information from other species, (iii) a substantial number of candidates for novel genes that are unique to wheat or related species can be identified, and (iv) a smaller number of genes can be identified that are common to monocots and dicots but absent from Arabidopsis. The sequences in the last group may have been lost from Arabidopsis after descendance from a common ancestor. Examples of potential novel wheat genes and Triticeae-specific genes are presented.  相似文献   

20.
黄管秦艽(Gentiana officinalis)是一种重要的藏药高山植物,本研究构建了该物种开花期的eDNA文库。经检测达到中等cDNA文库水平,文库滴度为1.2×10^7pfu/ml,重组率95.9%,插入片段平均长度大于500bp。对343个随机挑选的重组克隆进行部分测序,获得的ESTs经编辑后共有181条有效序列。经生物信息学方法分析181条表达序列标签(EST)代表144个单克隆序列,其中55个与已鉴定的基因同源,35个序列与未鉴定的EST匹配,54个未找到同源序列;后两者共有89个EST序列未发现功能相似的蛋白。对已鉴定的EST进行功能分析发现,相关基因主要编码以下蛋白:与蛋白表达相关的占35%;光合作用相关的占笠%;新陈代谢相关的占18%;抗性相关的占11%;质膜运输和细胞分裂相关的分别占5%;染色体变化和细胞信号转导的分别占2%。根据有效EST序列设计引物,通过RT-PCR进一验证了所得EST的准确性。这些研究结果为将来研究黄管秦艽的功能基因以及该物种与相关物种的群体遗传学、进化生物学等方面提供了基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号