首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 42 毫秒
1.
Huntley MA  Golding GB 《Genetics》2004,166(3):1141-1154
Proteins associated with disease and development of the nervous system are thought to contain repetitive, simple sequences. However, genome-wide surveys for simple sequences within proteins have revealed that repetitive peptide sequences are the most frequent shared peptide segments among eukaryotic proteins, including those of Saccharomyces cerevisiae, which has few to no specialized developmental and neurological proteins. It is therefore of interest to determine if these specialized proteins have an excess of simple sequences when compared to other sets of compositionally similar proteins. We have determined the relative abundance of simple sequences within neurological proteins and find no excess of repetitive simple sequence within this class. In fact, polyglutamine repeats that are associated with many neurodegenerative diseases are no more abundant within neurological specialized proteins than within nonneurological collections of proteins. We also examined the codon composition of serine homopolymers to determine what forces may play a role in the evolution of extended homopolymers. Codon type homogeneity tends to be favored, suggesting replicative slippage instead of selection as the main force responsible for producing these homopolymers.  相似文献   

2.
3.
Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.  相似文献   

4.
基于PC/Linux的核酸序列电子延伸系统的构建及其应用   总被引:5,自引:0,他引:5  
新基因全长cDNA序列的获得常常是分子生物学工作者面临的难题。人类基因组计划及其相关计划的实施导致了大量表达序列标签(EST)的产生。利用一定的生物信息学算法,这些EST序列往往可用来对新基因片段进行延伸。采用Linux操作系统,利用Blast软件和Phrap软件以及EST数据库在微机上构建了EST序列的电子延伸系统,并对来自于人胎肝的11386条EST序列和511条插入片段全长cDNA序列进行了电子延伸,结果显示8373条EST序列和389条插入片段全长cDNA序列得到了程度不等的延伸,部分结果通过RACE实验得到证实。该套系统可高效地、规模化进行EST序列的延伸,可为通过实验获得新基因全长cDNA序列提供重要线索。 Abstract:Normally it is difficult to obtain full-length cDNA sequence of novel genes.More and more expressed sequence tags(ESTs) have been obtained since the start-up of human genome project.Powerful system is badly needed for data mining on these EST sequences.Based on a personal computer coupled with Linux operating system and EST database,the Blast software and Phrap software were used to construct a platform for in silico elongation of ESTs in our lab.The performance was tested using 11386 EST sequences and 511 partial-length cDNA sequences.Results demonstrated that 8373 EST and 389 cDNA sequence were elongated using this system.Thus the platform seems to be a fast way for full-length cDNA sequence cloning of new genes.  相似文献   

5.
With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline "EST-SSR-MARKER PIPELINE (ESMP)". This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding. AVAILABILITY: The database is available for free at http://bioinfo.aau.ac.in/ESMP.  相似文献   

6.
7.

Background

Neurocysticercosis is a disease caused by the oral ingestion of eggs from the human parasitic worm Taenia solium. Although drugs are available they are controversial because of the side effects and poor efficiency. An expressed sequence tag (EST) library is a method used to describe the gene expression profile and sequence of mRNA from a specific organism and stage. Such information can be used in order to find new targets for the development of drugs and to get a better understanding of the parasite biology.

Methods and Findings

Here an EST library consisting of 5760 sequences from the pig cysticerca stage has been constructed. In the library 1650 unique sequences were found and of these, 845 sequences (52%) were novel to T. solium and not identified within other EST libraries. Furthermore, 918 sequences (55%) were of unknown function. Amongst the 25 most frequently expressed sequences 6 had no relevant similarity to other sequences found in the Genbank NR DNA database. A prediction of putative signal peptides was also performed and 4 among the 25 were found to be predicted with a signal peptide. Proposed vaccine and diagnostic targets T24, Tsol18/HP6 and Tso31d could also be identified among the 25 most frequently expressed.

Conclusions

An EST library has been produced from pig cysticerca and analyzed. More than half of the different ESTs sequenced contained a sequence with no suggested function and 845 novel EST sequences have been identified. The library increases the knowledge about what genes are expressed and to what level. It can also be used to study different areas of research such as drug and diagnostic development together with parasite fitness via e.g. immune modulation.  相似文献   

8.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

9.
10.
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).  相似文献   

11.
Sim KL  Creamer TP 《Proteins》2004,54(4):629-638
Protein simple sequences, a subset of low-complexity sequences, are regions of sequence highly enriched in one or a few residue types. Simple sequences are exceedingly common, the average being more than one per protein sequence. Despite being so common, such sequences are not well-studied. The simple sequences that have been subjected to detailed study are often found to possess important functions. Here we present a survey of protein simple sequences, generally enriched in a single residue type, with the aim of studying their conservation. We find that the majority of such simple sequences are not conserved. However, conserved protein simple sequences are relatively common, with approximately 11% of the surveyed protein families possessing a conserved simple sequence. The data obtained in this study support the idea that simple sequences are conserved for functional reasons. Such functions can range from substrate binding, to mediating protein-protein interactions, to structural integrity. A perhaps surprising finding is that the residue enriching a conserved simple sequence is itself not necessarily conserved. Neither is the length of many of the highly conserved simple sequences. In the few cases where structural and functional data is available it is found that the conserved simple sequences are consistent with both local structure and function. The data presented support the idea that protein simple sequences can be conserved and have important roles in protein structure and function.  相似文献   

12.
13.
Yang HL  Cho EY  Han KH  Kim H  Kim SJ 《Gene》2007,395(1-2):144-150
Using in silico approaches, we cloned a novel mouse gene (mbu-1) that was strictly expressed in the central nervous system. mbu-1 was first identified as an EST after carrying out digital differential display for unigene libraries from various mouse tissues. The full-length cDNA sequence was obtained by extending the ends of EST by RACE. The cDNA sequence was 2611 bp long and contained an ORF of 597 AA. A positive cis-acting region was found in the neuroblastomaxglioma hybrid, NG108-15, and in human embryonic kidney HEK293 cell lines. RT-PCR and in situ hybridization analysis showed that the mbu-1 gene was only expressed in the brain and spinal cord during the embryonic stages, and throughout all regions of the adult brain, showing higher levels in the hippocampus and hypothalamus.  相似文献   

14.
油料作物EST资源的生物信息学分析   总被引:1,自引:0,他引:1  
利用生物信息学方法,收集整理GenBank数据库中截至2008年5月收录的油料作物油菜、花生、芝麻、大豆、向日葵、蓖麻、亚麻、棕榈等八种油料作物的表达序列标签(EST)序列信息,共获得1,185,911条EST序列,使用Crosmatch、RepeatMask-er、Phrap、CAP3、EMBOSS、Blast、EST-pipeline、ORF finder、Interproscan、blast2go、IdentiCS等软件,基于Linux操作系统,进行了综合及分类分析。共获得289,892条UniEST序列,通过以上对EST序列信息的基因注释信息,筛选出与油脂代谢相关的基因信息,并以此为基础构建了油料作物油脂代谢途径比较结构图。本研究为油料作物油脂代谢相关基因数据库的构建和不同油料作物油脂代谢异同的比较打下基础。  相似文献   

15.
The central and peripheral nervous systems (CNS and PNS) of the ascidian tadpole larva are comparatively simple, consisting of only about 350 cells. However, studies of the expression of neural patterning genes have demonstrated overall similarity between the ascidian CNS and the vertebrate CNS, suggesting that the ascidian CNS is sufficiently complex to be relevant to those of vertebrates. Recent progress in the Ciona intestinalis genome project and cDNA project together with considerable EST information has made Ciona an ideal model for investigating molecular mechanisms underlying the formation and function of the chordate nervous system. Here, we characterized 56 genes specific to the nervous system by determining their full-length cDNA sequences and confirming their spatial expression patterns. These genes included those that function in the nervous systems of other animals, especially those involved in photoreceptor-mediated signaling and neurotransmitter release. Thus, the nervous system-specific genes in Ciona larvae will provide not only probes for determining their function but also clues for exploring the complex network of nervous system-specific genes.  相似文献   

16.
The analysis of expressed sequences from a diverse set of plant species has fueled the increase in understanding of the complex molecular mechanisms underlying plant growth regulation. While representative data sets can be found for the major branches of plant evolution, fern species data are lacking. To further the availability of genetic information in pteridophytes, a normalized cDNA library of Adiantum capillus-veneris was constructed from prothallia grown under white light. A total of 10,420 expressed sequence tags (ESTs) were obtained and clustering of these sequences resulted in 7,100 nonredundant clusters. Of these, 1,608 EST clusters were found to be similar to sequences of known function and 1,092 EST clusters showed similarity to sequences of unknown function. Given the usefulness of Adiantum for developmental studies, the sequence data represented in this report stand to make a significant contribution to the understanding of plant growth regulation, particularly for pteridophytes.  相似文献   

17.
We determined 36 310 bovine expressed sequence tag (EST) sequences using 10 different cDNA libraries. For massive EST sequencing, we devised a new system with two major features. First, we constructed cDNA libraries in which the poly(A) tails were removed using nested deletion at the 3′-ends. This permitted high quality reading of sequences from the 3′-end of the cDNA, which is otherwise difficult to do. Second, we increased throughput by sequencing directly on templates generated by colony PCR. Using this system, we determined 600 cDNA sequences per day. The read-out length was >450 bases in >90% of the sequences. Furthermore, we established a data management system for analyses, storage and manipulation of the sequence data. Finally, 16 358 non-redundant ESTs were derived from ~6900 independent genes. These data will facilitate construction of a precise comparative map across mammalian species and isolate the functional genes that govern economic traits. This system is applicable to other organisms, including livestock, for which EST data are limited.  相似文献   

18.
A molecular understanding of porcine reproduction is of biological interest and economic importance. Our Midwest Consortium has produced cDNA libraries containing the majority of genes expressed in major female reproductive tissues, and we have deposited into public databases 21,499 expressed sequence tag (EST) gene sequences from the 3 end of clones from these libraries. These sequences represent 10,574 different genes, based on sequence comparison among these data, and comparison with existing porcine ESTs and genes indicate as many as 4652 of these EST clusters are novel. In silico analysis identified sequences that are expressed in specific pig tissues or organs and confirmed the broad expression in pig for many genes ubiquitously expressed in human tissues. Furthermore, we have developed computer software to identify sequence similarity of these pig genes with their human counterparts, and to extract the mapping information of these human homologues from genome databases. We demonstrate the utility of this software for comparative mapping by localizing 61 genes on the porcine physical map for Chromosomes (Chrs) 5, 10, and 14. The following Accession numbers were assigned to our deposited sequences: BF701840 – BF704551, BF708383, BF708386 – BF713604, BG322266 – BG322271, BI398567 – BI405235, BQ597354 – BQ605166.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号