期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The alpha/beta fold family of proteins database and the cholinesterase gene server ESTHER. 总被引：2，自引：1，他引：1

X Cousin T Hotelier K Giles P Lievin J P Toutant A Chatonnet 《Nucleic acids research》1997,25(1):143-146

ESTHER (for esterases, alpha/betahydrolase enzyme and relatives) is a database of sequences phylogenetically related to cholinesterases. These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) sharing a similar structure of a central beta-sheet surrounded by alpha-helices. Among these proteins a wide range of functions can be found (hydrolases, adhesion molecules, hormone precursors). The purpose of ESTHER is to help comparison of structures and functions of members of the family. Since the last release, new features have been added to the server. A BLAST comparison tool allows sequence homology searches within the database sequences. New sections are available: kinetics and inhibitors of cholinesterases, fasciculin-acetylcholinesterase interaction and a gene structure review. The mutation analysis compilation has been improved with three-dimensional images. A mailing list has been created. 相似文献

2.

Gene expression versus sequence for predicting function: Glia Maturation Factor gamma is not a glia maturation factor

Walker MG 《基因组蛋白质组与生物信息学报(英文版)》2003,1(1):52-57

It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (co-expression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma. We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells. We found GMF-gamma mRNA in almost every tissue examined, with expre 相似文献

3.

Genomics of phytopathogenic fungi and the development of bioinformatic resources

Soanes DM Skinner W Keon J Hargreaves J Talbot NJ 《Molecular plant-microbe interactions : MPMI》2002,15(5):421-427

Genomic resources available to researchers studying phytopathogenic fungi are limited. Here, we briefly review the genomic and bioinformatic resources available and the current status of fungal genomics. We also describe a relational database containing sequences of expressed sequence tags (ESTs) from three phytopathogenic fungi, Blumeria graminis, Magnaporthe grisea, and Mycosphaerella graminicola, and the methods and underlying principles required for its construction. The database contains significant annotation for each EST sequence and is accessible at http://cogeme.ex.ac.uk. An easy-to-use interface allows the user to identify gene sequences by using simple text queries or homology searches. New querying functions and large sequence sets from a variety of phytopathogenic species will be incorporated in due course. 相似文献

4.

Clustering of Schistosoma mansoni mRNA sequences and analysis of the most transcribed genes: implications in metabolism and biology of different developmental stages

Prosdocimi F Faria-Campos AC Peixoto FC Pena SD Ortega JM Franco GR 《Memórias do Instituto Oswaldo Cruz》2002,97(Z1):61-69

The study of the Schistosoma mansoni genome, one of the etiologic agents of human schistosomiasis, is essential for a better understanding of the biology and development of this parasite. In order to get an overview of all S. mansoni catalogued gene sequences, we performed a clustering analysis of the parasite mRNA sequences available in public databases. This was made using softwares PHRAP and CAP3. The consensus sequences, generated after the alignment of cluster constituent sequences, allowed the identification by database homology searches of the most expressed genes in the worm. We analyzed these genes and looked for a correlation between their high expression and parasite metabolism and biology. We observed that the majority of these genes is related to the maintenance of basic cell functions, encoding genes whose products are related to the cytoskeleton, intracellular transport and energy metabolism. Evidences are presented here that genes for aerobic energy metabolism are expressed in all the developmental stages analyzed. Some of the most expressed genes could not be identified by homology searches and may have some specific functions in the parasite. 相似文献

5.

The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification. 总被引：3，自引：0，他引：3

下载免费PDF全文

S D Rounsley A Glodek G Sutton M D Adams C R Somerville J C Venter A R Kerlavage 《Plant physiology》1996,112(3):1177-1183

The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web. 相似文献

6.

Simultaneous mutation detection of three homoeologous genes in wheat by High Resolution Melting analysis and Mutation Surveyor^?

Chongmei Dong Kate Vincent Peter Sharp 《BMC plant biology》2009,9(1):143

Background

TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful tool for reverse genetics, combining traditional chemical mutagenesis with high-throughput PCR-based mutation detection to discover induced mutations that alter protein function. The most popular mutation detection method for TILLING is a mismatch cleavage assay using the endonuclease CelI. For this method, locus-specific PCR is essential. Most wheat genes are present as three similar sequences with high homology in exons and low homology in introns. Locus-specific primers can usually be designed in introns. However, it is sometimes difficult to design locus-specific PCR primers in a conserved region with high homology among the three homoeologous genes, or in a gene lacking introns, or if information on introns is not available. Here we describe a mutation detection method which combines High Resolution Melting (HRM) analysis of mixed PCR amplicons containing three homoeologous gene fragments and sequence analysis using Mutation Surveyor^? software, aimed at simultaneous detection of mutations in three homoeologous genes. 相似文献

7.

Ro-associated Y RNAs in metazoans: evolution and diversification 总被引：2，自引：0，他引：2

Perreault J Perreault JP Boire G 《Molecular biology and evolution》2007,24(8):1678-1689

The Y genes encode small noncoding RNAs whose functions remain elusive, whose numbers vary between species, and whose major property is to be bound by the Ro60 protein (or its ortholog in other species). To better understand the evolution of the Y gene family, we performed a homology search in 27 different genomes along with a structural search using Y RNA specific motifs. These searches confirmed that Y RNAs are well conserved in the animal kingdom and resulted in the detection of several new Y RNA genes, including the first Y RNAs in insects and a second Y RNA detected in Caenorhabditis elegans. Unexpectedly, Y5 genes were retrieved almost as frequently as Y1 and Y3 genes, and, consequently are not the result of a relatively recent apparition as is generally believed. Investigation of the organization of the Y genes demonstrated that the synteny was conserved among species. Interestingly, it revealed the presence of six putative "fossil" Y genes, all of which were Y4 and Y5 related. Sequence analysis led to inference of the ancestral sequences for all Y RNAs. In addition, the evolution of existing Y RNAs was deduced for many families, orders and classes. Moreover, a consensus sequence and secondary structure for each Y species was determined. Further evolutionary insight was obtained from the analysis of several thousand Y retropseudogenes among various species. Taken together, these results confirm the rich and diversified evolution history of Y RNAs. 相似文献

8.

nWayComp: a genome-wide sequence comparison tool for multiple strains/species of phylogenetically related microorganisms

Yao J Lin H Doddapaneni H Civerolo EL 《In silico biology》2007,7(2):195-200

The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level. 相似文献

9.

Structural analysis of a Lotus japonicus genome. V. Sequence features and mapping of sixty-four TAC clones which cover the 6.4 mb regions of the genome.

Tomohiko Kato Shusei Sato Yasukazu Nakamura Takakazu Kaneko Erika Asamizu Satoshi Tabata 《DNA research》2003,10(6):277-285

We determined the nucleotide sequences of 64 TAC (transformation-competent artificial chromosome) clones selected from genomic libraries of Lotus japonicus accession Miyakojima MG-20 based on the sequence information of expressed sequence tags (ESTs), cDNAs, genes and DNA markers from L. japonicus and other legumes. The length of the DNA regions sequenced in this study was 6,370,255 bp, and the total length of the L. japonicus genome sequenced so far is 32,537,698 bp together with the nucleotide sequences of 256 TAC clones previously reported. Five hundred forty-eight potential protein-encoding genes with known or predicted functions, 127 gene segments and 224 pseudogenes were assigned to the newly sequenced regions by computer prediction and similarity searches against the sequences in protein and EST databases. Based on the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was genetically localized onto the linkage map of two accessions of L. japonicus, MG-20 and Gifu B-129. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/. 相似文献

10.

Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases

Wallqvist A Fukunishi Y Murphy LR Fadel A Levy RM 《Bioinformatics (Oxford, England)》2000,16(11):988-1002

MOTIVATION: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate. RESULTS: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods. AVAILABILITY: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas CONTACT: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions. 相似文献

11.

Identification and molecular characterization of myosin gene family in Oryza sativa genome

Jiang S Ramachandran S 《Plant & cell physiology》2004,45(5):590-599

Myosins play an important role in various developmental processes in plants. We have identified 14 myosin genes in rice (Oryza sativa cv. Nipponbare) genome using sequence information available in public databases. Phylogenetic analysis of these sequences with other plant and non-plant myosins revealed that two of the predicted sequences belonged to class VIII and the others to class XI. All of these genes were distributed on seven chromosomes in the rice genome. Domain searches on these sequences indicated that a typical rice myosin consisted of Myosin_N, head domain, neck (IQ motifs), tail, and dilute (DIL) domain. Based on the sequence information obtained from predicted myosins, we isolated and sequenced two full-length cDNAs, OsMyoVIIIA and OsMyoXIE, representing each of the two classes of myosins. These two cDNAs isolated from different organs existed in isoforms due to differential splicing and showed minor differences from the predicted myosin in exon organization. Out of 14 myosin genes 11 were expressed in three major organs: leaves, panicles, and roots, among which three myosins exhibited different expression levels. On the other hand, three of the total myosin sequences showed organ-specific expression. The existence of different myosin genes and their isoforms in different organs or tissues indicates the diversity of myosin functions in rice. 相似文献

12.

Structural analysis of a Lotus japonicus genome. IV. Sequence features and mapping of seventy-three TAC clones which cover the 7.5 mb regions of the genome.

Erika Asamizu Tomohiko Kato Shutsei Sato Yasukazu Nakamura Takakazu Kaneko Satoshi Tabata 《DNA research》2003,10(3):115-122

Using the sequence information of expressed sequences tags (ESTs), cDNAs and genes from Lotus japonicus and other legumes, 73 TAC (transformation-competent artificial chromosomes) clones were selected from a genomic library of L. japonicus accession MG-20, and their nucleotide sequences were determined. The length of the DNA sequenced in this study was 7,455,959 bp, and the total length of the DNA regions sequenced so far is 26,167,443 bp together with the nucleotide sequences of 183 TAC clones previously reported. By similarity searches against the sequences in protein and EST databases and prediction by computer programs, a total of 699 potential protein-encoding genes with known or predicted functions, 163 gene segments and 267 pseudogenes were assigned to the newly sequenced regions. Based oil the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was located onto the linkage map of two accessions of L. japonicus, Gifu B-129 and Miyakojima MG-20. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/. 相似文献

13.

Sequences, sequence clusters and bacterial species

Hanage WP Fraser C Spratt BG 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2006,361(1475):1917-1927

Whatever else they should share, strains of bacteria assigned to the same species should have house-keeping genes that are similar in sequence. Single gene sequences (or rRNA gene sequences) have very few informative sites to resolve the strains of closely related species, and relationships among similar species may be confounded by interspecies recombination. A more promising approach (multilocus sequence analysis, MLSA) is to concatenate the sequences of multiple house-keeping loci and to observe the patterns of clustering among large populations of strains of closely related named bacterial species. Recent studies have shown that large populations can be resolved into non-overlapping sequence clusters that agree well with species assigned by the standard microbiological methods. The use of clustering patterns to inform the division of closely related populations into species has many advantages for poorly studied bacteria (or to re-evaluate well-studied species), as it provides a way of recognizing natural discontinuities in the distribution of similar genotypes. Clustering patterns can be used by expert groups as the basis of a pragmatic approach to assigning species, taking into account whatever additional data are available (e.g. similarities in ecology, phenotype and gene content). The development of large MLSA Internet databases provides the ability to assign new strains to previously defined species clusters and an electronic taxonomy. The advantages and problems in using sequence clusters as the basis of species assignments are discussed. 相似文献

14.

Genomic BLAST: custom-defined virtual databases for complete and unfinished genomes 总被引：10，自引：0，他引：10

Cummings L Riley L Black L Souvorov A Resenchuk S Dondoshansky I Tatusova T 《FEMS microbiology letters》2002,216(2):133-138

BLAST (Basic Local Alignment Search Tool) searches against DNA and protein sequence databases have become an indispensable tool for biomedical research. The proliferation of the genome sequencing projects is steadily increasing the fraction of genome-derived sequences in the public databases and their importance as a public resource. We report here the availability of Genomic BLAST, a novel graphical tool for simplifying BLAST searches against complete and unfinished genome sequences. This tool allows the user to compare the query sequence against a virtual database of DNA and/or protein sequences from a selected group of organisms with finished or unfinished genomes. The organisms for such a database can be selected using either a graphic taxonomy-based tree or an alphabetical list of organism-specific sequences. The first option is designed to help explore the evolutionary relationships among organisms within a certain taxonomy group when performing BLAST searches. The use of an alphabetical list allows the user to perform a more elaborate set of selections, assembling any given number of organism-specific databases from unfinished or complete genomes. This tool, available at the NCBI web site http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/genom_table_cgi, currently provides access to over 170 bacterial and archaeal genomes and over 40 eukaryotic genomes. 相似文献

15.

Essential factors determining codon usage in ubiquitin genes

Kazuei Mita Sachiko Ichimura Mitsuru Nenoi 《Journal of molecular evolution》1991,33(3):216-225

Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes. 相似文献

16.

Motifer, a search tool for finding amino acid sequence patterns from nucleotide sequence databases.

H J?rnvall 《FEBS letters》1999,456(1):85-88

Motifer is a software tool able to find directly in nucleotide databases very distant homologues to an amino acid query sequence. It focuses searches on a specific amino acid pattern, scoring the matching and intervening residues as specified by the user. The program has been developed for searching databases of expressed sequence tags (ESTs), but it is also well suited to search genomic sequences. The query sequence can be a variable pattern with alternative amino acids or gaps and the sequences searched can contain introns or sequencing errors with accompanying frame shifts. Other features include options to generate a searchable output, set the maximal sequencing error frequency, limit searches to given species, or exclude already known matches. Motifer can find sequence homologues that other search algorithms would deem unrelated or would not find because of sequencing errors or a too large number of other homologues. The ability of Motifer to find relatives to a given sequence is exemplified by searches for members of the transforming growth factor-beta family and for proteins containing a WW-domain. The functions aimed at enhancing EST searches are illustrated by the 'in silico' cloning of a novel cytochrome P450 enzyme. 相似文献

17.

CATMA: a complete Arabidopsis GST database 总被引：9，自引：0，他引：9

下载免费PDF全文

Crowe ML Serizet C Thareau V Aubourg S Rouzé P Hilson P Beynon J Weisbeek P van Hummelen P Reymond P Paz-Ares J Nietfeld W Trick M 《Nucleic acids research》2003,31(1):156-158

相似文献

18.

Histone and histone fold sequences and structures: a database. 总被引：4，自引：3，他引：1

下载免费PDF全文

A D Baxevanis D Landsman 《Nucleic acids research》1997,25(1):272-273

A database of aligned histone protein sequences has been constructed based on the results of homology searches of the major public sequence databases. In addition, sequences of proteins identified as containing the histone fold motif and structures of all known histone and histone fold proteins have been included in the current release. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, and links to the Entrez integrated information retrieval system at the National Center for Biotechnology Information (NCBI). The database currently contains over 1000 protein sequences. All sequences and alignments in this database are available through the World Wide Web at: http: //www.ncbi.nlm.nih.gov/Baxevani/HISTONES/ . 相似文献

19.

MITOP, the mitochondrial proteome database: 2000 update

下载免费PDF全文

Scharfe C Zaccaria P Hoertnagel K Jaksch M Klopstock T Dembowski M Lill R Prokisch H Gerbitz KD Neupert W Mewes HW Meitinger T 《Nucleic acids research》2000,28(1):155-158

MITOP (http://www.mips.biochem.mpg.de/proj/medgen/mitop/) is a comprehensive database for genetic and functional information on both nuclear- and mitochondrial-encoded proteins and their genes. The five species files--Saccharomyces cerevisiae, Mus musculus, Caenorhabditis elegans, Neurospora crassa and Homo sapiens--include annotated data derived from a variety of online resources and the literature. A wide spectrum of search facilities is given in the overlapping sections 'Gene catalogues', 'Protein catalogues', 'Homologies', 'Pathways and metabolism' and 'Human disease catalogue' including extensive references and hyperlinks to other databases. Central features are the results of various homology searches, which should facilitate the investigations into interspecies relationships. Precomputed FASTA searches using all the MITOP yeast protein entries and a list of the best human EST hits with graphical cluster alignments related to the yeast reference sequence are presented. The orthologue tables with cross-listings to all the protein entries for each species in MITOP have been expanded by adding the genomes of Rickettsia prowazeckii and Escherichia coli. To find new mitochondrial proteins the complete yeast genome has been analyzed using the MITOPROT program which identifies mitochondrial targeting sequences. The 'Human disease catalogue' contains tables with a total of 110 human diseases related to mitochondrial protein abnormalities, sorted by clinical criteria and age of onset. MITOP should contribute to the systematic genetic characterization of the mitochondrial proteome in relation to human disease. 相似文献

20.

Signals for the selection of a splice site in pre-mRNA. Computer analysis of splice junction sequences and like sequences 总被引：50，自引：0，他引：50

Y Ohshima Y Gotoh 《Journal of molecular biology》1987,195(2):247-259

To evaluate the importance of the surrounding nucleotide sequence in the selection of a splice site for mRNA, we have carried out computer studies of eukaryotic protein genes whose entire nucleotide sequences were available. A splice site-like sequence that has a significant homology to the consensus splice junction sequences is frequently found within an intron and exon. It is found that the higher the homology of a candidate donor site sequence to the nine-nucleotide consensus sequence, the higher is its probability of being a donor site. For most of the donors, the stability of presumed base-pairing with U1-RNA is higher than that of donor-like sequences, if any, in the adjacent exon and intron. However, homology of a candidate acceptor sequence to the 15-nucleotide consensus is a poor criterion of an acceptor site. The presence of a sequence that could serve as a branch-point 18 to 37 nucleotides before an acceptor does not seem to be critical in distinguishing it from an acceptor-like sequence. For genes of human, rat, mouse and chicken, respectively, nucleotide frequencies around splice junctions of many genes have been calculated. They seem to be different at some positions around a donor site from species to species. The acceptors for these vertebrates have longer pyrimidine-rich regions than the previous consensus sequence. The newly derived nucleotide frequencies were used as the standard to calculate the weighted homology score of a candidate splice site sequence in a gene of the four species. This weighted homology score of the 40 to 60-nucleotide intron-exon sequence is a much better criterion of an acceptor. These results suggest that the most important signal in the selection of a splice resides in the surrounding nucleotide sequence. It is also suggested that the surrounding nucleotide sequence alone is not generally sufficient for the selection. 相似文献