期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Putative in silico mapping of DNA sequences to livestock genome maps using SSLP flanking sequences

Farber CR Medrano JF 《Animal genetics》2003,34(1):11-18

In this study, an in silico approach was developed to identify homologies existing between livestock microsatellite flanking sequences and GenBank nucleotide sequences. Initially, 1955 bovine, 1570 porcine and 1121 chicken microsatellites were downloaded and the flanking sequences were compared with the nr and dbEST databases of GenBank. A total of 74 bovine, 44 porcine and 37 chicken microsatellite flanking sequences passed our criteria and had at least one significant match to human genomic sequence, genes/expressed sequence tags (ESTs) or both. GenBank annotation and BLAT searches of the UCSC human genome assembly revealed that 38 bovine, 13 porcine and 17 chicken microsatellite flanking sequences were highly similar to known human genes. Map locations were available for 67 bovine, 44 porcine and 21 chicken microsatellite flanking sequences, providing useful links in the comparative maps of humans and livestock. In support of our approach, 112 alignments with both microsatellite and match mapping information were located in the expected chromosomal regions based on previously reported syntenic relationships. The development of this in silico mapping approach has significantly increased the number of genes and EST sequences anchored to the bovine, porcine and chicken genome maps and the number of links in various human-livestock comparative maps. 相似文献

2.

Improved resolution of the comparative horse-human map: investigating markers with in silico and linkage mapping approaches

Tozaki T Swinburne J Hirota K Hasegawa T Ishida N Tobe T 《Gene》2007,392(1-2):181-186

Genetic maps are extremely important tools for tracing the genes that govern economically significant traits, and microsatellites are a significant component of these. In this study, we isolated 2346 novel horse microsatellites as resources for the construction of high-density horse genetic maps. Of these 2346 markers, 339 (14.5%) horse sequences showed sequence homology to DNA sequences in the human genome, demonstrating that microsatellites as type II markers are valuable resources for developing linkage maps and that they have a potential equal to that of type I markers for developing comparative maps. Of the 339 markers, 206 (60.8%) were assigned to horse chromosomes using the Animal Health Trust (AHT) full-sib reference family, and 195 (94.6%) of these localized to the expected syntenic locations on the human genome. These results confirmed the high level of accuracy of in silico mapping. Thus, the 339 markers that exhibited homology to the human genome increased the density of markers on the horse-human comparative map. The resulting comparative map will facilitate the use of horse microsatellites as genetic markers for the identification of quantitative trait loci (QTL) that have been mapped on the human genome. In addition, although the in silico and linkage mapping data did not agree for the other 11 (5.4%) of the assigned 206 markers, these may represent new putative regions of horse-human synteny. 相似文献

3.

A human-horse comparative map based on equine BAC end sequences

Leeb T Vogl C Zhu B de Jong PJ Binns MM Chowdhary BP Scharfe M Jarek M Nordsiek G Schrader F Blöcker H 《Genomics》2006,87(6):772-776

In an effort to increase the density of sequence-based markers for the horse genome we generated 9473 BAC end sequences (BESs) from the CHORI-241 BAC library with an average read length of 677 bp. BLASTN searches with the BESs revealed 4036 meaningful hits (E 相似文献

4.

A new contribution to the integration of human and porcine genome maps: 623 new points of homology

Robic A Faraut T Iannuccelli N Lahbib-Mansais Y Cantegrel V Alexander L Milan D 《Cytogenetic and genome research》2003,102(1-4):100-108

In this study we examined homologies between 1,735 porcine microsatellites and human sequence. For 1,710 microsatellites we directly used the sequence flanking the repeat available in GenBank. For a set of 305 microsatellites, a BAC library was screened and end-sequencing provided 461 additional sequences. Altogether 2,171 porcine sequences were tentatively aligned with the sequence of the human genome using the fasta program. Human homologies were observed for 652 microsatellite loci and porcine chromosome assignments available for 623 microsatellites provide useful links in the human and pig comparative map. Moreover for 92 STS, a significant sequence similarity was detected using at least two sequences and in all cases corresponding human locations were consistent. The present study allowed the integration of anonymous markers and the porcine linkage map into the framework of the comparative data between human and porcine genomes (http://w3.toulouse.inra.fr/lgc/pig/msat/). Moreover all conserved syntenic segments were defined on human chromosomes. 相似文献

5.

NotI flanking sequences: a tool for gene discovery and verification of the human genome 总被引：1，自引：0，他引：1

Kutsenko AS Gizatullin RZ Al-Amin AN Wang F Kvasha SM Podowski RM Matushkin YG Gyanchandani A Muravenko OV Levitsky VG Kolchanov NA Protopopov AI Kashuba VI Kisselev LL Wasserman W Wahlestedt C Zabarovsky ER 《Nucleic acids research》2002,30(14):3163-3170

A set of 22 551 unique human NotI flanking sequences (16.2 Mb) was generated. More than 40% of the set had regions with significant similarity to known proteins and expressed sequences. The data demonstrate that regions flanking NotI sites are less likely to form nucleosomes efficiently and resemble promoter regions. The draft human genome sequence contained 55.7% of the NotI flanking sequences, Celera’s database contained matches to 57.2% of the clones and all public databases (including non-human and previously sequenced NotI flanks) matched 89.2% of the NotI flanking sequences (identity ≥90% over at least 50 bp, data from December 2001). The data suggest that the shotgun sequencing approach used to generate the draft human genome sequence resulted in a bias against cloning and sequencing of NotI flanks. A rough estimation (based primarily on chromosomes 21 and 22) is that the human genome contains 15 000–20 000 NotI sites, of which 6000–9000 are unmethylated in any particular cell. The results of the study suggest that the existing tools for computational determination of CpG islands fail to identify a significant fraction of functional CpG islands, and unmethylated DNA stretches with a high frequency of CpG dinucleotides can be found even in regions with low CG content. 相似文献

6.

Conserved sequences in both coding and 5'' flanking regions of mammalian opal suppressor tRNA genes. 总被引：3，自引：0，他引：3

下载免费PDF全文

K Pratt F C Eden K H You V A O''''Neill D Hatfield 《Nucleic acids research》1985,13(13):4765-4775

The rabbit genome encodes an opal suppressor tRNA gene. The coding region is strictly conserved between the rabbit gene and the corresponding gene in the human genome. The rabbit opal suppressor gene contains the consensus sequence in the 3' internal control region but like the human and chicken genes, the rabbit 5' internal control region contains two additional nucleotides. The 5' flanking sequences of the rabbit and the human opal suppressor genes contain extensive regions of homology. A subset of these homologies is also present 5' to the chicken opal suppressor gene. Both the rabbit and the human genomes also encode a pseudogene. That of the rabbit lacks the 3' half of the coding region. Neither pseudogene has homologous regions to the 5' flanking regions of the genes. The presence of 5' homologies flanking only the transcribed genes and not the pseudogenes suggests that these regions may be regulatory control elements specifically involved in the expression of the eukaryotic opal suppressor gene. Moreover the strict conservation of coding sequences indicates functional importance for the opal suppressor tRNA genes. 相似文献

7.

Fifty-four new gene-based canine microsatellite markers

Litt M Bestwick ML Winther MJ Jakobs PM 《The Journal of heredity》2005,96(7):843-846

Fifty-four new markers were developed to fill in gaps in the current map of canine microsatellites and to complement existing markers that may not be sufficiently informative in highly inbred canine pedigrees. Canine genes contained on the radiation hybrid map were used to obtain the sequence of the human homolog. A BLAST search versus the canine whole genome shotgun (wgs) sequence resource was used to obtain the sequence of the canine genomic contigs containing the homolog of the corresponding human gene. Canine sequences that contained microsatellites and mapped back to the correct location in the human genome were used to design primers for amplification of the microsatellites from canine genomic DNA. Heterozygosities of the markers were tested by genotyping grandparental DNAs obtained from the Nestle Purina Reference family DNA distribution center plus DNAs from unrelated Bouviers and Irish wolfhounds. Canine map positions of markers on the July 2004 freeze of the canine genome assembly were determined by in silico PCR or BLAST. 相似文献

8.

A strategy for finding regions of similarity in complete genome sequences 总被引：3，自引：2，他引：1

Vincens P; Buffat L; Andre C; Chevrolat JP; Boisvieux JF; Hazout S 《Bioinformatics (Oxford, England)》1998,14(8):715-725

MOTIVATION: Complete genomic sequences will become available in the future. New methods to deal with very large sequences (sizes beyond 100 kb) efficiently are required. One of the main aims of such work is to increase our understanding of genome organization and evolution. This requires studies of the locations of regions of similarity. RESULTS: We present here a new tool, ASSIRC ('Accelerated Search for SImilarity Regions in Chromosomes'), for finding regions of similarity in genomic sequences. The method involves three steps: (i) identification of short exact chains of fixed size, called 'seeds', common to both sequences, using hashing functions; (ii) extension of these seeds into putative regions of similarity by a 'random walk' procedure; (iii) final selection of regions of similarity by assessing alignments of the putative sequences. We used simulations to estimate the proportion of regions of similarity not detected for particular region sizes, base identity proportions and seed sizes. This approach can be tailored to the user's specifications. We looked for regions of similarity between two yeast chromosomes (V and IX). The efficiency of the approach was compared to those of conventional programs BLAST and FASTA, by assessing CPU time required and the regions of similarity found for the same data set. AVAILABILITY: Source programs are freely available at the following address: ftp://ftp.biologie.ens. fr/pub/molbio/assirc.tar.gz CONTACT: vincens@biologie.ens.fr, hazout@urbb.jussieu.fr 相似文献

9.

Characterization of RFLP probe sequences for gene discovery and SSR development in Sorghum bicolor (L.) Moench

Schloss J Mitchell E White M Kukatla R Bowers E Paterson H Kresovich S 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2002,105(6-7):912-920

In this study, we collected and analyzed DNA sequence data for 789 previously mapped RFLP probes from Sorghum bicolor (L.) Moench. DNA sequences, comprising 894 non-redundant contigs and end sequences, were searched against three GenBank databases, nucleotide (nt), protein (nr) and EST (dbEST), using BLAST algorithms. Matching ESTs were also searched against nt and nr. Translated DNA sequences were then searched against the conserved domain database (CDD) to determine if functional domains/motifs were congruent with the proteins identified in previous searches. More than half (500/894 or 56%) of the query sequences had significant matches in at least one of the GenBank searches. Overall, proteins identified for 148 sequences (17%) were consistent among all searches, of which 66 sequences (7%) contained congruent coding domains. The RFLP probe sequences were also evaluated for the presence of simple sequence repeats (SSRs) and 60 SSRs were developed and assayed in an array of sorghum germplasm comprising inbreds, landraces and wild relatives. Overall, these SSR loci had lower levels of polymorphism ( D = 0.46, averaged over 51 polymorphic loci) compared with sorghum SSRs that were isolated by library hybridization screens ( D = 0.69, averaged over 38 polymorphic loci). This result was probably due to the relatively small proportion of di-nucleotide repeat-containing markers (42% of the total SSR loci) obtained from the DNA sequence data. These di-nucleotide markers also contained shorter repeat motifs than those isolated from genomic libraries. Based on BLAST results, 24 SSRs (40%) were located within, or near, previously annotated or hypothetical genes. We determined the location of 19 of these SSRs relative to putative coding regions. In general, SSRs located in coding regions were less polymorphic ( D = 0.07, averaged over three loci) than those from gene flanking regions, UTRs and introns ( D = 0.49, averaged over 16 loci). The sequence information and SSR loci generated through this study will be valuable for application to sorghum genetics and improvement, including gene discovery, marker-assisted selection, diversity and pedigree analyses, comparative mapping and evolutionary genetic studies. 相似文献

10.

Matching nucleotide sequences of human antibodies with other known sequences

T T Wu 《Journal of theoretical biology》1988,131(2):231-234

From an evolutionary point of view, the complementarity-determining regions of antibodies are distinct from other proteins including the framework regions of antibodies. A search for identical nucleotide sequences of eighty-four 15 consecutive bp in the complementary-determining regions of human antibody heavy chains with other known sequences yielded four matches: two sequential 15-bp matches, or one 16-bp match, with the coding region of a sea-urchin testis histone H2b-2, one 15-bp match with the promotor region of a cauliflower mosaic virus inclusion body protein, and a 15-bp match with an intron between exons 1 and 2 of human factor IX. As a control, an identical search of eighty-four 15 consecutive bp in the framework regions of human antibody heavy chains yielded no matches with other sequences except those from other antibody framework regions. Since the currently available nucleotide sequence database used in the search consisted of about 1 x 10(7) bp, finding such matches in the complementarity-determining regions might not be random. 相似文献

11.

Highly informative nature of inter simple sequence repeat (ISSR) sequences amplified using tri- and tetra-nucleotide primers from DNA of cauliflower (Brassica oleracea var. botrytis L.).

B Bornet C Muller F Paulus M Branchard 《Génome》2002,45(5):890-896

Inter simple sequence repeat (ISSR) sequences as molecular markers can lead to the detection of polymorphism and also be a new approach to the study of SSR distribution and frequency. In this study, ISSR amplification with nonanchored primer was performed in closely related cauliflower lines. Fourty-four different amplified fragments were sequenced. Sequences of PCR products are delimited by the expected motifs and number of repeats, which validates the ISSR nonanchored primer amplification technique. DNA and amino acids homology search between internal sequences and databases (i) show that the majority of the internal regions of ISSR had homologies with known sequences, mainly with genes coding for proteins implicated in DNA interaction or gene expression, which reflected the significance of amplified ISSR sequences and (ii) display long and numerous homologies with the Arabidopsis thaliana genome. ISSR amplifications revealed a high conservation of these sequences between Arabidopsis thaliana and Brassica oleracea var. botrytis. Thirty-four of the 44 ISSRs had one or several perfect or imperfect internal microsatellites. Such distribution indicates the presence in genomes of highly concentrated regions of SSR, or "SSR hot spots." Among the four nonanchored primers used in this study, trinucleotide repeats, and especially (CAA)5, were the most powerful primers for ISSR amplifications regarding the number of amplified bands, level of polymorphism, and their nature. 相似文献

12.

Heterogeneous Nature and Distribution of Interruptions in Dinucleotides May Indicate the Existence of Biased Substitutions Underlying Microsatellite Evolution 总被引：1，自引：1，他引：0

Varela MA Sanmiguel R Gonzalez-Tizon A Martinez-Lage A 《Journal of molecular evolution》2008,66(6):575-580

Some aspects of microsatellite evolution, such as the role of base substitutions, are far from being fully understood. To examine the significance of base substitutions underlying the evolution of microsatellites we explored the nature and the distribution of interruptions in dinucleotide repeats from the human genome. The frequencies that we inferred in the repetitive sequences were statistically different from the frequencies observed in other noncoding sequences. Additionally, we detected that the interruptions tended to be towards the ends of the microsatellites and 5'-3' asymmetry. In all the estimates nucleotides forming the same repetitive motif seem to be affected by different base substitution rates in AC and AG. This tendency itself could generate patterning and similarity in flanking sequences and reconcile these phenomena with the high mutation rate found in flanking sequences without invoking convergent evolution. Nevertheless, our data suggest that there is a regional bias in the substitution pattern of microsatellites. The accumulation of random substitutions alone cannot explain the heterogeneity and the asymmetry of interruptions found in this study or the relative frequency of different compound microsatellites in the human genome. Therefore, we cannot rule out the possibility of a mutational bias leading to convergent or parallel evolution in flanking sequences. 相似文献

13.

Characterisation of IS901 integration sites in the Mycobacterium avium genome

Inglis NF Stevenson K Heaslip DG Sharp JM 《FEMS microbiology letters》2003,221(1):39-47

Data are presented on the identification and characterisation of 17 chromosomal integration loci of the insertion element IS901 in the Mycobacterium avium (cervine strain JD88/118) genome. Thirteen of these integration loci have been mapped to their corresponding positions on the M. avium strain 104 (an IS901(-) strain) genome (The Institute for Genome Research (TIGR) unfinished genome-sequencing project). Sequence data for both upstream and downstream sequence flanking regions were obtained for 12 insertion loci, while upstream sequence was obtained for five others. A consensus IS901 insertion target sequence compiled from all 17 integration sites was in broad agreement with earlier reports that were based on only two such loci. Analysis of IS901 integration site flanking sequences revealed that, like IS900 in M. avium subspecies paratuberculosis, IS901 inserts preferentially between a putative ribosome-binding sequence (RBS) and the translational start codon of an open reading frame (ORF). In BLAST X and BLAST P searches of the GenBank database, these ORFs were shown to share significant homologies with a number of other prokaryotic genes. 相似文献

14.

Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. 总被引：10，自引：2，他引：10

下载免费PDF全文

M Borodovsky K E Rudd E V Koonin 《Nucleic acids research》1994,22(22):4756-4767

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. 相似文献

15.

A protocol for rapid isolation of flanking regions from short known sequences

Haijun Meng Juan Xu Wenwu Guo Qiang Xu Dingli Li Xiuxin Deng 《Plant Molecular Biology Reporter》2005,23(1):75-75

We present a method for rapid isolation of flanking regions from amplified fragment length polymorphism (AFLP) fragments based on thermal asymmetric interlaced (TAIL)-PCR, in which one sequence-specific primer and one degenerate primer derived from an conserved motif found in homologies of the known sequence were used. The final result showed this to be a simple and efficient strategy, especially for short known sequences containing coding regions. Moreover this protocol was especially useful for species with little available genome information such as Hongkong Kumquat (Fortunella hindsii), since most of their genes have known homologies in other species such asArabidopsis and rice. 相似文献

16.

Evolutionary dynamics of multilocus microsatellite arrangements in the genome of the butterfly Bicyclus anynana, with implications for other Lepidoptera

Van't Hof AE Brakefield PM Saccheri IJ Zwaan BJ 《Heredity》2007,98(5):320-328

The sequences flanking microsatellites isolated from the butterfly Bicyclus anynana display high levels of similarity among different loci. We examined sequence data for evidence of the two mechanisms most likely to generate these similarities, namely recombination mediated events, such as unequal crossing over or gene conversion and through transposition of mobile elements (MEs). Many sequences contained tandemly arranged microsatellites, lending support to recombination as the multiplication mechanism. There is, however, also support for ME-mediated multiplication of microsatellites and their flanking sequences. Homology with a known Lepidopteran ME was found in B. anynana microsatellite regions, and polymorphic microsatellite markers with partial similarities in their flanking sequences were passed on to the next generation independently, indicating that they are not linked. Therefore, the rise of these similarities appears to be mediated through both processes, either as an interaction between the two, or by each being responsible for part of the observations. A large proportion of microsatellites embedded in repetitive DNA is representative for most studied butterflies and moths, and a BLAST survey of the B. anynana sequences revealed four short microsatellite-associated sequences that were present in many species of Lepidoptera. The similarities usually start to deviate beyond these sequences, which suggests that they define the extremes of a repeated unit. Further study of these conserved sequences may help to understand the mechanism underlying the multiplication events, and answer the question of why these redundancies are predominantly found in this insect group. 相似文献

17.

Identification of 10 882 porcine microsatellite sequences and virtual mapping of 4528 of these sequences

Karlskov-Mortensen P Hu ZL Gorodkin J Reecy JM Fredholm M 《Animal genetics》2007,38(4):401-405

A total of 10 882 porcine microsatellite repeats were identified in genomic shotgun sequences from the Sino-Danish Pig Genome Sequencing Consortium (http://www.piggenome.dk). Of these, 4528 microsatellites were placed on a pig-human comparative map by blast analysis of porcine sequences against the human genome (blast cut-off threshold =1 x 10(-5)). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html. These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude. They are a new resource of microsatellite sequences for generating markers to be used in linkage studies and in fine mapping and positional cloning of quantitative trait loci. 相似文献

18.

Assignment of orthologous relationships among mammalian alpha-globin genes by examining flanking regions reveals a rapid rate of evolution 总被引：1，自引：0，他引：1

Hardison RC; Gelinas RE 《Molecular biology and evolution》1986,3(3):243-261

In order to study the relationships among mammalian alpha-globin genes, we have determined the sequence of the 3' flanking region of the human alpha 1 globin gene and have made pairwise comparisons between sequenced alpha-globin genes. The flanking regions were examined in detail because sequence matches in these regions could be interpreted with the least complication from the gene duplications and conversions that have occurred frequently in mammalian alpha-like globin gene clusters. We found good matches between the flanking regions of human alpha 1 and rabbit alpha 1, human psi alpha 1 and goat I alpha, human alpha 2 and goat II alpha, and horse alpha 1 and goat II alpha. These matches were used to align the alpha-globin genes in gene clusters from different mammals. This alignment shows that genes at equivalent positions in the gene clusters of different mammals can be functional or nonfunctional, depending on whether they corrected against a functional alpha-globin gene in recent evolutionary history. The number of alpha-globin genes (including pseudogenes) appears to differ among species, although highly divergent pseudogenes may not have been detected in all species examined. Although matching sequences could be found in interspecies comparisons of the flanking regions of alpha- globin genes, these matches are not as extensive as those found in the flanking regions of mammalian beta-like globin genes. This observation suggests that the noncoding sequences in the mammalian alpha-globin gene clusters are evolving at a faster rate than those in the beta-like globin gene clusters. The proposed faster rate of evolution fits with the poor conservation of the genetic linkage map around alpha-globin gene clusters when compared to that of the beta-like globin gene clusters. Analysis of the 3' flanking regions of alpha-globin genes has revealed a conserved sequence approximately 100-150 bp 3' to the polyadenylation site; this sequence may be involved in the expression or regulation of alpha-globin genes. 相似文献

19.

Characterization of microsatellites and repetitive flanking sequences (ReFS) from the topmouth culter (Culter alburnus Basilewsky)

《Biochemical Systematics and Ecology》2015

The topmouth culter (Culter alburnus) is an economically important freshwater fish in China. We obtained 159 microsatellite containing sequences (MCSs) from genomic DNA in this species enriched by (CAA)₈ and (GAA) ₈ probes. Careful examination of these sequences revealed the existence of cryptic repeated elements on presumed unique flanking regions. These cryptic elements can be grouped into three families, with the MCSs of the each family sharing regions of similarity ranging between 40 and 130 bp in length, with 96% sequence similarity. Repbase scans revealed that a large proportion of the cryptic repetitive DNA was identified as transposable elements (TEs). Complex patterns were apparent among these sequences. In most (89.2%), a single TE was identified in an MCS, in three instances, the same TE was observed twice in the same MCS. Some MCS have two or even four different TEs. We isolated nine polymorphic microsatellite loci from sequences with no matches to TEs. In a sample of 30 cultured C. alburnus, we found that the average allele number was 8.1 per locus (range = 4–17), with polymorphism informative content ranging from 0.364 to 0.898. These microsatellites can be used to study the population genetic diversity of this species. 相似文献

20.

Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation

Stevens FJ 《Journal of molecular recognition : JMR》2005,18(2):139-149

A substantial fraction of protein sequences derived from genomic analyses is currently classified as representing 'hypothetical proteins of unknown function'. In part, this reflects the limitations of methods for comparison of sequences with very low identity. We evaluated the effectiveness of a Psi-BLAST search strategy to identify proteins of similar fold at low sequence identity. Psi-BLAST searches for structurally characterized low-sequence-identity matches were carried out on a set of over 300 proteins of known structure. Searches were conducted in NCBI's non-redundant database and were limited to three rounds. Some 614 potential homologs with 25% or lower sequence identity to 166 members of the search set were obtained. Disregarding the expect value, level of sequence identity and span of alignment, correspondence of fold between the target and potential homolog was found in more than 95% of the Psi-BLAST matches. Restrictions on expect value or span of alignment improved the false positive rate at the expense of eliminating many true homologs. Approximately three-quarters of the putative homologs obtained by three rounds of Psi-BLAST revealed no significant sequence similarity to the target protein upon direct sequence comparison by BLAST, and therefore could not be found by a conventional search. Although three rounds of Psi-BLAST identified many more homologs than a standard BLAST search, most homologs were undetected. It appears that more than 80% of all homologs to a target protein may be characterized by a lack of significant sequence similarity. We suggest that conservative use of Psi-BLAST has the potential to propose experimentally testable functions for the majority of proteins currently annotated as 'hypothetical proteins of unknown function'. 相似文献