首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The complement component C4 genes of Old World primates exhibit a long/short dichotomous size variation, except that chimpanzee and gorilla only contain short C4 genes. In human it has been shown that the long C4 gene is attributed to the integration of an endogenous retrovirus, HERV-K(C4), into intron 9. This 6.36 kilobase retroviral element is absent in short C4 genes. Here it is shown that the homologous endogenous retrovirus, ERV-K(C4), is present precisely at the same position in the long C4 gene of orangutan and African green monkey. Determination of the short C4 gene intron 9 sequences from human, three apes, two Old World monkeys, and a New World monkey allowed the establishment of consistent phylogenetic trees for primates, which favors a chimpanzee-gorilla clade. The 5 long terminal repeats (LTR) and 3 LTR of ERV-K(C4) in long C4 genes of human, orangutan, and African green monkey have similar sequence divergence values of 9.1%–10.5%. These values are more than five-fold higher than the sequence divergence of the homologous intron 9 sequences between the long and short C4 genes in higher primates. The latter is probably a result of homogenization or concerted evolution. We suggest that the 5 LTR and 3 LTR of an endogenous retrovirus can serve as a reliable reference point or a molecular clock for studies of gene duplication and gene evolution. This is because the 5/3 LTR sequences were identical at the time of retroviral integration and evolved independently of each other afterwards. Our data provides strong evidence for the short C4 gene being the ancestral form in primates, trans-species evolution, and the slow-down phenomenon of the sequence divergence in great apes.The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the accession numbers L38796-L38807  相似文献   

2.
We have generated a mouse x human heterohybridoma that contains a single copy of chromosome 14 and, thus, a haploid set of Ig VH genes. This cell line was used to investigate the germ-line content and nucleotide sequences of members of the VH4 gene family in a polymerase chain reaction-based approach. The analysis of 58 full-length sequences revealed the presence of 12 different germ-line VH4 genes, each of which is potentially functional. These germ-line VH4 genes were compared with the nucleotide sequences of published VH4 genes. Three VH4 genes were 100% identical to previously published sequences and belong to a group of VH4 genes that are strongly conserved and highly prevalent in the human population. Three VH4 genes in our collection displayed greater than 99.3% sequence identity with reported germ-line VH4 sequences and likely represent allelic counterparts of these genes. Six genes displayed less than 97.2% sequence identity with published VH4 genes and were identified as novel members of the human VH4 gene family or more distantly related alleles of known VH4 genes. Collectively, these data suggest that, overall, the human VH4 gene family may be more diverse than hitherto assumed, whereas a number of individual members are nonpolymorphic and extremely well conserved.  相似文献   

3.
Summary A cDNA clone in pBR322 that cross-hybridizes with a mouse carbonic anhydrase form II (CAII) probe has been sequenced and identified as mouse carbonic anhydrase form I (CAI). The 1224-base-pair clone encodes the entire 260-amino-acid protein and appears to contain an Alu-like element in the 3 untranslated region. The deduced amino acid sequence exhibits 77% homology to human CAI and contains 17 of the 20 residues that are considered unique to and invariant for all mammalian CAI isozymes. The results of a detailed comparison of the nucleic acid sequences spanning the coding regions of mouse CAI and rabbit CAI have been used to calibrate an evolutionary clock for the carbonic anhydrases (CAs). These data have been applied to a comparison of the mouse CAI and CAII nucleic acid sequences to calculate the divergence time between the two genes. The divergence-time calculation provides the first estimation of the evolutionary relationship between CAs based entirely on nucleotide sequence comparison.  相似文献   

4.
5.
6.
To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multigene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multigene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.  相似文献   

7.
MOTIVATION: Discovery of host and pathogen genes expressed at the plant-pathogen interface often requires the construction of mixed libraries that contain sequences from both genomes. Sequence identification requires high-throughput and reliable classification of genome origin. When using single-pass cDNA sequences difficulties arise from the short sequence length, the lack of sufficient taxonomically relevant sequence data in public databases and ambiguous sequence homology between plant and pathogen genes. RESULTS: A novel method is described, which is independent of the availability of homologous genes and relies on subtle differences in codon usage between plant and fungal genes. We used support vector machines (SVMs) to identify the probable origin of sequences. SVMs were compared to several other machine learning techniques and to a probabilistic algorithm (PF-IND) for expressed sequence tag (EST) classification also based on codon bias differences. Our software (Eclat) has achieved a classification accuracy of 93.1% on a test set of 3217 EST sequences from Hordeum vulgare and Blumeria graminis, which is a significant improvement compared to PF-IND (prediction accuracy of 81.2% on the same test set). EST sequences with at least 50 nt of coding sequence can be classified using Eclat with high confidence. Eclat allows training of classifiers for any host-pathogen combination for which there are sufficient classified training sequences. AVAILABILITY: Eclat is freely available on the Internet (http://mips.gsf.de/proj/est) or on request as a standalone version. CONTACT: friedel@informatik.uni-muenchen.de.  相似文献   

8.
Intraepithelial lymphocytes (IELs) play a critical role in protective immune response to intestinal pathogens such as Eimeria, the etiologic agent of avian coccidiosis. A list of genes expressed by intestinal IELs of Eimeria-infected chickens was compiled using the expressed sequence tag (EST) strategy. The 14,409 ESTs consisted of 1851 clusters and 7595 singletons, which revealed 9446 unique genes in the data set. Comparison of the sequence data with chicken DNA sequences in GenBank identified 125 novel clones. This EST library will provide a valuable resource for profiling global gene expression in normal and pathogen-infected chickens and identifying additional unique immune-related genes.  相似文献   

9.
A total of 1000 expressed sequence tags (ESTs) corresponding to 760 unique sequence sets were identified using random sequencing of clones from a cDNA library constructed from mycelial RNA of Phytophthora infestans. A number of software programs, represented by a relational database and an analysis pipeline, were developed for the automated analysis and storage of the EST sequence data. A set of 419 nonredundant sequences, which correspond to a total of 632 ESTs (63.2%), were identified as showing significant matches to sequences deposited in public databases. A putative cellular identity and role was assigned to all 419 sequences. All major functional categories were represented by at least several ESTs. Four novel cDNAs containing sequences related to elicitins, a family of structurally related proteins that induce the hypersensitive response and condition avirulence of P. infestans on Nicotiana plants, were among the most notable genes identified. Two of these elicitin-like cDNAs were among the most abundant cDNAs examined. The set also contained several ESTs with high sequence similarity to unique plant genes.  相似文献   

10.
Phytophthora infestans is a devastating phytopathogenic oomycete that causes late blight on tomato and potato. Recent genome sequencing efforts of P. infestans and other Phytophthora species are generating vast amounts of sequence data providing opportunities to unlock the complex nature of pathogenesis. However, accurate annotation of Phytophthora genomes will be a significant challenge. Most of the information about gene structure in these species was gathered from a handful of genes resulting in significant limitations for development of ab initio gene-calling programs. In this study, we collected a total of 150 bioinformatically determined near full-length cDNA (FLcDNA) sequences of P. infestans that were predicted to contain full open reading frame sequences. We performed detailed computational analyses of these FLcDNA sequences to obtain a snapshot of P. infestans gene structure, gauge the degree of sequence conservation between P. infestans genes and those of Phytophthora sojae and Phytophthora ramorum, and identify patterns of gene conservation between P. infestans and various eukaryotes, particularly fungi, for which genome-wide translated protein sequences are available. These analyses helped us to define the structural characteristics of P. infestans genes using a validated data set. We also determined the degree of sequence conservation within the genus Phytophthora and identified a set of fast evolving genes. Finally, we identified a set of genes that are shared between Phytophthora and fungal phytopathogens but absent in animal fungal pathogens. These results confirm that plant pathogenic oomycetes and fungi share virulence components, and suggest that eukaryotic microbial pathogens that share similar lifestyles also share a similar set of genes independently of their phylogenetic relatedness.  相似文献   

11.
Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.  相似文献   

12.
Human prolactin. cDNA structural analysis and evolutionary comparisons   总被引:33,自引:0,他引:33  
Prolactin (Prl), growth hormone, and chorionic sommatomammotropin form a set (the "Prl set") of hormones which is thought to have evolved from a common ancestral gene. This assumption is based on several lines of evidence: overlap in their biological and immunological properties, similarities in their amino acid sequences, and homologies in the nucleic acid sequences of their structural genes. In the current study we report the cloning, amplification in bacteria, and sequence analysis of DNA complementary to Prl mRNA isolated from human pituitary Prl-secreting adenomas. The cloned DNA contains 914 bases, which includes the entire coding sequence of human prePrl as well as portions of the 5- and 3'-untranslated regions of the mRNA. The amino acid sequence predicted by our data differs from a previously reported amino acid sequence in 8 positions. With the results of this study we can now compare in one species the nucleotide sequences of the structural gene coding for each of the hormones of the Prl set. The sequence divergence at replacement sites is used to establish an evolutionary clock for the Prl set of genes. Using this clock, we postulate that the chromosomal segregation of human Prl and human growth hormone occurred about 392 million years ago and that growth hormone and chorionic sommatomammotropin underwent an intrachromosomal recombination within the last 10 million years.  相似文献   

13.
We describe TRiFLe, a freely accessible computer program that generates theoretical terminal restriction fragments (T-RFs) from any user-supplied sequence set tailored to a particular group of organisms, sequences from clone libraries, or sequences from specific genes. The program allows a rapid identification of the most polymorphic enzymes, creates a collection of T-RFs for the data set, and can potentially identify specific T-RFs in T-RF length polymorphism (T-RFLP) patterns by comparing theoretical and experimental results. TRiFLE was used for analyzing T-RFLP data generated for the amoA and pmoA genes. The peaks identified in the T-RFLP patterns show an overlap of ammonia- and methane-oxidizing bacteria in the metalimnion of a subtropical lake.  相似文献   

14.
The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.  相似文献   

15.
Operational taxonomic units (OTUs) are conventionally defined at a phylogenetic distance (0.03—species, 0.05—genus, 0.10—family) based on full-length 16S rRNA gene sequences. However, partial sequences (700 bp or shorter) have been used in most studies. This discord may affect analysis of diversity and species richness because sequence divergence is not distributed evenly along the 16S rRNA gene. In this study, we compared a set each of bacterial and archaeal 16S rRNA gene sequences of nearly full length with multiple sets of different partial 16S rRNA gene sequences derived therefrom (approximately 440-700 bp), at conventional and alternative distance levels. Our objective was to identify partial sequence region(s) and distance level(s) that allow more accurate phylogenetic analysis of partial 16S rRNA genes. Our results showed that no partial sequence region could estimate OTU richness or define OTUs as reliably as nearly full-length genes. However, the V1-V4 regions can provide more accurate estimates than others. For analysis of archaea, we recommend the V1-V3 and the V4-V7 regions and clustering of species-level OTUs at 0.03 and 0.02 distances, respectively. For analysis of bacteria, the V1-V3 and the V1-V4 regions should be targeted, with species-level OTUs being clustered at 0.04 distance in both cases.  相似文献   

16.
This paper reports a novel symbol-to-signal mapping for DNA sequences, based on the concept of categorical periodograms. A categorical periodogram is a numeric sequence with the n-th element of the sequence indicating the number of occurrences of cycles with period n in it. The period of the cycle is defined as the number of intervening events plus one. Spectral analysis studies have been conducted on Cumulative Categorical Periodogram (CCP) of 10 genes from the data set of Burset and Guigo. It is observed that the spectral signatures in CCP are functionally equivalent to the established N/3 peak in the spectrum of indicator sequences of genomes. Being a single sequence compared to four sequences in the case of indicator sequence representation, the method is claimed to be functionally equivalent, but computationally better for identification of gene coding regions in sequences.  相似文献   

17.
18.
19.
    
-Crystallin is the major and most abundant lens protein present in the eye lens of lower vertebrates such as amphibian and piscine species. To facilitate structural characterization of-crystallins isolated from the lens of the bullfrog (Rana catesbeiana), a cDNA mixture was synthesized from the poly(A)+mRNA isolated from fresh eye lenses. cDNA encoding-crystallin was then amplified using polymerase chain reaction (PCR) based on two primers designed according to the relatively conserved N- and C-terminal sequences of known-crystallins from teleostean fishes. PCR-amplified product corresponding to-crystallin isoforms was obtained, which was then subcloned in pUC18 vector and transformed intoEscherichia coli strain JM109. Plasmids containing amplified-crystallin cDNAs were purified and prepared for nucleotide sequencing by the dideoxynucleotide chain-termination method. Sequencing several clones containing DNA inserts of about 0.54 kb revealed the presence of two isoforms with an open reading frame of 534 base pairs, covering two-crystallins each with a deduced protein sequence of 177 amino acids including the translation-initiating methionine. These-crystallins of pI 6.364 and 6.366 contain a low-methionine content of 2.81%, in contrast to 11–16% obtained for those-crystallins with high-methionine content from most teleostean lenses. Pairwise sequence comparison of bullfrog-crystallins with those published sequences of-crystallins from carp, shark,Xenopus and anotherRana frog, bovine, and human lenses indicates that there is only 46–63% sequence similarity among these species, revealing that amphibians possess a very complex and heterogeneous group of-crystallins even from closely related species ofRana frogs. The sequence analysis and comparison of various isoforms of the frog-crystallin family provide a firm basis for identifying these lens proteins as members of a multigene family more complex than that reported for mammalian-crystallins.  相似文献   

20.
Gene structure conservation aids similarity based gene prediction   总被引:4,自引:1,他引:3       下载免费PDF全文
One of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches. Projector employs similarity information at the genomic DNA level by directly using known genes annotated on one DNA sequence to predict the corresponding related genes on another DNA sequence. It therefore makes explicit use of the conservation of the exon–intron structure between two related genes in addition to the similarity of their encoded amino acid sequences. We evaluate the performance of Projector by comparing it with the program Genewise on a test set of 491 pairs of independently confirmed mouse and human genes. It is more accurate than Genewise for genes whose proteins are <80% identical, and is suitable for use in a combined gene prediction system where other methods identify well conserved and non-conserved genes, and pseudogenes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号