首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for func- tional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase Ⅱ. Based on the results of this search, we discuss the high variability of U7 snRNAs in both se- quence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short se- quence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.  相似文献   

2.
3.
The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coil genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.  相似文献   

4.
Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbal- anced data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general im- balanced datasets.  相似文献   

5.
6.
7.
Two keratin-like proteins of 64 and 55 ku were purified from suspension cells of Caucus carota L, and their partial amino acid sequences were determined. The homological analysis showed that the sequence from the 64 ku protein was highly homological to p-glucosidase, and that from the 55 ku protein had no significant homologue in GenBank. Using conservative sequence of animal IF proteins as primer, we cloned a cDNA fragment from Daucus carota L. Southern blot and Northern blot results indicated that this cDNA fragment was a single copy gene and expressed both in suspension cells and leaves. Homological analysis revealed that it had moderate homology to a variety of a-helical proteins. Our results might shed more light on molecular characterization of IF existence in higher plant.  相似文献   

8.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned se- quences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitu- tion matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.  相似文献   

9.
Whole genome sequencing of buffalo is yet to be completed,and in the near future it may not be possible to identify an exome(coding region of genome) through bioinformatics for designing probes to capture it.In the present study,we employed in solution hybridization to sequence tissue specific temporal exomes(TST exome) in buffalo.We utilized cDNA prepared from buffalo muscle tissue as a probe to capture TST exomes from the buffalo genome.This resulted in a prominent reduction of repeat sequences(up to 40%) and an enrichment of coding sequences(up to 60%).Enriched targets were sequenced on a 454 pyro-sequencing platform,generating 101,244 reads containing 24,127,779 high quality bases.The data revealed 40,100 variations,of which 403 were indels and 39,218 SNPs containing 195 nonsynonymous candidate SNPs in protein-coding regions.The study has indicated that 80% of the total genes identified from capture data were expressed in muscle tissue.The present study is the first of its kind to sequence TST exomes captured by use of cDNA molecules for SNPs found in the coding region without any prior sequence information of targeted molecules.  相似文献   

10.
Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5′-end to the 3′-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.  相似文献   

11.
U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.  相似文献   

12.
The nucleotide sequence of Physarum polycephalum U4 snRNA*** was determined and compared to published U4 snRNA sequences. The primary structure of P polycephalum U4 snRNA is closer to that of plants and animals than to that of fungi. But, both fungi and P polycephalum U4 snRNAs are missing the 3' terminal hairpin and this may be a common feature of lower eucaryote U4 snRNAs. We found that the secondary structure model we previously proposed for 'free' U4 snRNA is compatible with the various U4 snRNA sequences published. The possibility to form this tetrahelix structure is preserved by several compensatory base substitutions and by compensatory nucleotide insertions and deletions. According to this finding, association between U4 and U6 snRNAs implies the disruption of 2 internal helical structures of U4 snRNA. One has a very low free energy, but the other, which represents one-half of the helical region of the 5' hairpin, requires 4 to 5 kcal to be open. The remaining part of the 5' hairpin is maintained in the U4/U6 complex and we observed the conservation, in all U4 snRNAs studied, of a U bulge residue at the limit between the helical region which has to be melted and that which is maintained. The 3' domain of U4 snRNA is less conserved in both size and primary structure than the 5' domain; its structure is also more compact in the RNA in solution. In this domain, only the Sm binding site and the presence of a bulge nucleotide in the hairpin on the 5' side of the Sm site are conserved throughout evolution.  相似文献   

13.
The spliceosome is a large, dynamic ribonuclear protein complex, required for the removal of intron sequences from newly synthesized eukaryotic RNAs. The spliceosome contains five essential small nuclear RNAs (snRNAs): U1, U2, U4, U5, and U6. Phylogenetic comparisons of snRNAs from protists to mammals have long demonstrated remarkable conservation in both primary sequence and secondary structure. In contrast, the snRNAs of the hemiascomycetous yeast Saccharomyces cerevisiae have highly unusual features that set them apart from the snRNAs of other eukaryotes. With an emphasis on the pathogenic yeast Candida albicans, we have now identified and compared snRNAs from newly sequenced yeast genomes, providing a perspective on spliceosome evolution within the hemiascomycetes. In addition to tracing the origins of previously identified snRNA variations present in Saccharomyces cerevisiae, we have found numerous unexpected changes occurring throughout the hemiascomycetous lineages. Our observations reveal interesting examples of RNA and protein coevolution, giving rise to altered interaction domains, losses of deeply conserved snRNA-binding proteins, and unique snRNA sequence changes within the catalytic center of the spliceosome. These same yeast lineages have experienced exceptionally high rates of intron loss, such that modern hemiascomycetous genomes contain introns in only approximately 5% of their genes. Also, the splice site sequences of those introns that remain adhere to an unusually strict consensus. Some of the snRNA variations we observe may thus reflect the altered intron landscape with which the hemiascomycetous spliceosome must contend.  相似文献   

14.
Splicing of U12-dependent introns requires the function of U11, U12, U6atac, U4atac, and U5 snRNAs. Recent studies have suggested that U6atac and U12 snRNAs interact extensively with each other, as well as with the pre-mRNA by Watson-Crick base pairing. The overall structure and many of the sequences are very similar to the highly conserved analogous regions of U6 and U2 snRNAs. We have identified the homologs of U6atac and U12 snRNAs in the plant Arabidopsis thaliana. These snRNAs are significantly diverged from human, showing overall identities of 65% for U6atac and 55% for U12 snRNA. However, there is almost complete conservation of the sequences and structures that are implicated in splicing. The sequence of plant U6atac snRNA shows complete conservation of the nucleotides that base pair to the 5' splice site sequences of U12-dependent introns in human. The immediately adjacent AGAGA sequence, which is found in human U6atac and all U6 snRNAs, is also conserved. High conservation is also observed in the sequences of U6atac and U12 that are believed to base pair with each other. The intramolecular U6atac stem-loop structure immediately adjacent to the U12 interaction region differs from the human sequence in 9 out of 21 positions. Most of these differences are in base pairing regions with compensatory changes occurring across the stem. To show that this stem-loop was functional, it was transplanted into a human suppressor U6atac snRNA expression construct. This chimeric snRNA was inactive in vivo but could be rescued by coexpression of a U4atac snRNA expression construct containing compensatory mutations that restored base pairing to the chimeric U6atac snRNA. These data show that base pairing of U4atac snRNA to U6atac snRNA has a required role in vivo and that the plant U6atac intramolecular stem-loop is the functional analog of the human sequence.  相似文献   

15.
Differences observed between plant and animal pre-mRNA splicing may be the result of primary or secondary structure differences in small nuclear RNAs (snRNAs). A cDNA library of pea snRNAs was constructed from anti-trimethylguanosine (m3(2,2,7)G immunoprecipitated pea nuclear RNA. The cDNA library was screened using oligo-deoxyribonucleotide probes specific for the U1, U2, U4 and U5 snRNAs. cDNA clones representing U1, U2, U4 and U5 snRNAs expressed in seedling tissue have been isolated and sequenced. Comparison of the pea snRNA variants with other organisms suggest that functionally important primary sequences are conserved phylogenetically even though the overall sequences have diverged substantially. Structural variations in U1 snRNA occur in regions required for U1-specific protein binding. In light of this sequence analysis, it is clear that the dicot snRNA variants do not differ in sequences implicated in RNA:RNA interactions with pre-mRNA. Instead, sequence differences occur in regions implicated in the binding of small ribonucleoproteins (snRNPs) to snRNAs and may result in the formation of unique snRNP particles.  相似文献   

16.
A phage containing two sequences homologous to U1 snRNA was isolated from a Drosophila melanogaster genomic library, and identified with a previously cloned D. melanogaster U1 snRNA gene. DNA sequence analysis showed that complete and truncated U1 snRNA genes are present, both of which have base substitutions relative to U1 snRNA. These genes show conservation of 5' and 3' flanking regions relative to other U1 and U2 snRNA genes of Drosophila. Intramolecular renaturation experiments and electron microscope mapping demonstrates that the two U1 snRNA sequences are present as inverted repeats about 2.7kb apart, separated by a smaller pair of inverted repeats of an unrelated sequence. These U1 snRNA sequences were located by in situ hybridization at 82E, and related sequences were found at 21D and 95C on the polytene chromosome map. The results are discussed with reference to the origin and function of snRNAs.  相似文献   

17.
Architecture of the U5 small nuclear RNA.   总被引:5,自引:1,他引:4       下载免费PDF全文
We have used comparative sequence analysis and deletion analysis to examine the secondary structure of the U5 small nuclear RNA (snRNA), an essential component of the pre-mRNA splicing apparatus. The secondary structure of Saccharomyces cerevisiae U5 snRNA was studied in detail, while sequences from six other fungal species were included in the phylogenetic analysis. Our results indicate that fungal U5 snRNAs, like their counterparts from other taxa, can be folded into a secondary structure characterized by a highly conserved stem-loop (stem-loop 1) that is flanked by a moderately conserved internal loop (internal loop 1). In addition, several of the fungal U5 snRNAs include a novel stem-loop structure (ca. 30 nucleotides) that is adjacent to stem-loop 1. By deletion analysis of the S. cerevisiae snRNA, we have demonstrated that the minimal U5 snRNA that can complement the lethal phenotype of a U5 gene disruption consists of (i) stem-loop 1, (ii) internal loop 1, (iii) a stem-closing internal loop 1, and (iv) the conserved Sm protein binding site. Remarkably, all essential, U5-specific primary sequence elements are encoded by a 39-nucleotide domain consisting of stem-loop 1 and internal loop 1. This domain must, therefore, contain all U5-specific sequences that are essential for splicing activity, including binding sites for U5-specific proteins.  相似文献   

18.
C Tschudi  S P Williams  E Ullu 《Gene》1990,91(1):71-77
The U2 small nuclear RNA (snRNA) of Trypanosoma brucei gambiense, a flagellated protozoon of the order Kinetoplastida, is 148 nucleotides (nt) long, and thus the smallest U2 snRNA identified so far. To examine the evolutionary conservation of this RNA among Kinetoplastida, we have cloned and sequenced the U2 genes from Trypanosoma congolense and Leishmania mexicana amazonensis, which are 145 and 141 nt in length, respectively. The sequences of the Kinetoplastida U2 snRNAs are essentially identical in the 5' half of the molecule. Surprisingly, the putative branch site recognition sequence of L. m. amazonensis U2 snRNA shows two nt changes when compared with the other two U2 snRNAs. The sequence of the 3' half of the Kinetoplastida U2 snRNAs is less conserved with T. congolense and L. m. amazonensis RNAs showing 23 and 35 nt sequence variations, respectively, when compared with the corresponding sequence of the T. b. gambiense U2 snRNA. Alignment of the flanking regions of the U2 genes revealed several elements which are conserved both in sequence and in position relative to the U2 coding region and which may function in the biosynthesis of U2 snRNAs. One upstream element specifically binds protein factor(s) present in T. brucei nuclear extracts.  相似文献   

19.
Maize U2 snRNAs: gene sequence and expression.   总被引:12,自引:8,他引:4       下载免费PDF全文
The complexity of plant U-type small nuclear ribonucleoprotein particles (UsnRNPs) may represent one level at which differences in splicing between animals and plants and between monocotyledonous and dicotyledonous plants could be effected. The maize (monocot.) U2snRNA multigene family consists of some 25 to 40 genes which from RNA blot and RNase protection analyses produce U2snRNAs varying in both size and sequence. The first 77 nucleotides of the maize U2-27 snRNA gene are identical to U2snRNA genes of Arabidopsis (dicot). Despite much lower sequence homology in the remaining 120 nucleotides the secondary structure of the RNA is conserved. The difference in splicing between monocot. and dicot. plants cannot be explained on the basis of sequence differences between monocot, and dicot. U2snRNAs in the region which may interact with intron branch point sequences.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号