首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
Primary structure of thousands of genes is being determined in many laboratories worldwide. While it is relatively easy to analyse the coding region(s) of genes, it is usually hard to understand what is located in non-coding regions. A non-coding region may contain very valuable information about the mode of functioning of a given gene, e. g. promoters, enhancers, silencers etc. The regulatory function of these sequences is determined by their interaction with certain sequence-specific proteins, i. e. the presence of a certain DNA sequence in a non-coding region of a gene may suggest that the gene is regulated by a specific protein factor. This minireview summarizes recent data on most known eukaryotic sequence-specific DNA-binding protein factors, including their origin, DNA consensus, and their role in expression of corresponding genes.  相似文献   

6.
Cancer is a disease involving multi-step dynamic changes in the genome. However, studies on cancer genome so far have focused most heavily on protein-coding genes, and our knowledge on alterations of the functional non-coding sequences in cancer is largely absent. MicroRNAs (miRNA) are ~22 nt non-coding RNAs, which regulate gene expression in a sequence-specific manner via translational inhibition or mRNA degradation. Mounting evidence is showing that miRNAs may play important roles in tumor development, and a better understanding of their alteration in cancer genome and oncogenic property should contribute to the diagnosis and treatment of cancer.  相似文献   

7.
8.
We introduce a new approach in this article to distinguish protein-coding sequences from non-coding sequences utilizing a period-3, free energy signal that arises from the interactions of the 3′-terminal nucleotides of the 18S rRNA with mRNA. We extracted the special features of the amplitude and the phase of the period-3 signal in protein-coding regions, which is not found in non-coding regions, and used them to distinguish protein-coding sequences from non-coding sequences. We tested on all the experimental genes from Saccharomyces cerevisiae and Schizosaccharomyces pombe. The identification was consistent with the corresponding information from GenBank, and produced better performance compared to existing methods that use a period-3 signal. The primary tests on some fly, mouse and human genes suggests that our method is applicable to higher eukaryotic genes. The tests on pseudogenes indicated that most pseudogenes have no period-3 signal. Some exploration of the 3′-tail of 18S rRNA and pattern analysis of protein-coding sequences supported further our assumption that the 3′-tail of 18S rRNA has a role of synchronization throughout translation elongation process. This, in turn, can be utilized for the identification of protein-coding sequences.  相似文献   

9.
Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs. K. Sato and K. Morita contributed equally to this work.  相似文献   

10.
11.
12.
MOTIVATION: Several pattern discovery methods have been proposed to detect over-represented motifs in upstream sequences of co-regulated genes, and are for example used to predict cis-acting elements from clusters of co-expressed genes. The clusters to be analyzed are often noisy, containing a mixture of co-regulated and non-co-regulated genes. We propose a method to discriminate co-regulated from non-co-regulated genes on the basis of counts of pattern occurrences in their non-coding sequences. METHODS: String-based pattern discovery is combined with discriminant analysis to classify genes on the basis of putative regulatory motifs. RESULTS: The approach is evaluated by comparing the significance of patterns detected in annotated regulons (positive control), random gene selections (negative control) and high-throughput regulons (noisy data) from the yeast Saccharomyces cerevisiae. The classification is evaluated on the annotated regulons, and the robustness and rejection power is assessed with mixtures of co-regulated and random genes.  相似文献   

13.
14.
We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.  相似文献   

15.
The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human–mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.  相似文献   

16.
17.
Barry AE  Leliwa-Sytek A  Man K  Kasper JM  Hartl DL  Day KP 《Gene》2006,376(2):163-173
An analysis of the diversity of the aspartyl proteases of Plasmodium falciparum, known as plasmepsins (PMs), was completed in view of their possible role as drug targets. DNA sequence polymorphisms were identified in nine pm genes including their non-coding (introns and 5' flanking) sequences. All genes contained at least one single nucleotide polymorphism (SNP). Extensive microsatellite diversity was observed predominantly in non-coding sequences. All but one non-synonymous polymorphism (a conservative substitution) were mapped to the surface of the predicted protein, contradicting a possible role in enzymatic activity. The distribution of SNPs was found to be non-random among pm genes, with pm6 and pm10 having significantly higher SNP densities, suggesting they were under selection. For pm6 the majority of the SNPs were in introns and some of these may contribute to splice site variation. SNPs were found at a high density in both the coding and non-coding sequences of pm10. Recombination was important in generating additional diversity at this locus. Although direct selection for pm10 mutations could not be ruled out, the presence of balancing selection and a high density of SNPs in non-coding sequence led us to propose that another gene under selection may be influencing the diversity in the region. By sequencing short DNA tags in a 200 kb region flanking pm10 we show that a cluster of antigen genes, known to be under diversifying selection, may contribute to the observed diversity. We discuss the importance of diversity and local selection effects when choosing drug targets for intervention strategies.  相似文献   

18.
Contrary to the classical view, a large amount of non-coding DNA seems to be selectively constrained in Drosophila and other species. Here, using Drosophila miranda BAC sequences and the Drosophila pseudoobscura genome sequence, we aligned coding and non-coding sequences between D. pseudoobscura and D. miranda, and investigated their patterns of evolution. We found two patterns that have previously been observed in comparisons between Drosophila melanogaster and its relatives. First, there is a negative correlation between intron divergence and intron length, suggesting that longer non-coding sequences may contain more regulatory elements than shorter sequences. Our other main finding is a negative correlation between the rate of non-synonymous substitutions (d N) and codon usage bias (F op), showing that fast-evolving genes have a lower codon usage bias, consistent with strong positive selection interfering with weak selection for codon usage.  相似文献   

19.
20.
Arquès DG  Lacan J  Michel CJ 《Bio Systems》2002,66(1-2):73-92
A new statistical approach using functions based on the circular code classifies correctly more than 93% of bases in protein (coding) genes and non-coding genes of human sequences. Based on this statistical study, a research software called 'Analysis of Coding Genes' (ACG) has been developed for identifying protein genes in the genomes and for determining their frame. Furthermore, the software ACG also allows an evaluation of the length of protein genes, their position in the genome, their relative position between themselves, and the prediction of internal frames in protein genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号