首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Investigating extended regulatory regions of genomic DNA sequences.   总被引:2,自引:0,他引:2  
MOTIVATION: Despite the growing volume of data on primary nucleotide sequences, the regulatory regions remain a major puzzle with regard to their function. Numerous recognising programs considering a diversity of properties of regulatory regions have been developed. The system proposed here allows the specific contextual, conformational and physico-chemical properties to be revealed based on analysis of extended DNA regions. RESULTS: The Internet-accessible computer system RegScan, designed to analyse the extended regulatory regions of eukaryotic genes, has been developed. The computer system comprises the following software: (i) programs for classification dividing a set of promoters into TATA-containing and TATA-less promoters and promoters with and without CpG islands; (ii) programs for constructing (a) nucleotide frequency profiles, (b) sequence complexity profiles and (c) profiles of conformational and physico-chemical properties; (iii) the program for constructing the sets of degenerate oligonucleotide motifs of a specified length; and (iv) the program searching for and visualising repeats in nucleotide sequences. The system has allowed us to demonstrate the following characteristic patterns of vertebrate promoter regions: the TATA box region is flanked by regions with an increased G+C content and increased bending stiffness, the TATA box content is asymmetric and promoter regions are saturated with both direct and inverted repeats. AVAILABILITY: The computer system RegScan is available via the Internet at http://www.mgs.bionet.nsc. ru/Systems/RegScan, http://www.cbil.upenn.edu/mgs/systems/r egscan/.  相似文献   

3.
This paper describes a computer method that uses codon preference to help find protein coding regions in long DNA sequences. The method can distinguish between introns and exons and can help to detect sequencing errors.  相似文献   

4.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

5.
Computer analysis of DNA and protein sequences.   总被引:2,自引:0,他引:2  
Some recent trends in the development of theoretical methods for DNA and protein sequence analysis are reviewed, with particular emphasis on the design of new databases, motif searches, sequence alignment algorithms and applications of neural networks.  相似文献   

6.
P McCaldon  P Argos 《Proteins》1988,4(2):99-122
We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. All sequences in the Protein Identification Resource database were carefully classified by sensitive homology searches into superfamilies to obtain unbiased oligopeptide counts. The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies and to improve the more general methods for detecting protein coding regions within nucleotide sequences.  相似文献   

7.
A method of identification of significant conservative and variable regions in homologous protein sequences is presented. A set of aligned homologous sequences is divided into two groups consisting of m and n most related sequences. Each pair of sequences from different group is compared using unitary similarity matrix. The superposition of pairwise comparisons scanned by a window of 10 amino acid residues gives intergroup local variability profile (VP). Area S of the figure between the VP and its mean value line is compared with averaged area S(r) of 1000 VPs of artificial homologous protein families. The difference (S-S(r)) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-S(r))/sigma r. If OI greater than 2, the real VP extrema containing the surplus of area S-(S(r) + 2 sigma r) are cut off. The cut off stretches are likely to be significant conservative and variable regions. The significant conservative and variable regions of six homologous sequence families (phospholipases A2, cytochromes b, alpha-subunits of Na, K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural proteins the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different length L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

8.
Melon (Cucumis melo) satellite DNA consists of two components, Q and S, each with a buoyant density in CsCl of 1.707 g/ml, but differing by 9 degrees C in "melting" temperature. These physical properties appear to be in contradiction, since both depend on G + C content. In order to resolve this anomaly, base compositions were directly determined for isolated fractions. the low-"melting" component S contains 41.8% G + C, with 6% of C present as 5-methylcytosine, whereas Q DNA contains 54% G + C, with 41% of C methylated. Analyses of restriction site loss agreed well with the direct determinations of methylation and divergence, and indicated some clustering of methylated sites in Q DNA. Analysis of restricted main-band DNA by hydridization with RNA complementary to Q satellite DNA ("Southern transfer") showed satellite Q tandem arrays interspersed in DNA of main-band density. Sequence divergence and extent of methylation did not appear to depend on whether a repeat array was present as satellite or interspersed in main-band DNA. Hydridization in situ indicated considerable heterogeneity in the genomic proportion of the Q-DNA sequences in melon fruit nuclei, implying over- and under-representation consistent with extensive unequal recombination in satellite Q tandem arrays. The cucumber, Cucumis sativus, contains less than 8% as much Q-homologous DNA per genome as the melon, suggesting rapid evolutionary gain or loss of these tandem repeat sequences.  相似文献   

9.
We propose a new approach to study protein coding and non-coding regions in DNA sequences, by making use of two complementary statistical methods. The principal component analysis (PCA) is a graphical method to represent DNA sequences which are characterized by some quantitative parameters: it is a help to the intuition. The discriminating analysis (DA) is a quantitative method which permits to classify the DNA sequences. It leads to an evaluation of the first method and to a decision. The value of this approach has been confirmed since we also have found some results which had been described recently in the literature. Furthermore, this general methodology has permitted us to show the existence of parameters which identify the nucleic acid sequence functional domains, without having to make use of the properties of the genetic code.  相似文献   

10.
DNA sequences of promoter regions for rRNA operons rrnE and rrnA in E. coli.   总被引:45,自引:0,他引:45  
H A de Boer  S F Gilbert  M Nomura 《Cell》1979,17(1):201-209
  相似文献   

11.
We present an algorithm to detect protein sub-structural motifs from primary sequence. The input to the algorithm is a set of aligned multiple protein sequences. It uses wavelet transforms to decompose protein sequences represented numerically by different indices (such as polarity, accessible surface area or electron-ion integration potentials of the amino acids). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected to be observable from the wavelet spectrum. The decomposed signals are then up-sampled and similarity search techniques are used to identify similar regions across all the proteins at multiple scales. Results indicate that wavelet transform techniques are a promising approach for rapid motif detection.  相似文献   

12.

Background  

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.  相似文献   

13.
14.
15.
The DNA at human centromeric regions was characterized by using a repetitive sequence, 308, which localizes in situ exclusively to centromeres of all chromosomes. We previously noted that this sequence is enriched on chromosome 6 and has chromosome-specific organization on 6, 3, 7, 14, X, and Y. In addition to this basic organization, sequences homologous to 308 are polymorphic among normal individuals. The variants are transmitted in a Mendelian manner within a family. To determine the chromosome origin of the variants, we studied their linkage to markers of various chromosomes. Linkage analysis of one pedigree segregating two polymorphisms shows that the 2.6-kilobase (kb) BamHI and 2.6-kb TaqI fragments are linked to each other and to the HLA loci on chromosome 6. Data from another family shows that 2.8-kb TaqI, 4.0-kb TaqI, and 1.3-kb BamHI polymorphic fragments are linked and are probably near the Fy locus on chromosome 1. By dot blot analysis, we determined that the relative amount of these sequences in the genome is not measurably different between unrelated individuals. Thus, the polymorphisms represent changes in homologous 308 sequences on specific chromosomes and can be used as chromosome-specific markers. Linkage studies using polymorphisms of repeated sequences will be most useful within a kindred, especially from an inbred population, because polymorphic repeats of the same restriction size may be heterogeneous in origin.  相似文献   

16.
MOTIVATION: Pair-wise alignment of protein sequences and local similarity searches produce many false positives because of compositionally biased regions, also called low-complexity regions (LCRs), of amino acid residues. Masking and filtering such regions significantly improves the reliability of homology searches and, consequently, functional predictions. Most of the available algorithms are based on a statistical approach. We wished to investigate the structural properties of LCRs in biological sequences and develop an algorithm for filtering them. RESULTS: We present an algorithm for detecting and masking LCRs in protein sequences to improve the quality of database searches. We developed the algorithm based on the complexity analysis of subsequences delimited by a pair of identical, repeating subsequences. Given a protein sequence, the algorithm first computes the suffix tree of the sequence. It then collects repeating subsequences from the tree. Finally, the algorithm iteratively tests whether each subsequence delimited by a pair of repeating subsequences meets a given criteria. Test results with 1000 proteins from 20 families in Pfam show that the repeating subsequences are a good indicator for the low-complexity regions, and the algorithm based on such structural information strongly compete with others. AVAILABILITY: http://bioinfo.knu.ac.kr/research/CARD/ CONTACT: swshin@bioinfo.knu.ac.kr  相似文献   

17.
18.
A homogeneous region in a protein sequence is a set of contiguousresidues that share common features, concerning physsico-chemical,structural and mutational information. This paper presents amethod for identifying such homogeneous regions. From a profiledescribing a given type of biological information along thesequence, the algorithm allows the segmentation of the sequenceby optimizing a criterion characterized by two user-definedcontrol parameters: the ‘homogenizing degree’ ofthe regions and the ‘site neighbourhood’ size. Weapply the method to the envelope proteins of the human immunodeficiencyvirus HIV-1, for the identification of homogeneous regions ina hydrophobicity profile and the delineation of variable andconserved regions in a variability profile.  相似文献   

19.
This paper describes a comprehensive program for translating one or two DNA sequences into amino acid sequences. Written in FORTRAN, it was designed for maximum flexibility of use and easy maintenance, modification and portability. It has full comments throughout.  相似文献   

20.
He-T DNA is a complex set of repeated DNA sequences with sharply defined locations in the polytene chromosomes of Drosophila melanogaster. He-T sequences are found only in the chromocenter and in the terminal (telomere) band on each chromosome arm. Both of these regions appear to be heterochromatic and He-T sequences are never detected in the euchromatic arms of the chromosomes (Young et al. 1983). In the study reported here, in situ hybridization to metaphase chromosomes was used to study the association of He-T DNA with heterochromatic regions that are under-replicated in polytene chromosomes. Although the metaphase Y chromosome appears to be uniformly heterochromatic, He-T DNA hybridization is concentrated in the pericentric region of both normal and deleted Y chromosomes. He-T DNA hybridization is also concentrated in the pericentric regions of the autosomes. Much lower levels of He-T sequences were found in pericentric regions of normal X chromosomes; however compound X chromosomes, constructed by exchanges involving Y chromosomes, had large amounts of He-T DNA, presumably residual Y sequences. The apparent co-localization of He-T sequences with satellite DNAs in pericentric heterochromatin of metaphase chromosomes contrasts with the segregation of satellite DNA to alpha heterochromatin while He-T sequences hybridize to beta heterochromatin in polytene nuclei. This comparison suggests that satellite sequences do not exist as a single block within each chromosome but have interspersed regions of other sequences, including He-T DNA. If this is so, we assume that the satellite DNA blocks must associate during polytenization, leaving the interspersed sequences looped out to form beta heterochromatin. DNA from D. melanogaster has many restriction fragments with homology to He-T sequences. Some of these fragments are found only on the Y. Two of the repeated He-T family restriction fragments are found entirely on the short arm of the Y, predominantly in the pericentric region. Under conditions of moderate stringency, a subset of He-T DNA sequences cross-hybridizes with DNA from D. simulans and D. miranda. In each species, a large fraction of the cross-hybridizing sequences is on the Y chromosome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号