首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: The completion of the Arabidopsis genome offers the first opportunity to analyze all of the membrane protein sequences of a plant. The majority of integral membrane proteins including transporters, channels, and pumps contain hydrophobic alpha-helices and can be selected based on TransMembrane Spanning (TMS) domain prediction. By clustering the predicted membrane proteins based on sequence, it is possible to sort the membrane proteins into families of known function, based on experimental evidence or homology, or unknown function. This provides a way to identify target sequences for future functional analysis. RESULTS: An automated approach was used to select potential membrane protein sequences from the set of all predicted proteins and cluster the sequences into related families. The recently completed sequence of Arabidopsis thaliana, a model plant, was analyzed. Of the 25,470 predicted protein sequences 4589 (18%) were identified as containing two or more membrane spanning domains. The membrane protein sequences clustered into 628 distinct families containing 3208 sequences. Of these, 211 families (1764 sequences) either contained proteins of known function or showed homology to proteins of known function in other species. However, 417 families (1444 sequences) contained only sequences with no known function and no homology to proteins of known function. In addition, 1381 sequences did not cluster with any family and no function could be assigned to 1337 of these.  相似文献   

2.

Background  

When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences.  相似文献   

3.
A catalogue of splice junction sequences.   总被引:763,自引:139,他引:624       下载免费PDF全文
Splice junction sequences from a large number of nuclear and viral genes encoding protein have been collected. The sequence CAAG/GTAGAGT was found to be a consensus of 139 exon-intron boundaries (or donor sequences) and (TC)nNCTAG/G was found to be a consensus of 130 intron-exon boundaries (or acceptor sequences). The possible role of splice junction sequences as signals for processing is discussed.  相似文献   

4.
We consider the effects of fully or partially random sequences on the estimation of four-taxon phylogenies. Fully or partially random sequences occur when whole subsets of sequences or some sites for subsets of sequences are independent of sequence data for the other taxa. Random sequences can be a consequence of misalignment or because sites evolve at very fast rates in some portions of a tree, a situation that occurs especially in analyses involving deep divergence times. One might reasonably speculate that random sites will only add noise to the estimation of a phylogeny. We show that in the case that a random sequence is added to a three-taxa alignment, it is more likely to be a neighbor of the sequence corresponding to the longest branch in the three-taxon tree. Surprisingly, when only about half of the sites show randomness, a long-branch-repels form of small sample bias occurs, and when a minority of sites show randomness this becomes a long-branch-attraction bias again. The most serious bias, one that does not vanish with increasing sequence length, occurs when more than one sequence is partially random. If there is a large amount of overlap in the random sites for two sequences, those two sequences will be attracted to each other; otherwise, they will repel each other. Random sequences or sites can, therefore, cause complicated biases in phylogenetic inference. We suggest performing analyses with and without potentially saturated sequences and/or misaligned sites, to check that these biases are not affecting the inferred branching pattern.[Reviewing Editor: Dr. J. Rasmus Nielson]  相似文献   

5.
Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.  相似文献   

6.
DNA condensation with polyamines. II. Electron microscopic studies   总被引:24,自引:0,他引:24  
Approximately 75% of the wheat and rye genomes consist of repeated sequence DNA. Three-quarters of the non-repeated or few copy sequences in wheat are less than 1000 base-pairs long, whilst in rye approximately half of the non-repeated or few copy sequences are in this size class. Most of the remaining non-repeated or few copy sequences appear to be a few thousand base-pairs long.In this paper a somewhat novel approach has been used to quantitatively analyse the linear organisation of the large proportion of repeated sequence DNA as well as the non-repeated DNA in the wheat and rye genomes. Repeated sequences in the genomes of oats, barley, wheat and rye have been used as probes to distinguish and isolate four different groups of repeated sequences and their neighbouring sequences from the wheat and rye genomes. Radioactively labelled wheat or rye DNA fragments ranging from 200 to over 9000 nucleotides long were incubated separately with large excesses of denatured unlabelled oats, barley, wheat and rye DNAs to Cot values which enable all the repeated sequences of the unlabelled DNA to renature. The following parameters were then determined from the proportions of total labelled DNA in fragments which had at least partially renatured. (1) The proportions of the repeated sequences in the labelled DNAs that were able to hybridise to each unlabelled DNA; (2) the mean distance apart of the hybridising sequences on the longer labelled fragments; and (3) the proportion of the genome in which the hybridising sequences were concentrated. Analysis of these results, together with those of separate experiments designed to quantitatively estimate the nature of sequences unable to reanneal with the repeated sequences of each of the probe DNAs, have enabled schematic maps to be drawn which show how the repeated and non-repeated sequences are arranged in the wheat and rye genomes.Both genomes are constructed from millions of relatively short sequences, most of them considerably shorter than 3000 base-pairs. This structure was recognised because adjacent sequences can be distinguished by their frequency of repetition (i.e. repeated or non-repeated) or by their evolutionary origin. Approximately 40 to 45% of the wheat genome and 30 to 35% of the rye genome consists of short non-repeated sequences interspersed between short repeated sequences. Approximately 50% of the wheat genome and 60% of the rye genome consists of tandemly arranged repeated sequences of different evolutionary origins. It is postulated that much of this complex repeated sequence DNA could have arisen from amplification of compound sequences, each containing repeated and non-repeated sequence DNA.Short repeated sequences with a number average length of around 200 base-pairs and which occupy about 20% of the wheat and rye genomes are related to repeated sequences also found in oats and barley. They are concentrated in 60 to 70% of the wheat and rye genomes, being interspersed with different short repeated sequences and a significant proportion of the short non-repeated sequences.Rye chromosomes contain more DNA than wheat chromosomes. This is principally, but not entirely, due to additional repeated sequence DNA. Many quantitative changes appear to have occurred in both genomes, possibly affecting most families of repeated sequences, since wheat and rye diverged from a common ancestor. Both species contain species-specific repeated sequences (24% of rye genome; 16% of wheat genome) but a large proportion of these are closely interspersed with repeated sequences found in both genomes.  相似文献   

7.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

8.
9.
Selective amplification in PCR is principally determined by the sequence of the primers and the temperature of the annealing step. We have developed a new PCR technique for distinguishing related sequences in which additional selectivity is dependent on sequences within the amplicon. A 5′ extension is included in one (or both) primer(s) that corresponds to sequences within one of the related amplicons. After copying and incorporation into the PCR product this sequence is then able to loop back, anneal to the internal sequences and prime to form a hairpin structure—this structure is then refractory to further amplification. Thus, amplification of sequences containing a perfect match to the 5′ extension is suppressed while amplification of sequences containing mismatches or lacking the sequence is unaffected. We have applied Headloop PCR to DNA that had been bisulphite-treated for the selective amplification of methylated sequences of the human GSTP1 gene in the presence of up to a 105-fold excess of unmethylated sequences. Headloop PCR has a potential for clinical application in the detection of differently methylated DNAs following bisulphite treatment as well as for selective amplification of sequence variants or mutants in the presence of an excess of closely related DNA sequences.  相似文献   

10.
Informativeness of human (dC-dA)n.(dG-dT)n polymorphisms   总被引:133,自引:0,他引:133  
J L Weber 《Genomics》1990,7(4):524-530
Abundant human interspersed repetitive DNA sequences of the form (dC-dA)n.(dG-dT)n have been shown to exhibit length polymorphisms. Examination of over 100 human (dC-dA)n.(dG-dT)n sequences revealed that the sequences differed from each other both in numbers of repeats and in repeat sequence type. Using a set of precise classification rules, the sequences were divided into three categories: perfect repeat sequences without interruptions in the runs of CA or GT dinucleotides (64% of total), imperfect repeat sequences with one or more interruptions in the run of repeats (25%), and compound repeat sequences with adjacent tandem simple repeats of a different sequence (11%). Informativeness of (dC-dA)n.(dG-dT)n markers in the perfect sequence category was found to increase with increasing average numbers of repeats. PIC values ranged from 0 at about 10 or fewer repeats to above 0.8 for sequences with about 24 or more repeats. (dC-dA)n.(dG-dT)n polymorphisms in the imperfect sequence category showed lower informativeness than expected on the basis of the total numbers of repeats. The longest run of uninterrupted CA or GT repeats was found to be the best predictor of informativeness of (dC-dA)n.(dG-dT)n polymorphisms regardless of the repeat sequence category.  相似文献   

11.
Inspection of many proposed recognition signal sequences shows that TGTG/CACA, GAGA/TCTC or their triplet subsets, and TGA/TCA occur frequently. These repeated elements, conserved in recognition sequences from evolutionarily distant organisms, are likely to possess unique structural characteristics. Recurrence of these oligomers may aid in identification of further regulatory sequences in upstream or other regions. Another class of recognition sequences is GC-rich. At present there are only a few examples of this class. It is likely that these sequences function via a different mechanism.  相似文献   

12.
Three clones of non-repetitive sequences and six clones containing repetitive sequences were obtained from micronuclear DNA of Tetrahymena thermophila. All the non-repetitive and three repetitive sequences had the same organization in micro- and macronuclear DNAs as revealed by blot hybridization. On the other hand, the remaining three clones with repetitive sequences had apparently different organization in the two nuclear DNAs. All these repetitive sequences showed a smear on the blot in addition to a number of discrete bands when micronuclear DNA was digested with EcoR I. In macronuclear DNAs, these sequences invariably became one or two bands and the smear disappeared. We conclude that, when a macronucleus develops from a micronucleus, the non-repetitive sequences amplify by more than 20 times with relatively few rearrangement, whereas some selected portions of repeated and/or repeat-contiguous sequences are amplified with rather extensive reorganization.  相似文献   

13.
14.
We investigated the ability of rats to recall sequences of nose-poke holes with a modified serial reaction time task. In each trial, a sequence was randomly selected and the position of the first illuminated hole, which functioned as a cue stimulus, informed the rats whether the following sequence was a predictable one or not, based on prior training. The rats responded predictively only when the cues of the predictable sequences were presented. They did not show predictive responses when the cues of unpredictable sequences were presented, even though the unpredictable sequences partially had the same order of holes as the predictable sequences. These results indicate that the rats can recall sequences on the basis of presentation of the first cue stimulus informing predictable or unpredictable sequences. Recording neuronal activity while rats perform this behavioral task would be useful to elucidate neuronal mechanisms that mediate sequence recall.  相似文献   

15.
Conserved features of coordinately regulated E. coli promoters   总被引:46,自引:15,他引:31       下载免费PDF全文
E. coli promoters which are coordinately regulated in response to amino acid limitation contain conserved nucleotide sequences immediately 3' to -10 region. These sequences contain predominantly either GC or AT residues depending on whether the response is respectively negative or positive. Certain classes of promoters also contain conserved sequences upstream of the primary promoter. In tRNA genes these sequences could act as a secondary polymerase binding site.  相似文献   

16.
Two approaches to the understanding of biological sequences are confronted. While the recognition of particular signals in sequences relies on complex physical interactions, the problem is often analysed in terms of the presence or absence of literal motifs (strings) in the sequence. We present here a test-case for evaluating the potential of this approach. We classify DNA sequences as positive or negative depending on whether they contain a single melted domain in the middle of the sequence, which is a global physical property. Two sets of positive "biological" sequences were generated by a computer simulation of evolutionary divergence along the branches of a phylogenetic tree, under the constraint that each intermediate sequence be positive. These two sets and a set of random positive sequences were subjected to pattern analysis. The observed local patterns were used to construct expert systems to discriminate positive from negative sequences. The experts achieved 79% to 90% success on random positive sequences and up to 99% on the biological sets, while making less than 2% errors on negative sequences. Thus, the global constraints imposed on sequences by a physical process may generate local patterns that are sufficient to predict, with a reasonable probability, the behaviour of the sequences. However, rather large sets of biological sequences are required to generate patterns free of illegitimate constraints. Furthermore, depending upon the initial sequence, the sets of sequences generated on a phylogenetic tree may be amenable or refractory to string analysis, while obeying identical physical constraints. Our study clarifies the relationship between experts' errors on positive and negative sequences, and the contributions of legitimate and illegitimate patterns to these errors. The test-case appears suitable both for further investigations of problems in the theory of sequence evolution and for further testing of pattern analysis techniques.  相似文献   

17.
Yersinia species utilize a type III secretion system to inject toxins, called Yops (Yersinia outer proteins), into eukaryotic cells. The N-termini of the Yops serve as type III secretion signals, but they do not share a consensus sequence. To simplify the analysis of type III secretion signals, we replaced amino acids 2-8 of the secreted protein YopE with all permutations (27 or 128) of synthetic serine/isoleucine sequences. The results demonstrate that amphipathic N-terminal sequences, containing four or five serine residues, have a much greater probability than hydrophobic or hydrophilic sequences to target YopE for secretion. Multiple linear regression analysis of the synthetic sequences was used to obtain a model for N-terminal secretion signals. The model accurately classifies the N-terminal sequences of native type III substrates as efficient secretion signals.  相似文献   

18.
19.
High-efficiency thermal asymmetric interlaced (HE-TAIL) PCR is a modified thermal asymmetric interlaced (TAIL) method for finding unknown genomic DNA sequences adjacent to known sequences in GC-rich plant DNA. Necessary modifications to obtain high-efficiency amplification of flanking sequences are the inclusion of 2 control reactions during tertiary cycling and the design of long gene-specific primers, which can be used during single-step annealing-extension PCR. The modified protocol is suitable to walk from short known sequences, such as sequence-tagged sites (STS), expressed sequence tags (EST), or short exon sequences, and enables researchers to clone full-length open reading frames (ORFs) without library screening. Moreover, the HE-TAIL method can be used to identify DNA sequences flanking T-DNA insertions or to isolate promoter regions. Although individual steps are limited to about 4 kb, multiple steps can be done to walk upstream or downstream of known regions.  相似文献   

20.
Activation of the transformation potential of the cellular fps gene   总被引:27,自引:0,他引:27  
D A Foster  M Shibuya  H Hanafusa 《Cell》1985,42(1):105-115
Chicken cellular-fps (c-fps) sequences were substituted for viral-fps (v-fps) sequences in two retroviral genome structures, one that expressed a c-fps gene product that was indistinguishable from the normal c-fps gene product expressed in chicken bone marrow cells, and another that expressed a gag-fps fusion protein. When c-fps gene sequences (without linked gag gene sequences) were expressed at high levels in a viral vector, no transformation of fibroblasts was detected. It was previously demonstrated that the corresponding v-fps sequences could transform fibroblasts. When the same c-fps sequences were expressed in a form linked to gag gene sequences, transformation of fibroblasts and induction of tumors were observed. The data suggest that the c-fps gene product lacks transformation potential by itself even when overexpressed and that the transformation potential of the c-fps gene can be activated by either mutation (or mutations) in the fps coding region or by fusion with viral gag gene sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号