首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We describe a new computer program that identifies conserved secondary structures in aligned nucleotide sequences of related single-stranded RNAs. The program employs a series of hash tables to identify and sort common base paired helices that are located in identical positions in more than one sequence. The program gives information on the total number of base paired helices that are conserved between related sequences and provides detailed information about common helices that have a minimum of one or more compensating base changes. The program is useful in the analysis of large biological sequences. We have used it to examine the number and type of complementary segments (potential base paired helices) that can be found in common among related random sequences similar in base composition to 16S rRNA from Escherichia coli. Two types of random sequences were analyzed. One set consisted of sequences that were independent but they had the same mononucleotide composition as the 16S rRNA. The second set contained sequences that were 80% similar to one another. Different results were obtained in the analysis of these two types of random sequences. When 5 sequences that were 80% similar to one another were analyzed, significant numbers of potential helices with two or more independent base changes were observed. When 5 independent sequences were analyzed, no potential helices were found in common. The results of the analyses with random sequences were compared with the number and type of helices found in the phylogenetic model of the secondary structure of 16S ribosomal RNA. Many more helices are conserved among the ribosomal sequences than are found in common among similar random sequences. In addition, conserved helices in the 16S rRNAs are, on the average, longer than the complementary segments that are found in comparable random sequences. The significance of these results and their application in the analysis of long non-ribosomal nucleotide sequences is discussed.  相似文献   

2.
3.
The exact distribution of word counts in random sequences and several approximations have been proposed in the past few years. The exact distribution has no theoretical limit but may require prohibitive computation time. On the other hand, approximate distributions can be rapidly calculated but, in practice, are only accurate under specific conditions. After making a survey of these distributions, we compare them according to both their accuracy and computational cost. Rules are suggested for choosing between Gaussian approximations, compound Poisson approximation, and exact distribution. This work is illustrated with the detection of exceptional words in the phage Lambda genome.  相似文献   

4.
5.
6.
Summary We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identity using the run test statistic (r o) of Mood (1940,Ann. Math. Stat. 11, 367–392). The probability density ofr o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.  相似文献   

7.
An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.  相似文献   

8.
9.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

10.
Proteins of related functions are often similar in sequence, reflecting a common phylogenetic origin. Proteins with no known homology are probably diversified proteins, too distantly related to known sequences in databases to retain significant similarity. All proteins, however, probably share common ancestries if one moves far enough back in evolution; therefore, given the huge accumulation of protein sequences in current databases, it could be expected that some proteins with no obvious sequence resemblance to any other share some residues that could represent footprints of ancient common ancestries. To identify such putative footprints, we have searched for short stretches of amino acids present in a given protein sequence that are also found in a significant number of nonrelated proteins in the database. The significantly high frequency of occurrence of these patterns in the database would support a common evolutionary source, and a diversity of non-related proteins that contain the pattern would express their ancient origin. Using this strategy, significant patterns were found in actual exons, but not in randomized amino acid sequences, nor in translated sequences of noncoding DNA, suggesting that this strategy actually leads to the identification of patterns with a biological significance. These significant patterns are not randomly positioned along the sequences analyzed, but they tend to accumulate within specific regions, producing a profile of discrete domains. In some well-known proteins analyzed in this study, some of these domains are coincident with known motifs. Thus, the procedure described in this paper could be useful for identifying ancient patterns and domains in protein sequences, some of which could also have a functional or structural significance.  相似文献   

11.
The aim of this paper is to summarize and to compare some known mathematical models of orientation perception in random dot patterns and to propose new solutions of this question. The model adequacy is judged from the previously obtained experimental results. Apart from the models based on some simple function of the coordinates of dots forming a pattern, also models derived from the so-called image function of the pattern are analysed. The latter ones were found more flexible to render the different features of experimentally obtained data, mainly in the case of orientation ambiguity for some special patterns. The stochastic variants of the deterministic models are introduced.  相似文献   

12.
This article introduces a new method to represent bone surface geometry for simulations of joint contact. The method uses the inner product of two basis functions to provide a mathematical representation of the joint surfaces. This method guarantees a continuous transition in the direction of the surface normals, an important property for computation of joint contact. Our formulation handles experimental data that are not evenly distributed, a common characteristic of digitized data of musculoskeletal morphologies. The method makes it possible to represent highly curved surfaces, which are encountered in many anatomical structures. The accuracy of this method is demonstrated by modeling the human knee joint. The mean relative percentage error in the representation of the patellar track surface was 0.25% (range 0-1.56%) which corresponded to an absolute error of 0.17mm (range 0-0.16mm).  相似文献   

13.
Methods for calculating the probabilities of finding patterns in sequences   总被引:1,自引:0,他引:1  
This paper describes the use of probability-generating functionsfor calculating the probabilities of finding motifs in nucleicacid and protein sequences. Equations and algorithms are givenfor calculating the probabilities associated with nine differentways of defining motifs. Comparisons are made with searchesof random sequences. A higher level structure-the pattern-isdefined as a list of motifs. A pattern also specifies the permittedranges of spacing allowed between its constituent motifs. Equationsfor calculating the expected numbers of matches to patternsare given. Received on March 1, 1988; accepted on September 30, 1988  相似文献   

14.

Aim

Coastal fishes have a fundamental role in marine ecosystem functioning and contributions to people, but face increasing threats due to climate change, habitat degradation and overexploitation. The extent to which human pressures are impacting coastal fish biodiversity in comparison with geographic and environmental factors at large spatial scale is still under scrutiny. Here, we took advantage of environmental DNA (eDNA) metabarcoding to investigate the relationship between fish biodiversity, including taxonomic and genetic components, and environmental but also socio-economic factors.

Location

Tropical, temperate and polar coastal areas.

Time period

Present day.

Major taxa studied

Marine fishes.

Methods

We analysed fish eDNA in 263 stations (samples) in 68 sites distributed across polar, temperate and tropical regions. We modelled the effect of environmental, geographic and socio-economic factors on α- and β-diversity. We then computed the partial effect of each factor on several fish biodiversity components using taxonomic molecular units (MOTU) and genetic sequences. We also investigated the relationship between fish genetic α- and β-diversity measured from our barcodes, and phylogenetic but also functional diversity.

Results

We show that fish eDNA MOTU and sequence α- and β-diversity have the strongest correlation with environmental factors on coastal ecosystems worldwide. However, our models also reveal a negative correlation between biodiversity and human dependence on marine ecosystems. In areas with high dependence, diversity of all fish, cryptobenthic fish and large fish MOTUs declined steeply. Finally, we show that a sequence diversity index, accounting for genetic distance between pairs of MOTUs, within and between communities, is a reliable proxy of phylogenetic and functional diversity.

Main conclusions

Together, our results demonstrate that short eDNA sequences can be used to assess climate and direct human impacts on marine biodiversity at large scale in the Anthropocene and can further be extended to investigate biodiversity in its phylogenetic and functional dimensions.  相似文献   

15.
A new approach to search for common patterns in many sequencesis presented. The idea is that one sequence from the set ofsequences to be compared is considered as a ‘basic’one and all its similarities with other sequences are found.Multiple similarities are then reconstructed using these data.This approach allows one to search for similar segments whichcan differ in both substitutions and deletions/insertions. Thesesegments can be situated at different positions in various sequences.No regions of complete or strong similarity within the segmentsare required. The other parts of the sequences can have no similarityat all. The only requirement is that the similar segments canbe found in all the sequences (or in the majority of them, giventhe common segments are present in the basic sequence). Workingtime of an algorithm presented is proportional to n.L2when nsequences of length L are analyzed. The algorithm proposed isimplemented as programs for the IBM-PC and IBM/370. Its applicationsto the analysis of biopolymer primary structures as well asthe dependence of the results on the choice of basic sequenceare discussed.  相似文献   

16.
17.
Cells containing pathogenic mutations in mitochondrial DNA (mtDNA) generally also contain the wild-type mtDNA, a condition called heteroplasmy. The amount of mutant mtDNA in a cell, called the heteroplasmy level, is an important factor in determining the amount of mitochondrial dysfunction and therefore the disease severity. mtDNA is inherited maternally, and there are large random shifts in heteroplasmy level between mother and offspring. Understanding the distribution in heteroplasmy levels across a group of offspring is an important step in understanding the inheritance of diseases caused by mtDNA mutations. Previously, our understanding of the heteroplasmy distribution has been limited to just the mean and variance of the distribution. Here we give equations, adapted from the work of Kimura on random genetic drift, for the full mtDNA heteroplasmy distribution. We describe how to use the Kimura distribution in mitochondrial genetics, and we test the Kimura distribution against human, mouse, and Drosophila data sets.  相似文献   

18.
The task-oriented groups considered here consist of a number of individuals, each having initially one piece of information which must be transmitted to all the others to complete the task. Interest is centered on the communication net which restricts the possible channels for messages. At every sending time each individual sends all the information he has acquired to one other individual; the major assumption here is that this recipient is chosen at random from the possibilities given by the communication net. The information state is defined as a matrix which shows where the initial information has spread. These matrices can be considered as the states of a Markov chain, and in this way the distribution of completion times for the task is obtained. Some special cases are worked out and generalizations are indicated. A proof is given of the formula for the shortest possible completion time in any net when a fixed number of messages is sent by each individual at each sending time.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号