首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new algorithm is introduced for analyzing gene-duplication-independent (orthologous) and gene-duplication-dependent amino acid sequence similarities between proteins of different species. It is based on the calculation of an autocorrelation function D(x) as a Fourier series analogous to that used in crystal analysis by x-ray diffraction. The primary structure of the protein is decomposed into "homopolypeptide-defective sequences" containing identical or similar amino acid residues and vacancies corresponding to the missing amino acid residues. The Fourier transforms F(h) simulating the diffraction patterns of defective linear gratings corresponding to the defective homopolypeptide sequences are calculated. The squared F(h) values are then used as coefficients of Fourier series corresponding to the autocorrelation functions D(x). A peak of D(x) corresponds to a vector of length x, which is the distance between two identical amino acid residues. It is pointed out that optical diffraction methods, instead of computer methods, would also be useful. It is shown through a number of examples that this method allows satisfactory pattern recognition of homologies and internal duplications of an initial segment of the polypeptide chain. In the latter case the value of the above method may be seen from the fact that it detects repeated duplications in proteins such as spinach ferredoxin and myoglobin, for which other methods had either failed or given inconclusive results. The above approach appears most promising for studies of molecular evolution and structure-sequence correlations.  相似文献   

2.
Genetic alphabet expansion of DNA by introducing unnatural bases (UBs), as a fifth letter, dramatically augments the affinities of DNA aptamers that bind to target proteins. To determine whether UB-containing DNA (UB-DNA) aptamers obtained by affinity selection could spontaneously achieve high specificity, we have generated a series of UB-DNA aptamers (KD: 27−182 pM) targeting each of four dengue non-structural protein 1 (DEN-NS1) serotypes. The specificity of each aptamer is remarkably high, and the aptamers can recognize the subtle variants of DEN-NS1 with at least 96.9% amino acid sequence identity, beyond the capability of serotype identification (69−80% sequence identities). Our UB-DNA aptamers specifically identified two major variants of dengue serotype 1 with 10-amino acid differences in the DEN-NS1 protein (352 aa) in Singaporeans’ clinical samples. These results suggest that the high-affinity UB-DNA aptamers generated by affinity selection also acquire high target specificity. Intriguingly, one of the aptamers contained two different UBs as fifth and sixth letters, which are essential for the tight binding to the target. These two types of unnatural bases with distinct physicochemical properties profoundly expand the potential of DNA aptamers. Detection methods incorporating the UB-DNA aptamers will facilitate precise diagnoses of viral infections and other diseases.  相似文献   

3.
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.  相似文献   

4.
An Eulerian path approach to global multiple alignment for DNA sequences.   总被引:3,自引:0,他引:3  
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one configuration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance.  相似文献   

5.
Digital signal processing methods for biosequence comparison.   总被引:1,自引:1,他引:0       下载免费PDF全文
A method is discussed for DNA or protein sequence comparison using a finite field fast Fourier transform, a digital signal processing technique; and statistical methods are discussed for analyzing the output of this algorithm. This method compares two sequences of length N in computing time proportional to N log N compared to N2 for methods currently used. This method makes it feasible to compare very long sequences. An example is given to show that the method correctly identifies sites of known homology.  相似文献   

6.
Many phylogenetic inference methods are based on Markov models of sequence evolution. These are usually expressed in terms of a matrix (Q) of instantaneous rates of change but some models of amino acid replacement, most notably the PAM model of Dayhoff and colleagues, were originally published only in terms of time-dependent probability matrices (P(t)). Previously published methods for deriving Q have used eigen-decomposition of an approximation to P(t). We show that the commonly used value of t is too large to ensure convergence of the estimates of elements of Q. We describe two simpler alternative methods for deriving Q from information such as that published by Dayhoff and colleagues. Neither of these methods requires approximation or eigen-decomposition. We identify the methods used to derive various different versions of the Dayhoff model in current software, perform a comparison of existing and new implementations, and, to facilitate agreement among scientists using supposedly identical models, recommend that one of the new methods be used as a standard.  相似文献   

7.
Rapid acquisition of high-resolution 2D and 3D NMR spectra is essential for studying biological macromolecules. In order to minimize the experimental time, a non-linear sampling scheme is proposed for the indirect dimensions of multidimensional experiments. These data can be processed using the algorithm proposed by Dutt and Rokhlin (Appl. Comp. Harm. Anal. 1995, 2, 85–100) for fast Fourier transforms of non equispaced data. Examples of 1H−15N HSQC spectra are shown, where crowded correlation peaks can be resolved using non-linear acquisition. Simulated data have been used to analyze the artefacts produced by the Lagrange interpolation. As compared to non-linear processing methods, this algorithm is simple and highly robust since no parameters need to be adjusted by the user.  相似文献   

8.
A method based on Fourier transforms is described for obtaining a 3-D reconstruction from a paracrystalline object with static disorder. The method is derived from the standard methods used in 3-D reconstruction of 2-D crystals except that all of the Fourier coefficients are used and not just the sampled data from the periodic lattice. Thus, not only is the spatially ordered part of the structure visualized in 3-D, but also the spatially disordered part. Application of the method to 3-D reconstructions of insect flight muscle is described as well as prospects for extension of the method to radiation-sensitive specimens.  相似文献   

9.
10.
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.  相似文献   

11.
In 3D single particle reconstruction, which involves the translational and rotational matching of a large number of electron microscopy (EM) images, the algorithmic performance is largely dependent on the efficiency and accuracy of the underlying 2D image alignment kernel. We present a novel fast rotational matching kernel for 2D images (FRM2D) that significantly reduces the cost of this alignment. The alignment problem is formulated using one translational and two rotational degrees of freedom. This allows us to take advantage of fast Fourier transforms (FFTs) in rotational space to accelerate the search of the two angular parameters, while the remaining translational parameter is explored, within a limited range, by exhaustive search. Since there are no boundary effects in FFTs of cyclic angular variables, we avoid the expensive zero padding associated with Fourier transforms in linear space. To verify the robustness of our method, efficiency and accuracy tests were carried out over a range of noise levels in realistic simulations of EM images. Performance tests against two standard alignment methods, resampling to polar coordinates and self-correlation, demonstrate that FRM2D compares very favorably to the traditional methods. FRM2D exhibits a comparable or higher robustness against noise and a significant gain in efficiency that depends on the fineness of the angular sampling and linear search range.  相似文献   

12.
Many studies of biological sequence data have examined sequence structure in terms of periodicity, and various methods for measuring periodicity have been suggested for this purpose. This paper compares two such methods, autocorrelation and the Fourier transform, using synthetic periodic sequences, and explains the differences in periodicity estimates produced by each. A hybrid autocorrelation—integer period discrete Fourier transform is proposed that combines the advantages of both techniques. Collectively, this representation and a recently proposed variant on the discrete Fourier transform offer alternatives to the widely used autocorrelation for the periodicity characterization of sequence data. Finally, these methods are compared for various tetramers of interest in C. elegans chromosome I.  相似文献   

13.
Application of Fourier Transform for processing 3D NMR spectra with random sampling of evolution time space is presented. The 2D FT is calculated for pairs of frequencies, instead of conventional sequence of one-dimensional transforms. Signal to noise ratios and linewidths for different random distributions were investigated by simulations and experiments. The experimental examples include 3D HNCA, HNCACB and 15N-edited NOESY-HSQC spectra of 13C 15N labeled ubiquitin sample. Obtained results revealed general applicability of proposed method and the significant improvement of resolution in comparison with conventional spectra recorded in the same time.  相似文献   

14.
We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.  相似文献   

15.
Equivalence of two Fourier methods for biological sequences   总被引:1,自引:0,他引:1  
 Two methods for defining Fourier power spectra for DNA sequences or other biological sequences are compared. The first method uses indicator sequences for each letter. The second method by Silverman and Linsker assigns to each letter a vertex of a regular tetrahedron in space, and this can be generalized to any dimension. While giving different Fourier transforms, it is shown that the power spectra of the two methods are essentially the same. This is also true if one replaces the Fourier transform in both methods with another linear transform, such as the Walsh transform. Received 4 December 1995  相似文献   

16.
Fourier transform infrared spectroscopy was used to investigate the small conformational differences which exist between ribonuclease A and ribonuclease S in aqueous systems. Deconvolution and derivative methods were used to observe the overlapping components of the amide I and II bands. These proteins give identical spectra in H2O and after complete exchange in 2H2O. However structural differences are revealed by monitoring the rate of 1H-2H exchange by Fourier transform infrared spectroscopy. At equivalent times of exposure in 2H2O buffer ribonuclease S undergoes greater isotopic exchange than ribonuclease A. Thus complete exchange takes place for ribonuclease S but not ribonuclease A after incubation at room temperature for 8 days. Complete 1H-2H exchange of ribonuclease A was achieved by incubation at 62 degrees C for 30 min. The available X-ray data and comparison with the infrared spectra of other soluble proteins was used to assign the components of the amide I and II bands to various secondary structures. In particular, band shifts observed during the later stages of exchange are associated with slowly exchanging residues in beta-strand and alpha-helical regions. The higher rate of exchange for ribonuclease S is associated with a greater conformational flexibility and a more open structure. The results show that it is necessary to be cautious in making band assignments based on exchange methods unless the extent of exchange is known. Furthermore, it is seen that the combination of Fourier transform infrared spectroscopy and hydrogen-deuterium exchange is a powerful technique for revealing small differences in protein secondary structure.  相似文献   

17.
There are over 10,000 C2H2-type zinc finger (ZF) domains distributed among more than 1,000 ZF proteins in the human genome. These domains are frequently observed to be involved in sequence-specific DNA binding, and uncharacterized domains are typically assumed to facilitate DNA interactions. However, some ZFs also facilitate binding to proteins or RNA. Over 100 Cys2-His2 (C2H2) ZF-protein interactions have been described. We initially attempted a bioinformatics analysis to identify sequence features that would predict a DNA- or protein-binding function. These efforts were complicated by several issues, including uncertainties about the full functional capabilities of the ZFs. We therefore applied an unbiased approach to directly examine the potential for ZFs to facilitate DNA or protein interactions. The human OLF-1/EBF associated zinc finger (OAZ) protein was used as a model. The human O/E-1-associated zinc finger protein (hOAZ) contains 30 ZFs in 6 clusters, some of which have been previously indicated in DNA or protein interactions. DNA binding was assessed using a target site selection (CAST) assay, and protein binding was assessed using a yeast two-hybrid assay. We observed that clusters known to bind DNA could facilitate specific protein interactions, but clusters known to bind protein did not facilitate specific DNA interactions. Our primary conclusion is that DNA binding is a more restricted function of ZFs, and that their potential for mediating protein interactions is likely greater. These results suggest that the role of C2H2 ZF domains in protein interactions has probably been underestimated. The implication of these findings for the prediction of ZF function is discussed.  相似文献   

18.
H(+)-ATPase/synthases are membrane-bound rotary nanomotors that are essential for energy conversion in nearly all life forms. A member of the family of the vacuolar-type ATPases (V-ATPases) from Thermus thermophilus, sometimes also termed A-type ATPase, was purified to homogeneity and subjected to two-dimensional (2D) crystallization trials. A novel approach to the 2D crystallization of unstable complexes yielded densely packed sheets of V-ATPase, exhibiting crystalline arrays. Aggregation of the V-ATPase under acidic conditions during reconstitution circumvented the continuous dissociation of the whole complex into the V(1) and V(o) domains. The resulting three-dimensional aggregates were converted into 2D sheets by the use of a basic buffer, and after a short annealing cycle, ordered arrays of up to 1.5 microm diameter appeared. Fourier transforms calculated from micrographs taken from the negatively stained sample showed diffraction spots to a resolution of 23A. The Fourier transforms of the untilted images revealed unit-cell dimensions of a=232A, b=132A, and gamma=90 degrees , and a projection map was calculated by merging 11 images. The most probable molecular packing suggests p22(1)2(1) symmetry of the crystals and dimer contacts between the V(1) domains.  相似文献   

19.
Comparative analysis of related DNA sequences has been simplified by the transformation of data in the standard A, G, C, T format into a set of geometric symbols that promote pattern recognition. Previously, comparing more than 2 or 3 sequences simultaneously has been difficult because of the monotonous patterns established by letters. Here 33 sequences are simultaneously compared to demonstrate the ease with which nucleotide substitutions are accurately identified. This has been accomplished by writing a Word-Perfect macro program to facilitate this transformation. Since this word processing program is widely used, performing this kind of analysis is readily achievable in most laboratories involved in DNA sequence analysis.  相似文献   

20.
All the elements of a Fourier analysis can be derived from the experiments of Graham and Robson on contrast sensitivity. Once their experiment is posed as an eigenvalue problem, a complete orthonormal set of eigenfunctions results from solving the associated differential equation. Neither sine and cosine nor Gabor functions result. Instead, the Hermite functions arise as the eigenfunctions of a space-variant differential operator used to model the contrast sensitivity of human observers. These functions, up to a constant, are their own Fourier transforms, and in principle can be used to exactly represent the Fourier transform of naturally occuring visual images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号