首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most molecular analyses, including phylogenetic inference, are based on sequence alignments. We present an algorithm that estimates relatedness between biomolecules without the requirement of sequence alignment by using a protein frequency matrix that is reduced by singular value decomposition (SVD), in a latent semantic index information retrieval system. Two databases were used: one with 832 proteins from 13 mitochondrial gene families and another composed of 1000 sequences from nine types of proteins retrieved from GenBank. Firstly, 208 sequences from the first database and 200 from the second were randomly selected and compared using edit distance between each pair of sequences and respective cosines and Euclidean distances from SVD. Correlation between cosine and edit distance was -0.32 (P < 0.01) and between Euclidean distance and edit distance was +0.70 (P < 0.01). In order to check the ability of SVD in classifying sequences according to their categories, we used a sample of 202 sequences from the 13 gene families as queries (test set), and the other proteins (630) were used to generate the frequency matrix (training set). The classification algorithm applies a voting scheme based on the five most similar sequences with each query. With a 3-peptide frequency matrix, all 202 queries were correctly classified (accuracy = 100%). This algorithm is very attractive, because sequence alignments are neither generated nor required. In order to achieve results similar to those obtained with edit distance analysis, we recommend that Euclidean distance be used as a similarity measure for protein sequences in latent semantic indexing methods.  相似文献   

2.
Singular value decomposition (SVD) is a technique commonly used in the analysis of spectroscopic data that both acts as a noise filter and reduces the dimensionality of subsequent least-squares fits. To establish the applicability of SVD to crystallographic data, we applied SVD to calculated difference Fourier maps simulating those to be obtained in a time-resolved crystallographic study of photoactive yellow protein. The atomic structures of one dark state and three intermediates were used in qualitatively different kinetic mechanisms to generate time-dependent difference maps at specific time points. Random noise of varying levels in the difference structure factor amplitudes, different extents of reaction initiation, and different numbers of time points were all employed to simulate a range of realistic experimental conditions. Our results show that SVD allows for an unbiased differentiation between signal and noise; a small subset of singular values and vectors represents the signal well, reducing the random noise in the data. Due to this, phase information of the difference structure factors can be obtained. After identifying and fitting a kinetic mechanism, the time-independent structures of the intermediates could be recovered. This demonstrates that SVD will be a powerful tool in the analysis of experimental time-resolved crystallographic data.  相似文献   

3.
We have used three reference sequences representative of bacterial drug resistance pumps and sugar transport proteins to collect the 91 most closely related sequences from a composite, nonredundant protein sequence database. Having eliminated certain very close relatives, the remainder were subjected to analysis and alignment by using two different similarity matrices: one of these was a matrix based on structural conservation of amino acid residues in proteins of known conformation and the other was based on the more familiar mutational matrix. Unrooted similarity trees for these proteins were constructed for each matrix and compared. A systematic analysis of the differences between these trees was undertaken and the sequences were analyzed for the presence or absence of certain sequence motifs. The results show that the clades created by the two methods are broadly comparable but that there are some clusters of sequences that are significantly different. Further analysis confirmed that (1) the sequences collected by this objective method are all known or putative 12-helix (in some cases reported as 14-helix) transmembrane proteins, (2) there is evidence for few cases of an origin based on gene duplication, (3) the bacterial drug resistance pumps are distributed in more than one clade and cannot be regarded as a definitive subset of these proteins, and that (4) the diversity is such that there is no evidence of a single ancestral protein. The possible extension of the methods to other cases of divergent protein sequences is discussed.  相似文献   

4.
SVDMAN--singular value decomposition analysis of microarray data   总被引:1,自引:0,他引:1  
SUMMARY: We have developed two novel methods for Singular Value Decomposition analysis (SVD) of microarray data. The first is a threshold-based method for obtaining gene groups, and the second is a method for obtaining a measure of confidence in SVD analysis. Gene groups are obtained by identifying elements of the left singular vectors, or gene coefficient vectors, that are greater in magnitude than the threshold W N(-1/2), where N is the number of genes, and W is a weight factor whose default value is 3. The groups are non-exclusive and may contain genes of opposite (i.e. inversely correlated) regulatory response. The confidence measure is obtained by systematically deleting assays from the data set, interpolating the SVD of the reduced data set to reconstruct the missing assay, and calculating the Pearson correlation between the reconstructed assay and the original data. This confidence measure is applicable when each experimental assay corresponds to a value of parameter that can be interpolated, such as time, dose or concentration. Algorithms for the grouping method and the confidence measure are available in a software application called SVD Microarray ANalysis (SVDMAN). In addition to calculating the SVD for generic analysis, SVDMAN provides a new means for using microarray data to develop hypotheses for gene associations and provides a measure of confidence in the hypotheses, thus extending current SVD research in the area of global gene expression analysis.  相似文献   

5.
As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date.  相似文献   

6.
Two species of alcyonarian corals, Lobophytum crassum and Sinularia polydactyla, are closely related to each other. It is reported that the calcified organic substances in the skeletons of both contain a protein–polysaccharide complex playing a key role in the regulation of biocalcification. However, information on the matrix proteins of endoskeletal sclerite has been lacking. Hence we studied the proteinaceous organic matrices of sclerites for both species, to analyze the sequences and the functional properties of the proteins present. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) analysis of the preparations showed four bands of proteins with apparent molecular masses of 102, 67, 48, and 37 kDa for L. crassum and seven bands of 109, 83, 70, 63, 41, 30, and 22 kDa for S. polydactyla. A major protein band of about 67 kDa in L. crassum and two bands of proteins of about 70 and 63 kDa in S. polydactyla yielded N-terminal amino acid sequences. Periodic acid-Schiff staining indicated that the 67-kDa protein in L. crassum, and 83- and 63-kDa proteins in S. polydactyla were glycosylated. For detection of calcium binding proteins, a Ca2+ overlay analysis was conducted in the extract via 45Ca autoradiography. The 102- and 67-kDa calcium binding proteins in L. crassum, and the 109- and 63-kDa Ca2+ binding proteins in S. polydactyla were found to be radioactive. An assay for carbonic anhydrase (CA), which is thought to play an important role in the process of calcification, revealed specific activities. Newly derived protein sequences were subjected to standard sequence analysis involving identification of similarities to other proteins in databases. The significantly different protein expressions and compositional analysis of sequences between two species were demonstrated.  相似文献   

7.
8.
Elmer WH  Marra RE 《Mycologia》2011,103(4):806-819
Sudden vegetation dieback (SVD) is the loss of smooth cordgrass (Spartina alterniflora) along intertidal creeks in salt marshes of the Atlantic and Gulf states. The underlying cause of SVD remains unclear, but earlier work suggested a contributing role for Fusarium spp. in Louisiana. This report investigated whether these or other Fusarium species were associated with S. alterniflora dieback in mid- to north-Atlantic states. Isolations from seven SVD sites yielded 192 isolates of Fusarium spp., with more than 75% isolated from aboveground tissue. Most isolates (88%) fell into two undescribed morphospecies (MS) distinguished from each other by macroconidial shape, phialide ontogeny and growth rates. Pathogenicity tests on wound-inoculated S. alterniflora stems and seedling roots revealed that isolates in MS1 were more virulent than those in MS2 but no single isolate caused plant mortality. No matches to known species of Fusarium were revealed by DNA sequence queries of translation elongation factor 1-α (tef1) sequences. A phylogenetic analysis of partial sequences of three genes, β-tubulin (β-tub), calmodulin (cal) and tef1, was conducted on representative isolates from MS1 (n = 20) and MS2 (n = 18); it provided strong evidence that the MS1 isolates form a clade that represents a heretofore undescribed species, which we designate Fusarium palustre sp. nov. Isolates from the more variable MS2 clustered with the F. incarnatum-equiseti species complex as F. cf. incarnatum. Although a strong association exists between both species and declining S. alterniflora in SVD sites, neither appears to play a primary causal role in SVD. However, our findings suggest that F. palustre might play an important secondary role in the ecological disruption of the salt marshes.  相似文献   

9.
Bindin is the sea urchin sperm acrosomal protein that is responsible for the species-specific adhesion of the sperm to the egg. Two new bindin cDNA sequences that contain the entire open reading frame for the binding precursor are reported: one for Strongylocentrotus franciscanus and one for Lytechinus variegatus. Both contain inverted repetitive sequences in their 3' untranslated regions, and the S. franciscanus cDNA contains an inverted repetitive sequence match between the 5' untranslated region and the coding region. The middle third of the mature bindin sequence is highly conserved in all three species, and the flanking sequences share short repeated sequences that vary in number between the species. Cross-fertilization data are reported for the species S. purpuratus, S. franciscanus, L. variegatus, and L. pictus. A barrier to cross-fertilization exists between the sympatric Strongylocentrotus species, but there is no barrier between the allopatric Lytechinus species.  相似文献   

10.
MOTIVATION: Multidimensional scaling (MDS) is a well-known multivariate statistical analysis method used for dimensionality reduction and visualization of similarities and dissimilarities in multidimensional data. The advantage of MDS with respect to singular value decomposition (SVD) based methods such as principal component analysis is its superior fidelity in representing the distance between different instances specially for high-dimensional geometric objects. Here, we investigate the importance of the choice of initial conditions for MDS, and show that SVD is the best choice to initiate MDS. Furthermore, we demonstrate that the use of the first principal components of SVD to initiate the MDS algorithm is more efficient than an iteration through all the principal components. Adding stochasticity to the molecular dynamics simulations typically used for MDS of large datasets, contrary to previous suggestions, likewise does not increase accuracy. Finally, we introduce a k nearest neighbor method to analyze the local structure of the geometric objects and use it to control the quality of the dimensionality reduction. RESULTS: We demonstrate here the, to our knowledge, most efficient and accurate initialization strategy for MDS algorithms, reducing considerably computational load. SVD-based initialization renders MDS methodology much more useful in the analysis of high-dimensional data such as functional genomics datasets.  相似文献   

11.
The thermal denaturation of synthetic deoxypolynucleotides of defined sequence was studied by a three dimensional melting technique in which complete UV absorbance spectra were recorded as a function of temperature. The results of such an experiment defined a surface bounded by absorbance, wavelength, and temperature. A matrix of the experimental data was built, and analyzed by the method of singular value decomposition (SVD). SVD provides a rigorous, model-free analytical tool for evaluating the number of significant spectral species required to account for the changes in UV absorbance accompany-ing the duplex – to – single strand transition. For all of the polynucleotides studied (Poly dA – Poly dT; [Poly (dAdT)]2; Poly dG – Poly dC; [Poly(dGdC)]2), SVD indicated the existence of at least 4 – 5 significant spectral species. The DNA melting transition for even these simple repeating sequences cannot, therefore, be a simple two-state process. The basis spectra obtained by SVD analysis were found to be unique for each polynucleotide studied. Differential scanning calorimetry was used to obtain model free estimates for the enthalpy of melting for the polynucleotides studied, with results in good agreement with previously published values. Received: 16 April 1997 / Accepted: 9 July 1997  相似文献   

12.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

13.
14.
15.

Background

Ligands of peroxisome-proliferator activated receptors (PPARs), such as non-esterified fatty acids (NEFAs), induce expression of angiopoietin-like protein 4 (ANGPTL4). Recently ANGPTL4 has been reported to be a mediator of intracellular adipose lipolysis induced by glucocorticoids.

Objective

To determine the concentrations of ANGPTL4 in cord serum of neonates born by spontaneous vaginal delivery (SVD) and by pre-labor cesarean section (CS) from healthy women, and to relate them to parameters of neonatal lipolytic activity at birth.

Measurements

In 54 neonates born by SVD and in 56 neonates born by CS, arterial cord blood was drawn to determine insulin, cortisol, triacylglycerols (TAGs), glycerol, non-esterified fatty acids (NEFAs), individual fatty acids, ANGPTL4, adiponectin, retinol binding protein 4 (RBP4) and leptin.

Results

Birth weight and neonatal fat mass in SVD and CS showed no difference, but the concentrations of glycerol, adiponectin, RBP4, NEFAs and most individual fatty acids were higher in cord serum of neonates born by SVD compared to CS, indicating a higher adipose tissue breakdown in the SVD group. The concentrations of TAG and cortisol were also higher and that of insulin was lower in cord serum of SVD compared to the CS group. However, the concentration in cord serum of ANGPTL4 did not differ between the two groups and no positive correlation with either NEFA or glycerol concentrations were detected.

Conclusion

ANGPTL4 is known to stimulate lipolysis in adults, but does not appear to mediate the increased activity in SVD, indicating the presence of different regulatory inputs.  相似文献   

16.
17.
18.
Choong MK  Yan H 《Bioinformation》2008,2(7):273-278
This paper presents a new method for exon detection in DNA sequences based on multi-scale parametric spectral analysis. A forward-backward linear prediction (FBLP) with the singular value decomposition (SVD) algorithm FBLP-SVD is applied to the double-base curves (DB-curves) of a DNA sequence using a variable moving window sizes to estimate the signal spectrum at multiple scales. Simulations are done on short human genes in the range of 11bp to 2032bp and the results show that our proposed method out-performs the classical Fourier transform method. The multi-scale approach is shown to be more effective than using a single scale with a fixed window size. In addition, our method is flexible as it requires no training data.  相似文献   

19.
Yau SS  Yu C  He R 《DNA and cell biology》2008,27(5):241-250
Graphical representation of gene sequences provides a simple way of viewing, sorting, and comparing various gene structures. Here we first report a two-dimensional graphical representation for protein sequences. With this method, we constructed the moment vectors for protein sequences, and mathematically proved that the correspondence between moment vectors and protein sequences is one-to-one. Therefore, each protein sequence can be represented as a point in a map, which we call protein map, and cluster analysis can be used for comparison between the points. Sixty-six proteins from five protein families were analyzed using this method. Our data showed that for proteins in the same family, their corresponding points in the map are close to each other. We also illustrate the efficiency of this approach by performing an extensive cluster analysis of the protein kinase C family. These results indicate that this protein map could be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence.  相似文献   

20.
We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号