首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
SUMMARY: MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. AVAILABILITY: MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. CONTACT: gorodkin@cbs.dtu.dk  相似文献   

2.
Peruani F  Tabourier L 《PloS one》2011,6(12):e28860
Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study causality trees in mobile-phone data, which can be represented as a dynamical directed network. This representation of the data reveals the existence of super-spreaders and super-receivers. We show that the tree statistics, respectively the information spreading process, are extremely sensitive to the in-out degree correlation exhibited by the users. We also learn that a given information, e.g., a rumor, would require users to retransmit it for more than 30 hours in order to cover a macroscopic fraction of the system. Our analysis indicates that topological node-node correlations of the underlying social network, while allowing the existence of information loops, they also promote information spreading. Temporal correlations, and therefore causality effects, are only visible as local phenomena and during short time scales. Consequently, the very idea that there is (intentional) information spreading beyond a small vecinity is called into question. These results are obtained through a combination of theory and data analysis techniques.  相似文献   

3.
Database on the structure of large ribosomal subunit RNA.   总被引:3,自引:0,他引:3       下载免费PDF全文
Our database on large ribosomal subunit RNA contained 334 sequences in July, 1995. All sequences in the database are aligned, taking into account secondary structure. The aligned sequences are provided, together with incorporated secondary structure information, in several computer-readable formats. These data can easily be obtained through the World Wide Web. The files in the database are also available via anonymous ftp.  相似文献   

4.
HvrBase is a compilation of human and ape mtDNA control region sequences. Sequences and related information on individuals, such as from where the sequences were obtained, is stored in three ASCII files as described previously. Moreover, the collection is also available as Mac/PC database application with a graphical user interface. It can be accessed through the WWW at URL http://www.eva.mpg.de/hvrbase. The current collection comprises 5846 human sequences from hypervariable region I (HVRI) and 2302 human sequences from hypervariable region II (HVRII). From apes, 295 HVRI sequences and 13 HVRII sequences are available.  相似文献   

5.
The latest release of the large ribosomal subunit RNA database contains 429 sequences. All these sequences are aligned, and incorporate secondary structure information. The rRNA WWW Server at URL http://rrna.uia.ac.be/ provides researchers with an easily accessible resource to obtain the data in this database in a number of computer-readable formats. A new query interface has been added to the server. If necessary, the data can also be obtained by anonymous ftp from the same site.  相似文献   

6.
DNA序列信息的一种新的测度   总被引:4,自引:3,他引:1  
根据信息理论给出了测度DNA序列信息的一种新的方法,获得DNA序列4个层次的信息量测度:Ib,If(1),If(2)andIf(3),这4种信息测度可分别用来测度DNA的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从M.edulis的线粒体基因组中两个较短的编码蛋白质的DNA序列和使用具有不同倍性的间并密码子组组成的模拟DNA序列中所获得计算结果表明,这些信息测度确实能用来揭示所  相似文献   

7.
The large-scale collection of partial cDNA sequences is becoming a powerful tool in biology. Similarity or motif searches in DNA databases using these partial cDNA sequences have facilitated the discovery of new genes of interest. By collecting and registering large numbers of partial sequences with a well designed non-biased cDNA library, an expression profile of active genes in a particular tissue can be obtained. Tissue-specific or stage-specific genes can be discovered by comparing the profiles from different tissues or from a tissue at different stages of development, respectively. The compilation of such expression profiles enables genes to be mapped to the tissue(s) where they are actively transcribed. The large-scale collation of gene sequences actively expressed in the body into databases complements efforts directed towards the structural analysis of the genome, with the ultimate aim of decoding all the genetic information carried in the human genome. This cDNA strategy is also being widely applied to organisms other than man.  相似文献   

8.
The European large subunit ribosomal RNA database   总被引:5,自引:1,他引:4  
The European Large Subunit (LSU) Ribosomal RNA (rRNA) database is accessible via the rRNA WWW Server at URL http://rrna.uia.ac.be/lsu/. It is a curated database that compiles complete or nearly complete LSU rRNA sequences in aligned form, and also incorporates secondary structure information for each sequence. Taxonomic information, literature references and other information about the sequences are also available, and can be searched via the WWW interface.  相似文献   

9.
遗传密码和DNA序列的高维空间数字编码   总被引:13,自引:7,他引:6  
二进制数字化编码是信息科学最基本的编码方式。用0(00)、1(01)、2(10)和3(11)4个数码对4种碱基(C、T、A、G)进行二进制数字编码,共有24种可能的编码组合,其中8种满足碱基到补法则,它们是拓扑等价的。按碱基分子量大小排列的编码格式:0123/CTAG是最理想的编码格式。用二进制数对DNA的字符序列进行编码,有以下优点:1)压缩信息冗余度,提高编码效率;2)可以对碱基的结构、功能基  相似文献   

10.
11.
Ferretti L  Raineri E  Ramos-Onsins S 《Genetics》2012,191(4):1397-1401
Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θ(W), Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.  相似文献   

12.
Comparative analysis of human and bovine papillomaviruses   总被引:4,自引:0,他引:4  
A method is presented for the analysis and comparison of nucleic acid and protein sequences utilizing all identity blocks (the term "identity block" refers to a set of consecutive matches between two sequences) above a prescribed length. Moreover, such identity blocks are determined for various groupings of amino acids according to chemical, functional, charge, and hydrophobic classifications. Alignment maps based on these classifications and containing all statistically significant identity blocks between two or more sequences are constructed. New theoretical results for determining the expected length of the longest identity block between sequences are also presented and are used, along with permutation procedures, to ascertain the significance of sequence identity blocks. As an example of the type of information that can be obtained, comparison has been made of the complete DNA sequences and the E1, E2, L1, and L2 genes of human and bovine papillomaviruses based on the classification schemes described above.  相似文献   

13.
R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).  相似文献   

14.
A large number of transmembrane proteins form aqueous pores or channels in the phospholipid bilayer, but the structural bases of pore formation and assembly have been determined experimentally for only a few of the proteins and protein complexes. The polypeptide segments that form the transmembrane pore and the secondary structure that creates the aqueous-lipid interface can be identified using multiple independent fluorescence techniques (MIFT). The information obtained from several different, but complementary, fluorescence analyses, including measurements of emission intensity, fluorescence lifetime, accessibility to aqueous and to lipophilic quenching agents, and fluorescence resonance energy transfer (FRET) can be combined to characterize the nature of the protein-membrane interaction directly and unambiguously. The assembly pathway can also be determined by measuring the kinetics of the spectral changes that occur upon pore formation. The MIFT approach therefore allows one to obtain structural information that cannot be obtained easily using alternative techniques such as crystallography. This review briefly outlines how MIFT can reveal the identity, location, conformation, and topography of the polypeptide sequences that interact with the membrane.  相似文献   

15.
Database on the structure of large subunit ribosomal RNA.   总被引:7,自引:0,他引:7       下载免费PDF全文
The Antwerp database on large subunit ribosomal RNA now contains 607 complete or nearly complete aligned sequences. The alignment incorporates secondary structure information for each sequence. Other information about the sequences, such as literature references, accession numbers and taxonomic information is also available. Information from the database can be downloaded or searched on the rRNA WWW Server at URL http://rrna.uia.ac.be/  相似文献   

16.
Summary The concepts of a super information source and ensemble averaging are used to estimate the amount of information stored in protein andt-RNA sequences. Specifically applied to cytochrome c and hemoglobins, information measures analogous to those found to be highly significant for DNA pair frequency data (D 2 vs.R) by Gatlin (1968) prove to be extremely highly correlated with organism complexity. Super source stability and the possible taxonomic utility of the extraordinary clusterings obtained are discussed. A restriction on the construction of ancestral sequences and a possible handle on homology are also detailed.  相似文献   

17.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

18.
Database on the structure of large ribosomal subunit RNA.   总被引:5,自引:0,他引:5       下载免费PDF全文
A database on large ribosomal subunit RNA is made available. It contains 258 sequences. It provides sequence, alignment and secondary structure information in computer-readable formats. Files can be obtained using ftp.  相似文献   

19.
以7种古菌、46种细菌和10种真核生物的基因组为样本,考虑碱基间的短程关联和长程关联作用,得到编码序列的密码对和基因间序列的三联体对中不同位点的二核苷酸频率,据此构建了基于编码序列和基因间序列的系统发生关系。无论是基于编码序列还是基因间序列对信息进行聚类,古菌或真核均被聚在一支上,表明聚类参数的选择是合适的;与基于氨基酸序列构建的系统发生关系进行两两比较,发现大部分硬壁菌的编码序列与基因间序列之间,以及编码序列与氨基酸序列之间的进化都存在较大差异。通过分析认为,只有综合考虑这三类序列的进化信息,才可能得到更自然的系统发生关系。  相似文献   

20.
The recent introduction of massively parallel pyrosequencers allows rapid, inexpensive analysis of microbial community composition using 16S ribosomal RNA (rRNA) sequences. However, a major challenge is to design a workflow so that taxonomic information can be accurately and rapidly assigned to each read, so that the composition of each community can be linked back to likely ecological roles played by members of each species, genus, family or phylum. Here, we use three large 16S rRNA datasets to test whether taxonomic information based on the full-length sequences can be recaptured by short reads that simulate the pyrosequencer outputs. We find that different taxonomic assignment methods vary radically in their ability to recapture the taxonomic information in full-length 16S rRNA sequences: most methods are sensitive to the region of the 16S rRNA gene that is targeted for sequencing, but many combinations of methods and rRNA regions produce consistent and accurate results. To process large datasets of partial 16S rRNA sequences obtained from surveys of various microbial communities, including those from human body habitats, we recommend the use of Greengenes or RDP classifier with fragments of at least 250 bases, starting from one of the primers R357, R534, R798, F343 or F517.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号