期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

31.

What's in the mix: phylogenetic classification of metagenome sequence samples

McHardy AC Rigoutsos I 《Current opinion in microbiology》2007,10(5):499-503

Metagenomics is a novel field which deals with the sequencing and study of microbial organisms or viruses isolated directly from a particular environment. This has already provided a wealth of information and new insights for the inhabitants of various environmental niches. For a given sample, one would like to determine the phylogenetic provenance of the obtained fragments, the relative abundance of its different members, their metabolic capabilities, and the functional properties of the community as a whole. To this end, computational analyses are becoming increasingly indispensable tools. In this review, we focus on the problem of determining the phylogenetic identity of the sample fragments, a procedure known as 'binning'. This step is essential for the reconstruction of the metabolic capabilities of individual organisms or phylogenetic clades of a community, and the study of their interactions. 相似文献

32.

Dictionary-driven protein annotation

Rigoutsos I Huynh T Floratos A Parida L Platt D 《Nucleic acids research》2002,30(17):3901-3916

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/. 相似文献

33.

In silico pattern-based analysis of the human cytomegalovirus genome 总被引：4，自引：0，他引：4

下载免费PDF全文

Rigoutsos I Novotny J Huynh T Chin-Bow ST Parida L Platt D Coleman D Shenk T 《Journal of virology》2003,77(7):4326-4344

More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). 相似文献

34.

Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

Rigoutsos I Riek P Graham RM Novotny J 《Nucleic acids research》2003,31(15):4625-4631

相似文献

35.

Non-alpha-helical elements modulate polytopic membrane protein architecture

Riek RP Rigoutsos I Novotny J Graham RM 《Journal of molecular biology》2001,306(2):349-362

In "all alpha-fold" transmembrane proteins, including ion channels, G-protein-coupled receptors (GPCRs), bacterial rhodopsins and photosynthetic reaction centers, relatively long alpha-helices, straight, curved or kinked, pack into compact elliptical or circular domains. Using both existing and newly developed tools to analyze transmembrane segments of all available membrane protein three-dimensional structures, including that very recently elucidated for the GPCR, rhodopsin, we report here the finding of frequent non-alpha-helical components, i.e. 3(10)-helices ("tight turns"), pi-helices ("wide turns") and intrahelical kinks (often due to residues other than proline). Often, diverse helical types and kinks concatenate over long segments and produce complex inclinations of helical axis, and/or diverse frame shifts in the "canonical", alpha-helical side-chain pattern. Marked differences in transmembrane architecture exist even between seemingly structurally related proteins, such as bacteriorhodopsin and rhodopsin. Deconvolution of these non-canonical features into their composite elements is essential for understanding the pleiotropy of polytopic protein structure and function, and must be considered in developing valid macromolecular models. 相似文献

36.

In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome

Novotny J Rigoutsos I Coleman D Shenk T 《Journal of molecular biology》2001,310(5):1151-1166

The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-layer-alphabetaalpha, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 3beta1alpha fold with one or two disulfide bridges, present in otherwise unrelated proteins. 相似文献

37.

MiR-103a-3p targets the 5′ UTR of GPRC5A in pancreatic cells

Honglei Zhou Isidore Rigoutsos 《RNA (New York, N.Y.)》2014,20(9):1431-1439

相似文献

38.

A generic motif discovery algorithm for sequential data

Jensen KL Styczynski MP Rigoutsos I Stephanopoulos GN 《Bioinformatics (Oxford, England)》2006,22(1):21-28

MOTIVATION: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. RESULTS: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. AVAILABILITY: Gemoda is freely available at http://web.mit.edu/bamel/gemoda 相似文献

39.

The miR-17/92 cluster: a comprehensive update on its genomics,genetics, functions and increasingly important and numerous roles in health and disease

E Mogilyansky I Rigoutsos 《Cell death and differentiation》2013,20(12):1603-1614

相似文献

40.

Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2):229] 总被引：1，自引：0，他引：1

Rigoutsos I; Floratos A 《Bioinformatics (Oxford, England)》1998,14(1):55-67

MOTIVATION: The discovery of motifs in biological sequences is an important problem. RESULTS: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi- linear to the size of the generated output. 相似文献