首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods

We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of \(k \le 3\) and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice.

Results

The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR.

Conclusions

We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.
  相似文献   

2.

Background

Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values.

Results

We investigate the locus tree inference problem as a possible alternative that combines the advantages of both approaches. We present several algorithms to solve the problem in the parsimony framework. We introduce a novel tree mapping, which allows us to obtain a heuristic solution to the problems of locus tree inference and duplication classification.

Conclusions

Our approach allows for faster comparisons of gene and species trees and improves known algorithms for duplication inference in the presence of polytomies in the species trees. We have implemented our algorithms in a software tool available at https://github.com/mciach/LocusTreeInference.
  相似文献   

3.

Background

The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.

Results

We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.

Conclusions

In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.
  相似文献   

4.

Background

Most genes in Arabidopsis thaliana are members of gene families. How do the members of gene families arise, and how are gene family copy numbers maintained? Some gene families may evolve primarily through tandem duplication and high rates of birth and death in clusters, and others through infrequent polyploidy or large-scale segmental duplications and subsequent losses.

Results

Our approach to understanding the mechanisms of gene family evolution was to construct phylogenies for 50 large gene families in Arabidopsis thaliana, identify large internal segmental duplications in Arabidopsis, map gene duplications onto the segmental duplications, and use this information to identify which nodes in each phylogeny arose due to segmental or tandem duplication. Examples of six gene families exemplifying characteristic modes are described. Distributions of gene family sizes and patterns of duplication by genomic distance are also described in order to characterize patterns of local duplication and copy number for large gene families. Both gene family size and duplication by distance closely follow power-law distributions.

Conclusions

Combining information about genomic segmental duplications, gene family phylogenies, and gene positions provides a method to evaluate contributions of tandem duplication and segmental genome duplication in the generation and maintenance of gene families. These differences appear to correspond meaningfully to differences in functional roles of the members of the gene families.
  相似文献   

5.

Background

Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

Results

We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

Conclusions

PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.
  相似文献   

6.

Background

The mutual exclusivity of somatic genome alterations (SGAs), such as somatic mutations and copy number alterations, is an important observation of tumors and is widely used to search for cancer signaling pathways or SGAs related to tumor development. However, one problem with current methods that use mutual exclusivity is that they are not signal-based; another problem is that they use heuristic algorithms to handle the NP-hard problems, which cannot guarantee to find the optimal solutions of their models.

Method

In this study, we propose a novel signal-based method that utilizes the intrinsic relationship between SGAs on signaling pathways and expression changes of downstream genes regulated by pathways to identify cancer signaling pathways using the mutually exclusive property. We also present a relatively efficient exact algorithm that can guarantee to obtain the optimal solution of the new computational model.

Results

We have applied our new model and exact algorithm to the breast cancer data. The results reveal that our new approach increases the capability of finding better solutions in the application of cancer research. Our new exact algorithm has a time complexity of \(O^{*}(1.325^{m})\)(Note: Following the recent convention, we use a star * to represent that the polynomial part of the time complexity is neglected), which has solved the NP-hard problem of our model efficiently.

Conclusion

Our new method and algorithm can discover the true causes behind the phenotypes, such as what SGA events lead to abnormality of the cell cycle or make the cell metastasis lose control in tumors; thus, it identifies the target candidates for precision (or target) therapeutics.
  相似文献   

7.

Introduction

Data sharing is being increasingly required by journals and has been heralded as a solution to the ‘replication crisis’.

Objectives

(i) Review data sharing policies of journals publishing the most metabolomics papers associated with open data and (ii) compare these journals’ policies to those that publish the most metabolomics papers.

Methods

A PubMed search was used to identify metabolomics papers. Metabolomics data repositories were manually searched for linked publications.

Results

Journals that support data sharing are not necessarily those with the most papers associated to open metabolomics data.

Conclusion

Further efforts are required to improve data sharing in metabolomics.
  相似文献   

8.

Background

Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.

Results

A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and Pinus, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.

Conclusion

Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.
  相似文献   

9.

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.
  相似文献   

10.

Background

The current literature establishes the importance of gene functional category and expression in promoting or suppressing duplicate gene loss after whole genome doubling in plants, a process known as fractionation. Inspired by studies that have reported gene expression to be the dominating factor in preventing duplicate gene loss, we analyzed the relative effect of functional category and expression.

Methods

We use multivariate methods to study data sets on gene retention, function and expression in rosids and asterids to estimate effects and assess their interaction.

Results

Our results suggest that the effect on duplicate gene retention fractionation by functional category and expression are independent and have no statistical interaction.

Conclusion

In plants, functional category is the more dominant factor in explaining duplicate gene loss.
  相似文献   

11.

Introduction

Swine dysentery caused by Brachyspira hyodysenteriae is a production limiting disease in pig farming. Currently antimicrobial therapy is the only treatment and control method available.

Objective

The aim of this study was to characterize the metabolic response of porcine colon explants to infection by B. hyodysenteriae.

Methods

Porcine colon explants exposed to B. hyodysenteriae were analyzed for histopathological, metabolic and pro-inflammatory gene expression changes.

Results

Significant epithelial necrosis, increased levels of l-citrulline and IL-1α were observed on explants infected with B. hyodysenteriae.

Conclusions

The spirochete induces necrosis in vitro likely through an inflammatory process mediated by IL-1α and NO.
  相似文献   

12.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

13.

Background

Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome’s copy-number profile. Understanding how such profiles evolve in cancer can assist in both diagnosis and prognosis.

Results

We model the evolution of a tumor by segmental deletions and amplifications, and gauge distance from profile \(\mathbf {a}\) to \(\mathbf {b}\) by the minimum number of events needed to transform \(\mathbf {a}\) into \(\mathbf {b}\). Given two profiles, our first problem aims to find a parental profile that minimizes the sum of distances to its children. Given k profiles, the second, more general problem, seeks a phylogenetic tree, whose k leaves are labeled by the k given profiles and whose internal vertices are labeled by ancestral profiles such that the sum of edge distances is minimum.

Conclusions

For the former problem we give a pseudo-polynomial dynamic programming algorithm that is linear in the profile length, and an integer linear program formulation. For the latter problem we show it is NP-hard and give an integer linear program formulation that scales to practical problem instance sizes. We assess the efficiency and quality of our algorithms on simulated instances.
  相似文献   

14.

Background

Restriction site analysis involves determining the locations of restriction sites after the process of digestion by reconstructing their positions based on the lengths of the cut DNA. Using different reaction times with a single enzyme to cut DNA is a technique known as a partial digestion. Determining the exact locations of restriction sites following a partial digestion is challenging due to the computational time required even with the best known practical algorithm.

Results

In this paper, we introduce an efficient algorithm to find the exact solution for the partial digest problem. The algorithm is able to find all possible solutions for the input and works by traversing the solution tree with a breadth-first search in two stages and deleting all repeated subproblems. Two types of simulated data, random and Zhang, are used to measure the efficiency of the algorithm. We also apply the algorithm to real data for the Luciferase gene and the E. coli K12 genome.

Conclusion

Our algorithm is a fast tool to find the exact solution for the partial digest problem. The percentage of improvement is more than 75% over the best known practical algorithm for the worst case. For large numbers of inputs, our algorithm is able to solve the problem in a suitable time, while the best known practical algorithm is unable.
  相似文献   

15.
16.

Background

Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated or repetitive sequences.

Results

We propose optimization problems with the objective of maximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join scenario among all minimum length scenarios. In the process we solve an optimization problem on colored noncrossing partitions, which is a generalization of the Maximum Independent Set problem on circle graphs.

Conclusions

We introduce a model for weighting genome rearrangements and show that under simple yet reasonable conditions, a fundamental distance can be computed in polynomial time. This is achieved by solving a generalization of the Maximum Independent Set problem on circle graphs. Several variants of the problem are also mentioned.
  相似文献   

17.

Background

Gastric duplication cysts are uncommon congenital anomalies found primarily in children and rarely seen in the adult population. Accurate diagnosis of cysts before resection is difficult even using the most advanced imaging techniques.

Case presentation

In this report, we describe a 28-year-old Moroccan patient with a history of autoimmune hemolytic anemia who presented with an asymptomatic abdominal cystic mass detected during abdominal computed tomography performed before splenectomy. Magnetic resonance imaging performed for accurate characterization showed a high-signal-intensity cystic mass on T2-weighted images, located between the patient’s stomach and spleen. The patient underwent a complete cyst resection during exploratory laparotomy. The histological examination showed a cyst lined by three different epithelia with bundles of smooth muscle, which suggested a gastric duplication cyst.

Conclusions

We report a case of gastric cyst duplication in an adult with autoimmune hemolytic anemia, and we discuss this rare association, radiological findings, and the unique histological findings of this case.
  相似文献   

18.

Background

The protein encoded by the gene ybgI was chosen as a target for a structural genomics project emphasizing the relation of protein structure to function.

Results

The structure of the ybgI protein is a toroid composed of six polypeptide chains forming a trimer of dimers. Each polypeptide chain binds two metal ions on the inside of the toroid.

Conclusion

The toroidal structure is comparable to that of some proteins that are involved in DNA metabolism. The di-nuclear metal site could imply that the specific function of this protein is as a hydrolase-oxidase enzyme.
  相似文献   

19.

Background

Dentatorubropallidoluysian atrophy (DRPLA) is a rare autosomal dominant neurodegenerative disease that is associated with numerous movement disorders. Ocular problems also occur with DRPLA with reports of corneal endothelial degeneration in some patients living with the disease. We report a new visual problem associated with DRPLA, optic atrophy.

Case presentation

A 47 year-old man presented complaining of progressive visual loss associated with optic atrophy on ophthalmological evaluation. He gradually developed a progressive ataxia with dystonia. Brain MRI revealed a diffuse leukoencephalopathy. Genetic analysis revealed 62 CAG repeats in one allele of the DRPLA gene and he was diagnosed with DRPLA.

Conclusion

Optic atrophy should be included in the clinical spectrum of DRPLA.
  相似文献   

20.

Background

In recent years the visualization of biomagnetic measurement data by so-called pseudo current density maps or Hosaka-Cohen (HC) transformations became popular.

Methods

The physical basis of these intuitive maps is clarified by means of analytically solvable problems.

Results

Examples in magnetocardiography, magnetoencephalography and magnetoneurography demonstrate the usefulness of this method.

Conclusion

Hardware realizations of the HC-transformation and some similar transformations are discussed which could advantageously support cross-platform comparability of biomagnetic measurements.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号