期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Estimation of alternative splicing isoform frequencies from RNA-Seq data 总被引：1，自引：0，他引：1

Nicolae M Mangul S Măndoiu II Zelikovsky A 《Algorithms for molecular biology : AMB》2011,6(1):9

相似文献

2.

Design and validation of methods searching for risk factors in genotype case-control studies.

Dumitru Brinza Alexander Zelikovsky 《Journal of computational biology》2008,15(1):81-90

Accessibility of high-throughput genotyping technology allows genome-wide association studies for common complex diseases. This paper addresses two challenges commonly facing such studies: (i) searching an enormous amount of possible gene interactions and (ii) finding reproducible associations. These challenges have been traditionally addressed in statistics while here we apply computational approaches--optimization and cross-validation. A complex risk factor is modeled as a subset of single nucleotide polymorphisms (SNPs) with specified alleles and the optimization formulation asks for the one with the maximum odds ratio. To measure and compare ability of search methods to find reproducible risk factors, we propose to apply a cross-validation scheme usually used for prediction validation. We have applied and cross-validated known search methods with proposed enhancements on real case-control studies for several diseases (Crohn's disease, autoimmune disorder, tick-borne encephalitis, lung cancer, and rheumatoid arthritis). Proposed methods are compared favorably to the exhaustive search: they are faster, find more frequently statistically significant risk factors, and have significantly higher leave-half-out cross-validation rate. 相似文献

3.

2SNP: scalable phasing method for trios and unrelated individuals

Brinza D Zelikovsky A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(2):313-318

Emerging microarray technologies allow affordable typing of very long genome sequences. A key challenge in analyzing of such huge amount of data is scalable and accurate computational inferring of haplotypes (i.e., splitting of each genotype into a pair of corresponding haplotypes). In this paper, we first phase genotypes consisting only of two SNPs using genotypes frequencies adjusted to the random mating model and then extend phasing of two-SNP genotypes to phasing of complete genotypes using maximum spanning trees. Runtime of the proposed 2SNP algorithm is O(nm (n + log m), where n and m are the numbers of genotypes and SNPs, respectively, and it can handle genotypes spanning entire chromosomes in a matter of hours.On datasets across 23 chromosomal regions from HapMap[11], 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example the 2SNP software phases entire chromosome (10(5) SNPs from HapMap) for 30 individuals in 2 hours with average switching error 7.7%.We have also enhanced 2SNP algorithm to phase family trio data and compared it with four other well-known phasing methods on simulated data from [15]. 2SNP is much faster than all of them while loosing in quality only to PHASE. 2SNP software is publicly available at http://alla.cs.gsu.edu/~software/2SNP. 相似文献

4.

GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences

Yu Ning Guo Xuan Zelikovsky Alexander Pan Yi 《BMC genomics》2017,18(4):392-9

Background

As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG _obs/CpG _exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden.

Results

A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection.

Conclusions

Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

相似文献

5.

Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction

Sergey Knyazev Viachaslau Tsyvina Anupama Shankar Andrew Melnyk Alexander Artyomenko Tatiana Malygina Yuri B Porozov Ellsworth M Campbell William M Switzer Pavel Skums Serghei Mangul Alex Zelikovsky 《Nucleic acids research》2021,49(17):e102

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms. 相似文献

6.

2SNP: scalable phasing based on 2-SNP haplotypes

Brinza D Zelikovsky A 《Bioinformatics (Oxford, England)》2006,22(3):371-373

2SNP software package implements a new very fast scalable algorithm for haplotype inference based on genotype statistics collected only for pairs of SNPs. This software can be used for comparatively accurate phasing of large number of long genome sequences, e.g. obtained from DNA arrays. As an input 2SNP takes genotype matrix and outputs the corresponding haplotype matrix. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2 Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week and admitting no less errors than 2SNP. 相似文献

7.

Guest editors' introduction to the special section on bioinformatics research and applications

Borodovsky M Przytycka TM Rajasekaran S Zelikovsky A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):865-866

相似文献

8.

Inference of genetic relatedness between viral quasispecies from sequencing data

Olga Glebova Sergey Knyazev Andrew Melnyk Alexander Artyomenko Yury Khudyakov Alex Zelikovsky Pavel Skums 《BMC genomics》2017,18(10):918

Background

RNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations.

Results

We present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters’ structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks.

Conclusions

All algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources.

相似文献

9.

A novel method for signal transduction network inference from indirect experimental evidence. 总被引：1，自引：0，他引：1

Réka Albert Bhaskar DasGupta Riccardo Dondi Sema Kachalo Eduardo Sontag Alexander Zelikovsky Kelly Westbrooks 《Journal of computational biology》2007,14(7):927-949

相似文献

10.

Guest editors’ introduction to the special section on bioinformatics research and applications

J Chen A Zelikovsky 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(4):1002-1003

相似文献