共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
To classify proteins into functional families based on their primary sequences, popular algorithms such as the k-NN-, HMM-, and SVM-based algorithms are often used. For many of these algorithms to perform their tasks, protein sequences need to be properly aligned first. Since the alignment process can be error-prone, protein classification may not be performed very accurately. To improve classification accuracy, we propose an algorithm, called the Unaligned Protein SEquence Classifier (UPSEC), which can perform its tasks without sequence alignment. UPSEC makes use of a probabilistic measure to identify residues that are useful for classification in both positive and negative training samples, and can handle multi-class classification with a single classifier and a single pass through the training data. UPSEC has been tested with real protein data sets. Experimental results show that UPSEC can effectively classify unaligned protein sequences into their corresponding functional families, and the patterns it discovers during the training process can be biologically meaningful. 相似文献
4.
5.
Case-control studies are commonly used to study whether a candidate allele and a disease are associated. However, spurious association can arise due to population substructure or cryptic relatedness, which cause the variance of the trend test to increase. Devlin and Roeder derived the appropriate variance inflation factor (VIF) for the trend test and proposed a novel genomic control (GC) approach to estimate VIF and adjust the test statistic. Their results were derived assuming an additive genetic model and the corresponding VIF is independent of the candidate allele frequency. We determine the appropriate VIFs for recessive and dominant models. Unlike the additive test, the VIFs for the optimal tests for these two models depend on the candidate allele frequency. Simulation results show that, when the null loci used to estimate the VIF have allele frequencies similar to that of the candidate gene, the GC tests derived for recessive and dominant models remain optimal. When the underlying genetic model is unknown or the null loci and candidate gene have quite different allele frequencies, the GC tests derived for the recessive or dominant models cannot be used while the GC test derived for the additive model can be. 相似文献
6.
This paper provides a review of recent applications of quantile regression to the fields of genetic and the emerging -omic studies. It begins with a general background about this statistical approach following the seminal paper of Koenker and Bassett (Econometrica 46:33–50, 1978). Applications are described, as diverse as genetic association studies, penetrance estimation, gene expression, CGH array experiments, RNAseq experiments, methylation data and proteomics. This paper also introduces recent extensions of quantile regression with a particular focus on the Copula-quantile regression, an approach we recently proposed for sib-pair analysis. A real data example from eQTL analysis is then presented and the \(R\) codes, which run the analyses are provided. Finally, we conclude with some statistical software presentation and some general statements about the potential and interests of quantile regression in modern biological experiments. 相似文献
7.
Latent amino acid repeats seem to be widespread in genetic sequences and to reflect their structure, function, and evolution. We have recently identified latent periodicity in more than 150 protein families including protein kinases and various nucleotide-binding proteins. The latent repeats in these families were correlated to their structure and evolution. However, a majority of known protein families were not identified with our latent periodicity search algorithm. The main presumable reason for this was the inability of our techniques to identify periodicities interspersed with insertions and deletions. We designed the new latent periodicity search algorithm, which is capable of taking into account insertions and deletions. As a result, we identified many novel cases of latent periodicity peculiar to protein families. Possible origins of the periodic structure of these families are discussed. Summarizing, we presume that latent periodicity is present in a substantial portion of known protein families. The latent periodicity matrices and the results of Swiss-Prot scans are available from http://bioinf.narod.ru/del/. 相似文献
8.
9.
A genetic analysis of various functions of the TyrR protein of Escherichia coli. 总被引:3,自引:9,他引:3
下载免费PDF全文

The TyrR protein is involved in both repression and activation of the genes of the TyrR regulon. Correction of an error in a previously published sequence has revealed a Cro-like helix-turn-helix DNA-binding domain near the carboxyl terminus. Site-directed mutagenesis in this region has generated a number of mutants that can no longer repress or activate. Deletions of amino acid residues 5 to 42 produced a protein that could repress but not activate. The central domain of TyrR contains an ATP-binding site and is homologous with the NtrC family of activator proteins. A mutation to site A of the ATP-binding site and other mutations in this region affect tyrosine-mediated repression but do not prevent activation or phenylalanine-mediated repression of aroG. 相似文献
10.
Various models proposed for heparin have been examined by a stereochemical approach involving contact distance criteria and potential energy calculations. The present study suggests that the model favored by Atkins and coworkers [Biochem. J. (1973) 135 , 729–733 and (1974) 143 , 251–252] is not stereochemically satisfactory. An alternative model has been proposed with N-acetyl-D -glucosamine and one of the uronides in the 4C1 conformation and the other uronide (probably sulfated) in the 1C4 conformation. The observed variations in the tetrasaccharide periodicities (16.5–17.3 Å) in different crystalline modifications of heparin have been attributed to possible changes in the rotational angles about the interunit glycosidic bonds rather than a change in the pyranose ring conformation. The proposed model is also independent of the observed variation in the relative composition of uronic acid residues in heparin. These conclusions are in disagreement with those of earlier workers. 相似文献
11.
12.
Modern protein Fourier transform infrared (FT-IR) spectroscopy has proven to be a versatile and sensitive technique, applicable to many aspects of protein characterization. The major practical drawback for the FT-IR spectroscopy of proteins is the large absorbance band of water, which overlaps the amide I resonances. D2O is often substituted for H2O in infrared experiments. Removal of water from protein samples can be complicated and tedious and potentially lead to denaturation, aggregation, or sample loss. Solvent removal by dialysis is difficult for suspensions and sols. A new method called the D2O dilution technique (Ddt) is described which simplifies the sample preparation step and improves the solvent subtraction. The effect of the D2O concentration on the IR spectrum of aqueous solutions of several model proteins was studied. Dilution of aqueous samples with D2O yields good quality spectra. The Ddt has been evaluated for quantitative analysis using standard proteins and its applicability to solutions and suspensions of a genetically engineered malaria antigen is demonstrated. Use of resolution-enhancement techniques with spectra in mixed solvents has also been investigated. 相似文献
13.
DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies. 相似文献
14.
We have created a program that searches densely genotyped regions for associated non-contiguous haplotypes using a standard family based haplotype association test. This program was designed to expand upon the 'sliding window' methodologies commonly used for haplotype construction by allowing the association of subsets of single nucleotide polymorphisms (SNPs) to drive the construction of the haplotype. This strategy permits HaploBuild to construct more biologically relevant haplotypes that are not constrained by arbitrary length and contiguous orientation. Availability: http://snp.bumc.bu.edu. 相似文献
15.
Marushchak D Grenklo S Johansson T Karlsson R Johansson LB 《Biophysical journal》2007,93(9):3291-3299
A new method, in which a genetic algorithm was combined with Brownian dynamics and Monte Carlo simulations, was developed to analyze fluorescence depolarization data collected by the time-correlated single photon-counting technique. It was applied to studies of BODIPY-labeled filamentous actin (F-actin). The technique registered the local order and reorienting motions of the fluorophores, which were covalently coupled to cysteine 374 (C374) in actin and interacted by electronic energy migration within the actin polymers. Analyses of F-actin samples composed of different fractions of labeled actin molecules revealed the known helical organization of F-actin, demonstrating the usefulness of this technique for structure determination of complex protein polymers. The distance from the filament axis to the fluorophore was found to be considerably less than expected from the proposed position of C374 at a high filament radius. In addition, polymerization experiments with BODIPY-actin suggest a 25-fold more efficient signal for filament formation than pyrene-actin. 相似文献
16.
Maximum likelihood estimation of genetic parameters of HLA-linked diseases using data from families of various sizes 总被引:2,自引:2,他引:2
This paper is concerned with estimating parameters associated with HLA-linked diseases. We consider a single disease locus closely linked to HLA, allowing a disease and a normal allele. The parameters to be estimated are the penetrances of the genotypes at the disease locus, the population frequency of the disease allele, and the distance of the disease locus from HLA. The presently used method of estimation uses HLA-sharing information from affected sib-pairs. The method proposed here generalizes the previous approach, using data from all sibs (affected or unaffected) in a family of any size. It allows immediate generalizations to the use of information on parental affectedness status and population prevalence. 相似文献
17.
Population genetic studies of serum protein polymorphisms in four Spanish populations. II 总被引:2,自引:0,他引:2
H W Goedde L Hirth H G Benkmann A Pellicer T Pellicer M Stahn S Singh 《Human heredity》1973,23(2):135-146
18.
19.
Melting Temperature shift(Tm-shift)是一种新的基因分型方法,主要通过在两条特异性引物5′端加入不同长度的GC序列,PCR扩增后根据熔解曲线中产物Tm值的差异来完成分型。文章建立了Tm-shift法对2 048份样品的29个SNP进行分型,通过分型成功率、重复检测一致率、测序验证准确度综合评价分型效果。结果显示,29个SNP中有27个可以采用本方法分型,分型成功率为93.1%。测序验证准确性达到100%。3种基因型阳性标准对照重复检测一致率为100%;100个随机样品重复检测,重复性为97%。因此,Tm-shift基因分型法是一种成本低廉、准确灵敏、稳定可靠、通量灵活、操作简便的基因分型方法,可在遗传学研究中推广应用。 相似文献