首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.  相似文献   

2.
MOTIVATION: Some genomic islands contain horizontally transferred genes, which play critical roles in altering the genotypes and phenotypes of organisms, and horizontal gene transfer has been recognized as a universal event throughout bacterial evolution. A windowless method to display the distribution of genomic GC content, the cumulative GC profile, is proposed to identify genomic islands in genomes whose complete genome sequences are available. Two new indices are proposed to assess the codon usage bias and amino acid usage bias in genomic islands. RESULTS: A 211 kb genomic island (CGGI-1) has been identified in the genome of Corynebacterium glutamicum, and three genomic islands VVGI-1, VVGI-2 and VVGI-3, with lengths 167, 40 and 33 kb, respectively, have been identified in the genome of Vibrio vulnificus CMCP6 chromosome I. The CGGI-1 is flanked by two approximately 500 bp direct repeats, and utilizes a Val-tRNA as the integration site. For the VVGI-1 and VVGI-2, each has an integrase gene at 5' junction. All the identified genomic islands show unusual GC content, codon usage and amino acid usage, compared with the rest of the genomes. In addition, it is found that genomic islands are fairly homogenous in terms of GC content variation. An index, h, to quantify the homogeneity of GC content for genomic islands is proposed, and it is shown that h is less than 0.1 for all the genomic islands analyzed. The cumulative GC profile, as well as various indices to assess the codon usage bias, amino acid usage bias and homogeneity of the genomic islands, will be useful in the analysis of other genomes. AVAILABILITY: Programs used in this work and numerical results are available upon request.  相似文献   

3.
A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters.  相似文献   

4.
Correlations between genomic GC contents and amino acid frequencies were studied in the homologous sequences of 12 eubacterial genomes. Results show that amino acids encoded by GC-rich codons increases significantly with genomic GC contents, whereas opposite trend was observed in case of amino acids encoded by GC-poor codons. Further studies show all the amino acids do not change in the predicted direction according to their genomic GC pressure, suggesting that protein evolution is not entirely dictated by their nucleotide frequencies. Amino acid substitution matrix calculated among hydrophobic, amphipathic and hydrophilic amino acid groups' shows that amphipathic and hydrophilic amino acids are more frequently substituted by hydrophobic amino acids than from hydrophobic to hydrophilic or amphipathic amino acids. This indicates that nucleotide bias induces a directional changes in proteome composition in such a way that underwent strong changes in hydropathy values. In fact, significant increases in hydrophobicity values have also been observed with the increase of genomic GC contents. Correlations between GC contents and amino acid compositions in three different predicted protein secondary structures show that hydropathy values increases significantly with GC contents in aperiodic and helix structures whereas strand structure remains insensitive with the genomic GC levels. The relative importance of mutation and selection on the evolution of proteins have been discussed on the basis of these results.  相似文献   

5.
Bielawski JP  Dunn KA  Yang Z 《Genetics》2000,156(3):1299-1308
Rates and patterns of synonymous and nonsynonymous substitutions have important implications for the origin and maintenance of mammalian isochores and the effectiveness of selection at synonymous sites. Previous studies of mammalian nuclear genes largely employed approximate methods to estimate rates of nonsynonymous and synonymous substitutions. Because these methods did not account for major features of DNA sequence evolution such as transition/transversion rate bias and unequal codon usage, they might not have produced reliable results. To evaluate the impact of the estimation method, we analyzed a sample of 82 nuclear genes from the mammalian orders Artiodactyla, Primates, and Rodentia using both approximate and maximum-likelihood methods. Maximum-likelihood analysis indicated that synonymous substitution rates were positively correlated with GC content at the third codon positions, but independent of nonsynonymous substitution rates. Approximate methods, however, indicated that synonymous substitution rates were independent of GC content at the third codon positions, but were positively correlated with nonsynonymous rates. Failure to properly account for transition/transversion rate bias and unequal codon usage appears to have caused substantial biases in approximate estimates of substitution rates.  相似文献   

6.
7.
Point estimation in group sequential and adaptive trials is an important issue in analysing a clinical trial. Most literature in this area is only concerned with estimation after completion of a trial. Since adaptive designs allow reassessment of sample size during the trial, reliable point estimation of the true effect when continuing the trial is additionally needed. We present a bias adjusted estimator which allows a more exact sample size determination based on the conditional power principle than the naive sample mean does.  相似文献   

8.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

9.
Comparative genomics has revealed that variations in bacterial and archaeal genome DNA sequences cannot be explained by only neutral mutations. Virus resistance and plasmid distribution systems have resulted in changes in bacterial and archaeal genome sequences during evolution. The restriction-modification system, a virus resistance system, leads to avoidance of palindromic DNA sequences in genomes. Clustered, regularly interspaced, short palindromic repeats (CRISPRs) found in genomes represent yet another virus resistance system. Comparative genomics has shown that bacteria and archaea have failed to gain any DNA with GC content higher than the GC content of their chromosomes. Thus, horizontally transferred DNA regions have lower GC content than the host chromosomal DNA does. Some nucleoid-associated proteins bind DNA regions with low GC content and inhibit the expression of genes contained in those regions. This form of gene repression is another type of virus resistance system. On the other hand, bacteria and archaea have used plasmids to gain additional genes. Virus resistance systems influence plasmid distribution. Interestingly, the restriction-modification system and nucleoid-associated protein genes have been distributed via plasmids. Thus, GC content and genomic signatures do not reflect bacterial and archaeal evolutionary relationships.  相似文献   

10.
Cai J  Sen PK  Zhou H 《Biometrics》1999,55(1):182-189
A random effects model for analyzing multivariate failure time data is proposed. The work is motivated by the need for assessing the mean treatment effect in a multicenter clinical trial study, assuming that the centers are a random sample from an underlying population. An estimating equation for the mean hazard ratio parameter is proposed. The proposed estimator is shown to be consistent and asymptotically normally distributed. A variance estimator, based on large sample theory, is proposed. Simulation results indicate that the proposed estimator performs well in finite samples. The proposed variance estimator effectively corrects the bias of the naive variance estimator, which assumes independence of individuals within a group. The methodology is illustrated with a clinical trial data set from the Studies of Left Ventricular Dysfunction. This shows that the variability of the treatment effect is higher than found by means of simpler models.  相似文献   

11.
Human cytomegalovirus (HCMV) infection, a worldwide contagion, causes a serious disorder in infected individuals. Analysis of codon usage can reveal much molecular information about this virus. The effective number of codon (ENC) values, relative synonymous codon usage (RSCU) values, codon adaptation index (CAI), and nucleotide contents was investigated in approximately 160 coding sequences (CDS) among 17 human cytomegalovirus genomes using the software CodonW. Linear regression analysis and logistic regression were performed to explore the preliminary data. The results showed that, overall, HCMV genomes had low codon usage bias (mean ENC = 47.619). However, the ENC of individual CDS varied widely and was distributed unevenly between host-related genes and viral-self-function genes (P = 0.002, odds ratio (OR) = 3.194), as did the GC content (P = 0.016, OR = 2.178). The ENC values correlated with CAI, GC content, and the nucleotide composing at the 3rd codon position (GC3s) (P < 0.001). There was a significant variation in the codon preference that depended on the RSCU data. The predicted ENC curve suggested that mutational pressure, rather than natural selection, was one of the main factors that determined the codon usage bias in HCMV. Among 123 genes with known function, the genes related to viral self-replication and viral–host interaction showed different ENC and CAI values, and GC and GC3s contents. In conclusion, the detailed codon usage bias theoretically revealed information concerning HCMV evolution and could be a valuable additional parameter for HCMV gene function research.  相似文献   

12.
Codon usage bias varies considerably among genomes and even within the genes of the same genome.In eukaryotic organisms,energy production in the form of oxidative phosphorylation(OXPHOS)is the only process under control of both nuclear and mitochondrial genomes.Although factors affecting codon usage in a single genome have been studied,this has not occurred when both interactional genomes are involved.Consequently, we investigated whether or not other factors influence codon usage of coevolved genes.We used Drosophila melanogaster as a model organism.Our χ2 test on the number of codons of nuclear and mitochondrial genes involved in the OXPHOS system was significantly different (χ2=7945.16,P<0.01).A plot of effective number of codons against GC3s content of nuclear genes showed that few genes lie on the expected curve,indicating that codon usage was random.Correspondence analysis indicated a significant correlation between axis 1 and codon adaptation index(R=0.947,P<0.01)in every nuclear gene sequence.Thus,codon usage bias of nuclear genes appeared to be affected by translational selection.Correlation between axis 1 coordinates and GC content(R=0.814.P<0.01)indicated that the codon usage of nuclear genes was also affected by GC composition.Analysis of mitochondrial genes did not reveal a significant correlation between axis 1 and any parameter.Statistical analyses indicated that codon usages of both nDNA and mtDNA were subjected to context-dependent mutations.  相似文献   

13.
Analysis of codon usage pattern is important to understand the genetic and evolutionary characteristics of genomes. We have used bioinformatic approaches to analyze the codon usage bias (CUB) of the genes located in human Y chromosome. Codon bias index (CBI) indicated that the overall extent of codon usage bias was low. The relative synonymous codon usage (RSCU) analysis suggested that approximately half of the codons out of 59 synonymous codons were most frequently used, and possessed a T or G at the third codon position. The codon usage pattern was different in different genes as revealed from correspondence analysis (COA). A significant correlation between effective number of codons (ENC) and various GC contents suggests that both mutation pressure and natural selection affect the codon usage pattern of genes located in human Y chromosome. In addition, Y-linked genes have significant difference in GC contents at the second and third codon positions, expression level, and codon usage pattern of some codons like the SPANX genes in X chromosome.  相似文献   

14.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

15.
16.
Estimating effective population size or mutation rate with microsatellites   总被引:4,自引:0,他引:4  
Xu H  Fu YX 《Genetics》2004,166(1):555-563
Microsatellites are short tandem repeats that are widely dispersed among eukaryotic genomes. Many of them are highly polymorphic; they have been used widely in genetic studies. Statistical properties of all measures of genetic variation at microsatellites critically depend upon the composite parameter theta = 4Nmicro, where N is the effective population size and micro is mutation rate per locus per generation. Since mutation leads to expansion or contraction of a repeat number in a stepwise fashion, the stepwise mutation model has been widely used to study the dynamics of these loci. We developed an estimator of theta, theta; (F), on the basis of sample homozygosity under the single-step stepwise mutation model. The estimator is unbiased and is much more efficient than the variance-based estimator under the single-step stepwise mutation model. It also has smaller bias and mean square error (MSE) than the variance-based estimator when the mutation follows the multistep generalized stepwise mutation model. Compared with the maximum-likelihood estimator theta; (L) by, theta; (F) has less bias and smaller MSE in general. theta; (L) has a slight advantage when theta is small, but in such a situation the bias in theta; (L) may be more of a concern.  相似文献   

17.
Ciliated protists contain both germline micronucleus (MIC) and somatic macronucleus (MAC) in a single cytoplasm. Programmed genome rearrangements occur in ciliates during sexual processes, and the extent of rearrangements varies dramatically among species, which lead to significant differences in genomic architectures. However, genomic sequences remain largely unknown for most ciliates due to the difficulty in culturing and in separating the germline from the somatic genome in a single cell. Single-cell whole genome amplification (WGA) has emerged as a powerful technology to characterize the genomic heterogeneity at the single-cell level. In this study, we compared two single-cell WGA, multiple displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC) in characterizing the germline and somatic genomes in ciliates with different genomic architectures. Our results showed that: 1) MALBAC exhibits strong amplification bias towards MAC genome while MDA shows bias towards MIC genome of ciliates with extensively fragmented MAC genome; 2) both MDA and MALBAC could amplify MAC genome more efficiently in ciliates with moderately fragmented MAC genome. Moreover, we found that more sample replicates could help to obtain more genomic data. Our work provides a reference for selecting the appropriate method to characterize germline and somatic genomes of ciliates.  相似文献   

18.
Kono N  Arakawa K  Tomita M 《PloS one》2012,7(4):e34526
In bacterial circular chromosomes and most plasmids, the replication is known to be terminated when either of the following occurs: the forks progressing in opposite directions meet at the distal end of the chromosome or the replication forks become trapped by Tus proteins bound to Ter sites. Most bacterial genomes have various polarities in their genomic structures. The most notable feature is polar genomic compositional asymmetry of the bases G and C in the leading and lagging strands, called GC skew. This asymmetry is caused by replication-associated mutation bias, and this "footprint" of the replication machinery suggests that, in contrast to the two known mechanisms, replication termination occurs near the chromosome dimer resolution site dif. To understand this difference between the known replication machinery and genomic compositional bias, we undertook a simulation study of genomic mutations, and we report here how different replication termination models contribute to the generation of replication-related genomic compositional asymmetry. Contrary to naive expectations, our results show that a single finite termination site at dif or at the GC skew shift point is not sufficient to reconstruct the genomic compositional bias as observed in published sequences. The results also show that the known replication mechanisms are sufficient to explain the position of the GC skew shift point.  相似文献   

19.
Recent studies have suggested that the thermodynamic stability of mRNA secondary structure near the start codon can regulate translation efficiency in Escherichia coli, and that translation is more efficient the less stable the secondary structure. We survey the complete genomes of 340 species for signals of reduced mRNA secondary structure near the start codon. Our analysis includes bacteria, archaea, fungi, plants, insects, fishes, birds, and mammals. We find that nearly all species show evidence for reduced mRNA stability near the start codon. The reduction in stability generally increases with increasing genomic GC content. In prokaryotes, the reduction also increases with decreasing optimal growth temperature. Within genomes, there is variation in the stability among genes, and this variation correlates with gene GC content, codon bias, and gene expression level. For birds and mammals, however, we do not find a genome-wide trend of reduced mRNA stability near the start codon. Yet the most GC rich genes in these organisms do show such a signal. We conclude that reduced stability of the mRNA secondary structure near the start codon is a universal feature of all cellular life. We suggest that the origin of this reduction is selection for efficient recognition of the start codon by initiator-tRNA.  相似文献   

20.
Outcome misclassification occurs frequently in binary-outcome studies and can result in biased estimation of quantities such as the incidence, prevalence, cause-specific hazards, cumulative incidence functions, and so forth. A number of remedies have been proposed to address the potential misclassification of the outcomes in such data. The majority of these remedies lie in the estimation of misclassification probabilities, which are in turn used to adjust analyses for outcome misclassification. A number of authors advocate using a gold-standard procedure on a sample internal to the study to learn about the extent of the misclassification. With this type of internal validation, the problem of quantifying the misclassification also becomes a missing data problem as, by design, the true outcomes are only ascertained on a subset of the entire study sample. Although, the process of estimating misclassification probabilities appears simple conceptually, the estimation methods proposed so far have several methodological and practical shortcomings. Most methods rely on missing outcome data to be missing completely at random (MCAR), a rather stringent assumption which is unlikely to hold in practice. Some of the existing methods also tend to be computationally-intensive. To address these issues, we propose a computationally-efficient, easy-to-implement, pseudo-likelihood estimator of the misclassification probabilities under a missing at random (MAR) assumption, in studies with an available internal-validation sample. We present the estimator through the lens of studies with competing-risks outcomes, though the estimator extends beyond this setting. We describe the consistency and asymptotic distributional properties of the resulting estimator, and derive a closed-form estimator of its variance. The finite-sample performance of this estimator is evaluated via simulations. Using data from a real-world study with competing-risks outcomes, we illustrate how the proposed method can be used to estimate misclassification probabilities. We also show how the estimated misclassification probabilities can be used in an external study to adjust for possible misclassification bias when modeling cumulative incidence functions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号