for Journals by Title or ISSN
for Articles by Keywords
  Subjects -> ENGINEERING (Total: 1955 journals)
    - CHEMICAL ENGINEERING (153 journals)
    - CIVIL ENGINEERING (148 journals)
    - ELECTRICAL ENGINEERING (82 journals)
    - ENGINEERING (1111 journals)
    - HYDRAULIC ENGINEERING (45 journals)
    - INDUSTRIAL ENGINEERING (52 journals)
    - MECHANICAL ENGINEERING (74 journals)

CHEMICAL ENGINEERING (153 journals)                  1 2     

ACS Combinatorial Science     Full-text available via subscription   (Followers: 9)
Acta Crystallographica Section B: Structural Science, Crystal Engineering and Materials     Hybrid Journal   (Followers: 3)
Acta Polymerica     Hybrid Journal   (Followers: 6)
Additives for Polymers     Full-text available via subscription   (Followers: 19)
Adhesion Adhesives & Sealants     Hybrid Journal   (Followers: 4)
Advanced Chemical Engineering Research     Open Access   (Followers: 8)
Advanced Powder Technology     Hybrid Journal   (Followers: 13)
Advances in Chemical Engineering     Full-text available via subscription   (Followers: 15)
Advances in Chemical Engineering and Science     Open Access   (Followers: 21)
Advances in Polymer Technology     Hybrid Journal   (Followers: 11)
African Journal of Pure and Applied Chemistry     Open Access   (Followers: 4)
Annual Review of Analytical Chemistry     Full-text available via subscription   (Followers: 9)
Annual Review of Chemical and Biomolecular Engineering     Full-text available via subscription   (Followers: 10)
Anti-Corrosion Methods and Materials     Hybrid Journal   (Followers: 4)
Applied Petrochemical Research     Open Access   (Followers: 3)
Asia-Pacific Journal of Chemical Engineering     Hybrid Journal   (Followers: 6)
Biochemical Engineering Journal     Hybrid Journal   (Followers: 8)
Biomass Conversion and Biorefinery     Partially Free   (Followers: 5)
BMC Chemical Biology     Open Access   (Followers: 4)
Brazilian Journal of Chemical Engineering     Open Access   (Followers: 2)
Bulletin of the Chemical Society of Ethiopia     Open Access   (Followers: 1)
Carbohydrate Polymers     Hybrid Journal   (Followers: 8)
Catalysts     Open Access   (Followers: 6)
Chemical and Petroleum Engineering     Hybrid Journal   (Followers: 7)
Chemical and Process Engineering     Open Access   (Followers: 3)
Chemical and Process Engineering Research     Open Access   (Followers: 5)
Chemical Communications     Full-text available via subscription   (Followers: 29)
Chemical Engineering & Technology     Hybrid Journal   (Followers: 24)
Chemical Engineering and Processing: Process Intensification     Hybrid Journal   (Followers: 10)
Chemical Engineering and Science     Open Access   (Followers: 2)
Chemical Engineering Communications     Hybrid Journal   (Followers: 10)
Chemical Engineering Journal     Hybrid Journal   (Followers: 18)
Chemical Engineering Research and Design     Hybrid Journal   (Followers: 15)
Chemical Engineering Science     Hybrid Journal   (Followers: 10)
Chemical Geology     Hybrid Journal   (Followers: 9)
Chemical Papers     Hybrid Journal   (Followers: 3)
Chemical Product and Process Modeling     Full-text available via subscription   (Followers: 3)
Chemical Reviews     Full-text available via subscription   (Followers: 280)
Chemical Society Reviews     Full-text available via subscription   (Followers: 28)
Chemical Technology     Open Access   (Followers: 4)
ChemInform     Hybrid Journal   (Followers: 3)
Chemistry & Industry     Hybrid Journal   (Followers: 2)
Chemistry Central Journal     Open Access   (Followers: 5)
Chemistry of Materials     Full-text available via subscription   (Followers: 194)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 6)
ChemSusChem     Hybrid Journal   (Followers: 7)
Chinese Chemical Letters     Full-text available via subscription   (Followers: 1)
Chinese Journal of Chemical Engineering     Full-text available via subscription   (Followers: 3)
Chinese Journal of Chemical Physics     Hybrid Journal   (Followers: 1)
Coke and Chemistry     Hybrid Journal  
Coloration Technology     Hybrid Journal   (Followers: 1)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 8)
Computer Aided Chemical Engineering     Full-text available via subscription   (Followers: 2)
Computers & Chemical Engineering     Hybrid Journal   (Followers: 6)
Corrosion Reviews     Full-text available via subscription   (Followers: 4)
Crystal Research and Technology     Hybrid Journal   (Followers: 2)
Current Opinion in Chemical Engineering     Open Access   (Followers: 3)
Education for Chemical Engineers     Hybrid Journal   (Followers: 4)
European Polymer Journal     Hybrid Journal   (Followers: 41)
Fibers and Polymers     Full-text available via subscription   (Followers: 3)
Focusing on Modern Food Industry     Open Access   (Followers: 3)
Frontiers of Chemical Science and Engineering     Hybrid Journal   (Followers: 1)
Geochemistry International     Hybrid Journal  
Handbook of Powder Technology     Full-text available via subscription   (Followers: 2)
High Performance Polymers     Hybrid Journal  
Indian Chemical Engineer     Hybrid Journal   (Followers: 3)
Indian Journal of Chemical Technology (IJCT)     Open Access   (Followers: 12)
Industrial & Engineering Chemistry     Full-text available via subscription   (Followers: 9)
Industrial & Engineering Chemistry Research     Full-text available via subscription   (Followers: 17)
Industrial Chemistry Library     Full-text available via subscription   (Followers: 4)
International Journal of Chemical and Petroleum Sciences     Open Access   (Followers: 1)
International Journal of Chemical Engineering     Open Access   (Followers: 6)
International Journal of Chemical Reactor Engineering     Full-text available via subscription   (Followers: 3)
International Journal of Chemical Technology     Open Access   (Followers: 3)
International Journal of Chemoinformatics and Chemical Engineering     Full-text available via subscription   (Followers: 2)
International Journal of Food Science     Open Access   (Followers: 2)
International Journal of Industrial Chemistry     Open Access  
International Journal of Polymeric Materials     Hybrid Journal   (Followers: 3)
International Journal of Science and Engineering     Open Access   (Followers: 7)
International Journal of Waste Resources     Open Access   (Followers: 5)
ISRN Chemical Engineering     Open Access   (Followers: 4)
ISRN Polymer Science     Open Access   (Followers: 11)
Journal of Applied Crystallography     Hybrid Journal   (Followers: 4)
Journal of Applied Electrochemistry     Hybrid Journal   (Followers: 7)
Journal of Applied Polymer Science     Hybrid Journal   (Followers: 178)
Journal of Biomaterials Science, Polymer Edition     Hybrid Journal   (Followers: 8)
Journal of Chemical & Engineering Data     Full-text available via subscription   (Followers: 6)
Journal of Chemical Ecology     Hybrid Journal   (Followers: 1)
Journal of Chemical Engineering     Open Access   (Followers: 4)
Journal of Chemical Engineering and Materials Science     Open Access  
Journal of Chemical Science and Technology     Open Access   (Followers: 1)
Journal of Chemical Sciences     Partially Free   (Followers: 15)
Journal of Chemical Technology & Biotechnology     Hybrid Journal   (Followers: 2)
Journal of Chemical Theory and Computation     Full-text available via subscription   (Followers: 9)
Journal of Coatings     Open Access   (Followers: 2)
Journal of Crystallization Process and Technology     Open Access   (Followers: 5)
Journal of Food Measurement and Characterization     Hybrid Journal  
Journal of Fuel Chemistry and Technology     Full-text available via subscription   (Followers: 5)
Journal of Fuels     Open Access  
Journal of Geochemical Exploration     Hybrid Journal  

        1 2     

Journal Cover Computational Biology and Chemistry
   Journal TOC RSS feeds Export to Zotero [10 followers]  Follow    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
     ISSN (Print) 1476-9271
     Published by Elsevier Homepage  [2566 journals]   [SJR: 0.558]   [H-I: 39]
  • Metabolic network motifs can provide novel insights into evolution: The
           evolutionary origin of Eukaryotic organelles as a case study
    • Abstract: Publication date: Available online 21 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Erin R. Shellman , Yu Chen , Xiaoxia Lin , Charles F. Burant , Santiago Schnell
      Phylogenetic trees are typically constructed using genetic and genomic data, and provide a robust relationship in the evolutionary origin of species from the genomic point of view. We present an application of network motif mining and analysis of metabolic pathways that when used in combination with phylogenetic trees can provide a more complete picture of evolution. By using distributions of three-node motifs as a proxy for metabolic similarity, we analyze the ancestral origin of Eukaryotic organelles from the metabolic point of view to illustrate the application of our motif mining and analysis network approach. Our analysis suggests that the hypothesis of an early proto-Eukaryote could be valid. It also suggests that a δ- or ϵ-Proteobacteria may have been the endosymbiotic partner that gave rise to modern mitochondria. Our evolutionary analysis needs to be extended by building metabolic network reconstructions of species from the phylum Crenarchaeota, which is considered to be a possible archaeal ancestor of the eukaryotic cell. In this paper, we also propose a methodology for constructing phylogenetic trees that incorporates metabolic network signatures to identify regions of genomically-estimated phylogenies that may be spurious. We find that results generated from our approach are consistent with a parallel phylogenetic analysis using the method of feature frequency profiles.
      Graphical abstract image

      PubDate: 2014-09-23T22:32:24Z
  • Computational insight into nitration of human myoglobin
    • Abstract: Publication date: Available online 18 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Ying-Wu Lin , Xiao-Gang Shu , Ke-Jie Du , Chang-Ming Nie , Ge-Bo Wen
      Protein nitration is an important post-translational modification regulating protein structure and function, especially for heme proteins. Myoglobin (Mb) is an ideal protein model for investigating the structure and function relationship of heme proteins. With limited structural information available for nitrated heme proteins from experiments, we herein performed a molecular dynamics study of human Mb with successive nitration of Tyr103, Tyr146, Trp7 and Trp14. We made a detailed comparison of protein motions, intramolecular contacts and internal cavities of nitrated Mbs with that of native Mb. It showed that although nitration of both Tyr103 and Tyr146 slightly alters the local conformation of heme active site, further nitration of both Trp7 and Trp14 shifts helix A apart from the rest of protein, which results in altered internal cavities and forms a water channel, representing an initial stage of Mb unfolding. The computational study provides an insight into the nitration of heme proteins at an atomic level, which is valuable for understanding the structure and function relationship of heme proteins in non-native states by nitration.
      Graphical abstract image

      PubDate: 2014-09-19T22:09:43Z
  • Orphan and gene related CpG Islands follow power-law-like distributions in
           several genomes: Evidence of function-related and taxonomy-related modes
           of distribution
    • Abstract: Publication date: Available online 16 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Giannis Tsiagkas , Christoforos Nikolaou , Yannis Almirantis
      CpG Islands (CGIs) are compositionally defined short genomic stretches, which have been studied in the human, mouse, chicken and later in several other genomes. Initially, they were assigned the role of transcriptional regulation of protein-coding genes, especially the house-keeping ones, while more recently there is found evidence that they are involved in several other functions as well, which might include regulation of the expression of RNA genes, DNA replication etc. Here, an investigation of their distributional characteristics in a variety of genomes is undertaken for both whole CGI populations as well as for CGI subsets that lie away from known genes (gene-unrelated or “orphan” CGIs). In both cases power-law-like linearity in double logarithmic scale is found. An evolutionary model, initially put forward for the explanation of a similar pattern found in gene populations is implemented. It includes segmental duplication events and eliminations of most of the duplicated CGIs, while a moderate rate of non-duplicated CGI eliminations is also applied in some cases. Simulations reproduce all the main features of the observed inter-CGI chromosomal size distributions. Our results on power-law-like linearity found in orphan CGI populations suggest that the observed distributional pattern is independent of the analogous pattern that protein coding segments were reported to follow. The power-law-like patterns in the genomic distributions of CGIs described herein are found to be compatible with several other features of the composition, abundance or functional role of CGIs reported in the current literature across several genomes, on the basis of the proposed evolutionary model.

      PubDate: 2014-09-19T22:09:43Z
  • Seeding-inspired Chemotaxis Genetic Algorithm for the Inference of
           Biological Systems
    • Abstract: Publication date: Available online 18 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Shinq-Jen Wu , Cheng-Tao Wu
      A large challenge in the post-genomic era is to obtain the quantitatively dynamic interactive information of the important constitutes of underlying systems. The S-system is a dynamic and structurally rich model that determines the net strength of interactions between genes and/or proteins. Good generation characteristics without the need for prior information have allowed S-systems to become one of the most promising canonical models. Various evolutionary computation technologies have recently been developed for the identification of system parameters and skeletal-network structures. However, the gaps between the truncated and preserved terms remain too small. Additionally, current research methods fail to identify the structures of high dimensional systems (e.g., 30 genes with 1800 connections). Optimization technologies should converge fast and have the ability to adaptively adjust the search. In this study, we propose a seeding-inspired chemotaxis genetic algorithm (SCGA) that can force evolution to adjust the population movement to identify a favorable location. The seeding-inspired training strategy is a method to achieve optimal results with limited resources. SCGA introduces seeding-inspired genetic operations to allow a population to possess competitive power (exploitation and exploration) and a winner-chemotaxis-induced population migration to force a population to repeatedly tumble away from an attractor and swim toward another attractor. SCGA was tested on several canonical biological systems. SCGA not only learned the correct structure within only one to three pruning steps but also ensures pruning safety. The values of the truncated terms were all smaller than 10−14, even for a 30-gene system.
      Graphical abstract image

      PubDate: 2014-09-19T22:09:43Z
  • Identification of potential drug targets by subtractive genome analysis of
           Bacillus anthracis A0248: An In silico approach
    • Abstract: Publication date: Available online 18 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Md. Anisur Rahman , Md. Sanaullah Noore , Md. Anayet Hasan , Md. Rakib Ullah , Md. Hafijur Rahman , Md. Amzad Hossain , Yeasmeen Ali , Md. Saiful Islam
      Background Bacillus anthracis is a gram positive, spore forming, rod shaped bacteria which is the etiologic agent of anthrax–Cutaneous, Pulmonary and Gastrointestinal. A recent outbreak of anthrax in a tropical region uncovered natural and in vitro resistance against penicillin, ciprofloxacin, quinolone due to over exposure of the pathogen to these antibiotics. This fact combined with the ongoing threat of using Bacillus anthracis as a biological weapon proves that the identification of new therapeutic targets is urgently needed. Methods In this computational approach various databases and online based servers were used to detect essential proteins of Bacillus anthracis A0248. Protein sequences of Bacillus anthracis A0248 strain were retrieved from the NCBI database which was then run in CD-hit suite for clustering. NCBI BlastP against the human proteome and similarity search against DEG were done to find out essential human non-homologous proteins. Proteins involved in unique pathways were analyzed using KEGG genome database and PSORTb, CELLO v.2.5, ngLOC- these three tools were used to deduce putative cell surface proteins. Results Successive analysis revealed 116 proteins to be essential human non-homologs among which 17 were involved in unique metabolic pathways and 28 were predicted as membrane associated proteins. Both types of proteins can be exploited as they are unlikely to have homologous counterparts in the human host. Conclusion Being human non-homologous, these proteins can be targeted for potential therapeutic drug development in future. Targets on unique metabolic and membrane-bound proteins can block cell wall synthesis, bacterial replication and signal transduction respectively.
      Graphical abstract image

      PubDate: 2014-09-19T22:09:43Z
  • Hierarchical closeness efficiently predicts disease genes in a directed
           signaling network
    • Abstract: Publication date: Available online 19 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Tien-Dzung Tran , Yung-Keun Kwon
      Background Many structural centrality measures were proposed to predict putative disease genes on biological networks. Closeness is one of the best-known structural centrality measures, and its effectiveness for disease gene prediction on undirected biological networks has been frequently reported. However, it is not clear whether closeness is effective for disease gene prediction on directed biological networks such as signaling networks. Results In this paper, we first show that closeness does not significantly outperform other well-known centrality measures such as Degree, Betweenness, and PageRank for disease gene prediction on a human signaling network. In addition, we observed that prediction accuracy by the closeness measure was worse than that by a reachability measure, but closeness could efficiently predict disease genes among a set of genes with the same reachability value. Based on this observation, we devised a novel structural measure, hierarchical closeness, by combining reachability and closeness such that all genes are first ranked by the degree of reachability and then the tied genes are further ranked by closeness. We discovered that hierarchical closeness outperforms other structural centrality measures in disease gene prediction. We also found that the set of highly ranked genes in terms of hierarchical closeness is clearly different from that of hub genes with high connectivity. More interestingly, these findings were consistently reproduced in a random Boolean network model. Finally, we found that genes with relatively high hierarchical closeness are significantly likely to encode proteins in the extracellular matrix and receptor proteins in a human signaling network, supporting the fact that half of all modern medicinal drugs target receptor-encoding genes. Conclusion Taken together, hierarchical closeness proposed in this study is a novel structural measure to efficiently predict putative disease genes in a directed signaling network.
      Graphical abstract image

      PubDate: 2014-09-19T22:09:43Z
  • Comparative analysis of periodicity search methods in DNA sequences
    • Abstract: Publication date: Available online 15 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Yulia M. Suvorova , Maria A. Korotkova , Eugene V. Korotkov
      To determine the periodicity of a DNA sequence, different spectral approaches are applied (discrete Fourier transform (DFT), autocorrelation (CORR), information decomposition (ID), hybrid method (HYB), concept of spectral envelope for spectral analysis (SE), normalized autocorrelation (CORR_N) and profile analysis (PA). In this work, we investigated the possibility of finding the true period length, by depending on the average number of accumulated changes in DNA bases (PM) for the methods stated above. The results show that for periods with short length (≤4 b.p), it is possible to use the hybrid method (HYB), which combines properties of autocorrelation, Fourier transform, and information decomposition (ID). For larger period lengths (>4) with values of point mutation (PM) equal to 1.0 or more per one nucleotide, it is preferable to use information of decomposition method (ID), as the other spectral approaches cannot achieve correct determination of the period length present in the analyzed sequence.

      PubDate: 2014-09-15T21:42:45Z
  • newDNA-Prot: Prediction of DNA-binding proteins by employing support
           vector machine and a comprehensive sequence representation
    • Abstract: Publication date: Available online 15 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Yanping Zhang , Jun Xu , Wei Zheng , Chen Zhang , Xingye Qiu , Ke Chen , Jishou Ruan
      Identification of DNA-binding proteins is essential in studying cellular activities as the DNA-binding proteins play a pivotal role in gene regulation. In this study, we propose newDNA-Prot, a DNA-binding protein predictor that employs support vector machine classifier and a comprehensive feature representation. The sequence representation are categorized into 6 groups: primary sequence based, evolutionary profile based, predicted secondary structure based, predicted relative solvent accessibility based, physicochemical property based and biological function based features. The mRMR, Wrapper and two-stage feature selection methods are employed for removing irrelevant features and reducing redundant features. Experiments demonstrate that the two-stage method performs better than the mRMR and Wrapper methods. We also perform a statistical analysis on the selected features and results show that more than 95% of the selected features are statistically significant and they cover all 6 feature groups. The newDNA-Prot method is compared with several state of the art algorithms, including iDNA-Prot, DNAbinder and DNA-Prot. The results demonstrate that newDNA-Prot method outperforms the iDNA-Prot, DNAbinder and DNA-Prot methods. More specific, newDNA-Prot improves the runner-up method, DNA-Prot for around 10% on several evaluation measures. The proposed newDNA-Prot method is available at
      Graphical abstract image

      PubDate: 2014-09-15T21:42:45Z
  • Genome-wide evidence of positive selection in Bacteroides fragilis
    • Abstract: Publication date: Available online 6 September 2014
      Source:Computational Biology and Chemistry
      Author(s): Sumio Yoshizaki , Toshiaki Umemura , Kaori Tanaka , Kunitomo Watanabe , Masahiro Hayashi , Yoshinori Muto
      We used an evolutionary genomics approach to identify genes that are under lineage-specific positive selection in six species of the genus Bacteroides, including three strains of pathogenic B. fragilis. Using OrthoMCL, we identified 1275 orthologous gene clusters present in all eight Bacteroides genomes. A total of 52 genes were identified as under positive selection in the branch leading to the B. fragilis lineage, including a number of genes encoding cell surface proteins such as TonB-dependent receptor. Three-dimensional structural mapping of positively selected sites indicated that many residues under positive selection occur in the extracellular loops of the proteins. The adaptive changes in these positively selected genes might be related to dynamic interactions between the host immune systems and the surrounding intestinal environment.
      Graphical abstract image

      PubDate: 2014-09-06T21:11:41Z
  • Conserved patterns in bacterial genomes: a conundrum physically tailored
           by evolutionary tinkering
    • Abstract: Publication date: Available online 30 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Ivan Junier
      The proper functioning of bacteria is encoded in their genome at multiple levels or scales, each of which is constrained by specific physical forces. At the smallest spatial scales, interatomic forces dictate the folding and function of proteins and nucleic acids. On longer length scales, stochastic forces emerging from the thermal jiggling of proteins and RNAs impose strong constraints on the organization of genes along chromosomes, more particularly in the context of the building of nucleoprotein complexes and the operational mode of regulatory agents. At the cellular level, transcription, replication and cell division activities generate forces that act on both the internal structure and cellular location of chromosomes. The overall result is a complex multi-scale organization of genomes that reflects the evolutionary tinkering of bacteria. The goal of this review is to highlight avenues for deciphering this complexity by focusing on patterns that are conserved among evolutionarily distant bacteria. To this end, I discuss three different organizational scales: the protein structures, the chromosomal organization of genes and the global structure of chromosomes.

      PubDate: 2014-09-02T20:59:48Z
  • Exploring the complexity of pathway-drug relationships using latent
           Dirichlet allocation
    • Abstract: Publication date: Available online 24 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Naruemon Pratanwanich , Pietro Lio’
      Analysis of cellular responses to diverse stimuli enables the exploration in the complexity of functional genomics. Typically, high-throughput microarray data allow us to identify genes that are differentially expressed under a phenomenon of interest. To extract the meanings from the long list of those differentially expressed genes, we present a new method “pathway-based LDA” to determine pathways/gene sets that are perturbed after exposure to different chemicals. In this study, a pathway is defined as a group of functionally related genes. Specifically, we have implemented a probabilistic Latent Dirichlet Allocation (LDA) model to learn drug-pathway-gene relations by taking known gene-pathway memberships as prior knowledge. We applied the pathway-based LDA model and 236 known pathways in order to determine pathway responsiveness to gene expression data of 1169 drugs. Our method yielded a better predictive performance on pathway responsiveness to drug treatments than the existing methods. Moreover, the pathway-based LDA also revealed genes contributing the most in each pre-defined pathway through a probabilistic distribution of genes. In achieving that, our method could provide a useful estimator of the pathway complexity of a genome.

      PubDate: 2014-09-02T20:59:48Z
  • Entropy and long-range correlations in DNA sequences
    • Abstract: Publication date: Available online 27 August 2014
      Source:Computational Biology and Chemistry
      Author(s): S.S. Melnik , O.V. Usatenko
      We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding “texts”. We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species.

      PubDate: 2014-09-02T20:59:48Z
  • Self-organizing approach for meta-genomes
    • Abstract: Publication date: Available online 24 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Jianfeng Zhu , Wei-Mou Zheng
      We extend the self-organizing approach for annotation of a bacterial genome to analyzing the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven ‘phases’, among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or ‘codon usages’. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome.

      PubDate: 2014-09-02T20:59:48Z
  • Large replication skew domains delimit GC-poor gene deserts in human
    • Abstract: Publication date: Available online 27 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Lamia Zaghloul , Guénola Drillon , Rasha E. Boulos , Françoise Argoul , Claude Thermes , Alain Arneodo , Benjamin Audit
      Besides their large-scale organization in isochores, mammalian genomes display megabase-sized regions, spanning both genes and intergenes, where the strand nucleotide composition asymmetry decreases linearly, possibly due to replication activity. These so-called skew-N domains cover about a third of the human genome and are bordered by two skew upward jumps that were hypothesized to compose a subset of “master” replication origins active in the germline. Skew-N domains were shown to exhibit a particular gene organization. Genes with CpG-rich promoters likely expressed in the germline are over represented near the master replication origins, with large genes being co-oriented with replication fork progression, which suggests some coordination of replication and transcription. In this study, we describe another skew structure that covers ∼13% of the human genome and that is bordered by putative master replication origins similar to the ones flanking skew-N domains. These skew-split-N domains have a shape reminiscent of a N, but split in half, leaving in the center a region of null skew whose length increases with domain size. These central regions (median size ∼860 kb) have a homogeneous composition, i.e. both a null and constant skew and a constant and low GC content. They correspond to heterochromatin gene deserts found in low-GC isochores with an average gene density of 0.81 promoters/Mb as compared to 7.73 promoters/Mb genome wide. The analysis of epigenetic marks and replication timing data confirms that, in these late replicating heterochomatic regions, the initiation of replication is likely to be random. This contrasts with the transcriptionally active euchromatin state found around the bordering well positioned master replication origins. Altogether skew-N domains and skew-split-N domains cover about 50% of the human genome.

      PubDate: 2014-09-02T20:59:48Z
  • On K-Peptide Length in Composition Vector Phylogeny of Prokaryotes
    • Abstract: Publication date: Available online 20 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Guanghong Zuo , Qiang Li , Bailin Hao
      Using an enlarged alphabet of K-tuples is the way to carry out alignment-free comparison of genomes in the composition vector (CV) approach to prokaryotic phylogeny. We summarize the known aspects concerning the choice of K and examine the results of using CVs with subtraction of a statistical background for K =3 to 9 and using raw CVs without subtraction for K =1 to 12. The criterion for evaluation consists in direct comparison with taxonomy. For prokaryotes the best performances are obtained for K =5 and 6 with subtraction and for K =11, 12 or even more without subtraction. In general, CVs with subtractions are slightly better and less CPU-consuming, but CVs without subtraction may provide complementary information.

      PubDate: 2014-09-02T20:59:48Z
  • Complexity Measures for the Evolutionary Categorisation of Organisms
    • Abstract: Publication date: Available online 28 August 2014
      Source:Computational Biology and Chemistry
      Author(s): A. Provata , C. Nicolis , G. Nicolis
      Complexity measures are used to compare the genomic characteristics of five organisms belonging to distinct classes spanning the evolutionary tree: higher eukaryotes, amoebae, unicellular eukaryotes and bacteria. The comparisons are undertaken using the full four-letter alphabet and the coarse grained two-letter alphabets AG-CT and AT-CG. We show that the conditional probability matrix for the four-letter and AT-CG alphabet is markedly asymmetric in eukaryotes while it is nearly symmetric in bacterial genomes. Spatial asymmetry is revealed in the four-letter alphabet, signifying that the probability fluxes are nonvanishing and thus the reading sense of a sequence is irreversible for all organisms. Calculations of the block entropy and excess entropy demonstrate that the human genome accommodates better all possible block configurations, especially for long blocks. With respect to point-to-point details and to spatial arrangement of blocks the exit distance distributions from a particular letter demonstrate long distance characteristics in the eukaryotic sequences for all three alphabets, while the bacterial (prokaryotic) genomes deviate indicating short range characteristics. Overall, the conditional probability, the fluxes, the block entropy content and the exit distance distributions can be used as markers, discriminating between eukaryotic and prokaryotic DNA, allowing in many cases to discern details related to finer classes. In all cases the reduction from four letters to two mask some important statistical and spatial properties, with the AT-CG alphabet having higher ability of discrimination than the AG-CT one. In particular, the AT-CG alphabet reduction accentuates the CpG related properties (conditional probabilities w 32 , long ranged exit distance distribution for A and T nucleotides), but masks sequence asymmetry and irreversibility in all examined organisms.

      PubDate: 2014-09-02T20:59:48Z
  • Characterizing regions in the human genome unmappable by
           next-generation-sequencing at the read length of 1000 bases
    • Abstract: Publication date: Available online 20 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Wentian Li , Jan Freudenberg
      Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project.

      PubDate: 2014-09-02T20:59:48Z
  • Evidence of a cancer type-specific distribution for consecutive somatic
           mutation distances
    • Abstract: Publication date: Available online 23 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Jose M. Muiño , Ercan E. Kuruoğlu , Peter F. Arndt
      Specific molecular mechanisms may affect the pattern of mutation in particular regions, and therefore leaving a footprint or signature in the DNA of their activity. The common approach to identify these signatures is studying the frequency of substitutions. However, such an analysis ignores the important spatial information, which is important with regards to the mutation occurrence statistics. In this work, we propose that the study of the distribution of distances between consecutive mutations along the DNA molecule can provide information about the types of somatic mutational processes. In particular, we have found that specific cancer types show a power-law in interoccurrence distances, instead of the expected exponential distribution dictated with the Poisson assumption commonly made in the literature. Cancer genomes exhibiting power-law interoccurrence distances were enriched in cancer types where the main mutational process is described to be the activity of the APOBEC protein family, which produces a particular pattern of mutations called Kataegis. Therefore, the observation of a power-law in interoccurence distances could be used to identify cancer genomes with Kataegis.

      PubDate: 2014-09-02T20:59:48Z
  • DNA clustering and genome complexity
    • Abstract: Publication date: Available online 23 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Francisco Dios , Guillermo Barturen , Ricardo Lebrón , Antonio Rueda , Michael Hackenberg , José L. Oliver
      Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of ‘clusters-within-clusters’ parallels the ‘domains within domains’ phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.

      PubDate: 2014-09-02T20:59:48Z
  • Bacterial genomes lacking long-range correlations may not be modeled by
           low-order Markov chains: the role of mixing statistics and frame shift of
           neighboring genes
    • Abstract: Publication date: Available online 30 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Germinal Cocho , Pedro Miramontes , Ricardo Mansilla , Wentian Li
      We examine the relationship between exponential correlation functions and Markov models in a bacterial genome in detail. Despite the well known fact that Markov models generate sequences with correlation function that decays exponentially, simply constructed Markov models based on nearest-neighbor dimer (first-order), trimer (second-order), up to hexamer (fifth-order), and treating the DNA sequence as being homogeneous all fail to predict the value of exponential decay rate. Even reading-frame-specific Markov models (both first- and fifth-order) could not explain the fact that the exponential decay is very slow. Starting with the in-phase coding-DNA-sequence (CDS), we investigated correlation within a fixed-codon-position subsequence, and in artificially constructed sequences by packing CDSs with out-of-phase spacers, as well as altering CDS length distribution by imposing an upper limit. From these targeted analyses, we conclude that the correlation in the bacterial genomic sequence is mainly due to a mixing of heterogeneous statistics at different codon positions, and the decay of correlation is due to the possible out-of-phase between neighboring CDSs. There are also small contributions to the correlation from bases at the same codon position, as well as by non-coding sequences. These show that the seemingly simple exponential correlation functions in bacterial genome hide a complexity in correlation structure which is not suitable for a modeling by Markov chain in a homogeneous sequence. Other results include: use of the (absolute value) second largest eigenvalue to represent the 16 correlation functions and the prediction of a 10–11 base periodicity from the hexamer frequencies.

      PubDate: 2014-09-02T20:59:48Z
  • A new method for predicting essential proteins based on dynamic network
           topology and complex information
    • Abstract: Publication date: October 2014
      Source:Computational Biology and Chemistry, Volume 52
      Author(s): Jiawei Luo , Ling Kuang
      Predicting essential proteins is highly significant because organisms can not survive or develop even if only one of these proteins is missing. Improvements in high-throughput technologies have resulted in a large number of available protein–protein interactions. By taking advantage of these interaction data, researchers have proposed many computational methods to identify essential proteins at the network level. Most of these approaches focus on the topology of a static protein interaction network. However, the protein interaction network changes with time and condition. This important inherent dynamics of the protein interaction network is overlooked by previous methods. In this paper, we introduce a new method named CDLC to predict essential proteins by integrating dynamic local average connectivity and in-degree of proteins in complexes. CDLC is applied to the protein interaction network of Saccharomyces cerevisiae. The results show that CDLC outperforms five other methods (Degree Centrality (DC), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC), PeC and Co-Expression Weighted by Clustering coefficient (CoEWC)). In particular, CDLC could improve the prediction precision by more than 45% compared with DC methods. CDLC is also compared with the latest algorithm CEPPK, and a higher precision is achieved by CDLC. CDLC is available as Supplementary materials. The default settings of active threshold and alpha-parameter are 0.8 and 0.1, respectively.
      Graphical abstract image

      PubDate: 2014-09-02T20:59:48Z
  • The complex task of choosing a de novo assembly: lessons from fungal
    • Abstract: Publication date: Available online 28 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Juan Esteban Gallo , José Fernando Mu noz , Elizabeth Misas , Juan Guillermo McEwen , Oliver Keatinge Clay
      Selecting the values of parameters used by de novo genomic assembly programs, or choosing an optimal de novo assembly from several runs obtained with different parameters or programs, are tasks that can require complex decision-making. A key parameter that must be supplied to typical next generation sequencing (NGS) assemblers is the k-mer length, i.e., the word size that determines which de Bruijn graph the program should map out and use. The topic of assembly selection criteria was recently revisited in the Assemblathon 2 study (Bradnam et al., 2013). Although no clear message was delivered with regard to optimal k-mer lengths, it was shown with examples that it is sometimes important to decide if one is most interested in optimizing the sequences of protein-coding genes (the gene space) or in optimizing the whole genome sequence including the intergenic DNA, as what is best for one criterion may not be best for the other. In the present study, our aim was to better understand how the assembly of unicellular fungi (which are typically intermediate in size and complexity between prokaryotes and metazoan eukaryotes) can change as one varies the k-mer values over a wide range. We used two different de novo assembly programs (SOAPdenovo2 and ABySS), and simple assembly metrics that also focused on success in assembling the gene space and repetitive elements. A recent increase in Illumina read length to around 150bp allowed us to attempt de novo assemblies with a larger range of k-mers, up to 127bp. We applied these methods to Illumina paired-end sequencing read sets of fungal strains of Paracoccidioides brasiliensis and other species. By visualizing the results in simple plots, we were able to track the effect of changing k-mer size and assembly program, and to demonstrate how such plots can readily reveal discontinuities or other unexpected characteristics that assembly programs can present in practice, especially when they are used in a traditional molecular microbiology laboratory with a ‘genomics corner’. Here we propose and apply a component of a first pass validation methodology for benchmarking and understanding fungal genome de novo assembly processes.
      Graphical abstract image Highlights The success of a short-read based genome assembly process in faithfully reproducing the sequences of a real genome, or its genes, can be modulated by some or all of three key parameters: read length r, insert size I, and a bioinformatics parameter, the word length k (k-mer length), which is used in most modern assembly tools based on de Bruijn graphs. The present study focuses on how plots of simple assembly success metrics, and their variation as a function of k, can serve as succinct graphical representations of how the assembly process deals with a given genomic context.

      PubDate: 2014-09-02T20:59:48Z
  • QM/MM–PB/SA scoring of the interaction strength between Akt kinase
           and apigenin analogues
    • Abstract: Publication date: October 2014
      Source:Computational Biology and Chemistry, Volume 52
      Author(s): Jian Lu , Zhuyi Zhang , Zhong Ni , Haijun Shen , Zhigang Tu , Hanqing Liu , Rongzhu Lu
      Identification of small-molecule compounds that can bind specifically and stably to protein targets of biological interest is a challenge task in structure-based drug design. Traditionally, several fast approaches such as empirical scoring functions and free energy analysis have been widely used to fulfill for this purpose. In the current study, we raised the rigorous quantum mechanics/molecular mechanics in combination with semi-empirical Poisson–Boltzmann/surface area (QM/MM–PB/SA) as an efficient strategy to characterize the intermolecular interaction between Akt kinase and its small-molecule ligands, although this hybrid approach is computationally expensive as compared to those empirical methods. In a round of experimental activity reproduction test based on a set of known Akt–inhibitor complexes, QM/MM–PB/SA has been shown to perform much better than two widely used scoring functions as well as the sophisticated MM-PB/SA analysis with or without improvement by molecular dynamics (MD) simulations. Next, the QM/MM–PB/SA was employed to screen for strong Akt binders from an apigenin analogue set. Consequently, four compounds, namely apigenin, quercetin, gallocatechin and myricetin, were suggested to have high binding potency to Akt active site. A further kinase assay was conducted to determine the inhibitory activity of the four promising candidates against Akt kinase, resulting in IC50 values of 38.4, 67.5, 157.1 and 25.5nM, respectively.
      Graphical abstract image

      PubDate: 2014-09-02T20:59:48Z
  • Editorial: Complexity in Genomes
    • Abstract: Publication date: Available online 19 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Yannis Almirantis , Peter Arndt , Wentian Li , Astero Provata

      PubDate: 2014-09-02T20:59:48Z
  • Analysis of correlation structures in the Synechocystis PCC6803 genome
    • Abstract: Publication date: Available online 19 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Zuo-Bing Wu
      Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding regions of genes. The non-periodic correlation structures are generated by non-periodic transfer of several substrings covering or overlapping with the coding regions of genes. In the periodic and non-periodic transfer, some gaps divide the long nucleotide strings into the substrings and prevent their global transfer. Most of the gaps are either the replacement of one base or the insertion/reduction of one base. In the reconstructed phase space, the points generated from two or three steps for the continuous iterative transfer via the second maximal distance can be fitted by two lines. It partly reveals an intrinsic dynamics in the transfer of nucleotide strings. Due to the comparison of the relative positions and lengths, the substrings concerned with the non-periodic correlation structures are almost identical to the mobile elements annotated in the genome. The mobile elements are thus endowed with the basic results on the correlation structures.

      PubDate: 2014-09-02T20:59:48Z
  • Investigating Long Range Correlation in DNA Sequences UsingSignificance
           Tests of Conditional Mutual Information
    • Abstract: Publication date: Available online 20 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Maria Papapetrou , Dimitris Kugiumtzis
      This study exploits the use of Markov chain order estimation from symbol sequences of systems exhibiting long memory or long range correlations (LRC), such as DNA sequences. In the presence of limited sequence length, LRC chain can be approximated by a high order Markov chain. For the order estimation, the parametric significance test of conditional mutual information I C (m) is applied, found in an earlier work to be suitable for high order estimation. Here, it is computationally optimized applying an iterative algorithm for calculating I C (m) at increasing order m, enabling the analysis of long symbol sequences of high Markov chain order or LRC. The simulation study shows that when the true order is reasonably small, the estimated order saturates at the true order with the increase of the symbol sequence length, while when the true order is very large or the chain has LRC, the estimated order increases logarithmically with the symbol sequence length. The order estimation shows a different dependence on the DNA sequence length for bacteria, the plant Arabidopsis thaliana and the human chromosome, indicating a different long memory structure in their DNA.

      PubDate: 2014-09-02T20:59:48Z
  • Menzerath–Altmann law in mammalian exons reflects the dynamics of
           gene structure evolution
    • Abstract: Publication date: Available online 20 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Christoforos Nikolaou
      Genomic sequences exhibit self-organization properties at various hierarchical levels. One such is the gene structure of higher eukaryotes with its complex exon/intron arrangement. Exon sizes and exon numbers in genes have been shown to conform to a law derived from statistical linguistics and formulated by Menzerath and Altmann, according to which the mean size of the constituents of an entity is inversely related to the number of these constituents. We herein perform a detailed analysis of this property in the complete exon set of the mouse genome in correlation to the sequence conservation of each exon and the transcriptional complexity of each gene locus. We show that extensive linear fits, representative of accordance to Menzerath–Altmann law are restricted to a particular subset of genes that are formed by exons under low or intermediate sequence constraints and have a small number of alternative transcripts. Based on this observation we propose a hypothesis for the law of Menzerath–Altmann in mammalian genes being predominantly due to genes that are more versatile in function and thus, more prone to undergo changes in their structure. To this end we demonstrate one test case where gene categories of different functionality also show differences in the extent of conformity to Menzerath–Altmann law.

      PubDate: 2014-09-02T20:59:48Z
  • Molecular dynamics simulations of lectin domain of FimH and
           Immunoinformatics for the design of potential vaccine candidates
    • Abstract: Publication date: Available online 15 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Muthukumar Singaravelu , Anitha Selvan , Sharmila Anishetty
      Adhesion of Uropathogenic E.coli (UPEC) to uroepithelial cell receptors is facilitated through the lectin domain of FimH adhesin. In the current study, Molecular Dynamics (MD) simulations were performed for the lectin domain of FimH from UPEC J96. The high affinity state lectin domain was found to be stable and rigid during the simulations. Further, based on conserved subsequences around one of the disulphide forming cysteines, two sequence motifs were designed. An immunoinformatics approach was utilized to identify linear and discontinuous epitopes for the lectin domain of FimH. We propose that the accessibility of predicted epitopes should also be assessed in a dynamic aqueous environment to evaluate the potential of vaccine candidates. Since MD simulation data enables assessing the accessibility in a dynamic environment, we evaluated the accessibility of the top ranked discontinuous and linear epitopes using structures obtained at every nanosecond (ns) in the 1 – 20ns MD simulation timeframe. Knowledge gained in this study has a potential utility in the design of vaccine candidates for Urinary Tract Infection (UTI).
      Graphical abstract image

      PubDate: 2014-08-18T20:11:46Z
  • Circular code motifs in the ribosome decoding center
    • Abstract: Publication date: Available online 4 August 2014
      Source:Computational Biology and Chemistry
      Author(s): Karim El Soufi , Christian J. Michel
      A translation (framing) code based on the circular code was proposed in Michel (2012) with the identification of X circular code motifs (X motifs shortly) in the bacterial rRNA of Thermus thermophilus, in particular in the ribosome decoding center. Three classes of X motifs are now identified in the rRNAs of bacteria Escherichia coli and Thermus thermophilus, archaea Pyrococcus furiosus, nuclear eukaryotes Saccharomyces cerevisiae, Triticum aestivum and Homo sapiens, and chloroplast Spinacia oleracea. The universally conserved nucleotides A1492 and A1493 in all studied rRNAs (bacteria, archaea, nuclear eukaryotes and chloroplasts) belong to X motifs (called m AA). The conserved nucleotide G530 in rRNAs of bacteria and archaea belongs to X motifs (called m G). Furthermore, the X motif m G is also found in rRNAs of nuclear eukaryotes and chloroplasts. Finally, a potentially important X motif, called m, is identified in all studied rRNAs. With the available crystallographic structures of the Protein Data Bank PDB, we also show that these X motifs m AA, m G and m belong to the ribosome decoding center of all studied rRNAs with possible interaction with the mRNA X motifs and the tRNA X motifs. The three classes of X motifs identified here in rRNAs of several and different organisms strengthen the concept of translation code based on the circular code.
      Graphical abstract image

      PubDate: 2014-08-06T19:31:13Z
  • Title page
    • Abstract: Publication date: August 2014
      Source:Computational Biology and Chemistry, Volume 51

      PubDate: 2014-08-02T19:02:24Z
  • IFC Editorial Board
    • Abstract: Publication date: August 2014
      Source:Computational Biology and Chemistry, Volume 51

      PubDate: 2014-08-02T19:02:24Z
  • Genome-wide identification and predictive modeling of lincRNAs
           polyadenylation in cancer genome
    • Abstract: Publication date: Available online 27 July 2014
      Source:Computational Biology and Chemistry
      Author(s): Shanxin Zhang , Jiuqiang Han , Dexing Zhong , Ruiling Liu , Jiguang Zheng
      Long noncoding RNAs (lncRNAs) play essential regulatory roles in the human cancer genome. Many identified lncRNAs are transcribed by RNA polymerase II in which they are polyadenylated, whereby the long intervening noncoding RNAs (lincRNAs) have been widely used for the researches of lncRNAs. To date, the mechanism of lincRNAs polyadenylation related to cancer is rarely fully understood yet. In this paper, first we reported a comprehensive map of global lincRNAs polyadenylation sites (PASs) in five human cancer genomes; second we proposed a grouping method based on the pattern of genes expression and the manner of alternative polyadenylation (APA); third we investigated the distribution of motifs surrounding PASs. Our analysis reveals that about 70% of PASs are located in the sense strand of lincRNAs. Also more than 90% PASs in the antisense strand of lincRNAs are located in the intron regions. In addition, around 40% of lincRNA genes with PASs have APA sites. Four obvious motifs, i.e. AATAAA, TTTTTTTT, CCAGSCTGG and RGYRYRGTGG, were detected in the sequences surrounding PASs in the normal and cancer tissues. Furthermore, a novel algorithm was proposed to recognize the lincRNAs PASs of tumor tissues based on support vector machine (SVM). The algorithm can achieve the accuracies up to 96.55% and 89.48% for identification the tumor lincRNAs PASs from the non-polyadenylation sites and the non-lincRNA PASs, respectively.
      Graphical abstract image

      PubDate: 2014-07-28T18:36:12Z
  • Advances in bioinformatics: Selected papers from APBC 2014
    • Abstract: Publication date: June 2014
      Source:Computational Biology and Chemistry, Volume 50
      Author(s): Shuigeng Zhou , Yi-Ping Phoebe Chen

      PubDate: 2014-07-25T18:10:58Z
  • IFC Editorial Board
    • Abstract: Publication date: June 2014
      Source:Computational Biology and Chemistry, Volume 50

      PubDate: 2014-07-25T18:10:58Z
  • Title page
    • Abstract: Publication date: June 2014
      Source:Computational Biology and Chemistry, Volume 50

      PubDate: 2014-07-25T18:10:58Z
  • Structure and evolution analysis of pollen receptor-like kinase in Zea
           mays and Arabidopsis thaliana
    • Abstract: Publication date: August 2014
      Source:Computational Biology and Chemistry, Volume 51
      Author(s): Dongxu Wang , He Wang , Muhammad Irfan , Mingxia Fan , Feng Lin
      Receptor-like kinase (RLKs) is an important member in protein kinase family which is widely involved in plant growth, development and defense responses. It is significant to analyze the kinase structure and evolution of pollen RLKs in order to study their mechanisms. In our study, 64 and 73 putative pollen RLKs were chosen from maize and Arabidopsis. Phylogenetic analysis showed that the pollen RLKs were conservative and might had existed before divergence between monocot and dicot which were mainly concentrated in RLCK-VII and LRR-III two subfamilies. Chromosomal localization and gene duplication analysis showed the expansion of pollen RLKs were mainly caused by segmental duplication. By calculating Ka/Ks value of extracellular domain, intracellular domain and kinase domain in pollen RLKs, we found that the pollen RLKs duplicated genes had mainly experienced the purifying selection, while maize might have experienced weaker purifying selection. Meanwhile, extracellular domain might have experienced stronger diversifying selection than intracellular domain in both species. Estimation of duplication time showed that the duplication events of Arabidopsis have occurred approximately between 18 and 69 million years ago, compared to 0.67–170 million years ago of maize.
      Graphical abstract image

      PubDate: 2014-07-25T18:10:58Z
  • Gene Cloning, Homology Comparison and Analysis of the Main Functional
           Structure Domains of beta Estrogen Receptor in Jining Grey Goat
    • Abstract: Publication date: Available online 1 June 2014
      Source:Computational Biology and Chemistry
      Author(s): Hai-gang Liu , Hong-mei Li , Shuying Wang , Libo Huang , Hui-jun Guo
      To clarify the molecular evolution and characteristic of beta estrogen receptor (ERβ) gene in Jining Grey goat in China, the entire ERβ gene from Jining Grey goat ovary was amplified, identified and sequenced, and the gene sequences were compared with those of other animals. Functional structural domains and variations in DNA binding domains (DBD) and ligand binding domains (LBD) between Jining Grey goat and Boer goat were analyzed. The results indicate that the ERβ gene in Jining Grey goat includes a 1584bp sequence with a complete open-reading-frame (ORF), encoding a 527 amino acid (aa) receptor protein. Compared to other species, the nucleotide homology is 73.9%-98.9% and the amino acid homology is 79.5%-98.5%. The main antigenic structural domains lie from the 97th aa to the 286th aa and from the 403rd aa to the 527th aa. The hydrophilicity and the surface probability of the structural domains are distributed throughout a range of amino acids. There are two different amino acids in the DBD and three different amino acids in the LBD between Jining Grey and Boer goats, resulting in dramatically different spatial structures for ERβ protein. These differences may explain the different biological activities of ERβ between the two goat species. This study firstly acquired the whole ERβ gene sequence of Jining Grey goat with a complete open reading frame, and analyzed its gene evolutionary relationship and predicted its mainly functional structural domains, which may very help for further understanding the genome evolution and gene diversity of goat ERβ.
      Graphical abstract image

      PubDate: 2014-06-05T14:50:14Z
  • In silico study of potential autoimmune threats from rotavirus infection
    • Abstract: Publication date: Available online 4 June 2014
      Source:Computational Biology and Chemistry
      Author(s): Tapati Sarkar , Sukhen Das , Papiya Nandy , Rahul Bhowmick , Ashesh Nandy
      Rotavirus, the major cause of infantile nonbacterial diarrhea, was found to be associated with development of diabetes-associated auto-antibodies. In our study we tried to find out further potential autoimmune threats of this virus using bioinformatics approach. We took rotaviral proteins to study similarity with Homo sapiens proteome and found most conserved structural protein VP6 matches at two regions with ryanodine receptor, an autoimmune target associated with myasthenia gravis. Myasthenia gravis, a chronic neurodegenerative auto immune disorder with no typical known reason, is characterized by fluctuating muscle weakness which is typically enhanced during muscular effort. Affected patients generate auto antibodies against mainly acetyl choline receptor and sarcoplasmic reticulum calcium-release channel protein ryanodine receptor. Further, we observed that two regions which matched with ryanodine receptor remain conserved in all circulating rotaviral strains and showed significant antigenecity with respect to myasthenia gravis associated HLA haplotypes. Overall, our study detected rotaviral VP6 as a potential threat for myasthenia gravis and enlighten an area of virus associated autoimmune research.
      Graphical abstract image

      PubDate: 2014-06-05T14:50:14Z
  • A Computational Prospect to Aspirin Side Effects: Aspirin and COX-1
           Interaction Analysis based on Non-Synonymous SNPs
    • Abstract: Publication date: Available online 2 June 2014
      Source:Computational Biology and Chemistry
      Author(s): Mojtabavi Naeini Marjan , Mesrian Tanha Hamzeh , Emamzadeh Rahman , Vallian Sadeq
      Aspirin (ASA) is a commonly used nonsteroidal anti-inflammatory drug (NSAID), which exerts its therapeutic effects through inhibition of cyclooxygenase (COX) isoform 2 (COX-2), while the inhibition of COX-1 by ASA leads to apparent side effects. In the present study, the relationship between COX-1 non-synonymous single nucleotide polymorphisms (nsSNPs) and Aspirin related side effects was investigated. The functional impacts of 37 nsSNPs on aspirin inhibition potency of COX-1 with COX-1/aspirin molecular docking were computationally analyzed, and each SNP was scored based on DOCK AMBER score. The data predicted that 22 nsSNPs could reduce COX-1 inhibition, while 15 nsSNPs showed increasing inhibition level in comparison to the regular COX-1 protein. In order to perform a comparing state, the AMBER scores for two Arg119 mutants (R119A and R119Q) were also calculated. Moreover, among nsSNP variants, rs117122585 represented the closest AMBER score to R119A mutant. A separate docking computation validated the score and represented a new binding position for ASA that acetyl group was located within the distance of 3.86 Å from Ser529 OH group. This could predict an associated loss of activity of ASA through this nsSNP variant. Our data represent a computational sub-population pattern for aspirin COX-1 related side effects, and provide basis for further research on COX-1/ASA interaction.
      Graphical abstract image

      PubDate: 2014-06-05T14:50:14Z
  • A computational method of predicting regulatory interactions in
           Arabidopsis based on gene expression data and sequence information
    • Abstract: Publication date: Available online 9 May 2014
      Source:Computational Biology and Chemistry
      Author(s): Xiaoqing Yu , Hongyun Gao , Xiaoqi Zheng , Chun Li , Jun Wang
      Inferring transcriptional regulatory interactions between transcription factors (TFs) and their targets has utmost importance for understanding the complex regulatory mechanisms in cellular system. In this paper, we introduced a computational method to predict regulatory interactions in Arabidopsis based on gene expression data and sequence information. Support vector machine (SVM) and Jackknife cross-validation test were employed to perform our method on a collected dataset including 178 positive samples and 1068 negative samples. Results showed that our method achieved an overall accuracy of 98.39% with the sensitivity of 94.88%, and the specificity of 93.82%, which suggested that our method can serve as a potential and cost-effective tool for predicting regulatory interactions in Arabidopsis.
      Graphical abstract image

      PubDate: 2014-05-10T10:33:11Z
  • Structural evaluation of BTK and PKCδ mediated phosphorylation of MAL
           at positions Tyr86 and Tyr106
    • Abstract: Publication date: Available online 19 April 2014
      Source:Computational Biology and Chemistry
      Author(s): Rehan Zafar Paracha , Amjad Ali , Jamil Ahmad , Riaz Hussain , Umar Niazi , Syed Aun Muhammad
      A number of diseases including sepsis, rheumatoid arthritis, diabetes, cardiovascular diseases and hyperinflammatory immune disorders have been associated with Toll like receptor (TLR) 2 and TLR4. Endogenous adaptor protein known as MyD88 Adapter–like protein (MAL) bind exclusively to the cytosolic portions of TLR2 and TLR4 to initiate downstream signalling. Brutons Tyrosine Kinase (BTK) and Protein Kinase C delta (PKCδ) have been implicated to phosphorylate MAL and activate it to initiate downstream signalling. BTK has been associated with phosphorylation at positions Tyr86 and Tyr106, necessary for the activation of MAL but definite residual target of PKCδ in MAL is still to be explored. To produce a better understanding of the functional domains involved in the formation of MAL-kinase complexes, computer-aided studies were used to characterize the protein-protein interactions (PPIs) of phosphorylated BTK and PKCδ with MAL. Docking and physicochemical studies indicated that BTK was involved in close contact with Tyr86 and Tyr106 of MAL whereas PKCδ may phosphorylate Tyr106 only. Moreover, the electrostatics charge distribution of binding interfaces of BTK and PKCδ were distinct but compatible with respective regions of MAL. Our results implicate that position of Tyr86 is specifically phosphorylated by BTK whereas Tyr106 can be phosphorylated by competitive action of both BTK and PKCδ. Additionally, the residues of MAL which are necessary for interaction with TLR2, TLR4, MyD88 and SOCS-1 also play their roles in maintaining interaction with kinases and can be targeted in future to reduce TLR2 and TLR4 induced pathological responses.
      Graphical abstract image Highlights

      PubDate: 2014-04-23T09:25:06Z
  • Investigation of phase shifts for different period lengths in the genomes
           of C.elegans, D.melanogaster and S.cerevisiae
    • Abstract: Publication date: Available online 13 April 2014
      Source:Computational Biology and Chemistry
      Author(s): Pugacheva Valentina , Frenkel Felix , Korotkov Eugene
      We describe a new mathematical method for finding very diverged short tandem repeats containing a single indel. The method involves comparison of two frequency matrices: a first matrix for a subsequence before shift and a second one for a subsequence after it. A measure of comparison is based on matrix similarity. The approach developed was applied to analysis of the genomes of C.elegans, D.melanogaster and S.cerevisiae. They were investigated regarding the presence of tandem repeats having repeat length equal to 2 and 4-11 nucleotides. A number of phase shift regions for these genomes was approximately 2.2×104, 1.5×104 and 1.7×102, respectively. Type I error was less than 5%. The mean length of fuzzy periodicity and phase shift regions was about 220 nucleotides. The regions of fuzzy periodicity having single insertion or deletion occupy substantial parts of the genomes: 5%, 3% and 0.3%, respectively. Only less than 10% of these regions have been detected previously. That is, the number of such regions in the genomes of C.elegans, D.melanogaster and S.cerevisiae is dramatically higher than it has been revealed by any known methods. We suppose that some found regions of fuzzy periodicity could be the regions for protein binding.
      Graphical abstract image

      PubDate: 2014-04-18T08:31:10Z
  • All-atomic Molecular Dynamic Studies of Human CDK8: Insight into the
           A-loop, Point Mutations and Binding with Its Partner CycC
    • Abstract: Publication date: Available online 3 April 2014
      Source:Computational Biology and Chemistry
      Author(s): Wu Xu , Benjamin Amire-Brahimi , Xiao-Jun Xie , Liying Huang , Jun-Yuan Ji
      The Mediator, a conserved multisubunit protein complex in eukaryotic organisms, regulates gene expression by bridging sequence-specific DNA-binding transcription factors to the general RNA polymerase II machinery. In yeast, Mediator complex is organized in three core modules (head, middle and tail) and a separable ‘CDK8 submodule’ consisting of four subunits including Cyclin-dependent kinase CDK8 (CDK8), Cyclin C (CycC), MED12, and MED13 The 3-D structure of human CDK8-CycC complex has been recently experimentally determined. To take advantage of this structure and the improved theoretical calculation methods, we have performed molecular dynamic simulations to study dynamics of CDK8 and two CDK8 point mutations (D173A and D189N), which have been identified in human cancers, with and without full length of the A-loop as well as the binding between CDK8 and CycC. We found that CDK8 structure gradually loses two helical structures during the 50-ns molecular dynamic simulation, likely due to the presence of the full-length A-loop. In addition, our studies showed the hydrogen bond occupation of the CDK8 A-loop increases during the first 20-ns MD simulation and stays stable during the later 30-ns MD simulation. Four residues in the A-loop of CDK8 have high hydrogen bond occupation, while the rest residues have low or no hydrogen bond occupation. The hydrogen bond dynamic study of the A-loop residues exhibits three types of changes: increasing, decreasing, and stable. Furthermore, the 3-D structures of CDK8 point mutations D173A, D189N, T196A and T196D have been built by molecular modeling and further investigated by 50-ns molecular dynamic simulations. D173A has the highest average potential energy, while T196D has the lowest average potential energy, indicating that T196D is the most stable structure. Finally, we calculated theoretical binding energy of CDK8 and CycC by MM/PBSA and MM/GBSA methods, and the negative values obtained from both methods demonstrate stability of CDK8-CycC complex. Taken together, these analyses will improve our understanding of the exact functions of CDK8 and the interaction with its partner CycC.
      Graphical abstract image

      PubDate: 2014-04-04T04:24:48Z
  • In-silico study of anti-carcinogenic Lysyl Oxidase-Like 2 inhibitors
    • Abstract: Publication date: Available online 20 March 2014
      Source:Computational Biology and Chemistry
      Author(s): Syed Aun Muhammad , Amjad Ali , Tariq Ismail , Rehan Zafar , Umair Ilyas , Jamil Ahmad
      Lysyl oxidase homolog 2 (LOXL2), also known as Lysyl oxidase-like protein 2 is recently been explored as regulator of carcinogenesis and has been shown to be involved in tumor progression and metastasis of several carcinomas. Therefore LOXL2 has been considered as potential therapeutic target. Doing so, its inhibitors as new chemotherapeutic lead molecules: 4-Amino-5-(2-Hydroxyphenyl)-1,2,4-Triazol-3-Thione (2a) and 4-(2-hydroxybenzalidine) amine-5-(2-hydroxy) phenyl-1,2,4-triazole-3-thiol (2b) are synthesized by fusion method (refluxed at 160°C). Spectral analysis of these triazole derivatives are characterized by FTIR and NMR. Active binding sites and quality of the LOXL2 model is assessed by Ramachandran plots and finally drug-target analysis is performed by computational virtual screening tools. Compounds 2a and 2b showed optimum target binding affinity with -6.2Kcal/mol and -8.9Kcal/mol binding energies. This in-silico study will add to our understanding of the drug designing and development, and to target cancer-causing proteins more precisely and quickly than before.

      PubDate: 2014-03-22T23:22:05Z
  • Determining common insertion sites based on retroviral insertion
           distribution across tumors
    • Abstract: Publication date: Available online 12 March 2014
      Source:Computational Biology and Chemistry
      Author(s): Feng Chen , Zhoufang Li , Yi-Ping Phoebe Chen
      A CIS (Common Insertion Site) indicates a genome region that is hit more frequently by retroviral insertions than expected by chance. Such a region is strongly related to cancer gene loci, which leads to the detection of cancer genes. An algorithm for detecting CISs should satisfy the following: 1) it does not require any prior knowledge of underlying insertion distribution; 2) it can resolve the insertion biases caused by hotspots; 3) it can detect CISs of any biological width; 4) it can identify noises resulting from statistic mistakes and non-CIS insertions; and 5) it can identify the widths of CISs as accurately as possible. We develop a method to resolve these difficulties. We verify a region's significance from two perspectives: distribution width and distribution depth. The former indicates how many insertions in a region while the latter evaluates the insertion distribution across the tumors in a region. We compare our method with kernel density estimation and sliding window on the simulated data, showing that our method not only identifies cancer-related insertions effectively, but also filters noises correctly. The experiments on the real data show that taking insertion distribution into account can highlight significant CISs. We detect 53 novel CISs, some of which have been proven correct by the biological literature.
      Graphical abstract image

      PubDate: 2014-03-14T22:32:16Z
  • Fast detection of high-order epistatic interactions in genome-wide
           association studies using information theoretic measure
    • Abstract: Publication date: Available online 27 January 2014
      Source:Computational Biology and Chemistry
      Author(s): Sangseob Leem , Hyun-hwan Jeong , Jungseob Lee , Kyubum Wee , Kyung-Ah Sohn
      There are many algorithms for detecting epistatic interactions in GWAS. However, most of these algorithms are applicable only for detecting two-locus interactions. Some algorithms are designed to detect only two-locus interactions from the beginning. Others do not have limits to the order of interactions, but in practice take very long time to detect higher order interactions in real data of GWAS. Even the better ones take days to detect higher order interactions in WTCCC data. We propose a fast algorithm for detection of high order epistatic interactions in GWAS. It runs k-means clustering algorithm on the set of all SNPs. Then candidates are selected from each cluster. These candidates are examined to find the causative SNPs of k-locus interactions. We use mutual information from information theory as the measure of association between genotypes and phenotypes. We tested the power and speed of our method on extensive sets of simulated data. The results show that our method has more or equal power, and runs much faster than previously reported methods. We also applied our algorithm on each of seven diseases in WTCCC data to analyze up to 5-locus interactions. It takes only a few hours to analyze 5-locus interactions in one dataset. From the results we make some interesting and meaningful observations on each disease in WTCCC data. In this study, a simple yet powerful two-step approach is proposed for fast detection of high order epistatic interaction. Our algorithm makes it possible to detect high order epistatic interactions in GWAS in a matter of hours on a PC.

      PubDate: 2014-01-30T17:17:58Z
  • lncRNAMap: A map of putative regulatory functions in the long non-coding
    • Abstract: Publication date: Available online 23 January 2014
      Source:Computational Biology and Chemistry
      Author(s): Wen-Ling Chan , Hsien-Da Huang , Jan-Gowth Chang
      Background Recent studies have demonstrated the importance of long non-coding RNAs (lncRNAs) in chromatin remodeling, and in transcriptional and post-transcriptional regulation. However, only a few specific lncRNAs are well understood, whereas others are completely uncharacterised. To address this, there is a need for user-friendly platform to studying the putative regulatory functions of human lncRNAs. Description lncRNAMap is an integrated and comprehensive database relating to exploration of the putative regulatory functions of human lncRNAs with two mechanisms of regulation, by encoding siRNAs and by acting as miRNA decoys. To investigate lncRNAs producing siRNAs that regulate protein-coding genes, lncRNAMap integrated small RNAs (sRNAs) that were supported by publicly available deep sequencing data from various sRNA libraries and constructed lncRNA-derived siRNA-target interactions. In addition, lncRNAMap demonstrated that lncRNAs can act as targets for miRNAs that would otherwise regulate protein-coding genes. Previously studies indicated that intergenic lncRNAs (lincRNAs) either positive or negative regulated neighboring genes, therefore, lncRNAMap surveyed neighboring genes within a 1 Mb distance from the genomic location of specific lncRNAs and provided the expression profiles of lncRNA and its neighboring genes. The gene expression profiles may supply the relationship between lncRNA and its neighboring genes. Conclusions lncRNAMap is a powerful user-friendly platform for the investigation of putative regulatory functions of human lncRNAs with producing siRNAs and acting as miRNA decoy. lncRNAMap is freely available on the web at ht*tp://

      PubDate: 2014-01-26T22:15:30Z
  • Predicting Essential Genes for Identifying Potential Drug Targets In
           Aspergillus fumigatus
    • Abstract: Publication date: Available online 23 January 2014
      Source:Computational Biology and Chemistry
      Author(s): Yao Lu , Jingyuan Deng , Judith C. Rhodes , Hui Lu , Long Jason Lu
      Background Aspergillus fumigatus (Af) is a ubiquitous and opportunistic pathogen capable of causing acute, invasive pulmonary disease in susceptible hosts. Despite current therapeutic options, mortality associated with invasive Af infections remains unacceptably high, increasing 357% since 1980. Therefore, there is an urgent need for the development of novel therapeutic strategies, including more efficacious drugs acting on new targets. Thus, as noted in a recent review, “the identification of essential genes in fungi represents a crucial step in the development of new antifungal drugs”. Expanding the target space by rapidly identifying new essential genes has thus been described as “the most important task of genomics-based target validation”. Results In previous research, we were the first to show that essential gene annotation can be reliably transferred between distantly related four Prokaryotic species. In this study, we extend our machine learning approach to the much more complex Eukaryotic fungal species. A compendium of essential genes is predicted in Af by transferring known essential gene annotations from another filamentous fungus N. crassa. This approach predicts essential genes by integrating diverse types of intrinsic and context-dependent genomic features encoded in microbial genomes. The predicted essential datasets contained 1,674 genes. We validated our results by comparing our predictions with known essential genes in Af, comparing our predictions with those predicted by homology mapping, and conducting conditional expressed alleles. We applied several layers of filters and selected a set of potential drug targets from the predicted essential genes. Finally, we have conducted wet lab knockout experiments to verify our predictions, which further validates the accuracy and wide applicability of the machine learning approach. Conclusions The approach presented here significantly extended our ability to predict essential genes beyond orthologs and made it possible to predict an inventory of essential genes in Eukaryotic fungal species, amongst which a preferred subset of suitable drug targets may be selected. By selecting the best new targets, we believe that resultant drugs would exhibit an unparalleled clinical impact against a naive pathogen population. Additional benefits that a compendium of essential genes can provide are important information on cell function and evolutionary biology. Furthermore, mapping essential genes to pathways may also reveal critical check points in the pathogen's metabolism. Finally, this approach is highly reproducible and portable, and can be easily applied to predict essential genes in many more pathogenic microbes, especially those unculturable.

      PubDate: 2014-01-26T22:15:30Z
  • Identification and characterization of lysine-methylated sites on histones
           and non-histone proteins
    • Abstract: Publication date: Available online 24 January 2014
      Source:Computational Biology and Chemistry
      Author(s): Tzong-Yi Lee , Cheng-Wei Chang , Cheng-Tzung Lu , Tzu-Hsiu Cheng , Tzu-Hao Chang
      Protein methylation is a kind of post-translational modification (PTM), and typically takes place on lysine and arginine amino acid residues. Protein methylation is involved in many important biological processes, and most recent studies focused on lysine methylation of histones due to its critical roles in regulating transcriptional repression and activation. Histones possess highly conserved sequences and are homologous in most species. However, there is much less sequence conservation among non-histone proteins. Therefore, mechanisms for identifying lysine-methylated sites may greatly differ between histones and non-histone proteins. Nevertheless, this point of view was not considered in previous studies. Here we constructed two support vector machine (SVM) models by using lysine-methylated data from histones and non-histone proteins for predictions of lysine-methylated sites. Numerous features, such as the amino acid composition (AAC) and accessible surface area (ASA), were used in the SVM models, and the predictive performance was evaluated using five-fold cross-validations. For histones, the predictive sensitivity was 85.62% and specificity was 80.32%. For non-histone proteins, the predictive sensitivity was 69.1% and specificity was 88.72%. Results showed that our model significantly improved the predictive accuracy of histones compared to previous approaches. In addition, features of the flanking region of lysine-methylated sites on histones and non-histone proteins were also characterized and are discussed. A gene ontology functional analysis of lysine-methylated proteins and correlations of lysine-methylated sites with other PTMs in histones were also analyzed in detail. Finally, a web server, MethyK, was constructed to identify lysine-methylated sites. MethK now is available at

      PubDate: 2014-01-26T22:15:30Z
  • Deciphering Histone Code of Transcriptional Regulation in Malaria
           Parasites by Large-scale Data Mining
    • Abstract: Publication date: Available online 23 January 2014
      Source:Computational Biology and Chemistry
      Author(s): Haifen Chen , Stefano Lonardi , Jie Zheng
      Histone modifications play a major role in the regulation of gene expression. Evidence has been accumulated to show that histone modifications mediate biological processes such as transcription cooperatively. This has led to the hypothesis of ‘histone code’ which suggests that combinations of different histone modifications correspond to unique chromatin states and have distinct functions. In this paper, we propose a framework based on association rule mining to discover the potential regulatory relations between histone modifications and gene expression in Plasmodium falciparum. Our approach can output rules with statistical significance. Some of the discovered rules are supported by literature of experimental results. Moreover, we have also discovered de novo rules which can guide further research in epigenetic regulation of transcription. Based on our association rules we build a model to predict gene expression, which outperforms a published Bayesian network model for gene regulation by histone modifications. The results of our study reveal mechanisms for histone modifications to regulate transcription in large-scale. Among our findings, the cooperation among histone modifications provides new evidence for the hypothesis of histone code. Furthermore, the rules output by our method can be used to predict the change of gene expression.

      PubDate: 2014-01-23T18:42:06Z
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2014