Pharmaceutical Bioinformatics - University of Freiburg



The Pharmaceutical Bioinformatics lab is working on several projects dealing with the discovery of new drugs, analysis of the mechanisms of known compounds as well as the biosynthesis of natural compounds. Some selected projects are described here.

Automated recognition of functional compound-protein relationships in literature

PubMed is a database containing millions of references to biomedical publications. Searching for protein-compound interactions in this continuously growing amount of literature can be a difficult and time-consuming task.

Text mining and machine learning techniques are applied here to identify the functional compound-protein relationships in all titles and abstracts of the PubMed database. (Döring et al., 2020)

DNA Methylation

Methylation of cytosins within a CpG dinucleotide is a common epigenetic DNA modification and may arrest cells in a pathogenic state in complex disorders, e.g. cancer or rheumatoid arthritis. CpGs occur mainly in clusters, called CpG islands (CPIs), being present in nearly 70% of the human genes’ promotor region. The Illumina HumanMethylation450 Beadchip platform provides a genome-wide coverage of 485,577 CpGs. Analysis of these CpGs reveals a correlation between changes in DNA methylation and gene expression, even though not all sites have the same impact.

The methylation state of one CpG or a whole CPI may influence the expression of the corresponding gene due to binding of Methylation-Binding-Domains (MBD) and other methylation dependent proteins. To identify CpGs influencing gene expression and common methylation patterns we use several approaches, e.g. network analysis and machine learning techniques. (Related article: Heßelbach et al., 2017.)

Epigenetic drug discovery

Epigenetic mechanisms are essential for normal cellular development and maintenance of cellular homeostasis. In the past few years, it has been well established that epigenetic aberrations play an important role in a wide range of human diseases, including cancer. Unlike genetic mutations, epigenetic modifications are reversible, which makes them an attractive target for disease therapy. As a consequence, the last decade has seen an emergence of small molecule inhibitors against new epigenetic targets being implicated in various diseases, particularly cancer.

We use in silico methods (including molecular docking and virtual screening as well as ligand-based approaches such as pharmacophore modeling and QSAR, network analysis and MD simulations) for the design of inhibitors for a broad array of potential targets including epigenetic targets (e.g. bromodomains, chromodomains, HDACs etc.), methyltransferases as well as cofactor-binding proteins. (Publication in preparation.)

The diagram displays the superposition of the KAc binding sites of p300 (green) and CBP (yellow) in complex with XDM-CBP (Hügle et al., 2017).

Genome-based secondary metabolite prediction

The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not yet been identified. Structural elucidation of the actual secondary metabolite is still challenging, especially due to the currently unpredictable post-modifications.

To address this, SeMPI was designed, a web server providing a Secondary Metabolite Prediction and Identification pipeline for natural products synthesized by polyketide synthases of type I modular (Zierep et al., 2017). Further extensions of SeMPI require the improvement of state-of-the-art prediction algorithms, but also the implementation of new algorithms adapted to the metabolite in focus.

The core of all secondary metabolite prediction approaches is based on the accurate functional classification of the proteins responsible for its synthesis. In order to facilitate this task a pipeline was designed which merges all crucial steps for efficient protein classification. It allows for the collection and annotation of related sequences. These can be used for parallel benchmarking of suitable machine learning algorithms, such as hidden Markov profiles, position specific scoring matrices and optimized decision trees, accompanied by careful parameter optimization of each design. The most efficient classification system can then be incorporated into the prediction software. Advantages of this set-up are the straightforward evaluation of a newly created classification algorithm, as well as an update option, which keeps the learning data up-to-date and therefore the ability to build prediction rules for various kinds of gene cluster products, such as polyketides of type I iterative and nonribosomal peptides.

Currently the feasibility of structure based classifications are also evaluated by incorporation of secondary structure assignment tools.

Genome-scale metabolic modelling and Flux Balance Analysis

Streptomyces are a genus of ubiquitous soil bacteria known for the rich diversity of biological active secondary metabolites they produce. The expression of these compounds is often coupled to complex phenotypic changes, including morphological differentiation and reorganization of the organism's metabolic machinery. Despite these comprehensive changes, the yield of the produced substances is typically rather low.

In order to explore these intricate regulation mechanisms and their effects, we employ genome-scale metabolic models. These models are generated from genomic sequence data and include all metabolic processes the organism is capable of. These models can be used to systematically organize and interpret data from all kinds of ‘omic’ experiments. With their help, the effect of transcriptional changes or differing protein levels can be deduced using a detailed metabolic simulation. (Publication in preparation.)

Additionally, we use the insights gained to develop efficient and novel metabolic engineering strategies. The models open up the possibility of predicting the consequences of genetic modifications on a global level, allowing for a rational pre-selection of in vitro experiments.

With these combined efforts we want to improve access to rare and therefore often expensive biogenic drugs.

Molecular Dynamics (MD) simulation of therapeutic relevant protein targets

We are interested in understanding the structural dynamics and the signaling cascade mechanism of relevant biological targets with therapeutic potentials. We study the microscale atomistic dynamics through distributed MD simulations, using a BinAC high performance computing cluster. Currently, we are working on kinases and epigenetic drug targets. (Publication in preparation.)

Structural Analysis of Privileged Chemotypes in Drug Discovery

CovPDB is a freely accessible web database solely dedicated to high-resolution 3D structures of biologically relevant covalent protein-ligand complexes, mined from the Protein Data Bank (PDB). We have so created CovPDB to assist structure-based approaches in chemical biology and drug design by identifying covalent binding sites suitable for the docking of drug-like ligands, and, likewise, typical ligands that covalently modify targetable binding sites. Furthermore, the corresponding covalent bonding mechanisms of such complexes were manually expertly annotated. For these curated complexes, the chemical structures and warheads of pre-reactive electrophilic ligands as well as the covalent bonding mechanisms to their target proteins were expertly manually annotated. Totally, CovPDB contains 733 proteins and 1,501 ligands, relating to 2,294 cP-L complexes, 93 reactive warheads, 14 targetable residues, and 21 covalent mechanisms. Users are provided with an intuitive and interactive web interface that allows multiple search and browsing options to explore the covalent interactome at a molecular level in order to develop novel TCIs (Gao et al., 2021). Based on a comprehensive characterization of the covalently targeted cysteine residues, we developed a machine learning model covalent cysteine predictor (CoCyPred) to identify targeted cystine residues.