GCB 2021-Logo


Metabolic Reconstruction and Flux Modelling

Sam Seaver, Nadine Töpfer
IPK Gatersleben

Part I: Introduction to Model Reconstruction
Genome-scale metabolic reconstructions help us to understand and engineer metabolism. Next-generation sequencing technologies are delivering genomes and transcriptomes for an ever-widening range of species. While such omic data can, in principle, be used to compare metabolic reconstructions in different species, organs and environmental conditions, these comparisons require a standardized framework for the reconstruction of metabolic networks from transcript data. We introduce PlantSEED as such a framework to cover the primary metabolism for any plant species.

Part II: Introduction to Flux Balance Analysis
Flux Balance Analysis (FBA) is a linear programming technique for predicting metabolic flux distributions by using the underlying stoichiometry of a metabolic network and a biologically motivated objective function. FBA was initially developed to model steady-state metabolism of microbial systems. Over the past decade various applications to modelling plant metabolic systems have emerged. Genome-scale metabolic reconstructions, for instance those reconstructed by the PlantSEED framework, can be analysed using FBA and extensions of it.


Modern hashing for alignment-free sequence analysis

Sven Rahmann - Center for Bioinformatics, Saarland University
Jens Zentgraf - TU Dortmund University

In recent years, alignment free sequence analysis methods have gained importance, due to their superior speed at equivalent results in comparison to traditional mapping- and alignment-based methods. Recently, methods have emerged that are able to index very large collections of sequenced DNA samples (e.g. any genome ever sequenced).

The basis of each alignment-free method is a so called k-mer dictionary (or key-value-store) hat associates a value (e.g., a transcript ID, chromosome number, species ID or counter) to each DNA substring of length k (from a genome or a sequenced sample). Almost always, such a dictionary is implemented via hashing. Ideally, considering that billions of k-mers have to be processed, such a hash table is both small and fast. It is both a science and an art to design fast and small hash tables for a given task.

In this two-part tutorial/workshop, we will first give an overview of state-of-the-art hashing methods (and low-level engineering tricks) and then have a small number of contributed application-specific talks.

The tutorial part is addressed to bioinformaticians who would like to know more about the underlying hashing algorithms. Following the tutorial will enable you to better understand the underlying methods (and their limitations) of many state-of-the-art sequence analysis tools in genomics, transcriptomics, metagenomics and pangenomics. It will also help you to design your own method efficiently when the need arisies.

The tutorial will cover:
- introduction to k-mer key-value stores
- canonical encodings of k-mers
- basics of Cuckoo hashing
- extensions of Cuckoo hashing: buckets, several hash functions
- theoretical load thresholds
- bit-level layout of a hash table
- performance engineering:
- importance of cache locality and prefetching
- shortcuts for unsuccessful lookups
- compact encoding of hash choices and values
- optimization of a hash table
- parallelization

The workshop part will cover several application examples.
The following list is preliminary and may change until the conference.
- general-purpose k-mer counting
- metagenomic read classification
- xenograft sorting, i.e., classifying reads from a mixed sample containing graft tumor and host reads


Bioinformatics tools for analyzing clinical metaproteomics samples of the human gut

Dr. Robert Heyer - Otto von Guericke University, Magdeburg
Dr. Stephan Fuchs - Robert Koch-Institut, Berlin
Dr. Thilo Muth - Federal Institute for Materials Research and Testing, Berlin
Kay Schallert - Otto von Guericke University, Magdeburg
Prof. D. Benndorf - Otto von Guericke University, Magdeburg, FH Köthen

Metaproteomics analyzes the entirety of proteins from whole microbial communities such as complex microbiomes from medical and technical applications, e.g., in fecal diagnostics and the operation of biogas plants or wastewater treatment plants. A precondition for successful metaproteomics studies is, in addition to experimental knowledge, comprehensive knowledge about the bioinformatics data evaluation (Heyer et al., 2017). This workshop aims to train people by a hands-on workshop in the required bioinformatics tools and skills required for the complete workflow of metaproteomics data analysis. It starts with identifying peptides and inferring proteins from mass spectrometry data using the MetaProteomeAnalyzer (Muth et al., 2015, Muth et al., 2018, Heyer et al., 2019) and the taxonomic and functional annotation using Prophane (Schiebenhoefer et al. 2020). Subsequently, we will illuminate the biostatistical data analysis and data visualization. As a use case, we selected fecal samples from patients with inflammatory bowel disease.

Heyer, R., Schallert, K., Zoun, R., Becher, B., Saake, G., Benndorf, D., (2017). Challenges and perspectives of metaproteomic data analysis. Journal of Biotechnology, 261:24-36. doi: 10.1016/j.jbiotec.2017.06.1201
Heyer, R., Schallert, K., (shared first), Büdel, A., Zoun, R., Dorl, S., Kohrs, F., Püttker, S., Siewert, C., Muth, T., Saake, G., Reichl, U., Benndorf, D., (2019) MPA-WORKFLOW: A robust and universal metaproteomics workflow for research studies and routine diagnostics within 24°h using phenol extraction, FASP digest, and the MetaProteomeAnalyzer. Frontiers in Microbiology, 10: 1883. doi: 10.3389/fmicb.2019.01883
Muth, T., Behne, A., Heyer, R., Kohrs, F., Benndorf, D., Hoffmann, M., Lehtevä, M., Reichl, U., Martens, L., Rapp, E. (2015). The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. Journal of Proteome Research, 14, 1557-1565. doi: 10.1002/pmic.201400559
Muth, T., Kohrs, F., Heyer, R., Benndorf, D., Rapp, E., Reichl, U., Martens, L., Renard, B.Y. (2018) MPA Portable: a Stand-alone Software Package for Analyzing Metaproteome Samples on the Go. Analytical Chemistry, 90 (1), pp 685–689, doi: 10.1021/acs.analchem.7b03544
Schiebenhoefer, H., Schallert, K., Renard, B.Y., Trappe, K., Schmid, E., Benndorf, E., Riedel, K., Muth, T., Fuchs, S., (2020), A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nature Protocols, 15, 3212–3239. https://doi.org/10.1038/s41596-020-0368-7


Cell segmentation using KNIME Analytics Platform and its Tensorflow2 Integration

Janina Mothes, PhD - Data Scientist (Life Sciences), KNIME GmbH
Temesgen H. Dadi, PhD - Technical Data Scientist (Life Sciences), KNIME GmbH
Jeanette Prinz, PhD - Team Lead (Life Sciences), KNIME GmbH

Image analysis is one of the hallmarks of biomedical research due to its wide range of potential applications. This includes enhancing our understanding of brain function by analyzing the connectivity of individual neuronal processes and synapses through serial transmission electron microscopy (EM). Machine learning approaches, in particular convolutional neural networks, allow the automatic segmentation of neural structures in EM images, an important step towards automating the extraction of neuronal connectivity.
The open source KNIME Analytics Platform offers an accessible tool based on the visual programming paradigm to analyse diverse kinds of data, including images. In addition, one can choose from a wide array of data transformations, machine learning algorithms, and visualizations and combine those in one reproducible workflow. KNIME Analytics Platform is freely available from ​https://www.knime.com/downloads​.
In this hands-on tutorial, participants will produce a workflow to create and train a specific Convolutional Network (U-Net) for segmenting cell images. We will start by importing and cleaning up the input data (Transmission Electron Microscopy data)[1,2]. Afterwards, with the help of the KNIME Tensorflow2 integration, we will then train a U-Net model and use the trained network to predict the segmentation of unseen data. In the last step, we visualize our results.

Participants will learn how to
- Use the open source KNIME Analytics Platform for importing, blending and transforming data
- Work with images in KNIME Analytics Platform
- Train a U-Net model and apply it to unseen data
- Visualize the results

Intended audience and level - Beginner
Students (grad/undergrad), researchers, principal investigators with an interest in machine learning, images, data manipulation are welcome to attend the tutorial. A little background on machine learning and imaging data is a plus. We will provide a short introduction to the KNIME Analytics Platform, cell segmentation, and convolutional neural networks, before starting the hands-on sessions.

For a hands-on tutorial, participants need to bring their own laptop. All the necessary software and data will be made available for download before the tutorial day.

[1] Arganda-Carreras, Ignacio, et al. ​"Crowdsourcing the creation of image segmentation algorithms for connectomics​." ​Frontiers in neuroanatomy​ 9 (2015): 142.
[2] Cardona, Albert, et al. "​An integrated micro-and macroarchitectural analysis of the Drosophila brain by computer-assisted serial section electron microscopy​." ​PLoS Biol​ 8.10 (2010): e1000502.


How to build and analyse mathematical models of biological systems using Python & ​modelbase

Dr Anna Matuszyńska, Marvin van Aalst
Heinrich Heine University Düsseldorf

Computational mathematical models of biological and biomedical systems have been successfully applied to advance our understanding of various regulatory processes, metabolic fluxes, effects of drug therapies and disease evolution or transmission. Many computational approaches have been developed to support model construction and analysis. We have developed ​modelbase​, an open-source toolbox, to facilitate synergies within the emerging research fields of systems biology and medicine making the overall process of model construction more consistent, understandable, transparent and reproducible. During this workshop, we will show how to use the software by building a new model of a biochemical network and analysing its dynamic behaviour. Considering the increasing use of Python by computational biologists, a fully embedded modelling package like modelbase​ is desired.

modelbase software: https://gitlab.com/qtb-hhu/modelbase-software

Documentation: https://modelbase.readthedocs.io/en/latest/


Exploring Target Structures with ProteinsPlus

Katrin Schöning-Stierand, Christiane Ehrt and Matthias Rarey
Universität Hamburg, ZBH – Center for Bioinformatics, Hamburg, Germany

Three-dimensional protein structures are a fundamental basis for understanding, modulating, and manipulating protein functionality. With almost 175,000 structures (access date: March 1st, 2021), the Protein Data Bank (PDB) is one of the most important bioinformatics resources for life sciences. Roughly 1,000 structures are SARS-CoV-2 structures, forming a good basis for structure-based modeling processes.
In this workshop, we present the ProteinsPlus server [1-3] enriching structural knowledge from the PDB by additional computed information required for typical biological research questions. ProteinsPlus enables easy access to this information for all researchers in the fields of molecular life sciences. The provided computational services comprise various tools for the assessment, representation, preprocessing, and interconnection of structural data. Many of the provided tools focus on protein binding pockets and molecular interactions to small molecules due to their relevance for drug design.
Participants will get to know a combination of tools and web services for searching and analyzing protein structure data. The focus will be on protein preparation for molecular docking scenarios related to COVID-19. We will work with the ProteinsPlus web service that contains a diverse range of software solutions for the analysis of protein structures and its application in molecular modeling approaches.

Learning goals:
This course is designed for life and computer scientists with interest in protein structures, but only very basic experience in 3D modeling. Topics include: Finding and selecting protein structure data, evaluating the quality of experimental data, preprocessing structure data for modeling, first modeling steps like the analysis of binding site properties and conformational flexibility, fully automated docking. The usage of the ProteinsPlus tools is free and open to all users.

General knowledge of proteins and their role in life sciences.

[1] https://proteins.plus
[2] K. Schöning-Stierand, K. Diedrich, R. Fährrolfes, F. Flachsenberg, A. Meyder, E. Nittinger, R. Steinegger, M. Rarey. Nucleic Acids Res. 2020, 48: W48-W53.
[3] R. Fährrolfes, S. Bietz, F. Flachsenberg, A. Meyder, E. Nittinger, T. Otto, A. Volkamer, M. Rarey, Nucleic Acids Res. 2017, 45: W337–W343.


Non-targeted label-free proteomics

Dr. Timo Sachsenberg
de.NBI/CIBI Tübingen

The course introduces key concepts of non-targeted label-free proteomics. Non-targeted methods are ideal for unbiased discovery studies and scale well for large-scale studies (e.g., clinical proteomics). Based on example datasets we will then introduce several open-source software tools for proteomics primarily focusing on OpenMS (www.openms.org). We will demonstrate how these tools can be combined into complex data analysis workflows including visualization of results. Participants will have the opportunity to design custom analysis workflows together with instructors.

Target audience are computational scientists interested in working with raw mass spectrometric data.

Learning goals:
- Introduction to computational mass spectrometry proteomics
- OpenMS and the integration platform KNIME
- Hands-on: Identification and Quantification workflow for Label-free quantitative proteomics
- Optional: Developing tools with the OpenMS library
- Optional: Large scale data processing with OpenMS (nextflow or galaxy)


BioC++ - solving daily bioinformatic tasks with C++ efficiently

Marcel Ehrhardt, René Rahn, Enrico Seiler
Free University Berlin

[description coming soon]



Supported by

Uni Leipzig
Leibniz IPK
Uni Halle