BioHackathon Europe 2025, Berlin, Germany, 2025

BioHackathon Europe is an annual event that brings together bioinformaticians and computational biologists from around the world. It’s organised by ELIXIR Europe, and offers an intense week of hacking, with participants working on diverse and exciting projects. BioHackathon is a community-driven event, which provides an opportunity for members of the life sciences community to meet and work together on topics of common interest. The goal is to create code that addresses challenges in bioinformatics research.

Source

Previous BioHackathon Europe preprints

YAML instructions

biohackathon_name: "BioHackathon Europe 2025"
biohackathon_url: "https://biohackathon-europe.org/"
biohackathon_location: "Berlin, Germany, 2025"

Preprints

  • Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

    Background: Despite the ease and affordability of genome sequencing in biomedical research, the genetic causes of many diseases or their subtypes remain unknown due to diverse biological mechanisms that complicate genotype-phenotype relationships. Most previous studies have focused on single variants or sets of variants presumed to be directly causal for the disease. However, incomplete penetrance, in which some individuals carry disease-associated variants yet exhibit no phenotype, suggests that these variants, the genomic background and other secondary factors combine to shape the susceptibility to the disease.
  • BioHackEU25 report: Towards a Robust Validation Service for Data and Metadata in ARC RO-Crates

    Robust validation of both research data and its accompanying metadata is essential for ensuring adherence to FAIR principles. Current approaches often handle these aspects separately, hindering a holistic quality assessment. Building upon previous BioHackathon work establishing ARCs (Annotated Research Context) as RO-Crates (ARC RO-Crate), we aim to develop and demonstrate an integrated validation strategy for FAIR digital objects. It distinguishes between validating the metadata descriptor and the payload data files.For the metadata descriptor, validation will ensure structural and semantic compliance to the base RO-Crate specification and the ARC-ISA family of RO-Crate profiles, using and extending the RO-Crate validator tool.For the payload data files, validation targets the actual content, since data files often require domain-specific structural and value constraints, which requires explicit schema definitions. For this, we will integrate Frictionless for checking data content against community standards (e.g. MIAPPE, as demonstrated in the HORIZON project AGENT). Crucially, this project will also explore mechanisms for specifying expected data structures’ requirements within the ARC RO-Crate itself. This aims to provide a more self-contained description of data, investigating how such internal requirements can be linked to data validation frameworks, complementing the crate’s metadata validation.The overall goal is to provide a powerful, holistic validation mechanism for ARC RO-Crates, enhancing their reliability, trustworthiness, and FAIRness. A MIAPPE-compliant plant phenomics dataset will serve as a use case. This integrated validation approach aims to streamline quality control for researchers and will be packaged as a deployable microservice, offering broad applicability across diverse research workflows.
  • Mining the potential of knowledge graphs for metadata on training

    Training metadata in the life‑science community is increasingly standardized through Bioschemas, yet remains fragmented and under‑utilized. In this work we harvested training records from ELIXR’s TeSS platform and the Galaxy Training Network, converting them into a unified knowledge graph. A dedicated pipeline parses RDF/Turtle dumps, deduplicates entries, and builds rich indexes (keyword, provider, location, date, topic) that power a Model Context Protocol (MCP) server. The MCP offers live and offline search tools—including keyword, provider, location, date, topic, and SPARQL queries—enabling natural‑language access to training resources via LLM‑driven clients. User‑story driven evaluations demonstrate the system’s ability to generate custom learning paths, assemble trainer profiles, and link training data to external repositories. Findings highlight gaps in persistent identifiers (ORCID, ROR) and location granularity, informing recommendations for metadata providers. The project showcases how knowledge‑graph‑backed metadata can enhance discoverability, interoperability, and AI‑assisted exploration of scientific training materials.
  • BioHackEU25 report: Scop3PTM Next - Interactive visualization of PTM data across sequence, structure and interactions

    Scop3PTM Next was developed during BioHackathon Europe 2025 to address the need for integrated visualization of protein-centric data across sequence (and modification), interaction and structural contexts. The project delivers an open-source library of modular JavaScript components, built with Vue.js and documented with Storybook, enabling reusable and interoperable visualizations. The framework provides 1D sequence tracks, contact-map networks and interactive 3D structural renders, using MolSpecView for rendering 3D structures linked to Nightingale 1D tracks. Together, these components offer a unified interface for exploring PTM features across multiple representational layers. This work establishes the basis for a community-oriented visualization library to support proteomics analysis.