Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2024
3rd BioHackathon Germany
DBCLS BioHackathon 2024
ELIXIR INTOXICOM
Recent preprints
-
AI for Computational Biology: Highlights from the first BioAI Hackathon at University of Warsaw
The BioAI Hackathon at the Centre of New Technologies at the University of Warsaw convened 43 international researchers to collaboratively explore artificial intelligence (AI) approaches for solving complex challenges in computational biology. Nine interdisciplinary and multi-institutional teams addressed the following problems: disease-gene prioritization, microbiome analysis, drug-protein interactions, alternative splicing prediction, chromatin architecture study and toxicological profiling. Using cutting-edge tools such as graph neural networks (GNNs), large language models (LLMs), and multi-omics integration frameworks, participants developed scalable and reproducible analytical pipelines. The results include: a disease gene prioritization framework using GNNs, a microbiome dynamics analysis for poultry health prediction and the construction of chromatin structure-aware regulatory networks. All projects follow the open science principles and display translational potential. This hackathon underscores the transformative role of AI in biomedicine and the value of collaborative, time-bounded innovation for accelerating discovery in life sciences. All projects are publicly available on GitHub: https://github.com/SFGLab -
Enhancing Digital Infrastructures and Data Handling Practices for Single Specimen Barcoding - the 2024 BGE Barcoding Hackathon
The 2024 BGE Barcoding Hackathon, hosted by the Biodiversity Genomics Europe (BGE) project in Leiden, Netherlands, focused on advancing digital infrastructures and data handling practices for single specimen barcoding. This event brought together 24 participants from various consortium member institutes to enhance workflows in DNA barcoding, which complements genome sequencing efforts within BGE. The hackathon was structured around four thematic pillars: Data Generation and Processing, BOLD Release Candidate, Wider Data Integration, and Reference Data Curation. Participants worked on optimizing barcode generation pipelines, developing a new version of the Barcode of Life Data Systems (BOLD) data publishing platform, harmonizing data standards for better integration with databases like ENA and UNITE, and automating curation processes for DNA barcoding records. The outcomes include improved workflows, enhanced data interoperability, and refined curation standards, which collectively support BGE’s goals of achieving a step change in molecular biodiversity monitoring and research. -
Oncomatch- Optimizing Oncology Combination Therapy Prediction
Advances in precision medicine are reshaping cancer treatment by tailoring therapies to a patient’s specific genetic profile. Despite this, matching cancer mutations to effective drugs remains a complex task due to variability in mutations across cancer types and limited tools for practical clinical application. In this project, initially developed during the BioIT Hackathon2025, we created OncoMatch—an open-data-powered web application designed to bridge thisgap by integrating genomic, transcriptomic, proteomic, and drug-target interaction data tosupport therapy selection.Building on prior work in colorectal cancer, we expanded our scope to include bladder, ovarian,and non-small cell lung cancer (NSCLC), using the COSMIC and DrugCentral databasesto identify relevant gene mutations and therapeutics. We developed two novel scoringsystems—the Cancer Precision Score (CPS) and Gene Precision Score (GPS)—to evaluatedrug specificity and potential effectiveness. Using data from DrugCentral, LINCS L1000,and DeepCoverMOA, we created a unified bioactivity dataset for over 4,000 drugs, including measures such as IC50 and Kd values.The OncoMatch platform features interactive tools to visualize drug bioactivity, assess multiomic and structural similarity among compounds, and explore potential drug combinations. Users can query drugs by cancer type and gene mutation, generating insights into the mostpromising therapies and alternatives. Our open source approach not only democratizes access to high quality bioinformatics tools but also encourages data driven, personalized cancer care. Future directions include refining subtype level predictions and improving the platform’s utility for combinatorial therapy planning. We have developed a streamlit app to make it easy to access this data. That app can be found at https://oncomatchapp-precision-medicine.streamlit.app. -
Software Quality Indicators: extraction, categorisation andrecommendations from canonical sources
Research software plays a central role in modern science, and its quality is increasinglyrecognized as essential for reproducibility, sustainability, and trust. Numerous initiatives haveproposed indicators to guide quality assessment, yet these indicators are dispersed acrossdomains and vary in scope, terminology, and practical use. This work presents a curatedcatalogue of software quality indicators tailored to the needs of research software. Developedduring BioHackathon Europe 2024 and refined in collaboration with the ELIXIR Tools Platformand EVERSE project, the catalogue consolidates and structures indicators from a range ofauthoritative sources. -
Addressing Background Genomic and Environmental Effects on Health through Accelerated Computing and Machine Learning: Results from the 2025 Hackathon at Carnegie Mellon University
In March 2025, 34 scientists from the United States, Ireland, the United Kingdom, Switzerland,France, Germany, Spain, India, and Australia gathered in Pittsburgh, Pennsylvania and virtuallyfor a collaborative biohackathon, hosted by DNAnexus and Carnegie Mellon University Libraries.The goal of the hackathon was to explore machine learning approaches for multimodalproblems in computational biology using public datasets. Teams worked on the followinginnovative projects: applying machine learning techniques for clustering and similarity analysisof haplotypes; adapting the StructLMM framework to study Gene-Gene (GxG) interactions;creating a nextflow workflow for generating an imputation reference panel using large-scalecohort data; optimizing discovery of causal relationships in large electronic health record (EHR)datasets using the open source causal analysis software Tetrad; examining the evolution of agraph neural network in a Lenski-esque experiment; and developing tools and workflows forgenerating pathway intersection diagrams and graph-based analyses for multiomics data. Allprojects were dedicated to study the background genomic and environmental effects underlyingcomplex genotype-phenotype relationships. Their objective was to set foundations for furtherstudies on predicting complex phenotypic traits using integrative multi-omic and environmentalanalyses. -
Leveraging RDF and CURIE metadata resolution with identifiers.org
Identifiers.org provides two core services for CURIEs in life sciences. One is a registry of CURIE prefixes and URL locations that contain entries for the main life sciences datasets. The other is a resolver that allows for consistent data access using registry information to redirect to current URLs for CURIE identifiers. For this work, we aimed to expand these services to facilitate the integration of CURIE-related metadata into different contexts. The first part of this exports the registry in RDF with a SPARQL server to allow queries on the dataset. Through these, RDF-based systems can associate with registry metadata on different data collections. Allowing, for example, services that have identifiers.org URLs to collect metadata on the collection that it references. The second part expands on the existing metadata resolver to be able to collect CURIE-related metadata from different metadata providers.While the previous resolver could only collect LDJSON notations from pages, it can now be expanded to collect from any metadata provider.For this work, we implement two proof of concept retrievers, one for EBI Search, a text search engine that allows for metadata acquisition, and one for TogoID, an ID mapping service for life sciences.Finally, we gather some future tasks for identifiers.org services. -
BioHackEU24 report: Expanding FAIR database integration through elucidation and transformation of underlying graph schemas
The BioDataFuse (BDF) project aims to enhance the interoperability of biomedical data through modular integration of data from diverse life sciences resources into context-specific knowledge graphs. This paper discusses the efforts made during BioHackathon Europe 2024 to improve the FAIR (Findable, Accessible, Interoperable, and Reusable) data integration process by clarifying and transforming graph schemas. We explored tools such as VoID-generator, RDF-config, and sheXer for data schema extraction and the integration of RDF Portal data into the BDF framework. By leveraging these tools, we automated the generation of SPARQL queries, created GraphQL endpoints, and enhanced BDF’s ability to integrate new databases. Additionally, we explored the potential of large language models (LLMs) for automated reasoning and data interpretation within the BDF ecosystem. This work lays the foundation for building more efficient and standardized data models, contributing to the seamless integration of multiple biomedical databases.
- •
- 1
- 2