Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2024
3rd BioHackathon Germany
DBCLS BioHackathon 2024
ELIXIR INTOXICOM
Recent preprints
-
BioHackEU24 report: Integrating Bioconductor packages with the ELIXIR Research Software Ecosystem using EDAM
This project seeks to enhance the ELIXIR Research Software Ecosystem (RSEc) by increasing the findability, accessibility, interoperability, and reusability (FAIR principles) of Bioconductor’s extensive collection of over 2,000 bioinformatics packages. By aligning Bioconductor metadata with the EDAM ontology and integrating detailed package descriptions into the bio.tools registry, we aim to improve the discoverability and usability of bioinformatics analysis tools. Short-term goals include mapping Bioconductor’s biocViews controlled vocabulary to EDAM concepts, developing a set of manually annotated “gold standard” packages, and evaluating tools for automated EDAM concept suggestions. Long-term, we intend to expand EDAM coverage across Bioconductor, phase out biocViews, and implement automated synchronisation with bio.tools. This initiative fosters collaboration between Bioconductor and ELIXIR, establishing a foundation for sustainable software management in European bioinformatics.Key results from the ELIXIR BioHackathon 2024 week include substantial progress in mapping the biocViews vocabulary to EDAM concepts, initiating the curation of a reference set of packages with manual annotations, integrating Bioconductor metadata into the ELIXIR Research Software Ecosystem (RSEc) with automated updates, and prototyping a tool for automated EDAM concept suggestions. Together, these achievements establish a strong foundation for further integration and refinement. -
An assessment of Croissant ML metadata descriptors for AI-ready datasets
To advance the use of machine learning to address humanity’s grand challenges such as the understanding of disease conditions and biodiversity loss in the anthropocene, it is important to promote FAIR AI-ready datasets, since data scientists and bioinformaticians spend 80% of their time in data finding and preparation. Metadata descriptors for datasets are pivotal for the creation of machine learning models as they facilitate the definition of strategies for data discovery, feature selection, data cleaning, and data pre-processing. ML-ready datasets, whether by design or after pre-processing, can be enriched with metadata so they become FAIRer, i.e., autonomously discoverable and processable by machines (machine-actionable). Croissant ML is an extension of schema.org to better describe ML-ready datasets, released early 2024 and already adopted by some ML-model platforms such as Hugging Face (see Croissant ML viewer documentation) and OpenML. However, as it commonly happens with metadata, there are some limitations to the amount of metadata that can be automatically extracted. How much Croissant metadata can be programmatically extracted from ML-ready datasets? And how could this automation be improved? In this project, we explored answers to these two questions. -
2024 OME-NGFF workflows hackathon
The 2024 OME-NGFF Workflows Hackathon, held at the BioVisionCenter at the University of Zurich, brought together an international group of researchers and developers to develop the ecosystem around the open, scalable, and FAIR bioimage file format OME-Zarr. Over five days, participants tackled key challenges in four main areas: (1) advancing the OME-Zarr specification, (2) enabling workflow interoperability by integrating OME-Zarr image processing tasks across multiple open-source frameworks, (3) expanding Java support for Zarr v3 and enhancing the compatibility of OME-Zarr with the popular bioimage analysis software Fiji, and (4) improving the Python resources supporting OME-Zarr. The event led to the release of OME-Zarr 0.5, which formalizes the adoption of Zarr v3 and introduces a sharding strategy to reduce file system overhead. This report provides an overview of the key discussions, outcomes, and future directions emerging from the hackathon, with the goal of fostering continued community engagement in developing OME-Zarr as a robust open bioimaging standard. -
Development of FAIR image analysis workflows and training in Galaxy
Although image analysis tools are available within the Galaxy platform, they remain underutilised. During the 2023 BioHackathon Europe, our efforts focused on enhancing the image analysis community in Galaxy by cataloguing and annotating tools and facilitating community discussions to establish naming conventions that promote standardisation. These initial efforts, detailed in the project outcomes, laid the foundation for the ongoing expansion of Galaxy’s image analysis capabilities.Building on these achievements, this year’s work aimed to exploit and demonstrate theGalaxy platform’s full potential to address the needs of the image analysis community.This project involved developing FAIR (Findable, Accessible, Interoperable, and Reusable)image analysis workflows, creating tutorials for the Galaxy Training Network (GTN) to providedocumentation, and fostering broader adoption and facilitating theapplication of these workflows across scientific domains. -
Secure Processing Environments as a Service in the de.NBI Cloud
Sensitive human data is crucial for biomedical research, enabling faster drug development and better understanding of diseases. The Biohackathon Germany project utilized ELIXIR Europe’s services and external tools to create Secure Processing Environments, ensuring high protection of sensitive data while facilitating research across Germany and Europe. -
Report: Workshop on connecting Knowledge Graphs with BioChatter
The workshop on connecting Knowledge Graphs (KGs) with BioChatter convened experts from biology, computer science, and bioinformatics to tackle challenges in integrating and accessing dispersed datasets in plant sciences. The goal was to create user-friendly interfaces for querying these datasets using natural language, bypassing the need for expertise in semantic technologies or query languages like SPARQL or Cypher.Key use cases included the BrAPI project, which aimed to simplify data retrieval from plant research datasets. While BioChatter effectively generated simple API queries, complex multi-step queries posed challenges, suggesting that programmatic approaches are better suited for such tasks. Integrating BrAPI with BioCypher enabled successful querying of KGs for questions like identifying studies involving specific plant varieties.The RDF adapter use case focused on enhancing the Plant Phenotyping Experiment Ontology (PPEO) by converting it into a BioCypher-compatible KG, thereby improving data interoperability and enabling LLMs to generate context-aware responses. The Mobile Element Knowledge Graph use case explored the relationship between transposable elements and gene regulation networks, utilizing BioChatter to assist users unfamiliar with Cypher.The Stress Knowledge Map (SKM) use case integrated a highly curated model of plant stress signaling with BioCypher and BioChatter, allowing natural language queries and improving access to complex biological data. The Chem and Plant KG use case aimed to integrate diverse scientific resources into a unified KG, enhancing data interoperability and accessibility.Challenges included the need for human-readable concepts within KGs to improve LLM interaction and aligning LLMs with user demands. Future work will focus on refining KG schemas, improving LLM integration, and expanding documentation to support broader adoption and utility in scientific research. The workshop highlighted the potential of combining KGs with LLMs to enhance data accessibility and drive new insights in biological and agricultural sciences. -
Simplifying and Standardizing the Creation of Data Use Agreements for Life Sciences and Beyond - BH Germany2024
The primary goal of this project is to develop a web application for creating data usage agreements (DUA) in a way that allows automated evaluation of access permissions. Specifically, we want to adhere to the Open Digital Rights Language (ODRL) standard [1] and model permissions and prohibitions for the use of digital objects. ODRL is a policy expression language developed and adopted by the W3C. It provides a flexible and interoperable data model and vocabulary to enable fine-grained statements about the use of digital content and services. Recently, the Data Governance Act (DGA) was published as an implementation of the EU Data Act and defined roles for data intermediaries such as data trustees with certain prohibitions and obligations. The high expectations placed on the data trustee require that they have technical measures in place to facilitate the negotiation and enforcement of data use agreements. The “Ethical, Legal & Social Aspects” section of the NFDI (ELSA), has also issued a statement to the DGA [2], demonstrating the importance of this issue.Usually, DUAs are negotiated individually between parties and are not stored in a machine-readable format, which prevents automated modeling and verification of access rights for digital objects. Our web application will allow to create a DUA step by step via a configurable graphical user interface using ODRL data model in the background. This enables legal laymen to create data use agreements without much effort. The use of ODRL allows to programmatically query the data use agreements and to answer access requests automatically. To this end, the resulting DUAs can be persisted and queried through an API e.g. according to the GAIA-X specifications of Eclipse Dataspace Components (EDC) [3, 4] to exchange data compliant to rules and policies. Additionally, we want to address the integration of ODRLs in FDOs, such as ARC-RO-Crate of the DataPlant consortia and discuss extensions of the RO-Crate profiles. At last, for legal review and formal signing, negotiated DUAs can be rendered as PDFs.In summary, we will simplify the process of creating DUAs by adhering to international standards and will contribute to efforts to harmonize technical solutions, as the EOSC describes ODRL as a core metadata schema for legal interoperability [5]. DUAs can serve as a platform to gain the trust of data owners with protected, sensitive data and thus enable access to such resources. The project is in line with ELIXIR-DE/de.NBI’s objective to improve the accessibility of resources and to ensure efficient, interoperable and secure resource sharing. It also aligns with the goals of the NFDI by handling sensitive data and enabling data protection. Especially when dealing with data from the health sector, but also handling agronomic data like land survey data or data from breeding programs.The project is a joint activity of the Leibniz IPK in Gatersleben (ELIXIR-DE/de.NBI Service Center GCBN), and the Justus-Liebig-University Giessen (ELIXIR-DE/de.NBI Service Center BiGi) and contributes to NFDI4Biodiversity, FAIRAgro, NFDI4Microbiota, DataPlant and FAIR-DS as well as to European initiatives such as EOSC and Gaia-X.