Meetings

Recent preprints

Dec 11, 2024

BioHack24 report: Using discovered RDF schemes: a compilation of potential use cases for shapes reusage
RDF shapes are formal expressions of schema structures in RDF data. Their primary purposeis twofold: describing and validating RDF data. However, as machine-readable representationsof the expected structures in a given data source, RDF shapes can be applied to varioustasks that require automatic comprehension of data schemas. In this paper, we present ourwork conducted during the DBCLS BioHackathon 2024 in Fukushima, Japan, to harness thepotential of RDF shapes. The identified and partially implemented use cases include thegeneration and validation of SPARQL queries, data and schema visualization, mappings toother formal syntaxes, and applications in data modeling scenarios. less than 1 minute read
Dec 9, 2024

Publishing FAIR datasets from Bioimaging repositories
We assessed the implementation of the FAIR principles in the currentbioimaging and clinical imaging data repositories. Additionally, to make the RDF export triples from the IDR discoverable, we also explored the Fair Data Point interface (Silva Santos et al.,2023), as a mechanism to facilitate the exposure of machine-actionable metadata and how itcould be added to the Imaging Data Resource (IDR) portal. less than 1 minute read
Nov 5, 2024

INTOXICOM Workshop Report: FAIRification of Toxicological Research Output: Leveraging ELIXIR Resources
This report documents the first workshop of the ELIXIR Toxicology Community (Martenset al., 2023), held in Utrecht on May 28-29, 2024 (FAIRification of Toxicological Research Output: Leveraging ELIXIR Resources, 2024), as part of the INTOXICOM Implementation Study workshop series (Integrating the Toxicology Community into ELIXIR, 2024). The main topic of the meeting was the FAIRification of toxicological research outputs and exploring the potential role of ELIXIR resources in this process. A team of ten people from the ELIXIR Toxicology Community, including Marvin Martens, Penny Nymark, Iseult Lynch, Meike Bünger, Rob Stierum, Thomas Exner, Egon Willighagen, Ammar Ammar, Dominik Martinát, and Karel Berka, coordinated the event. less than 1 minute read
Oct 28, 2024

Unveiling ecological dynamics through simulation and visualization of biodiversity data cubes
The gcube R package, developed during the B-Cubed hackathon (Hacking Biodiversity Data Cubes for Policy), provides a flexible framework for generating biodiversity data cubes using minimal input. The package assumes three consecutive steps (1) the occurrence process, (2) the detection process, and (3) the grid designation process, accompanied by three main functions respectively: simulate_occurrences(), sample_observations(), and grid_designation(). It allows for customisable spatial and temporal patterns, detection probabilities, and sampling biases. During the hackathon, collaboration was highly efficient due to thorough preparation, task division, and the use of a scrum board. Fourteen participants contributed 209 commits, resulting in a functional package with a pkgdown website, 67 % code coverage, and successful CMD checks. However, certain limitations were identified, such as the lack of spatiotemporal autocorrelation in the occurrence simulations, which affects the model’s realism. Future development will focus on improving spatiotemporal dynamics, adding comprehensive documentation and testing, and expanding functionality to support multi-species simulations. The package also aims to incorporate a virtual species workflow, linking the virtualspecies package to the gcube processes. Despite these challenges, gcube strikes a balance between usability and complexity, offering researchers a valuable tool for simulating biodiversity data cubes to assess research questions under different parameter settings, such as the effect of spatial clustering on the occurrence-to-grid designation and the effect of different patterns of missingness on data quality and robustness of derived biodiversity indicators. 1 minute read
Sep 5, 2024

Expanding data on cultivation media and microbial traits
The standardization and integration of cultivation media data are essential for advancing microbial research and enabling AI-based predictions of optimal growth conditions. This study addresses the challenges of data fragmentation by aligning terminologies and mapping ingredients between two prominent databases: MediaDive (DSMZ) and TogoMedium (DBCLS). We successfully linked 870 ingredients, expanded the Growth Media Ontology (GMO), and prepared data for media similarity calculations, thereby enhancing the interoperability of these resources. Additionally, we developed the first version of a BacDive RDF knowledge graph, incorporating mapping rules for 24 key entities and materializing the data in turtle format to facilitate integration into broader knowledge networks. We also propose a novel process for the standardized registration of media recipes by depositors, ensuring that these recipes can be cited and shared consistently. Together, these efforts contribute to the creation of a more cohesive and accessible microbial data ecosystem, supporting future research and innovation. less than 1 minute read
Aug 31, 2024

Revisiting SRAmetadb.sqlite
The SRAmetadb.sqlite database, which compiles Sequence Read Archive (SRA) metadata into an offline SQLite format, has been a crucial resource for bioinformatics tools like the SRAdb R package and the pysradb. Despite its utility, the database has not been regularly updated, with the last refresh occurring in late 2023. Moreover, no public tools exist to rebuild or update this database. This report introduces an open-source pipeline developed during the 2024 international biohackaton, designed to generate and update a similar SRAmetadb.sqlite database from SRA metadata, addressing the gap left by the lack of recent updates.The SRAmetadb.sqlite database’s value extends beyond its original use cases, offering potential integration with other tools such as DuckDB and programmatically accessing from custom scripts. The proposed pipeline introduces features like the generation of metadata subsets, enabling researchers to focus on specific species. It also offers offline access to SRA metadata, significantly enhancing query speed and efficiency. This adaptability is particularly relevant as new use cases emerge, including applications in large language models (LLMs) and Retrieval-Augmented Generation (RAG).This pipeline prioritizes low resource usage and ease of maintenance. It is not intended as a direct replacement for the original SRAmetadb.sqlite but seeks to maintain compatibility while exploring the benefits of modern SQLite features. By providing this tool as an open-source resource, the project encourages community involvement to ensure its ongoing development and relevance in the evolving landscape of bioinformatics research. 1 minute read
Aug 31, 2024

DBCLS BioHackathon 2024 Report for Project: Human Glycome Atlas
As part of BioHackathon 2024, we here report on our analysis of tools reviewed by this group to implement a new knowledgebase called TOHSA for the Human Glycome Atlas (HGA) Project. In particular, we focus on the experiences of the integration process of the QLever framework, a promising Semantic Web tool for handling “Triple Stores” and SPARQL technologies in the scope of creating a reliable and performant Semantic Knowledge-Base (Infrastructure and Portal) for TOHSA. QLever highlights the ongoing relevance and potential of these technologies to deliver scalable and reliable solutions. It was nice to see that “Triple Stores” and SPARQL technology implementations and developments for the community are ongoing and that progressively useful and performant and scalable/reliable, open-source software is being implemented. And we did a general revision and comparison of relevant Semantic Web Frameworks for our use-case. less than 1 minute read

Meetings

Recent preprints

BioHack24 report: Using discovered RDF schemes: a compilation of potential use cases for shapes reusage

Publishing FAIR datasets from Bioimaging repositories

INTOXICOM Workshop Report: FAIRification of Toxicological Research Output: Leveraging ELIXIR Resources

Unveiling ecological dynamics through simulation and visualization of biodiversity data cubes

Expanding data on cultivation media and microbial traits

Revisiting SRAmetadb.sqlite

DBCLS BioHackathon 2024 Report for Project: Human Glycome Atlas