Meetings

Recent preprints

Nov 17, 2022

Enhancement and Reusage of Biomedical Knowledge Graph Subsets
Knowledge Graphs (KGs) such as Wikidata act as a hub of information from multiple domains and disciplines, and is crowdsourced by multiple stakeholders. The vast amount of available information makes it difficult for researchers to manage the entire KG, which is also continually being edited. It is necessary to develop tools that extract subsets for domains of interest. These subsets will help researchers to reduce costs and time, making data of interest more accessible. In the last two BioHackathons (BH20, BH21), we have created prototypes to extract subsets easily applicable to Wikidata, as well as to define a map of the different approaches used to tackle this problem. Building on those outcomes, we aim to enhance subsetting in both definitions using Entity schemas based on Shape Expressions (ShEx) and extraction algorithms, with a special focus on the biomedical domain. Our first aim is to develop complex subsetting patterns based on qualifiers and references for enhancing credibility of datasets. Our second aim is to establish a faster subsetting extraction platform applying new algorithms based on Apache Spark and new tools like a document-oriented DBMS platform. 1 minute read
Nov 12, 2022

CWLD: Mapping colloquial wet lab language to ontologies
The use of ontology terms can make data more FAIR and tractable by machines. However, the highly formalised terminology used by these ontology terms does not always match the colloquial language used by practitioners. This disparity can (a) make it difficult for practitioners to understand the language used by knowledge stored in ontologies; and (b) make it difficult to machine-interpret information written by practitioners to map it to ontologies. This problem is particularly relevant in the ELIXIR Microbial Biotechnology (MB) community, as although the domain has adopted ontologies and data standards such as SO, SBO, GO, and SBOL for data representation, the tools developed often use ontology terms directly rather than the language used in the wet lab (i.e. by the people using the tools.) At the BioHackathon 2022 in Paris, France, we initiated an effort to address this problem by (a) mining the internet for colloquial language used by biologists; (b) constructing a dictionary (CWLD: colloquial wet lab dictionary) of this language and its mappings to ontology terms; and (c) constructing a table of the occurrences of different terminology used in MB tools and resources. While initially developed to serve the MB community, we hope that the dictionary will serve as a helpful resource for anyone hoping to map from colloquial wet lab language to ontology terms for e.g. text mining applications. 1 minute read
Sep 27, 2022

GEM: Genome Editing Meta-database
Genome editing is a widely used tool to create precise changes in a genome. However, no specialized database for genome editing is available. Therefore, we have been developing genome editing meta-database (GEM) which aims to collect the exhaustive dataset of metadata related to genome editing. Currently, GEM consists primarily of a subset of genome editing- related metadata from PubMed articles. Metadata is extracted from research articles that have the contents with experiments using either of 7 types of genome editing tools: CRISPR-Cas9, Transcription activator-like effector nuclease (TALEN), Zinc finger nuclease (ZFN), CRISPR- Cas12, CRISPR-Cas3, Base editor, and Prime editor. Those tools are often used for knock-out or knock-in of genes to elucidate the biological functions of them. In domestic version of BioHackathon in 2022 (BH22.9), we have discussed the datasets and the usage of GEM, and also updated the scripts for GEM in github based on the discussion. less than 1 minute read
May 25, 2022

NBS Hack Week: Pilot study on ARRIVE guidelines E10 compliance
The ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines are reportingguidelines designed to improve transparency and facilitate critical assessment ofexperiments involving animal research. Although it represents essential reportinginformation for animal experiments, compliance with ARRIVE items is not commonlydemanded by journals and is often lacking in animal studies. In this small pilot project, weevaluated compliance with ARRIVE 2.0 essential 10 items in 64 papers from 2018 and 2020,either citing or not the ARRIVE manuscripts. Papers that cited the ARRIVE guidelines hadslightly higher reporting scores, but we did not detect an effect of the time period nor aninteraction effect between ARRIVE versions 1.0 and 2.0. This work was conducted during theNo-Budget Science Hack Week 2021 event, an extended hackathon to discuss and developprojects in metascience. In future work, this pilot can be expanded to better estimate theeffects of the ARRIVE guidelines on reporting practices. less than 1 minute read
May 16, 2022

An iNaturalist-Pl@ntNet-workflow to identify plant-pollinator interactions – a case study of Isodontia mexicana
Not only the primary observations of species occurrences are used for research, but also the additional, mostly unintentionally documented information in citizen science observations is beneficial for research, the so-called secondary data. In this study, we investigated the plant-pollinator interactions of the Mexican grass-carrying wasp Isodontia mexicana using a data exchange workflow from two globally operating citizen science platforms, iNaturalist and Pl@ntNet. Images from iNaturalist observations of the target species were used to query the Pl@ntNet application to identify possible plant species present in the pictures. At the same time, botanists manually identified the plants at family, genus and species level from the images. The goals were to calibrate Pl@ntNet’s accuracy in relation to this workflow, to update the list of plant species that I. mexicana visits, and to investigate colour preferences and other interactions of the wasp recorded by citizen scientists. Although the list of plant species visited could be confirmed and expanded, identifying plants from images that predominantly show an insect proved similarly difficult for both experts and the Pl@ntNet app. The presented approach can nevertheless help to get an overview or first insight into species interactions and generate more specific research questions. 1 minute read
Mar 25, 2022

Bioschemas data harvesting project report
The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration. less than 1 minute read
Dec 16, 2021

DS Wizard Meets DAISY: A Romance Solving Data Protection Requirements in Data Management Planning
This report summarises our activities and achievements in integrating the Data Stewardship Wizard (DSW) and Data Information System (DAISY) tools during the ELIXIR BioHackathon Europe 2021. As a data information system for GDPR compliance, DAISY is focused on a single goal – gathering all information required for GDPR accountability of biomedical research projects. On the other hand, DSW is very flexible and can be used beyond data management planning. We worked on the integration between both tools on two fronts. Firstly, we created a new Knowledge Model in DSW together with a document output template to be able to generate a data protection impact assessment (DPIA). Secondly, we introduced a new integration type between projects in DSW and DAISY that allows the querying of DAISY data upon document generation in DSW. Both of these independent activities brought successful results that were polished and published after the actual BioHackathon. Finally, we provide the related materials as an on-demand training course in the ELIXIR eLearning Platform. less than 1 minute read

Meetings

Recent preprints

Enhancement and Reusage of Biomedical Knowledge Graph Subsets

CWLD: Mapping colloquial wet lab language to ontologies

GEM: Genome Editing Meta-database

NBS Hack Week: Pilot study on ARRIVE guidelines E10 compliance

An iNaturalist-Pl@ntNet-workflow to identify plant-pollinator interactions – a case study of Isodontia mexicana

Bioschemas data harvesting project report

DS Wizard Meets DAISY: A Romance Solving Data Protection Requirements in Data Management Planning