Meetings

Recent preprints

Nov 11, 2023

Rendering co-author graphs using linked-open-data from Wikidata
Wikidata is the linked-open-data graph of the Wikimedia foundation with its most known sibling Wikipedia (Vrandečić, 2012). What Wikipedia is to text, Wikidata is to data. Like in Wikipedia linked-data can be added for everyone, by everyone. This makes Wikidata a very rich source of data. A substantial part of the data on Wikidata is about scientific publications and the authors of these publications (Taraborelli et al., 2016). Scholia is a tool that uses this data to create a profile page for authors and publications (Nielsen et al., 2017). This report describes a workflow to create co-author graphs using the data from Scholia. less than 1 minute read
Nov 3, 2023

Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization
The landscape of genomic wastewater surveillance in the context of infectious disease monitoring is rapidly evolving, and this came into sharp focus during the COVID-19 pandemic. Here we highlight the significance of wastewater surveillance as a passive monitoring system complementary to clinical genomic surveillance activities. Emphasizing the need for coordination, standardization, and the development of a unified catalog of software tools and services, we aim to streamline the implementation of end-to-end genomic wastewater surveillance pipelines.Key considerations such as defining variants, understanding antimicrobial resistance, and assessing viral fitness within the framework of wastewater surveillance are explored, linking to examples of respective tools and existing pipelines. The challenges of wastewater data analysis, the need for specialized tools and bioinformatics workflows, and the significance of integrated pipelines are also discussed in detail. The article presents case studies, including the V-pipe integrated bioinformatics workflow and the integration of tools into the Galaxy platform, underscoring their role in enhancing data analysis efficiency and standardization within the field.Overall, the review highlights the critical importance of continued research efforts to advance understanding and implementation of bioinformatic approaches in wastewater surveillance for the effective monitoring and management of infectious diseases. 1 minute read
Oct 26, 2023

Efforts to analyze pathways in non-model organisms
In addition to functional annotation of genes, annotating genes to pathways is important in current molecular biology.But, pathway diagrams are required to annotate genes to nodes of those.Therefore, it is important to draw pathway diagrams with assignment to genes and metabolites.Existing metabolic pathway databases focus on generic pathways, while secondary metabolism is emphasized in organisms producing useful substances.Moreover they cannot accept third party annotation of those data.A practical system for pathway analyses is therefore really needed.Following on from the previous BioHackathon (BH23), we first discussed how to create a database of pathway information in non-model species in a domestic version of the BioHackathon called BH23.9 held in Shirahama, Wakayama, Japan (25-29 September 2023).We then gave a tutorial on how to write a pathway diagram using PathVisio, which is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. Finally we tried to establish the conversion system from text data to Graphical Pathway Markup Language (GPML), which is called txt2gpml.txt2gpml will drastically reduce the time and effort required to create pathway diagrams.After a stimulus discussion in BH23 and BH23.9, we could clarify the current issues in the pathway analysis for non-model organisms. 1 minute read
Sep 14, 2023

BioHackJP 2023 Report R3: Plant data integration for findability across multiple databases
Plant research generate vast amount of heterogeneous data available in dispersed repositories. Therefore, accessing, integrating, and analyzing these datasets is a challenge caused by their low findability as well as format and standards variability. Several solutions including data standards (MIAPPE, BrAPI) and portals (FAIDARE) are recommended by the ELIXIR plant community through the RDM Kit plant pages. The BioHackathon Japan 2023 was an ideal event to outreach those solutions toward the Japanese researchers and bioinformaticians in order to increase visibility of Japanese databases in the plant research data discovery portal FAIDARE and explore the use of the Breeding API for knowledge graph. less than 1 minute read
Jul 30, 2023

BioHackEU22 Report: Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates
This report describes the integration of RO-Crates into Data Stewardship Wizard and Galaxy during the BioHackathon Europe 2023, aiming to improve data management and sharing in scientific research. By utilizing RO-Crates, researchers can easily create machine-readable metadata for their datasets, ensuring long-term discoverability, accessibility, and reusability. The seamless integration of RO-Crates in these platforms enhances collaboration between researchers and institutions, facilitating data sharing and reuse across projects and domains. Future efforts may focus on enhancing RO-Crate’s interoperability with other standards and platforms, as well as promoting wider adoption through outreach and education initiatives to meet the evolving needs of researchers and institutions in data stewardship. less than 1 minute read
Jul 22, 2023

Infrastructure for synthetic health data
Machine learning (ML) methods are becoming ever more prevalent across all domains of lifesciences. However, a key component of effective ML is the availability of large datasets thatare diverse and representative. In the context of health systems, with significant heterogeneityof clinical phenotypes and diversity of healthcare systems, there exists a necessity to developand refine unbiased and fair ML models. Synthetic data are increasingly being used to protectthe patient’s right to privacy and overcome the paucity of annotated open-access medical data. Here, we present our proof of concept for the generation of synthetic health data and our proposed FAIR implementation of the generated synthetic datasets. The work was developed during and after the one-week-long BioHackathon Europe, by together 20 participants (10 new to the project), from different countries (NL, ES, LU, UK, GR, FL, DE, . . . ). less than 1 minute read
Jul 18, 2023

Redesign of the validation framework in LinkML
LinkML is a data modeling language that can be used to describe the structure and semantics of data from a specific domain. But as with any modeling language, there is a need for tools that support validation of data. The LinkML provides a set of validation tools but there is a growing need to adapt the tools for a broader audience. The work highlighted in this report describes the efforts of redesigning the validation framework in LinkML to better support a wider range of validation scenarios and use cases. less than 1 minute read

Meetings

Recent preprints

Rendering co-author graphs using linked-open-data from Wikidata

Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization

Efforts to analyze pathways in non-model organisms

BioHackJP 2023 Report R3: Plant data integration for findability across multiple databases

BioHackEU22 Report: Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates

Infrastructure for synthetic health data

Redesign of the validation framework in LinkML