Meetings

Recent preprints

  • Rendering co-author graphs using linked-open-data from Wikidata

    Wikidata is the linked-open-data graph of the Wikimedia foundation with its most known sibling Wikipedia (Vrandečić, 2012). What Wikipedia is to text, Wikidata is to data. Like in Wikipedia linked-data can be added for everyone, by everyone. This makes Wikidata a very rich source of data. A substantial part of the data on Wikidata is about scientific publications and the authors of these publications (Taraborelli et al., 2016). Scholia is a tool that uses this data to create a profile page for authors and publications (Nielsen et al., 2017). This report describes a workflow to create co-author graphs using the data from Scholia.
  • Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization

    The landscape of genomic wastewater surveillance in the context of infectious disease monitoring is rapidly evolving, and this came into sharp focus during the COVID-19 pandemic. Here we highlight the significance of wastewater surveillance as a passive monitoring system complementary to clinical genomic surveillance activities. Emphasizing the need for coordination, standardization, and the development of a unified catalog of software tools and services, we aim to streamline the implementation of end-to-end genomic wastewater surveillance pipelines.Key considerations such as defining variants, understanding antimicrobial resistance, and assessing viral fitness within the framework of wastewater surveillance are explored, linking to examples of respective tools and existing pipelines. The challenges of wastewater data analysis, the need for specialized tools and bioinformatics workflows, and the significance of integrated pipelines are also discussed in detail. The article presents case studies, including the V-pipe integrated bioinformatics workflow and the integration of tools into the Galaxy platform, underscoring their role in enhancing data analysis efficiency and standardization within the field.Overall, the review highlights the critical importance of continued research efforts to advance understanding and implementation of bioinformatic approaches in wastewater surveillance for the effective monitoring and management of infectious diseases.
  • Efforts to analyze pathways in non-model organisms

    In addition to functional annotation of genes, annotating genes to pathways is important in current molecular biology.But, pathway diagrams are required to annotate genes to nodes of those.Therefore, it is important to draw pathway diagrams with assignment to genes and metabolites.Existing metabolic pathway databases focus on generic pathways, while secondary metabolism is emphasized in organisms producing useful substances.Moreover they cannot accept third party annotation of those data.A practical system for pathway analyses is therefore really needed.Following on from the previous BioHackathon (BH23), we first discussed how to create a database of pathway information in non-model species in a domestic version of the BioHackathon called BH23.9 held in Shirahama, Wakayama, Japan (25-29 September 2023).We then gave a tutorial on how to write a pathway diagram using PathVisio, which is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. Finally we tried to establish the conversion system from text data to Graphical Pathway Markup Language (GPML), which is called txt2gpml.txt2gpml will drastically reduce the time and effort required to create pathway diagrams.After a stimulus discussion in BH23 and BH23.9, we could clarify the current issues in the pathway analysis for non-model organisms.
  • BioHackJP 2023 Report R3: Plant data integration for findability across multiple databases

    Plant research generate vast amount of heterogeneous data available in dispersed repositories. Therefore, accessing, integrating, and analyzing these datasets is a challenge caused by their low findability as well as format and standards variability. Several solutions including data standards (MIAPPE, BrAPI) and portals (FAIDARE) are recommended by the ELIXIR plant community through the RDM Kit plant pages. The BioHackathon Japan 2023 was an ideal event to outreach those solutions toward the Japanese researchers and bioinformaticians in order to increase visibility of Japanese databases in the plant research data discovery portal FAIDARE and explore the use of the Breeding API for knowledge graph.
  • BioHackEU22 Report: Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates

    This report describes the integration of RO-Crates into Data Stewardship Wizard and Galaxy during the BioHackathon Europe 2023, aiming to improve data management and sharing in scientific research. By utilizing RO-Crates, researchers can easily create machine-readable metadata for their datasets, ensuring long-term discoverability, accessibility, and reusability. The seamless integration of RO-Crates in these platforms enhances collaboration between researchers and institutions, facilitating data sharing and reuse across projects and domains. Future efforts may focus on enhancing RO-Crate’s interoperability with other standards and platforms, as well as promoting wider adoption through outreach and education initiatives to meet the evolving needs of researchers and institutions in data stewardship.
  • Infrastructure for synthetic health data

    Machine learning (ML) methods are becoming ever more prevalent across all domains of lifesciences. However, a key component of effective ML is the availability of large datasets thatare diverse and representative. In the context of health systems, with significant heterogeneityof clinical phenotypes and diversity of healthcare systems, there exists a necessity to developand refine unbiased and fair ML models. Synthetic data are increasingly being used to protectthe patient’s right to privacy and overcome the paucity of annotated open-access medical data. Here, we present our proof of concept for the generation of synthetic health data and our proposed FAIR implementation of the generated synthetic datasets. The work was developed during and after the one-week-long BioHackathon Europe, by together 20 participants (10 new to the project), from different countries (NL, ES, LU, UK, GR, FL, DE, . . . ).
  • Redesign of the validation framework in LinkML

    LinkML is a data modeling language that can be used to describe the structure and semantics of data from a specific domain. But as with any modeling language, there is a need for tools that support validation of data. The LinkML provides a set of validation tools but there is a growing need to adapt the tools for a broader audience. The work highlighted in this report describes the efforts of redesigning the validation framework in LinkML to better support a wider range of validation scenarios and use cases.