<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://index.biohackrxiv.org//feed/by_tag/BH22EU.xml" rel="self" type="application/atom+xml" /><link href="https://index.biohackrxiv.org//" rel="alternate" type="text/html" /><updated>2026-06-14T20:19:19+00:00</updated><id>https://index.biohackrxiv.org//feed/by_tag/BH22EU.xml</id><title type="html">BioHackrXiv Preprints</title><subtitle>Preprints for BioHackathons</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization</title><link href="https://index.biohackrxiv.org//2023/11/03/rtgk9.html" rel="alternate" type="text/html" title="Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization" /><published>2023-11-03T00:00:00+00:00</published><updated>2023-11-03T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/11/03/rtgk9</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/11/03/rtgk9.html"><![CDATA[<p>The landscape of genomic wastewater surveillance in the context of infectious disease monitoring is rapidly evolving, and this came into sharp focus during the COVID-19 pandemic. Here we highlight the significance of wastewater surveillance as a passive monitoring system complementary to clinical genomic surveillance activities. Emphasizing the need for coordination, standardization, and the development of a unified catalog of software tools and services, we aim to streamline the implementation of end-to-end genomic wastewater surveillance pipelines.Key considerations such as defining variants, understanding antimicrobial resistance, and assessing viral fitness within the framework of wastewater surveillance are explored, linking to examples of respective tools and existing pipelines. The challenges of wastewater data analysis, the need for specialized tools and bioinformatics workflows, and the significance of integrated pipelines are also discussed in detail. The article presents case studies, including the V-pipe integrated bioinformatics workflow and the integration of tools into the Galaxy platform, underscoring their role in enhancing data analysis efficiency and standardization within the field.Overall, the review highlights the critical importance of continued research efforts to advance understanding and implementation of bioinformatic approaches in wastewater surveillance for the effective monitoring and management of infectious diseases.</p>]]></content><author><name>Fotis E. Psomopoulos</name></author><category term="BH22EU" /><summary type="html"><![CDATA[The landscape of genomic wastewater surveillance in the context of infectious disease monitoring is rapidly evolving, and this came into sharp focus during the COVID-19 pandemic. Here we highlight the significance of wastewater surveillance as a passive monitoring system complementary to clinical genomic surveillance activities. Emphasizing the need for coordination, standardization, and the development of a unified catalog of software tools and services, we aim to streamline the implementation of end-to-end genomic wastewater surveillance pipelines.Key considerations such as defining variants, understanding antimicrobial resistance, and assessing viral fitness within the framework of wastewater surveillance are explored, linking to examples of respective tools and existing pipelines. The challenges of wastewater data analysis, the need for specialized tools and bioinformatics workflows, and the significance of integrated pipelines are also discussed in detail. The article presents case studies, including the V-pipe integrated bioinformatics workflow and the integration of tools into the Galaxy platform, underscoring their role in enhancing data analysis efficiency and standardization within the field.Overall, the review highlights the critical importance of continued research efforts to advance understanding and implementation of bioinformatic approaches in wastewater surveillance for the effective monitoring and management of infectious diseases.]]></summary></entry><entry><title type="html">BioHackEU22 Report: Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates</title><link href="https://index.biohackrxiv.org//2023/07/30/24jst.html" rel="alternate" type="text/html" title="BioHackEU22 Report: Enhancing Research Data Management in Galaxy and Data Stewardship Wizard by utilising RO-Crates" /><published>2023-07-30T00:00:00+00:00</published><updated>2023-07-30T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/30/24jst</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/30/24jst.html"><![CDATA[<p>This report describes the integration of RO-Crates into Data Stewardship Wizard and Galaxy during the BioHackathon Europe 2023, aiming to improve data management and sharing in scientific research. By utilizing RO-Crates, researchers can easily create machine-readable metadata for their datasets, ensuring long-term discoverability, accessibility, and reusability. The seamless integration of RO-Crates in these platforms enhances collaboration between researchers and institutions, facilitating data sharing and reuse across projects and domains. Future efforts may focus on enhancing RO-Crate’s interoperability with other standards and platforms, as well as promoting wider adoption through outreach and education initiatives to meet the evolving needs of researchers and institutions in data stewardship.</p>]]></content><author><name>Ignacio Eguinoa</name></author><category term="BH22EU" /><summary type="html"><![CDATA[This report describes the integration of RO-Crates into Data Stewardship Wizard and Galaxy during the BioHackathon Europe 2023, aiming to improve data management and sharing in scientific research. By utilizing RO-Crates, researchers can easily create machine-readable metadata for their datasets, ensuring long-term discoverability, accessibility, and reusability. The seamless integration of RO-Crates in these platforms enhances collaboration between researchers and institutions, facilitating data sharing and reuse across projects and domains. Future efforts may focus on enhancing RO-Crate’s interoperability with other standards and platforms, as well as promoting wider adoption through outreach and education initiatives to meet the evolving needs of researchers and institutions in data stewardship.]]></summary></entry><entry><title type="html">Infrastructure for synthetic health data</title><link href="https://index.biohackrxiv.org//2023/07/22/q4zgx.html" rel="alternate" type="text/html" title="Infrastructure for synthetic health data" /><published>2023-07-22T00:00:00+00:00</published><updated>2023-07-22T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/07/22/q4zgx</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/07/22/q4zgx.html"><![CDATA[<p>Machine learning (ML) methods are becoming ever more prevalent across all domains of lifesciences. However, a key component of effective ML is the availability of large datasets thatare diverse and representative. In the context of health systems, with significant heterogeneityof clinical phenotypes and diversity of healthcare systems, there exists a necessity to developand refine unbiased and fair ML models. Synthetic data are increasingly being used to protectthe patient’s right to privacy and overcome the paucity of annotated open-access medical data. Here, we present our proof of concept for the generation of synthetic health data and our proposed FAIR implementation of the generated synthetic datasets. The work was developed during and after the one-week-long BioHackathon Europe, by together 20 participants (10 new to the project), from different countries (NL, ES, LU, UK, GR, FL, DE, . . . ).</p>]]></content><author><name>Núria Queralt-Rosinach</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Machine learning (ML) methods are becoming ever more prevalent across all domains of lifesciences. However, a key component of effective ML is the availability of large datasets thatare diverse and representative. In the context of health systems, with significant heterogeneityof clinical phenotypes and diversity of healthcare systems, there exists a necessity to developand refine unbiased and fair ML models. Synthetic data are increasingly being used to protectthe patient’s right to privacy and overcome the paucity of annotated open-access medical data. Here, we present our proof of concept for the generation of synthetic health data and our proposed FAIR implementation of the generated synthetic datasets. The work was developed during and after the one-week-long BioHackathon Europe, by together 20 participants (10 new to the project), from different countries (NL, ES, LU, UK, GR, FL, DE, . . . ).]]></summary></entry><entry><title type="html">BioHackEU22 Report for Project 31: The What &amp;amp;amp; How in data management: Improving connectivity between RDMkit and FAIR Cookbook</title><link href="https://index.biohackrxiv.org//2023/06/15/emc2f.html" rel="alternate" type="text/html" title="BioHackEU22 Report for Project 31: The What &amp;amp;amp; How in data management: Improving connectivity between RDMkit and FAIR Cookbook" /><published>2023-06-15T00:00:00+00:00</published><updated>2023-06-15T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/06/15/emc2f</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/06/15/emc2f.html"><![CDATA[<p>This report describes the work completed during the ELIXIR Biohackathon 2022 for project 31: The What &amp;amp; How in data management: Improving connectivity between RDMkit and FAIR Cookbook. The project covered 3 subjects: the technical connectivity between the two primary resources, an editorial alignment and gap analysis of their content, and the creation of user journeys incorporating the wider ELIXIR Research Data Management (RDM) ecosystem.</p>]]></content><author><name>Danielle Welter</name></author><category term="BH22EU" /><summary type="html"><![CDATA[This report describes the work completed during the ELIXIR Biohackathon 2022 for project 31: The What &amp;amp; How in data management: Improving connectivity between RDMkit and FAIR Cookbook. The project covered 3 subjects: the technical connectivity between the two primary resources, an editorial alignment and gap analysis of their content, and the creation of user journeys incorporating the wider ELIXIR Research Data Management (RDM) ecosystem.]]></summary></entry><entry><title type="html">BioHackathon Europe 2022 Paper for Project 3: Bioinforming</title><link href="https://index.biohackrxiv.org//2023/06/15/p8n2t.html" rel="alternate" type="text/html" title="BioHackathon Europe 2022 Paper for Project 3: Bioinforming" /><published>2023-06-15T00:00:00+00:00</published><updated>2023-06-15T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/06/15/p8n2t</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/06/15/p8n2t.html"><![CDATA[<p>Optimal formats to inform and engage young students in novel biology-related fields are short courses. Training schools, e.g. those lasting for five days,
can provide enough content to introduce students to an extensive overview of bioinformatics and scientific career opportunities. In this work, we define a
five-day training school format tailored to three target groups of young students: high school students, undergraduate students in biology-related fields
and undergraduate students in computational fields. We structure the content and sessions around learning areas consisting of learning topics, detailing
the dependencies between them.For each learning topic, we define learning outcomes and learning activities. Moreover, we conceptualize a teaching platform
to manage FAIRyfied (Findable, Accessible, Interoperable, Reusable) training materials that anyone will be able to use to design a new training school in
bioinformatics.</p>]]></content><author><name>Marco Anteghini</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Optimal formats to inform and engage young students in novel biology-related fields are short courses. Training schools, e.g. those lasting for five days, can provide enough content to introduce students to an extensive overview of bioinformatics and scientific career opportunities. In this work, we define a five-day training school format tailored to three target groups of young students: high school students, undergraduate students in biology-related fields and undergraduate students in computational fields. We structure the content and sessions around learning areas consisting of learning topics, detailing the dependencies between them.For each learning topic, we define learning outcomes and learning activities. Moreover, we conceptualize a teaching platform to manage FAIRyfied (Findable, Accessible, Interoperable, Reusable) training materials that anyone will be able to use to design a new training school in bioinformatics.]]></summary></entry><entry><title type="html">Onboarding suite for Federated EGA nodes</title><link href="https://index.biohackrxiv.org//2023/04/06/dsz3y.html" rel="alternate" type="text/html" title="Onboarding suite for Federated EGA nodes" /><published>2023-04-06T00:00:00+00:00</published><updated>2023-04-06T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/04/06/dsz3y</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/04/06/dsz3y.html"><![CDATA[<p>The European Genome-phenome Archive (EGA) (Freeberg et al., 2022) (also known as CentralEGA - cEGA) is a service for permanent archiving and sharing personally identifiable geneticand phenotypic data resulting from biomedical research projects. The Federated EGA (EGAConsortium, n.d.), consisting of the Central and Federated EGA nodes, will be a distributednetwork of repositories for sharing human -omics data and phenotypes. Each node of thefederation is responsible for its own infrastructure and the connection to the Central EGA.Currently, the adoption and deployment of a new federated node is challenging due to thecomplexity of the project and the diversity of technological solutions used, in order to ensurethe secure archiving of the data and the transfer of the information between the nodes.The goal of this project was to develop an onboarding suite consisting of simple scripts,supplemented by documentation, that would help newcomers to the EGA federation in orderunderstand in depth the main concepts, while enabling them to get involved in the developmentof the technology as quickly as possible.At the same time we aimed to identify existing technologies and standards across FEGA nodesthat can be used as a reference to upcoming nodes.</p>]]></content><author><name>Stefan Negru</name></author><category term="BH22EU" /><summary type="html"><![CDATA[The European Genome-phenome Archive (EGA) (Freeberg et al., 2022) (also known as CentralEGA - cEGA) is a service for permanent archiving and sharing personally identifiable geneticand phenotypic data resulting from biomedical research projects. The Federated EGA (EGAConsortium, n.d.), consisting of the Central and Federated EGA nodes, will be a distributednetwork of repositories for sharing human -omics data and phenotypes. Each node of thefederation is responsible for its own infrastructure and the connection to the Central EGA.Currently, the adoption and deployment of a new federated node is challenging due to thecomplexity of the project and the diversity of technological solutions used, in order to ensurethe secure archiving of the data and the transfer of the information between the nodes.The goal of this project was to develop an onboarding suite consisting of simple scripts,supplemented by documentation, that would help newcomers to the EGA federation in orderunderstand in depth the main concepts, while enabling them to get involved in the developmentof the technology as quickly as possible.At the same time we aimed to identify existing technologies and standards across FEGA nodesthat can be used as a reference to upcoming nodes.]]></summary></entry><entry><title type="html">Operator dashboard for controlling the NeIC Sensitive Data Archive</title><link href="https://index.biohackrxiv.org//2023/04/06/nstjb.html" rel="alternate" type="text/html" title="Operator dashboard for controlling the NeIC Sensitive Data Archive" /><published>2023-04-06T00:00:00+00:00</published><updated>2023-04-06T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/04/06/nstjb</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/04/06/nstjb.html"><![CDATA[<p>Human genome and phenome data is classified as special categories data under the EU GDPR legislation (Art. 9 GDPR). This requires special care to be taken when processing and reusing this data for research. To enable this in a compliant way, a federated approach was applied to the existing European Genome-phenome Archive ([EGA(https://ega-archive.org/)]) (Freeberg et al., 2022), creating the Federated EGA ([FEGA(https://ega-archive.github.io/ FEGA-onboarding/#what-is-federated-ega)]) (EGA Consortium, n.d.) in 2022. The Nordic countries, Norway, Finland and Sweden, together with Spain and Germany, represent the first federated partners.In the Nordics we have collaborated around our own implementation for our federated EGA nodes. We have done this under the umbrella of the Nordic e-Infrastructure Collaboration (NeIC)[https://neic.no/] (NeIC, n.d.), where we have had three projects over the last 7 years: Tryggve1 (NeIC, 2014-2017), Tryggve2 (NeIC, 2017-2020) and now Heilsa (NeIC, 2021-2024).As we in the nordics now move into production there is a need for both system administrators and helpdesk staff to be able to control and inspect the system. We need to answer questions related to operations, identify errors in order to better manage the services and infrastructure. To standardize this workflow and make the system easier to use, we decided to build a Minimal Viable Product (MVP) for such an “Operator Dashboard” during the ELIXIR Biohackathon 2022.</p>]]></content><author><name>Johan Viklund</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Human genome and phenome data is classified as special categories data under the EU GDPR legislation (Art. 9 GDPR). This requires special care to be taken when processing and reusing this data for research. To enable this in a compliant way, a federated approach was applied to the existing European Genome-phenome Archive ([EGA(https://ega-archive.org/)]) (Freeberg et al., 2022), creating the Federated EGA ([FEGA(https://ega-archive.github.io/ FEGA-onboarding/#what-is-federated-ega)]) (EGA Consortium, n.d.) in 2022. The Nordic countries, Norway, Finland and Sweden, together with Spain and Germany, represent the first federated partners.In the Nordics we have collaborated around our own implementation for our federated EGA nodes. We have done this under the umbrella of the Nordic e-Infrastructure Collaboration (NeIC)[https://neic.no/] (NeIC, n.d.), where we have had three projects over the last 7 years: Tryggve1 (NeIC, 2014-2017), Tryggve2 (NeIC, 2017-2020) and now Heilsa (NeIC, 2021-2024).As we in the nordics now move into production there is a need for both system administrators and helpdesk staff to be able to control and inspect the system. We need to answer questions related to operations, identify errors in order to better manage the services and infrastructure. To standardize this workflow and make the system easier to use, we decided to build a Minimal Viable Product (MVP) for such an “Operator Dashboard” during the ELIXIR Biohackathon 2022.]]></summary></entry><entry><title type="html">Enabling profile updates through the Data Discovery Engine (DDE)</title><link href="https://index.biohackrxiv.org//2023/04/04/3b9gp.html" rel="alternate" type="text/html" title="Enabling profile updates through the Data Discovery Engine (DDE)" /><published>2023-04-04T00:00:00+00:00</published><updated>2023-04-04T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/04/04/3b9gp</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/04/04/3b9gp.html"><![CDATA[<p>Bioschemas is a grassroots community effort to improve FAIRness of resources in the Life sciences by defining specific Life Science metadata schemas and exposing that metadata from resources that have adopted it. Now that some initial types have been adopted directly into schema.org, an improved mechanism is required to reignite community engagement and encourage profile development. The current process for creating or updating Bioschemas profiles and types is technical and convoluted which creates accessibility issues that can hamper community participation. As adoption of Bioschemas grows and more of the Life Science community considers contributing specific types and profiles, a more accessible creation/modification process is necessary to avoid a loss in engagement. To address this issue, and to drive further Bioschemas adoption, the community has exploited the Data Discovery Engine (DDE) for profile and type development. DDE provides a schema registry and user-friendly tools for creating and editing schemas. The goal of this project is to update existing Bioschemas community profiles in a targeted and crowd-sourced manner, add new profiles as required, and to ensure the documentation is fit for purpose to enable further Bioschemas contributions, at scale.</p>]]></content><author><name>Ginger Tsueng</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Bioschemas is a grassroots community effort to improve FAIRness of resources in the Life sciences by defining specific Life Science metadata schemas and exposing that metadata from resources that have adopted it. Now that some initial types have been adopted directly into schema.org, an improved mechanism is required to reignite community engagement and encourage profile development. The current process for creating or updating Bioschemas profiles and types is technical and convoluted which creates accessibility issues that can hamper community participation. As adoption of Bioschemas grows and more of the Life Science community considers contributing specific types and profiles, a more accessible creation/modification process is necessary to avoid a loss in engagement. To address this issue, and to drive further Bioschemas adoption, the community has exploited the Data Discovery Engine (DDE) for profile and type development. DDE provides a schema registry and user-friendly tools for creating and editing schemas. The goal of this project is to update existing Bioschemas community profiles in a targeted and crowd-sourced manner, add new profiles as required, and to ensure the documentation is fit for purpose to enable further Bioschemas contributions, at scale.]]></summary></entry><entry><title type="html">Streamlining data brokering from Research Data Management platforms to ELIXIR Repositories</title><link href="https://index.biohackrxiv.org//2023/02/23/mwk9f.html" rel="alternate" type="text/html" title="Streamlining data brokering from Research Data Management platforms to ELIXIR Repositories" /><published>2023-02-23T00:00:00+00:00</published><updated>2023-02-23T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/02/23/mwk9f</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/02/23/mwk9f.html"><![CDATA[<p>Mobilizing data from data producers to data deposition databases is an integral service that research data management (RDM) platforms could offer. However, brokering the heterogeneous mixture of scientific data requires systems that are compatible with the diverse (meta)data models of the different RDM platforms, and diverse submission routes of different domain/techniques-specific repositories.Existing tools for brokering of research (meta)data in life sciences often are technique or domain specific and aimed at only one specific deposition database at a time, which does not reflect the way scientific projects are often conducted. As a result, infrastructure providers or research laboratories have to invest resources in manual curation and mapping of (meta)data in order to help researchers deposit their outputs into specialized repositories.This BioHackathon 2022 project specifically focused on designing and implementing a prototype of a data brokering system from ISA-JSON to multiple ELIXIR Deposition Databases, starting with the European Nucleotide Archive (ENA). Specifically, we started from a ISA-JSON file exported from the DataHub, a metadata management platform (an instance of the FAIRDOM-SEEK software) which uses the well-established ISA (Investigation Study Assay) framework to describe multi-omics metadata and link to the location of data files.During this project we performed a high-level mapping of the ISA-JSON schema to the ENA XML files necessary for metadata submission. We also described a flexible, sustainable and domain/technique-agnostic brokering strategy from ISA-JSON to multiple ELIXIR deposition databases and developed a prototype of an EBI multi-repositories converter tool.</p>]]></content><author><name>Flora D&apos;Anna</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Mobilizing data from data producers to data deposition databases is an integral service that research data management (RDM) platforms could offer. However, brokering the heterogeneous mixture of scientific data requires systems that are compatible with the diverse (meta)data models of the different RDM platforms, and diverse submission routes of different domain/techniques-specific repositories.Existing tools for brokering of research (meta)data in life sciences often are technique or domain specific and aimed at only one specific deposition database at a time, which does not reflect the way scientific projects are often conducted. As a result, infrastructure providers or research laboratories have to invest resources in manual curation and mapping of (meta)data in order to help researchers deposit their outputs into specialized repositories.This BioHackathon 2022 project specifically focused on designing and implementing a prototype of a data brokering system from ISA-JSON to multiple ELIXIR Deposition Databases, starting with the European Nucleotide Archive (ENA). Specifically, we started from a ISA-JSON file exported from the DataHub, a metadata management platform (an instance of the FAIRDOM-SEEK software) which uses the well-established ISA (Investigation Study Assay) framework to describe multi-omics metadata and link to the location of data files.During this project we performed a high-level mapping of the ISA-JSON schema to the ENA XML files necessary for metadata submission. We also described a flexible, sustainable and domain/technique-agnostic brokering strategy from ISA-JSON to multiple ELIXIR deposition databases and developed a prototype of an EBI multi-repositories converter tool.]]></summary></entry><entry><title type="html">An evaluation of EDAM coverage in the Tools Ecosystem and prototype integration of Galaxy and WorkflowHub systems</title><link href="https://index.biohackrxiv.org//2023/02/16/79kje.html" rel="alternate" type="text/html" title="An evaluation of EDAM coverage in the Tools Ecosystem and prototype integration of Galaxy and WorkflowHub systems" /><published>2023-02-16T00:00:00+00:00</published><updated>2023-02-16T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/02/16/79kje</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/02/16/79kje.html"><![CDATA[<p>Here we report the results of a project started at the BioHackathon Europe 2022. Its goals were to cross-compare and analyze the metadata centralized in the Tools Ecosystem, and linked to the EDAM ontology, as well as to explore methods for connecting tools used in registered Galaxy workflows (i.e. WorkflowHub entries) to the annotations available in bio.tools.</p>]]></content><author><name>Lucie Lamothe</name></author><category term="BH22EU" /><summary type="html"><![CDATA[Here we report the results of a project started at the BioHackathon Europe 2022. Its goals were to cross-compare and analyze the metadata centralized in the Tools Ecosystem, and linked to the EDAM ontology, as well as to explore methods for connecting tools used in registered Galaxy workflows (i.e. WorkflowHub entries) to the annotations available in bio.tools.]]></summary></entry></feed>