<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://index.biohackrxiv.org//feed/by_tag/BH21EU.xml" rel="self" type="application/atom+xml" /><link href="https://index.biohackrxiv.org//" rel="alternate" type="text/html" /><updated>2026-04-10T13:10:20+00:00</updated><id>https://index.biohackrxiv.org//feed/by_tag/BH21EU.xml</id><title type="html">BioHackrXiv Preprints</title><subtitle>Preprints for BioHackathons</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">Executing workflows in the cloud with WESkit</title><link href="https://index.biohackrxiv.org//2023/02/21/2z6nu.html" rel="alternate" type="text/html" title="Executing workflows in the cloud with WESkit" /><published>2023-02-21T00:00:00+00:00</published><updated>2023-02-21T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/02/21/2z6nu</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/02/21/2z6nu.html"><![CDATA[<p>With the exponential increase in genomic data, analyzing and processing large datasets has become a challenging task in healthcare. To address this issue, the Global Alliance for Genomics and Health (GA4GH) has proposed a set of community standards for enabling the adoption of FAIR principles for data, software, and infrastructure. These standards promote the concept of sending analysis and processing workflows to the data rather than transferring large datasets, thereby increasing efficiency and data security. In this paper, we present the outcomes of the ELIXIR Biohackathon 2021 project, where we worked on our software WESkit, which implements the GA4GH WES standard for running Snakemake and Nextflow workflows. During the hackathon, we implemented basic GA4GH TRS support, deployed a cloud platform, and added S3 support for downloading result files.</p>]]></content><author><name>Philip Reiner Kensche</name></author><category term="BH21EU" /><summary type="html"><![CDATA[With the exponential increase in genomic data, analyzing and processing large datasets has become a challenging task in healthcare. To address this issue, the Global Alliance for Genomics and Health (GA4GH) has proposed a set of community standards for enabling the adoption of FAIR principles for data, software, and infrastructure. These standards promote the concept of sending analysis and processing workflows to the data rather than transferring large datasets, thereby increasing efficiency and data security. In this paper, we present the outcomes of the ELIXIR Biohackathon 2021 project, where we worked on our software WESkit, which implements the GA4GH WES standard for running Snakemake and Nextflow workflows. During the hackathon, we implemented basic GA4GH TRS support, deployed a cloud platform, and added S3 support for downloading result files.]]></summary></entry><entry><title type="html">CiTO support for BioHackrXiv</title><link href="https://index.biohackrxiv.org//2023/02/03/6rjvc.html" rel="alternate" type="text/html" title="CiTO support for BioHackrXiv" /><published>2023-02-03T00:00:00+00:00</published><updated>2023-02-03T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/02/03/6rjvc</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/02/03/6rjvc.html"><![CDATA[<p>In this paper we present the work executed on BioHackrXiv during the international ELIXIR BioHackathon in Barcelona, Spain, 2021.</p>]]></content><author><name>Egon Willighagen</name></author><category term="BH21EU" /><summary type="html"><![CDATA[In this paper we present the work executed on BioHackrXiv during the international ELIXIR BioHackathon in Barcelona, Spain, 2021.]]></summary></entry><entry><title type="html">Addressing sex bias in biological databases worldwide</title><link href="https://index.biohackrxiv.org//2023/02/02/n9dkg.html" rel="alternate" type="text/html" title="Addressing sex bias in biological databases worldwide" /><published>2023-02-02T00:00:00+00:00</published><updated>2023-02-02T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2023/02/02/n9dkg</id><content type="html" xml:base="https://index.biohackrxiv.org//2023/02/02/n9dkg.html"><![CDATA[<p>Precision medicine aims at tailoring treatments to individual patient needs. In this context, artificial intelligence (AI)-based technologies are viewed as revolutionary since they have the capacity to identify key features that link genomic and phenotypic traits at the individual level. AI techniques therefore depend on the quantity and quality of patient data. When variables like sex, age, or race are ignored in sample records, it can result in biased predictions as they will not be considered in the training of the AI algorithm. To this end, the European Genome-phenome Archive (EGA) took action in 2018 and put into place a rule that requires data providers to declare the sex of donor samples uploaded into their repository to improve data quality and prevent the spread of biased results. In this work we quantified biases in sex classification over time in human data from studies deposited in EGA and the database of Genotypes and Phenotypes (dbGaP), which represents the EGA’s equivalent in the USA. The main result is that the EGA policy is effective to fight sex classification biases because there are significantly less samples classified as unknown after 2018 in this repository than in dbGaP. Additionally, we qualitatively assessed public opinion on this issue. A survey addressed to users, creators, maintainers, and developers of biological databases revealed that specialized training and additional knowledge about diversity criteria are required. Based on our findings, we raise awareness of sample bias problems and provide a list of recommendations for enhancing biomedical research practices.</p>]]></content><author><name>Victoria Ruiz-Serra</name></author><category term="BH21EU" /><summary type="html"><![CDATA[Precision medicine aims at tailoring treatments to individual patient needs. In this context, artificial intelligence (AI)-based technologies are viewed as revolutionary since they have the capacity to identify key features that link genomic and phenotypic traits at the individual level. AI techniques therefore depend on the quantity and quality of patient data. When variables like sex, age, or race are ignored in sample records, it can result in biased predictions as they will not be considered in the training of the AI algorithm. To this end, the European Genome-phenome Archive (EGA) took action in 2018 and put into place a rule that requires data providers to declare the sex of donor samples uploaded into their repository to improve data quality and prevent the spread of biased results. In this work we quantified biases in sex classification over time in human data from studies deposited in EGA and the database of Genotypes and Phenotypes (dbGaP), which represents the EGA’s equivalent in the USA. The main result is that the EGA policy is effective to fight sex classification biases because there are significantly less samples classified as unknown after 2018 in this repository than in dbGaP. Additionally, we qualitatively assessed public opinion on this issue. A survey addressed to users, creators, maintainers, and developers of biological databases revealed that specialized training and additional knowledge about diversity criteria are required. Based on our findings, we raise awareness of sample bias problems and provide a list of recommendations for enhancing biomedical research practices.]]></summary></entry><entry><title type="html">Mapping OHDSI OMOP Common Data Model and GA4GH Phenopackets for COVID-19 disease epidemics and analytics</title><link href="https://index.biohackrxiv.org//2022/11/26/ep3xh.html" rel="alternate" type="text/html" title="Mapping OHDSI OMOP Common Data Model and GA4GH Phenopackets for COVID-19 disease epidemics and analytics" /><published>2022-11-26T00:00:00+00:00</published><updated>2022-11-26T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2022/11/26/ep3xh</id><content type="html" xml:base="https://index.biohackrxiv.org//2022/11/26/ep3xh.html"><![CDATA[<p>The COVID-19 crisis demonstrates a critical requirement for rapid and efficient sharing of data to facilitate the global response to this and future pandemics. Our project aims are to enhance interoperability between health and research data by mapping Phenopackets and OMOP schemas, and representing COVID-19 metadata using the FAIR principles to enable discovery, integration and analysis of genotypic and phenotypic data. Here, we present our outcomes after one week of BioHacking together 17 participants (10 new to the project), from different countries (CH, US and in EU), and continents.</p>]]></content><author><name>Núria Queralt-Rosinach</name></author><category term="BH21EU" /><summary type="html"><![CDATA[The COVID-19 crisis demonstrates a critical requirement for rapid and efficient sharing of data to facilitate the global response to this and future pandemics. Our project aims are to enhance interoperability between health and research data by mapping Phenopackets and OMOP schemas, and representing COVID-19 metadata using the FAIR principles to enable discovery, integration and analysis of genotypic and phenotypic data. Here, we present our outcomes after one week of BioHacking together 17 participants (10 new to the project), from different countries (CH, US and in EU), and continents.]]></summary></entry><entry><title type="html">Bioschemas data harvesting project report</title><link href="https://index.biohackrxiv.org//2022/03/25/y6gbq.html" rel="alternate" type="text/html" title="Bioschemas data harvesting project report" /><published>2022-03-25T00:00:00+00:00</published><updated>2022-03-25T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2022/03/25/y6gbq</id><content type="html" xml:base="https://index.biohackrxiv.org//2022/03/25/y6gbq.html"><![CDATA[<p>The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration.</p>]]></content><author><name>Alasdair J. G. Gray</name></author><category term="BH21EU" /><summary type="html"><![CDATA[The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration.]]></summary></entry><entry><title type="html">DS Wizard Meets DAISY: A Romance Solving Data Protection Requirements in Data Management Planning</title><link href="https://index.biohackrxiv.org//2021/12/16/cuvqw.html" rel="alternate" type="text/html" title="DS Wizard Meets DAISY: A Romance Solving Data Protection Requirements in Data Management Planning" /><published>2021-12-16T00:00:00+00:00</published><updated>2021-12-16T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2021/12/16/cuvqw</id><content type="html" xml:base="https://index.biohackrxiv.org//2021/12/16/cuvqw.html"><![CDATA[<p>This report summarises our activities and achievements in integrating the Data Stewardship Wizard (DSW) and Data Information System (DAISY) tools during the ELIXIR BioHackathon Europe 2021. As a data information system for GDPR compliance, DAISY is focused on a single goal – gathering all information required for GDPR accountability of biomedical research projects. On the other hand, DSW is very flexible and can be used beyond data management planning. We worked on the integration between both tools on two fronts. Firstly, we created a new Knowledge Model in DSW together with a document output template to be able to generate a data protection impact assessment (DPIA). Secondly, we introduced a new integration type between projects in DSW and DAISY that allows the querying of DAISY data upon document generation in DSW. Both of these independent activities brought successful results that were polished and published after the actual BioHackathon. Finally, we provide the related materials as an on-demand training course in the ELIXIR eLearning Platform.</p>]]></content><author><name>Marek Suchánek</name></author><category term="BH21EU" /><summary type="html"><![CDATA[This report summarises our activities and achievements in integrating the Data Stewardship Wizard (DSW) and Data Information System (DAISY) tools during the ELIXIR BioHackathon Europe 2021. As a data information system for GDPR compliance, DAISY is focused on a single goal – gathering all information required for GDPR accountability of biomedical research projects. On the other hand, DSW is very flexible and can be used beyond data management planning. We worked on the integration between both tools on two fronts. Firstly, we created a new Knowledge Model in DSW together with a document output template to be able to generate a data protection impact assessment (DPIA). Secondly, we introduced a new integration type between projects in DSW and DAISY that allows the querying of DAISY data upon document generation in DSW. Both of these independent activities brought successful results that were polished and published after the actual BioHackathon. Finally, we provide the related materials as an on-demand training course in the ELIXIR eLearning Platform.]]></summary></entry><entry><title type="html">Network analysis of specimen co-collection</title><link href="https://index.biohackrxiv.org//2021/12/07/4ahng.html" rel="alternate" type="text/html" title="Network analysis of specimen co-collection" /><published>2021-12-07T00:00:00+00:00</published><updated>2021-12-07T00:00:00+00:00</updated><id>https://index.biohackrxiv.org//2021/12/07/4ahng</id><content type="html" xml:base="https://index.biohackrxiv.org//2021/12/07/4ahng.html"><![CDATA[<p>We took data on the collectors of specimens from natural history collections. Co-collectors of specimens were extracted from the data and a network of co-collection was constructed. This network was used to analyze the age and gender balance of collectors and how this has changed with time. Men outnumber women in the network, but women participation increases with time, as are the all female pairs of collectors. Most collector pairs have less than 50 years age difference and it is suggested that co-collections above this age difference should be checked for errors. This project has proven the value of analyzing co-collection data, but also highlighted the many additional avenues for future research on this subject.</p>]]></content><author><name>Sofie Meeus</name></author><category term="BH21EU" /><summary type="html"><![CDATA[We took data on the collectors of specimens from natural history collections. Co-collectors of specimens were extracted from the data and a network of co-collection was constructed. This network was used to analyze the age and gender balance of collectors and how this has changed with time. Men outnumber women in the network, but women participation increases with time, as are the all female pairs of collectors. Most collector pairs have less than 50 years age difference and it is suggested that co-collections above this age difference should be checked for errors. This project has proven the value of analyzing co-collection data, but also highlighted the many additional avenues for future research on this subject.]]></summary></entry></feed>