Meetings

Recent preprints

Apr 6, 2023

Operator dashboard for controlling the NeIC Sensitive Data Archive
Human genome and phenome data is classified as special categories data under the EU GDPR legislation (Art. 9 GDPR). This requires special care to be taken when processing and reusing this data for research. To enable this in a compliant way, a federated approach was applied to the existing European Genome-phenome Archive ([EGA(https://ega-archive.org/)]) (Freeberg et al., 2022), creating the Federated EGA ([FEGA(https://ega-archive.github.io/ FEGA-onboarding/#what-is-federated-ega)]) (EGA Consortium, n.d.) in 2022. The Nordic countries, Norway, Finland and Sweden, together with Spain and Germany, represent the first federated partners.In the Nordics we have collaborated around our own implementation for our federated EGA nodes. We have done this under the umbrella of the Nordic e-Infrastructure Collaboration (NeIC)[https://neic.no/] (NeIC, n.d.), where we have had three projects over the last 7 years: Tryggve1 (NeIC, 2014-2017), Tryggve2 (NeIC, 2017-2020) and now Heilsa (NeIC, 2021-2024).As we in the nordics now move into production there is a need for both system administrators and helpdesk staff to be able to control and inspect the system. We need to answer questions related to operations, identify errors in order to better manage the services and infrastructure. To standardize this workflow and make the system easier to use, we decided to build a Minimal Viable Product (MVP) for such an “Operator Dashboard” during the ELIXIR Biohackathon 2022. 1 minute read
Apr 6, 2023

Improving Metadata Collection and Aggregation in Plant Phenotyping Experiments with MIAPPE Wizard and DataPLANT
As part of the BioHackathon Germany 2022, we hereby report on the success of the two projects “MIAPPE Wizard: Enabling easy creation of MIAPPE-compliant ISA metadata for Plant Phenotyping Experiments” and “DataPLANT - Facilitating Research Data Management to combat the reproducibility crisis”. Shortly before the actual hackathon, it became apparent to the participants that close coordination between the projects would be very beneficial. Both projects aimed to improve the process of collecting and aggregating metadata on plant experiments, but with different approaches. less than 1 minute read
Apr 6, 2023

Onboarding suite for Federated EGA nodes
The European Genome-phenome Archive (EGA) (Freeberg et al., 2022) (also known as CentralEGA - cEGA) is a service for permanent archiving and sharing personally identifiable geneticand phenotypic data resulting from biomedical research projects. The Federated EGA (EGAConsortium, n.d.), consisting of the Central and Federated EGA nodes, will be a distributednetwork of repositories for sharing human -omics data and phenotypes. Each node of thefederation is responsible for its own infrastructure and the connection to the Central EGA.Currently, the adoption and deployment of a new federated node is challenging due to thecomplexity of the project and the diversity of technological solutions used, in order to ensurethe secure archiving of the data and the transfer of the information between the nodes.The goal of this project was to develop an onboarding suite consisting of simple scripts,supplemented by documentation, that would help newcomers to the EGA federation in orderunderstand in depth the main concepts, while enabling them to get involved in the developmentof the technology as quickly as possible.At the same time we aimed to identify existing technologies and standards across FEGA nodesthat can be used as a reference to upcoming nodes. 1 minute read
Apr 4, 2023

Enabling profile updates through the Data Discovery Engine (DDE)
Bioschemas is a grassroots community effort to improve FAIRness of resources in the Life sciences by defining specific Life Science metadata schemas and exposing that metadata from resources that have adopted it. Now that some initial types have been adopted directly into schema.org, an improved mechanism is required to reignite community engagement and encourage profile development. The current process for creating or updating Bioschemas profiles and types is technical and convoluted which creates accessibility issues that can hamper community participation. As adoption of Bioschemas grows and more of the Life Science community considers contributing specific types and profiles, a more accessible creation/modification process is necessary to avoid a loss in engagement. To address this issue, and to drive further Bioschemas adoption, the community has exploited the Data Discovery Engine (DDE) for profile and type development. DDE provides a schema registry and user-friendly tools for creating and editing schemas. The goal of this project is to update existing Bioschemas community profiles in a targeted and crowd-sourced manner, add new profiles as required, and to ensure the documentation is fit for purpose to enable further Bioschemas contributions, at scale. 1 minute read
Mar 15, 2023

Comparison of orthology finding tools using plant genes
Orthology finding tools are valuable for analyzing biological data from multiple species and predicting the functions of uncharacterized genes. Although several tools are available for this purpose, the characteristics of their results for plant genes are not well compared. In this hackathon, we examined three tools (OMA, OrthoDB, and Ensembl Plants) by extracting ortholog pairs between Arabidopsis and soybean and analyzing each result, focusing on five plant genes with varying degrees of conservation. We observed that changes in the taxonomic ranges of OMA and OrthoDB affected ortholog detection, and the range of ortholog detection across the three tools was inconsistent, suggesting the importance of comparing multiple tools to obtain more accurate information on orthologs. less than 1 minute read
Feb 23, 2023

Streamlining data brokering from Research Data Management platforms to ELIXIR Repositories
Mobilizing data from data producers to data deposition databases is an integral service that research data management (RDM) platforms could offer. However, brokering the heterogeneous mixture of scientific data requires systems that are compatible with the diverse (meta)data models of the different RDM platforms, and diverse submission routes of different domain/techniques-specific repositories.Existing tools for brokering of research (meta)data in life sciences often are technique or domain specific and aimed at only one specific deposition database at a time, which does not reflect the way scientific projects are often conducted. As a result, infrastructure providers or research laboratories have to invest resources in manual curation and mapping of (meta)data in order to help researchers deposit their outputs into specialized repositories.This BioHackathon 2022 project specifically focused on designing and implementing a prototype of a data brokering system from ISA-JSON to multiple ELIXIR Deposition Databases, starting with the European Nucleotide Archive (ENA). Specifically, we started from a ISA-JSON file exported from the DataHub, a metadata management platform (an instance of the FAIRDOM-SEEK software) which uses the well-established ISA (Investigation Study Assay) framework to describe multi-omics metadata and link to the location of data files.During this project we performed a high-level mapping of the ISA-JSON schema to the ENA XML files necessary for metadata submission. We also described a flexible, sustainable and domain/technique-agnostic brokering strategy from ISA-JSON to multiple ELIXIR deposition databases and developed a prototype of an EBI multi-repositories converter tool. 1 minute read
Feb 21, 2023

Executing workflows in the cloud with WESkit
With the exponential increase in genomic data, analyzing and processing large datasets has become a challenging task in healthcare. To address this issue, the Global Alliance for Genomics and Health (GA4GH) has proposed a set of community standards for enabling the adoption of FAIR principles for data, software, and infrastructure. These standards promote the concept of sending analysis and processing workflows to the data rather than transferring large datasets, thereby increasing efficiency and data security. In this paper, we present the outcomes of the ELIXIR Biohackathon 2021 project, where we worked on our software WESkit, which implements the GA4GH WES standard for running Snakemake and Nextflow workflows. During the hackathon, we implemented basic GA4GH TRS support, deployed a cloud platform, and added S3 support for downloading result files. less than 1 minute read

Meetings

Recent preprints

Operator dashboard for controlling the NeIC Sensitive Data Archive

Improving Metadata Collection and Aggregation in Plant Phenotyping Experiments with MIAPPE Wizard and DataPLANT

Onboarding suite for Federated EGA nodes

Enabling profile updates through the Data Discovery Engine (DDE)

Comparison of orthology finding tools using plant genes

Streamlining data brokering from Research Data Management platforms to ELIXIR Repositories

Executing workflows in the cloud with WESkit