Meetings

Recent preprints

Jan 3, 2024

Benchmarks for Bioinformatics Workflow Bake Offs
This BioHackathon Project focused on establishing a “Great Bake Off of Bioinformatics Workflows” by developing workflow-level benchmarks for evaluating tools in computational tasks. Initially tested in proteomics, the project expanded to genomics and metabolomics. Collaborating with ELIXIR Implementation Studies, the team created rudimentary benchmarks, aiming for formalization before production use. The project consolidated efforts to produce a minimum set of workflow-specific benchmarks, aligning tools and workflow definitions. Short-term goals include drafting benchmarks with examples, while long-term plans involve implementing them in the Workflomics project and Proteomics Community ELIXIR Implementation Studies for community sharing. less than 1 minute read
Jan 3, 2024

BioHackEU23 report: Enabling continuous RDM using Annotated Research Contexts with RO-Crate profiles for ISA
A prevailing paradigm in Research Data Management (RDM) is to publish research datasets in designated archives upon conclusion of a research process. However, it is beneficial to abandon the notion of final or static data artifacts and instead adopt a continuous approach towards working with research data, where data is constantly shared, versioned, and updated. This immutable yet evolving perspective allows for the application of existing technologies and processes from software engineering, such as continuous integration, release practices, and version management backed by decades of experience, and adaptable to RDM.To facilitate this, we propose the Annotated Research Context (ARC), a data and metadata layout convention based on the well-established ISA model for metadata annotation and implemented using Git repositories. ARCs are amenable towards frequent, lightweight data management operations, such as (meta)data validation and transformation. The Omnipy Python library is designed to help develop stepwise validated (meta)data transformations as scalable data flows that can be incrementally designed, updated, and rerun as requirements or data evolve.To demonstrate the concept of continuous RDM we will use Omnipy to define and orchestrate Git-backed CI/CD (Continuous Integration/Continuous Delivery) data flows to convert ISA metadata present in ARCs into validated RO-Crate representations adhering to the Bioschemas convention. A RO-Crate package combines the actual research data with its metadata description. Downstream, this allows semantic interpretation by Galaxy for e.g. workflow execution as well as machine-readable data access and data harvesting for search engines such as FAIDARE. 1 minute read
Dec 2, 2023

BioHackEU23 report: Extending interoperability of experimental data using modular queries across biomedical resources
This report provides an overview of the significant accomplishments achieved during the ELIXIR Biohackathon 2023 under Project 17: “Extending interoperability of experimental data using modular queries across biomedical resources”. The project diligently addressed four key aspects: the expansion of data resources, the creation of knowledge graphs, advancements in data visualization, and the development of a use-case-driven pipeline. The collective efforts during the Biohackathon aimed to enhance the integration and accessibility of experimental data across diverse biomedical resources by developing a tool named BioDataFuse. less than 1 minute read
Nov 28, 2023

The fourth annual Carnegie Mellon Libraries hackathon for biomedical data management, knowledge graphs, and deep learning
In October 2023, a group of 44 scientists hailing from several U.S. states, Canada, Poland, and Switzerland came together for a hybrid in-person and virtual hackathon. The event was jointly hosted by Carnegie Mellon University Libraries and DNAnexus, a California-based cloud computing and bioinformatics company. This collaborative effort revolved around the theme of “Data Management and Graph Extraction for Large Transformer Models in the Biomedical Space.” In the spirit of fostering collaboration, participants organized themselves into five teams, which ultimately resulted in the successful completion of four hackathon projects. These projects encompassed a wide range of topics, from detecting features contributing to virus susceptibility to validating models using knowledge graphs. Repositories for the hackathon projects are available at https://github.com/collaborativebioinformatics. We hope that the insights and experiences shared by these teams, as detailed in the following manuscript, will prove valuable to the broader scientific community. less than 1 minute read
Nov 11, 2023

Rendering co-author graphs using linked-open-data from Wikidata
Wikidata is the linked-open-data graph of the Wikimedia foundation with its most known sibling Wikipedia (Vrandečić, 2012). What Wikipedia is to text, Wikidata is to data. Like in Wikipedia linked-data can be added for everyone, by everyone. This makes Wikidata a very rich source of data. A substantial part of the data on Wikidata is about scientific publications and the authors of these publications (Taraborelli et al., 2016). Scholia is a tool that uses this data to create a profile page for authors and publications (Nielsen et al., 2017). This report describes a workflow to create co-author graphs using the data from Scholia. less than 1 minute read
Nov 3, 2023

Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization
The landscape of genomic wastewater surveillance in the context of infectious disease monitoring is rapidly evolving, and this came into sharp focus during the COVID-19 pandemic. Here we highlight the significance of wastewater surveillance as a passive monitoring system complementary to clinical genomic surveillance activities. Emphasizing the need for coordination, standardization, and the development of a unified catalog of software tools and services, we aim to streamline the implementation of end-to-end genomic wastewater surveillance pipelines.Key considerations such as defining variants, understanding antimicrobial resistance, and assessing viral fitness within the framework of wastewater surveillance are explored, linking to examples of respective tools and existing pipelines. The challenges of wastewater data analysis, the need for specialized tools and bioinformatics workflows, and the significance of integrated pipelines are also discussed in detail. The article presents case studies, including the V-pipe integrated bioinformatics workflow and the integration of tools into the Galaxy platform, underscoring their role in enhancing data analysis efficiency and standardization within the field.Overall, the review highlights the critical importance of continued research efforts to advance understanding and implementation of bioinformatic approaches in wastewater surveillance for the effective monitoring and management of infectious diseases. 1 minute read
Oct 26, 2023

Efforts to analyze pathways in non-model organisms
In addition to functional annotation of genes, annotating genes to pathways is important in current molecular biology.But, pathway diagrams are required to annotate genes to nodes of those.Therefore, it is important to draw pathway diagrams with assignment to genes and metabolites.Existing metabolic pathway databases focus on generic pathways, while secondary metabolism is emphasized in organisms producing useful substances.Moreover they cannot accept third party annotation of those data.A practical system for pathway analyses is therefore really needed.Following on from the previous BioHackathon (BH23), we first discussed how to create a database of pathway information in non-model species in a domestic version of the BioHackathon called BH23.9 held in Shirahama, Wakayama, Japan (25-29 September 2023).We then gave a tutorial on how to write a pathway diagram using PathVisio, which is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. Finally we tried to establish the conversion system from text data to Graphical Pathway Markup Language (GPML), which is called txt2gpml.txt2gpml will drastically reduce the time and effort required to create pathway diagrams.After a stimulus discussion in BH23 and BH23.9, we could clarify the current issues in the pathway analysis for non-model organisms. 1 minute read

Meetings

Recent preprints

Benchmarks for Bioinformatics Workflow Bake Offs

BioHackEU23 report: Enabling continuous RDM using Annotated Research Contexts with RO-Crate profiles for ISA

BioHackEU23 report: Extending interoperability of experimental data using modular queries across biomedical resources

The fourth annual Carnegie Mellon Libraries hackathon for biomedical data management, knowledge graphs, and deep learning

Rendering co-author graphs using linked-open-data from Wikidata

Exploring the landscape of the genomic wastewater surveillance ecosystem: a roadmap towards standardization

Efforts to analyze pathways in non-model organisms