Meetings
BioHackSWAT4HCLS 2025
BioHackathon Europe 2024
3rd BioHackathon Germany
DBCLS BioHackathon 2024
ELIXIR INTOXICOM
Recent preprints
-
Determining a novel feature-space for SARS-CoV-2 sequence data
The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the “Machine learning” track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research. -
Global analysis of human SARS-CoV-2 infection and host-virus interaction
As part of the virtual BioHackathon 2020, we formed a working group that focused on the analysis of gene expression in the context of COVID-19. More specifically, we performed transcriptome analyses on published datasets in order to better understand the interaction between the human host and the SARS-CoV-2 virus.The ideas proposed during this hackathon were divided into five projects. Projects 1 and 2 aimed to identify human genes that are important in the process of viral infection of human cells. Projects 3 and 4 aimed to take the candidate genes identified in projects 1 and 2, as well as by independent studies, and relate them to clinical information and to possible therapeutic interventions. Finally, Project 5 aimed to package and containerize software and workflows used and generated here in a reusable manner, ultimately providing scalable and reproducible workflows. -
Comparison of SARS-CoV-2 variants with INSaFLU and galaxyproject
Development of workflows for NGS data analysis have facilitated the study of sequences. Such workflows have their own advantages and challenges based on the algorithms they use. As a part of this study for Biohackathon 2020, we have compared the SARS-CoV-2 variant outputs of INSaFLU workflow with those analyzed by galaxyproject/SARS-CoV-2. Within 24 samples, 597 variants were found to be shared between two workflows, with almost half of them found within the coding sequence of replicase polyprotein 1ab. Within the shared variants, number of non-synonymous variants were considerably higher and nearly half of the variants were multiallelic. Prospective studies could help us evaluate the accuracy of these variants. -
Logic Programming for the Biomedical Sciences
As part of the one week Biohackathion 2019 in Fukuoka Japan, we formed a working group on logic programming for the biomedical sciences. Logic programming is understood by many bioinformaticians when it is presented in the form of relational SQL queries or SPARQL queries. More advanced logic programming, however, is underutilized in bioinformatics. -
Data validation and schema interoperability
Validating RDF data becomes necessary in order to ensure data compliance against the conceptualization model it follows, e.g., schema or ontology behind the data, and improve data consistency and completeness. There are different approaches to validate RDF data, for instance, JSON schema, particularly for data in JSONLD format, as well as Shape Expression and Shapes Constraint Language, which can be used with other serialization as well, e.g., RDF/XML or Turtle. Currently, no validation approach is prevalent regarding others, selection commonly depends on data characteristics, background knowledge and personal preferences . In some cases, the approaches are interchangeable; however, that is not always the case, making it necessary to identify a subset among them that can be seamlessly translated from one to another. During the NBDC/DBCLS 2019 BioHackathon, we worked on a variety of topics related to RDF data validation, including (i) development of ShEx shapes for a number of datasets, (ii) development of a tool to semi-automatically create ShEx shapes, (iii) improvements to the RDFShape tool, and (iv) enabling validation schema conversion from one format to the other. Here we report on our BioHackathon achievements.
- 16
- 17
- •