Meetings

Recent preprints

  • Progress on Data Stewardship Wizard during BioHackathon Europe 2020

    We used the Virtual BioHackathon Europe 2020 to work on a number of projects for improvement of the data stewardship wizard: (a) We made first steps to analysis of what is needed to make all questions and answers machine actionable (b) We worked on supporting the Horizon 2020 Data Management Plan Template (c) Several new integrations were made, e.g. to ROR and Wikidata (d) we made a draft plan for supporting multiple languages and (e) we implemented many suggestions for improvement of the knowledge model that had been suggested to us over the past time. Quickly after the BioHackathon, the adapted knowledge model, new integrations and the H2020 template have been made available to all users of the wizard.
  • Disease and pathway maps for Rare Diseases

    In this article we present a workflow for construction of prototype rare disease maps based on the phenotypic description of a rare disease. We use stable disease and phenotype identifiers to i) retrieve disease-associated genes and genetic variants, ii) identify relevant mechanistic components from pathways, disease maps and interaction repositories, and iii) assemble these components into single diagram, available for visualisation and pathway analysis. The workflow allows construction of prototype diagrams representing mechanisms related to a rare disease, useful for further refinement and data interpretation.
  • TogoEx: the integration of gene expression data

    Report on the group ‘TogoEx’ in the BioHackathon Europe 2019.
  • Characterization of Potential Drug Treatments for COVID-19 using Twitter

    Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 280 million tweets of COVID-19 chatter to identify discourse around potential treatments. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning methods to aid in this task. By applying these methods we are able to recover almost 21% additional data than with traditional methods.
  • Determining a novel feature-space for SARS-CoV-2 sequence data

    The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the “Machine learning” track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.
  • Global analysis of human SARS-CoV-2 infection and host-virus interaction

    As part of the virtual BioHackathon 2020, we formed a working group that focused on the analysis of gene expression in the context of COVID-19. More specifically, we performed transcriptome analyses on published datasets in order to better understand the interaction between the human host and the SARS-CoV-2 virus.The ideas proposed during this hackathon were divided into five projects. Projects 1 and 2 aimed to identify human genes that are important in the process of viral infection of human cells. Projects 3 and 4 aimed to take the candidate genes identified in projects 1 and 2, as well as by independent studies, and relate them to clinical information and to possible therapeutic interventions. Finally, Project 5 aimed to package and containerize software and workflows used and generated here in a reusable manner, ultimately providing scalable and reproducible workflows.
  • Comparison of SARS-CoV-2 variants with INSaFLU and galaxyproject

    Development of workflows for NGS data analysis have facilitated the study of sequences. Such workflows have their own advantages and challenges based on the algorithms they use. As a part of this study for Biohackathon 2020, we have compared the SARS-CoV-2 variant outputs of INSaFLU workflow with those analyzed by galaxyproject/SARS-CoV-2. Within 24 samples, 597 variants were found to be shared between two workflows, with almost half of them found within the coding sequence of replicase polyprotein 1ab. Within the shared variants, number of non-synonymous variants were considerably higher and nearly half of the variants were multiallelic. Prospective studies could help us evaluate the accuracy of these variants.