NBDC/DBCLS BioHackathon, Fukuoka, Japan, 2019
Preprints
-
The COVID-19 epidemiology and monitoring ontology
The novel COVID-19 infectious disease emerged and spread, causing high mortality and morbidity rates worldwide. In the OBO Foundry, there are more than one hundred ontologies to share and analyse large-scale datasets for biological and biomedical sciences. However, this pandemic revealed that we lack tools for an efficient and timely exchange of this epidemiological data which is necessary to assess the impact of disease outbreaks, the efficacy of mitigating interventions and to provide a rapid response. In this study we present our findings and contributions for the bio-ontologies community. -
Characterization of Potential Drug Treatments for COVID-19 using Twitter
Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 280 million tweets of COVID-19 chatter to identify discourse around potential treatments. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning methods to aid in this task. By applying these methods we are able to recover almost 21% additional data than with traditional methods. -
Determining a novel feature-space for SARS-CoV-2 sequence data
The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the “Machine learning” track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research. -
Global analysis of human SARS-CoV-2 infection and host-virus interaction
As part of the virtual BioHackathon 2020, we formed a working group that focused on the analysis of gene expression in the context of COVID-19. More specifically, we performed transcriptome analyses on published datasets in order to better understand the interaction between the human host and the SARS-CoV-2 virus.The ideas proposed during this hackathon were divided into five projects. Projects 1 and 2 aimed to identify human genes that are important in the process of viral infection of human cells. Projects 3 and 4 aimed to take the candidate genes identified in projects 1 and 2, as well as by independent studies, and relate them to clinical information and to possible therapeutic interventions. Finally, Project 5 aimed to package and containerize software and workflows used and generated here in a reusable manner, ultimately providing scalable and reproducible workflows. -
Comparison of SARS-CoV-2 variants with INSaFLU and galaxyproject
Development of workflows for NGS data analysis have facilitated the study of sequences. Such workflows have their own advantages and challenges based on the algorithms they use. As a part of this study for Biohackathon 2020, we have compared the SARS-CoV-2 variant outputs of INSaFLU workflow with those analyzed by galaxyproject/SARS-CoV-2. Within 24 samples, 597 variants were found to be shared between two workflows, with almost half of them found within the coding sequence of replicase polyprotein 1ab. Within the shared variants, number of non-synonymous variants were considerably higher and nearly half of the variants were multiallelic. Prospective studies could help us evaluate the accuracy of these variants. -
Logic Programming for the Biomedical Sciences
As part of the one week Biohackathion 2019 in Fukuoka Japan, we formed a working group on logic programming for the biomedical sciences. Logic programming is understood by many bioinformaticians when it is presented in the form of relational SQL queries or SPARQL queries. More advanced logic programming, however, is underutilized in bioinformatics.