The digitalization and integration of biodiversity information can generate substantial added value for existing data and yield novel scientific insights of relevance to bioeconomy, biotechnology, human health, and environmental protection. So far this potential has been exploited only rarely due the heterogeneity and fragmentation of data sources, and the little documentation, variable standards, and limited interoperability of data. For bacteria, research data are particularly diverse and broadly distributed; therefore these organisms will serve as the model group for the current project. The project DiASPora will establish an approach for synthesizing information for bacterial species by applying state-of-the-art data science methodology, genomics, and developing user-centric workflows. Extraction of phenotypic data from the microbiological literature will be achieved by large-scale text mining, applying artificial intelligence (AI) techniques that will be trained through the feedback of microbiologist curators. The data recovered will be hosted by the existing BacDive database and transformed into a machine readable and processable format using the Resource Description Framework (RDF). Subsequently, the transformed data will be used to establish a knowledge graph to generate innovative search options for the discovery of hidden data relationships. In parallel, phenotypic predictions will be derived from (meta)genomic data, through the application of metabolic models and comparison with the physiological and habitat data as obtained by data mining, and will be supported by an AI approach. The project is committed to an integral community engagement and an efficient dissemination of results. DiASPora builds upon the complementary expertise of three participating institutions, covering the fields of microbial databases and diversity research, bacterial genomics, text mining, artificial intelligence, and semantic technologies.
| Type | Activity |
|---|
|
Start
01.04.2020
End
31.03.2023
|
| Project type Third-party funding |
| Third-party funder Leibniz Wettbewerb |
| Funding organization Leibniz SAW |
| Funding reference number(s) K280/2019 |
| Coordinator facility DSMZ |
Project logo
|