Agora: Bringing Together Datasets, Algorithms, Models and More in a Unified Ecosystem [Vision]

SIGMOD Record, 49(4) 2020
Publication Date: 1.12.2020

Abstract

Data science and artificial intelligence are driven by a plethora of diverse data-related assets, including datasets, data streams, algorithms, processing software, compute resources, and domain knowledge. As providing all these assets requires a huge investment, data science and artificial intelligence technologies are currently dominated by a small number of providers who can afford these investments. This leads to lock-in effects and hinders features that require a flexible exchange of assets among users. In this paper, we introduce Agora, our vision towards a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience. Agora (i) treats assets as first-class citizens and leverages a fine-grained exchange of assets, (ii) allows for combining assets to novel applications, and (iii) flexibly executes such applications on available resources. As a result, it enables easy creation and composition of data science pipelines as well as their scalable execution. In contrast to existing data management systems, Agora operates in a heavily decentralized and dynamic environment: Data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. Agora presents novel research directions for the data management community as a whole: It requires to combine our traditional expertise in scalable data processing and management with infrastructure provisioning as well as economic and application aspects of data, algorithms, and infrastructure.