Universite Paris Cite Seminar Series on Data Analytics
in collaboration with the diNo group

Invited Seminar Talk




Data Virtual Machines: Simplifying Data Sharing, Exploration and Querying in Big Data Environments
Prof Damianos Chatziantoniou, Athens University of Economics and Business (Greece)


when: 23 March 2023, 2pm
where: online (email the organizer for connection details), and in-person:
room Turing Conseil, 7th floor, Universite Paris Cite, 45 Rue Des Saints Peres, Paris 75006


Abstract

Today’s analytics environments are characterized by a high degree of heterogeneity in terms of data systems, formats and types of analysis. Many occasions call for rapid, ad hoc, on demand construction of a data model that represents (parts of) the data infrastructure of an organization, including ML tasks. This data model is given to data scientists to play with (express reports, build ML models, explore, etc.) We present a novel graph-based conceptual model, the Data Virtual Machine (DVM) representing data (persistent, transient, derived) of an organization. A DVM can be built quickly and agilely, offering schema flexibility. It is amenable to visual interfaces for schema and query management. Data framing, a frequent querying/preprocessing task in analytics applications, is usually carried out by experienced data engineers employing SQL (in the presence of a relational data warehouse) or Python/R: a procedural approach with all the known drawbacks. Data frames over DVMs are expressed declaratively - and visually, via a simple and intuitive tool. This way, non-IT experts can be involved in data framing. In addition, query evaluation takes place within an algebraic framework with all the known benefits. I.e. a DVM enables the delegation of data engineering tasks to simpler users. We have seen analogous cases in the past, e.g. with the introduction of SQL. Finally, a DVM offers a formalism that facilitates data sharing, data portability and a single view of any entity – because a DVM’s node is an attribute and an entity at the same time. In this respect, DVMs can excellently serve as a data virtualization technique, an emerging trend in the industry. We argue that DVMs can have a significant practical impact in today’s big data environments.

Short Bio

Damianos Chatziantoniou received his B.Sc. in Applied Mathematics from the University of Athens and continued his studies in Computer Science at Courant Institute of Mathematical Sciences at New York University (M.Sc.) and Columbia University (Ph.D.) His academic research interests include big data systems, business intelligence, large-scale analytics and real-time analytics. He is currently a Professor at Athens University of Economics and Business (AUEB) in the Department of Management Science and Technology and Director of AUEB’s international Masters program in Business Analytics. He is also serving as an External Reviewer at "Business Analytics" and "Machine Learning and Data Sciences" Master’s programs at University College London. Prior to AUEB, Damianos was a tenure-track Assistant Professor at Stevens Institute of Technology in New Jersey, and held research collaborations with AT&T Research and Columbia Medical Informatics Department. He has published more than 40 articles at top conferences and journals, such as VLDB, ICDE, EDBT, KDD, SIGMOD, CIKM, Journal of Information Systems, Journal of Data and Knowledge Engineering and elsewhere. His research work has influenced Microsoft’s SQL Server (query processor), Oracle’s 8i and 9i Systems (Analytic Functions for OLAP), and ANSI SQL Standard (OLAP Amendment). Besides academia, he has been involved in several technology start-up companies. Panakea Software Inc. (founder, 1998), based in New York City, developed and marketed BI technology to make certain analytics easier to express and faster to evaluate. VoiceWeb (co-founder, 2001), based in Athens, focused on speech and telecom applications. Damianos also served as a senior research consultant in Aster Data, a pioneer of big data systems.


Hosted by: Themis Palpanas

List of past seminars