Theme: Cloud Computing & Data Infrastructures

Large-Scale Data Management
(6 ECTS - 36h)

The ability to process large amounts of data is key to both industry and research today. As computing systems are getting larger, they generate more data that need to be analyzed to extract knowledge.

Data management infrastructures are growing fast, leading to the creation of large data centers and federations of data centers. Suitable software infrastructures should be used to store and process data in this context. Big Data software systems are built to take advantage of a large set of distributed resources to efficiently process massive amounts of data while being able to cope with failures that are frequent at such a scale.

Prerequisites

Relational databases, Programming

Contents

The course is divided into two parts: principles and systems for large-scale data management and advanced data management topics.

Principles and Systems for Large-Scale Data Management

Through lectures and practical sessions, this part provides an overview of the software systems that are used to store and process data at large scale. The following topics will be covered:

  • Map-Reduce programming model
  • In-memory data processing
  • Stream processing (data movement and processing)
  • Large scale distributed data storage (distributed file systems, NoSQL databases)

The challenges associated with performance and fault tolerance will also be discussed.

Advanced Data Management Topics

This part will cover advanced management of modern DBMSs, including indexing, query optimization, and transaction management. We will also cover data warehouse systems, and the difference between OLTP and OLAP systems, from the cube data models to implementations in DBMS.

Practical labs will be organized to illustrate the concepts.

Lecturers