1. Accueil
  2. EN
  3. Studying at ULB
  4. Find your course
  5. UE
GEST-S420

Advanced Data Science and Machine Learning

academic year
2024-2025

Course teacher(s)

Pierre DEVILLE (Coordinator)

ECTS credits

5

Language(s) of instruction

french

Course content

Data are everywhere. They pervade our world. Most of our daily routines do nowadays leave digital traces: sending emails, visiting a website, calling a friend, posting content on a social network, or using a loyalty card at the supermarket. In addition, we have witnessed over the last few years an explosion of connected devices, obviously including smartphones but also more specific devices in the area of health or domotics for instance. The number of such connected devices generating, collecting and sharing data could reach 75 billions in 2025. Other figures related to this exponential data generation are staggering: in the last two years alone, the astonishing 90% of the world’s data has been created. The volume of generated data more than doubles every year and in 2023, the size of the digital data universe will then approximately reach 100 trillion gigabytes and the value of the big data market will approximately reach $200 billions.

Simultaneously, our capacity to collect and store those data is increasing extremely fast as well, meaning that we are now able to keep track of this huge amount of information. This fantastic growth in digital information is what we call Big Data: Data that we generate and acquire far more rapidly than the rate at which we process, analyse and exploit it.

Indeed, despite this flood of digital traces, few initiatives have succeeded at efficiently leveraging large-scale digital traces to resolve the many challenges we are facing in the information business. This observation is partly due to the emergence of ever growing unstructured data such as images, videos or texts (which represent more than 80% of the generated data) and the inadequacy of traditional approaches to manage and analyse such kinds of data. 

Consequently, new concepts and approaches aiming at resolving these issues have been introduced over the last few years. Numerous marketing jargons, if not too many, have been used to describe this new analytics paradigm: Artificial Intelligence, Advanced Analytics, Machine Learning, Big Data Analytics, Deep learning, Natural Language Processing or Cognitive Computing. 

Given the rapid emergence and attention of such concepts in the industry, but also the apparent complexity and confusion they may bring, we believe it is crucial to provide a better understanding of Artificial Intelligence in the context of Big Data. In this course, we will perform a deep-dive into some of the cutting-edge algorithms and analytical engines within A.I, as well as how large scale data can be stored and managed. 

More concretely, the course will be structured around 4 main modules, each containing a mix of theoretical and hands-on exercices. We will also engage with external experts from various industries.

Module 1 - NoSQL and distributed systems

Principles and advantages of non-relational and distributed database architectures, with a practical introduction to MongoDB through Python.

Module 2 - Recommendation Engine

Deep-dive into the different types of Machine Learning algorithms related to recommendations, from content-based methods to collaborative filtering and matrix factorisation.

Module 3 - Image Recognition

Introduction to neural networks and deep learning paradigm in the context of image processing, ranging from simple perceptron models to convolutional neural networks.

Module 4 - Naturale Language Processing

This module will focus on the analysis of large-scale textual data. From basic textual principles, to learning semantical representation, as well as advanced speech processing.

Objectives (and/or specific learning outcomes)

  • Understand the opportunities, challenges and limits associated to Artificial Intelligence in the context of large-scale data
  • Manage and exploit different types of large-scale data in the context of NoSQL databases
  • Discover, understand and exploit cutting-edge Artificial Intelligence algorithms to solve practical business problems

Prerequisites and Corequisites

Required and Corequired knowledge and skills

  • Background in Python
  • Background in basic principles of Analytics

Teaching methods and learning activities

30h of modules (Theoretical introduction and hands-on session on Python Notebook)

6h of external speakers

2 group projects (3 students)

Other information

Contacts

pierre.deville at ulb.be

Campus

Solbosch

Evaluation

Method(s) of evaluation

  • Other

Other

Group Projects
Written Exam

Language(s) of evaluation

  • english

Programmes