Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

course_projects

Coin Counter: Deep Learning for Robust Multi-Class Coin Detection and Classification

This project explores multiple deep learning models: CNNs, ResNet50, and Vision Transformers, for classifying coins in images. We use advanced segmentation techniques, custom over-segmentation filtering, and data augmentation to improve generalization. Developed for the EPFL IAPR course, this system demonstrates competitive accuracy and robustness for automated coin recognition tasks.

Download Slides

Distributed Movie Recommendation Pipelines with Apache Spark

This project builds a full-scale movie recommendation system using Apache Spark, incorporating data analytics, keyword-based filtering. Implemented on the MovieLens dataset, the system supports efficient data preprocessing, incremental rating updates, and personalized movie recommendations through LSH and predictive models.

NameCoin on Peerster: A Blockchain-Based Decentralized DNS Implementation

This project explores the design and implementation of a decentralized DNS system using blockchain and a gossip-based peer protocol. The system supports secure domain registration, updates, transfers, and resolution with robust anti-entropy synchronization and Proof-of-Work consensus. The project evaluates network resilience, consensus reliability, and mining efficiency.

Download Report

Multimodal Modeling of Entrepreneurial Teams: Predicting Opportunity Generation from Audio and Text

This project combines personality and emotion detection with machine learning to predict the number of ideas generated by entrepreneurial teams. We process multimodal data (transcripts and audio) to extract MBTI traits, emotional profiles, and speaker features, and use these to model team-level idea generation. This work advances behavioral modeling in early-stage startup evaluation.

Download Report

Command-Collecting Robot: Embedded Systems Project for Restaurant Automation

A robotic system developed for real-time order collection in a restaurant environment, combining line-following, object detection, and GUI-based control. The robot uses image processing, multithreading, and Bluetooth communication to collect orders from tables and transmit them for analysis and optimization.

Download Report

journal

Speeding Up Graph Similarity Matching with Efficient Tensor Ops

The graph similarity algorithm for matching image-text graph pairs was too slow, particularly in the pairwise comparison step

Reducing Padding Overhead with Sequence Bucketing

Group similar-length samples to minimize VRAM waste and stabilize throughput in NLP tasks.

Resolving OOM in PPO/GRPO with Large Models

PPO and GRPO training with models >7B caused OOM errors on A100 GPUs due to multiple full model replicas. This post details optimization strategies to fix it.

Speeding Up Distributed Training with vLLM, Flash Attention, and Checkpoint Resuming

Improving distributed training speed using vLLM, Flash Attention, LoRA, gradient checkpointing, and stable checkpoint recovery across multi-node systems.

Scaling Data Mining with API Efficiency Under TPM Limits

Efficiently mining structured text or graphs using GPT-4 APIs while staying under 2M TPM.

Fixing Mixed Precision Underutilization for Speed Gains

Correctly configuring AMP and autocast led to 2× faster training on NVIDIA GPUs.

Speeding Up Evaluation with Cached Tokenization

Avoiding redundant tokenizer calls accelerated validation by up to 3× during fine-tuning.

publications

Generative Approaches to Kinetic Parameter Inference in Metabolic Networks via Latent Space Exploration

Published in bioRxiv, 2025

We present a novel generative framework that leverages latent space exploration to generate dynamic metabolic models with targeted properties. This work introduces a new approach to controllably infer kinetic parameters in large-scale biological systems using pretrained neural network generators such as REKINDLE and RENAISSANCE.

Download Paper

research_projects

Unified Graph-Based Matching: Text-Image Cross-Modal Retrieval using Scene Graph Alignment for Remote Sensing Applications

3 minute read

We introduce a unified framework for cross-modal retrieval in remote sensing, aligning scene graphs from satellite imagery and text descriptions. Using the STAR dataset, our method encodes graph structures from both modalities and aligns them via contrastive learning. We evaluate multiple similarity strategies—node, edge, global, and hybrid—and propose a benchmark protocol for retrieval. This work opens up new directions for structured vision-language understanding in geospatial domains.

Prompting Beyond Retrieval with GRAD: A Generative Retrieval-Aligned Demonstrator for Robust Few-Shot Reasoning

2 minute read

This project was conducted at DLab, EPFL. We propose GRAD: a generative, retrieval-free demonstration generator for LLMs. GRAD tailors concise, input-specific prompts to improve multi-step reasoning under strict token limits. Unlike RAG, GRAD requires no external retrieval and adapts across out of distribution (OOD) domains. Trained only on math data, it generalizes to OOD tasks in physics, chemistry, and CS. It enables scalable, low-cost few-shot learning in resource-constrained settings. This work has been submitted to EMNLP 2025. The code repository will be made public upon acceptance.

Kinetic Parameter Inference in Metabolic Networks via Latent Space Exploration

1 minute read

Published: April 05, 2025

We present a novel framework to interpret and control the latent spaces of generative neural network models for kinetic metabolic modeling. By perturbing structured latent spaces learned via REKINDLE or RENAISSANCE, our method generates new dynamic models with targeted properties such as specific response times, regulatory bottlenecks, or alternative physiologies, unlocking deeper insight and reusability across metabolic contexts.

Download Paper

GemmaEdu: Enhancing Scientific Learning via Fine-Tuned Language Models and RAG

1 minute read

We developed an educational chatbot built on the quantized Gemma 2 7B model, optimized with Direct Preference Optimization (DPO) and enhanced with Retrieval-Augmented Generation (RAG). By leveraging fine-tuning on student-generated preference data and incorporating relevant external documents, our system significantly improves accuracy in answering STEM multiple-choice questions, outperforming baseline models like Mistral and Llama2.

Download Report

From Novice to Expert: Dimensionality Reduction and Policy Distillation in Reinforcement Learning for Motor Control

2 minute read

This project investigates how to accelerate motor skill acquisition in reinforcement learning using curriculum-based learning, dimensionality reduction, and policy distillation. Using the Myosuite Baoding balls task, we explore how expert policies can be transferred to novice agents via PCA-reduced feature and action spaces, offering an efficient alternative to prolonged training times.

Download Report

Learning-Based Multi-Robot Lane Navigation: Scalable Trajectory Prediction using Neural Networks

1 minute read

This project was conducted at DISAL, EPFL. We explore trajectory generation for multi-robot navigation using neural networks. We propose a scalable alternative to Webots simulation by training models using graph neural network, reinforcement and imitation learning. The final approach produces accurate trajectories in a lane-based environment, balancing precision and efficiency in robotic control.

Download Report

work

AI Research Intern — AXA Group Operations

I led applied research and prototyping efforts in multimodal AI, focusing on cross-modal representation learning, graph-based embeddings, and neural search systems. I developed scalable pipelines to generate scene graphs from satellite imagery and knowledge graphs from textual data, and to align their graph embeddings in a shared representation space.

Machine Learning Intern - Pixalione

Developed a machine learning pipeline to forecast daily ad spend on Google Ads based on client-specific campaign data. Deployed a web backend for dynamic budget strategy adjustment, automated alerts, and integration with Azure Cloud infrastructure.

Student Assistant — EPFL

During my studies, I served as a teaching assistant for multiple courses, assisting in lectures, labs, and tutorials

Oussama Gabouj

Sitemap

Pages

Posts

course_projects

journal

publications

research_projects

work