DSA4264: Sense-making Case Analysis

Course description

Interested in applying your data science skills for the public good? In this course, we will learn how data science and AI can be used to tackle public sector challenges and improve societal outcomes. We start with an overview of Singapore's public sector and the range of policy issues Singapore grapples with, before exploring examples of how data science is used in the public sector. We then dive into three content areas: geospatial data analysis, natural language processing and LLMs, and responsible AI, before sharpening your skills in data science scoping and technical communication. The course culminates in a group project focused on applying your data science skills and knowledge to real problems. Join us on an exciting journey of learning how to use data science for the public good!

What you will learn from this course:

How data science is used in the Singapore public sector
Technical knowledge and practical skills for delivering good data science and AI projects
Working in a team to apply data science to a public policy problem

View AY2024/25 Projects →

Course outline

In the first half of the course, we go through content and skills that will help you for your group project, covering both technical aspects, like geospatial and text data analysis, and crucial soft skills, like technical communication and project scoping. In the second half of the course, you will be given time to focus on your group projects, with optional consultations to help you along the way. The group presentations will be held in the last two weeks of the semester.

Week 1 (13 Jan 2026): Data science in the Singapore public sector
Week 2 (22 Jan 2026): Analysing geospatial data
Week 3 (29 Jan 2026): Analysing text data + introduction to LLMs
Week 4 (5 Feb 2026): Scoping data science projects + group project kickoff
Week 5 (12 Feb 2026): AI safety and fairness

We will take a 2-week break for CNY and reading week. Note that the scoping document is due on 20 Feb 2026.

Week 7 (5 Mar 2026): Technical communication
Week 8 (12 Mar 2026): Building LLM applications with OpenAI by Gabriel Chua, Developer Experience Engineer at OpenAI (Note: OpenAI credits will be provided)
Week 9-11 (19 Mar 2026 - 2 Apr 2026): Virtual consultations (no in-person class)
Week 12 (9 Apr 2026): Presentations (Problem 1)
Week 13 (16 Apr 2026): Presentations (Problem 2)

Note: This course outline is mostly confirmed, but may be subject to minor changes.

Course requirements

To do well in this course, students should:

Have a strong understanding of key machine learning and data science concepts
Be proficient in programming with R or Python
Be comfortable with essential development tools (e.g. Git, venv, Docker)
Have a basic grasp of the Singapore public sector and policy issues
Be interested in applying data science to public policy problems

Useful readings

To help you prepare for the course, here are some recommended online resources:

About the lecturer

I am a Senior Data Scientist at GovTech Singapore. I'm currently the technical co-lead for the AI Practice's Responsible AI team, which focuses on applied research and experimentation for AI safety, fairness, and robustness. Our team has developed and open-sourced several tools, such as LionGuard for localised safety (see version 1 and version 2), KnowOrNot for out-of-knowledge-base robustness, LLM guardrails for off-topic prompts and system prompt leakage, and benchmarks like MinorBench (child safety) and RabakBench (localised safety). We also have a Responsible AI playbook to improve the public sector's technical understanding of key responsible AI concepts and tools.

Prior to this, I was the data science team lead at the Ministry of Manpower's Co-Lab unit, working on a wide range of data analytics, machine learning, and LLM-related projects to support the ministry's policymaking and operations. Before joining the Singapore government, I was the Lead Data Scientist at Lovelytics in Washington D.C., where I was responsible for the company’s data science projects with external clients.

I graduated from Columbia University in 2019 with a MA in Quantitative Methods in the Social Sciences (Data Science Focus), and from the University of Oxford in 2018 with a BA (Hons) in Philosophy, Politics and Economics.

See my website here for more details.

Course description

Course outline

Course requirements

Useful readings

Data Science in the Public Sector

Geospatial Data Analysis

Natural Language Processing & LLMs

AI Safety & Fairness

About the lecturer