Important! The first class will be held on 13 Jan 2026 (Tuesday) 6-9pm at UTown LT-52. All subsequent classes (from 22 Jan 2026 onwards) will be held on Thursdays 6-9pm at LT33 (Block S17).
Course description
Interested in applying your data science skills for the public good? In this course, we will learn how data science and AI can be used to tackle public sector challenges and improve societal outcomes. We start with an overview of Singapore's public sector and the range of policy issues Singapore grapples with, before exploring examples of how data science is used in the public sector. We then dive into three content areas: geospatial data analysis, natural language processing and LLMs, and responsible AI, before sharpening your skills in data science scoping and technical communication. The course culminates in a group project focused on applying your data science skills and knowledge to real problems. Join us on an exciting journey of learning how to use data science for the public good!
What you will learn from this course:
- How data science is used in the Singapore public sector
- Technical knowledge and practical skills for delivering good data science and AI projects
- Working in a team to apply data science to a public policy problem
View AY2024/25 Projects →
Course outline
In the first half of the course, we go through content and skills that will help you for your group project, covering both technical aspects, like geospatial and text data analysis, and crucial soft skills, like technical communication and project scoping. In the second half of the course, you will be given time to focus on your group projects, with optional consultations to help you along the way. The group presentations will be held in the last two weeks of the semester.
- Week 1 (13 Jan 2026): Data science in the Singapore public sector
- Week 2 (22 Jan 2026): Analysing geospatial data
- Week 3 (29 Jan 2026): Analysing text data + introduction to LLMs
- Week 4 (5 Feb 2026): Scoping data science projects + group project kickoff
- Week 5 (12 Feb 2026): AI safety and fairness We will take a 2-week break for CNY and reading week. Note that the scoping document is due on 20 Feb 2026.
- Week 7 (5 Mar 2026): Technical communication
- Week 8 (12 Mar 2026): To be confirmed
- Week 9-11 (19 Mar 2026 - 2 Apr 2026): Virtual consultations (no in-person class)
- Week 12 (9 Apr 2026): Presentations (Problem 1)
- Week 13 (16 Apr 2026): Presentations (Problem 2)
Note: This course outline is mostly confirmed, but may be subject to minor changes.
Course requirements
To do well in this course, students should:
- Have a strong understanding of key machine learning and data science concepts
- Be proficient in programming with R or Python
- Be comfortable with essential development tools (e.g. Git, venv, Docker)
- Have a basic grasp of the Singapore public sector and policy issues
- Be interested in applying data science to public policy problems
Useful readings
To help you prepare for the course, here are some recommended online resources:
Data Science in the Public Sector
- Solving Real World Problems in the Public Service with AI - Chang Sau Sheong
- AI Practice Technical Blog - GovTech Singapore
- AI in Cybersecurity: Fighting scams with AI and overcoming data poisoning - GovTech Singapore
- data.gov.sg - Open Government Products
Geospatial Data Analysis
- Tutorial 1.2 - Spatial analysis with Python - Henrikki Tenkanen
- Mapping Motor Vehicle Collisions in New York City - Todd W. Schneider
- A linguistic streetmap of Singapore - Michelle Fullwood
Natural Language Processing & LLMs
- LLM Course - Hugging Face
- The Illustrated Word2Vec - Jay Alammar
- The Illustrated Transformer - Jay Alammar
- Training language models to follow instructions with human feedback - OpenAI
AI Safety & Fairness
- Responsible AI Playbook - GovTech Singapore
- Machine Bias - ProPublica
- Inside Amsterdam’s high-stakes experiment to create fair welfare AI - MIT Technology Review
About the lecturer
I am a Senior Data Scientist at GovTech Singapore. I'm currently the technical co-lead for the AI Practice's Responsible AI team, which focuses on applied research and experimentation for AI safety, fairness, and robustness. Our team has developed and open-sourced several tools, such as LionGuard for localised safety (see version 1 and version 2), KnowOrNot for out-of-knowledge-base robustness, LLM guardrails for off-topic prompts and system prompt leakage, and benchmarks like MinorBench (child safety) and RabakBench (localised safety). We also have a Responsible AI playbook to improve the public sector's technical understanding of key responsible AI concepts and tools.
Prior to this, I was the data science team lead at the Ministry of Manpower's Co-Lab unit, working on a wide range of data analytics, machine learning, and LLM-related projects to support the ministry's policymaking and operations. Before joining the Singapore government, I was the Lead Data Scientist at Lovelytics in Washington D.C., where I was responsible for the company’s data science projects with external clients.
I graduated from Columbia University in 2019 with a MA in Quantitative Methods in the Social Sciences (Data Science Focus), and from the University of Oxford in 2018 with a BA (Hons) in Philosophy, Politics and Economics.
See my website here for more details.