Moral Machine Project using Postgres, Dbt, AWS S3 and Streamlit
- Christoph Nguyen
- Jan 6
- 2 min read
Updated: Apr 26
#artificial intelligence #ai ethics
December 2024 | Christopher Nguyen
This project presents a data pipeline designed to align artificial intelligent (AI) systems with human values. The data used for this project comes from the Moral Machine Experiment, a platform for gathering human perspectives on moral decisions made by autonomous vehicles. Participants can visit the moral machine website to take the experiment, which provides 13 different ethical dilemma scenarios where an AV has to make a decision (e.g hitting a barrier, killing everyone in the car or hitting an elderly person).
Goal: Predict saved outcomes by an AV based on various moral dilemmas
Data source
The data can be accessed and downloaded from the OSF homepage as a .csv file. The specific data file used is the SharedResponseSurvey.csv, which contains responses from all participants who took the experiment and filled out the survey at the end.
OFS homepage: https://osf.io/3hvt2/
Tools:
- Postgres for local data storage 
- Dbt to transform data with SQL 
- Python and SQL for data handling, feature engineering, and ml modeling 
- AWS S3 for cloud storage 
- Data visualization and deployment using streamlit, pandas and plotly 
Data Migration
This project starts off with building the data engineering pipeline locally then adds an ml model to streamlit for model prediction and visualization
Transformation and data destinations
- Loaded raw moral machine experiment dataset into postgres pgAdmin locally 
- Connected postgres database to dbt 
- Transformed data in dbt staging to clean columns, remove illegal characters, NAs, and change data types 
- Created marts to prepare dataset for ml classifications on AV survival rates (survival_predictions.sql) 
- Trained ml models (trained_model.py) using the survival_prediction.sql table in marts and compared logistic regression, random forest classification, and XGboost classification with scikit-learn -- used logistic regression model as it had the highest accuracy = 0.7012 and F1 score = 0.69, caveat that the model could be improved upon with additional feature engineering 
- Uploaded the logistic_regression.pkl model to AWS S3 to be added to streamlit for model prediction 
- Decided to use streamlit cloud for public deployment for it simplicity, UI design, and cost effectiveness 
The final product details predictions on saved attributes based on different scenarios along with data visualizations of saved by country.
Here we have a demo of the streamlit app predicting survival rate for a human (hoomans) with a pedestrian present, no crossing signal or barrier present, country USA, and political and religious reviews null. The results show the model predicted roughly a 75% survival rate in this type of scenario.

Future research would include data visualizations on the ethical implications of designing an AV to prioritize certain groups over others.
Pipeline Workflow
This ETL pipeline semi-automates the workflow for transforming raw data from the moral machines experiment into machine learning ready analysis. Aspects that can be considered automated is using DBT to generate new tables and training various machine learning models based on those tables. We can also easily reconfigure the dbt tables and python ml models for different research needs. Additionally, we added the ml model to AWS S3 to be used across streamlit. There is no orchestration added into the workflow and most of the data processing are manually triggered across the different tools. For these reasons, this is a semi-automated ETL workflow.