Hey all — I just submitted my Master’s thesis, and would really appreciate feedback from people thinking about non-task value alignment or LLM transparency metrics.
I explored whether Reinforcement Learning from Human Feedback (RLHF), using a constitutional rubric, could align a language model with transparency as a civic value. The case: Amsterdam’s parking permit system.
Methods:
Model: Mistral-7B-Instruct v3
Training: PPO using a 6-dimensional transparency rubric (Explainability, Completeness, Source Attribution, Procedural Clarity, Accuracy, Limitation Disclosure)
whether this rubric-based evaluator is generalizable
how others might have measured non-task values like “transparency”
if this work might overlap with existing alignment metric toolkits
I’ll post the paper & code when cleaned up.
Abstract
Large Language Models in government services often lack alignment with democratic values, particularly transparency, risking opaque AI-citizen interactions that undermine public trust. This thesis investigates whether Reinforcement Learning from Human Feedback (RLHF) can systematically embed transparency in government AI systems.
Using Amsterdam's parking regulations as a critical case, we fine-tuned a Mistral-7B model through constitutional RLHF, where a six-dimensional transparency rubric guided Proximal Policy Optimization. This approach was benchmarked against supervised fine-tuning across 300 citizen queries, with an independent LLM evaluating transparency outcomes.
The RLHF model achieved a 61%-win rate (p < .001), with substantial improvements in Information Completeness (+21.6%) and Source Attribution (+19.5%). Critically, explicit constitutional constraints dominated the reward signal (81%), suggesting that democratic values require deliberate encoding rather than emergent learning. However, transparency gains produced a 6.6% decrease in factual accuracy, revealing an inherent tension between comprehensive disclosure and precision.
These findings show that while RLHF offers a viable framework for value alignment in government AI, it cannot resolve fundamental democratic trade-offs through technical means alone. Responsible deployment requires integrating technical alignment with institutional governance, citizen participation, and explicit acknowledgment that competing public values, transparency versus accuracy, require democratic deliberation rather than algorithmic optimization.
Note: The present figure shows each of the three phases and the respective element
The choice of RLAIF over traditional human feedback approaches was mainly motivated by the need for scalable and consistent evaluation of complex transparency criteria. Government transparency encompasses multiple dimensions that require systematic assessment, making AI-based evaluation particularly suitable for this domain while acknowledging the limitations of not including human validation in the current study (Adam, 2025).
Dataset Construction and Preparation
A comprehensive knowledge base of Amsterdam parking regulations was constructed through systematic web scraping of official government sources. The scraping pipeline, implemented in R using the rvest package, targeted content from 5 pages across the English portal (https://www.amsterdam.nl/en/parking/apply-resident-parking-permit/) and Dutch portals (https://www.amsterdam.nl/parkeren-verkeer/) for Dutch only documents (e.g. Blue Zones, parking locations, new-build project rules).
The scraping process adhered to ethical web scraping practices including robots.txt compliance and implementation of 2-second delays between requests to minimize server load. Quality assurance procedures included validation of content length (minimum 100 characters), verification of parking-related content through keyword matching, and manual review by a native Dutch speaker to ensure accuracy and completeness.
Keywords: AI governance, transparency, RLHF, constitutional AI, public administration, value alignment
Hey all — I just submitted my Master’s thesis, and would really appreciate feedback from people thinking about non-task value alignment or LLM transparency metrics.
I explored whether Reinforcement Learning from Human Feedback (RLHF), using a constitutional rubric, could align a language model with transparency as a civic value. The case: Amsterdam’s parking permit system.
Methods:
Key Results
Takeaways
Would love feedback — especially on:
I’ll post the paper & code when cleaned up.
Abstract
Large Language Models in government services often lack alignment with democratic values, particularly transparency, risking opaque AI-citizen interactions that undermine public trust. This thesis investigates whether Reinforcement Learning from Human Feedback (RLHF) can systematically embed transparency in government AI systems.
Using Amsterdam's parking regulations as a critical case, we fine-tuned a Mistral-7B model through constitutional RLHF, where a six-dimensional transparency rubric guided Proximal Policy Optimization. This approach was benchmarked against supervised fine-tuning across 300 citizen queries, with an independent LLM evaluating transparency outcomes.
The RLHF model achieved a 61%-win rate (p < .001), with substantial improvements in Information Completeness (+21.6%) and Source Attribution (+19.5%). Critically, explicit constitutional constraints dominated the reward signal (81%), suggesting that democratic values require deliberate encoding rather than emergent learning. However, transparency gains produced a 6.6% decrease in factual accuracy, revealing an inherent tension between comprehensive disclosure and precision.
These findings show that while RLHF offers a viable framework for value alignment in government AI, it cannot resolve fundamental democratic trade-offs through technical means alone. Responsible deployment requires integrating technical alignment with institutional governance, citizen participation, and explicit acknowledgment that competing public values, transparency versus accuracy, require democratic deliberation rather than algorithmic optimization.
The choice of RLAIF over traditional human feedback approaches was mainly motivated by the need for scalable and consistent evaluation of complex transparency criteria. Government transparency encompasses multiple dimensions that require systematic assessment, making AI-based evaluation particularly suitable for this domain while acknowledging the limitations of not including human validation in the current study (Adam, 2025).
Dataset Construction and Preparation
A comprehensive knowledge base of Amsterdam parking regulations was constructed through systematic web scraping of official government sources. The scraping pipeline, implemented in R using the rvest package, targeted content from 5 pages across the English portal (https://www.amsterdam.nl/en/parking/apply-resident-parking-permit/) and Dutch portals (https://www.amsterdam.nl/parkeren-verkeer/) for Dutch only documents (e.g. Blue Zones, parking locations, new-build project rules).
The scraping process adhered to ethical web scraping practices including robots.txt compliance and implementation of 2-second delays between requests to minimize server load. Quality assurance procedures included validation of content length (minimum 100 characters), verification of parking-related content through keyword matching, and manual review by a native Dutch speaker to ensure accuracy and completeness.
Keywords: AI governance, transparency, RLHF, constitutional AI, public administration, value alignment