The MIT AI Risk Initiative is seeking support to build LLM-augmented pipelines to accelerate evidence synthesis and systematic reviews for AI risks and mitigations. The initial contract is six months, and part-time, with the possibility of extension.
The immediate use case is to help build out modules to support our review of global organizations’ AI risk responses, where we identify public documents, screen for relevance, extract claims about AI risks/mitigations, and classify outputs against several taxonomies.
The bigger picture includes generalizing and adapting this pipeline to support living updates & extensions for our risk repository, incident tracker, mitigations review, and governance mapping work.
By contributing your skills to the MIT AI Risk Initiative, you’ll help us provide the authoritative data and frameworks that enable decision-makers across the AI ecosystem to understand & address AI risks.
What you’ll do:
Phase 1: Org review pipeline (Jan–Mar)
Build/improve modules for document identification, screening, extraction, and classification
Build/improve human validation / holdout sampling processes and interfaces so we can measure performance against humans at each step
Integrate modules into an end-to-end evidence synthesis pipeline
Ship something that helps us complete the org review by ~March
Phase 2: Generalization & learning (Mar onwards)
Refactor for reuse across different AI Risk Initiative projects (incidents, mitigations, governance mapping)
Implement adaptive example retrieval
Build change tracking: when prompts or criteria change, what shifts in outputs?
Help us understand where LLM judgments can exceed human performance and thus be fully automated, and what still needs human review (and design interfaces / processes to enable this)
Document architecture and findings for handoff
Required skills
Strong software engineering fundamentals
Hands-on experience building LLM pipelines
Python proficiency
Comfort working on ambiguous problems where "what should we build?" is part of the work
Can communicate clearly with researchers who aren't software engineers
Nice to have
Prior work in research, systematic review, or annotation/labeling contexts
Experience with evaluation/QA/human validation
Familiarity with embeddings + vector search for example retrieval
API integrations (Airtable or similar), Extract, Transform, Load (ETL)/scraping-adjacent work
The MIT AI Risk Initiative is seeking support to build LLM-augmented pipelines to accelerate evidence synthesis and systematic reviews for AI risks and mitigations. The initial contract is six months, and part-time, with the possibility of extension.
The immediate use case is to help build out modules to support our review of global organizations’ AI risk responses, where we identify public documents, screen for relevance, extract claims about AI risks/mitigations, and classify outputs against several taxonomies.
The bigger picture includes generalizing and adapting this pipeline to support living updates & extensions for our risk repository, incident tracker, mitigations review, and governance mapping work.
By contributing your skills to the MIT AI Risk Initiative, you’ll help us provide the authoritative data and frameworks that enable decision-makers across the AI ecosystem to understand & address AI risks.
What you’ll do:
Phase 1: Org review pipeline (Jan–Mar)
Phase 2: Generalization & learning (Mar onwards)
Required skills
Nice to have
Read more: https://futuretech.mit.edu/opportunities/ml-engineer---mit-ai-risk-initiative-contractor-part-time-6-months
Express interest:
https://mitfuturetech.atlassian.net/jira/core/form/a35da49a-3ed9-4722-8eda-2258b30bcc29
Please share with anyone relevant.