This is a response to Gwern's Guardian Angels post and draws terms from it quite extensively.
Epistemic status: draft and to-be-updated once I do more experiments.
Suppose you have a Guardian Angel (GA) model and you want to know how well it predicts your knowledge/personality/values/preferences.
However, you don't have an extensive list of personal writings to personalize the model (because you're not one of the most prominent internet writers), and you don't want to do a lot of data labeling/supervision either. How would you train/evaluate the model then?
Motivating examples
We give the GA model a held-out replay of your twitter-browsing session containing 10 tweets (could be either as text captions or a stiched-together screenshot). Can the model can actually predict which ones you'll actually click on?
Doing it correctly depends on a lot of personalization! At least for me, I don't click on most tweets on my timeline, and what I click depends a lot on my specific, mostly implicit taste, and twitter's recommender system doesn't seem to grok it well.
You're reading a technical blogpost/paper. Can the model predict which sections you'll read or which concepts you'll look up?
This depends on your knowledge (e.g. are you new to the domain?) and you taste (did you only read the abstract/intro before you closed the tab? or did you read it top-to-bottom?)
My plan
I vibe-coded a macOS app that continuously take 4fps screencasts alongside a 4fps webcam feed (for possible eye tracking), plus timestamp aligned user inputs, and sends it to Cloudflare's S3 equivalent.
So far we're at ~8 hours now, so roughly 1GB/hr, and doing this for 1 year is <10TB / <$150 on Cloudflare based on my usage patterns. (it goes up linearly if you keep them forever)
I intend to make a benchmark out of this using a bunch of heuristics, then try a bunch of methods to hill-climb it. Will report back if there's any updates.
Other things to read
Michael S. Bernstein's lab from StanfordHCI is thinking about user modeling among many other related things. (The company Simile.ai also spun off of it.)
This is a response to Gwern's Guardian Angels post and draws terms from it quite extensively.
Epistemic status: draft and to-be-updated once I do more experiments.
Suppose you have a Guardian Angel (GA) model and you want to know how well it predicts your knowledge/personality/values/preferences.
However, you don't have an extensive list of personal writings to personalize the model (because you're not one of the most prominent internet writers), and you don't want to do a lot of data labeling/supervision either. How would you train/evaluate the model then?
Motivating examples
My plan
I vibe-coded a macOS app that continuously take 4fps screencasts alongside a 4fps webcam feed (for possible eye tracking), plus timestamp aligned user inputs, and sends it to Cloudflare's S3 equivalent.
So far we're at ~8 hours now, so roughly 1GB/hr, and doing this for 1 year is <10TB / <$150 on Cloudflare based on my usage patterns. (it goes up linearly if you keep them forever)
I intend to make a benchmark out of this using a bunch of heuristics, then try a bunch of methods to hill-climb it. Will report back if there's any updates.
Other things to read