[ Question ]

Value of building an online "knowledge web"

by Joe Kwon1 min read1st May 20208 comments

2

World Modeling
Personal Blog

Please excuse my naivety, would really like to hear from more knowledgable people about this.

I recently discovered a note-taking website called Roam which allows you to create pages, take bullet point notes in those pages, and use double brackets around phrases and words to create a new, doubly linked page. When you do this, you can see the all the connected pages in a visual; a graph where each node is a page. I think this tool is valuable because it allows me to externalize the connectedness of ideas and concepts with clarity.

I'm wondering why a tool like this hasn't been popularized, especially as a tool for quicker onboarding for people who want to learn about something very quickly? I want to know what I'm overlooking. It seems to me that having an organization of notes like this, with the most important TL;DRs and links to papers and websites for additional info for those interested in learning more, would save a lot of time and help create an understanding of how various ideas relate. Additionally, wouldn't a structure like this help with understanding how to build AI from an engineering perspective? Breaking down the desired functionalities into sub-topics, having an idea of how to put certain functionalities together, etc.

I'm an undergrad that learned about AI Safety through my school EA group. I've spent a lot of time in the past year trying to learn about this space and how I can contribute to it (from a cogsci->theory of mind->utility inference /value alignment perspective). A nontrivial amount of that time was spent on finding+identifying good information from various sources: mainly from LW/AN/80k/FHI/OpenAI pubs/Deepmind pubs... -- while attempting to piece the relationship of the subtopics all together.

In a sense, I found even the very roughly structured bibliography by CHAI (https://humancompatible.ai/bibliography) to be my main source of understanding how I could categorize the different work and ideas in AI Safety. Wouldn't building up a knowledge web on a platform like this be valuable in allowing a faster onboarding process, and making the field more accessible to people? I imagine it would be a lot easier for aspiring independent researchers without heavy technical and research backgrounds (like myself) to get caught up and work on projects. There would be more room and structure on plausible research projects for those who aren't in an organization like FHI/OpenAI/MIRI/etc to try to work on subproblems in AI Safety if they have a better sense of what they can effectively contribute to.


Thank you for reading!

World Modeling2
Personal Blog

2

New Answer
Ask Related Question
New Comment

4 Answers

I have not found to be an interlinked knowledge graph to be a very effective teaching tool, nor a very effective learning tool. In some sense, this has already been created with the internet, wikipedia, etc. There are various interlinked knowledge graphs available like The Climate Web ( https://app.thebrain.com/brains/2cfec560-6321-4d32-a42f-f04ad33f1092/thoughts/d9fcbf68-f835-517d-aa79-fe76f30cca47/notes )

I've tried various experiments with teaching using them, including giving them as resources to my students, presenting workshops from an interlinked knowledge graph which I could navigate if there were questions, and creating a crowdsourced one for a class.

The key problem with all of them is lack of linear learning path. Interlinked graphs of knowledge are excellent "WHAT" resources (https://www.lesswrong.com/posts/oPEWyxJjRo4oKHzMu/the-3-books-technique-for-learning-a-new-skilll#The__What__Book), however they are horrible at "How"( https://www.lesswrong.com/posts/oPEWyxJjRo4oKHzMu/the-3-books-technique-for-learning-a-new-skilll#The__How__Book) - there are too many paths, it's too easy to get lost. You need an expert to show you what's important, what's dependent on what else, and what would be most easily digested first.

I've been interested in this area for the last couple of years as well. Surprisingly I had not found https://arbital.com/ until very recently which has got to be the closest thing to what would be ideal.


The main problem seems to be the amount of refactoring/reframing that can be done. As mentioned by @__nobody 's answer, there is a fundamental problem in naming and concept drift. I say fundamental because it has become my belief that on a practical level defining words is essentially performed on a community level, not an individual. Coming up with narrative learning paths also requires authorship, the time spent needing to be compensated if not monetarily at minimum socially.


https://arbital.com/
seems like an almost exact solution, although strategically it might work better as a browser extension symbiotic with existing social networks, as creating a new social network in this day and age is an immense task. This is probably one of the leading reasons for its abandonment https://www.lesswrong.com/posts/kAgJJa3HLSZxsuSrf/arbital-postmortem
I'm incredibly sad such a network/tool does not exist currently. It seems the time is ripe for a place where the 'game' is to not only aggregate/decide/distribute knowledge, but also turn on itself and thus be able to aggregate/decide/distribute the best ways of aggregating/deciding/distributing knowledge.

I haven't created an account on their page, so this is based purely on what I'm seeing in the example collection / demo videos. It looks broadly similar enough to what I've been building/using over the last years[1] that I think a summary of my experiences with my own tool and the features that you will need might be useful:

In short: It looks awesome as long as you have only small amounts of content – and I think it may actually be awesome in those cases: For almost all my projects, I'm creating separate(!) maps/webs and collecting todos and their states to get a visual overview, and these are also useful to get back into the project if I come back months or years later – so they'd probably also help other people too. But… as things grow, it will get very confusing. (Unless you add lots of functionality into the tool and put in extra effort specifically to counter that.) All attempts to collect all my projects / ideas in a single big map have failed so far… That said, I haven't given up yet and am still trying to make it work.


Here's how things have been breaking down for me:

naming things is hard and over time, you'll pick subtly different names (unless you have a fast way to look up what you called the thing a couple of months ago), and then spelling variants will point to different pages and the web breaks down… Also, there will be drift / shift in meaning or goals – you explore a new sub-topic and suddenly it looks like a good idea to rename / move a page or even a whole category, in sync, across all pages. Without tool support for both of these, things will fail. (p=1.0, N=3; after adding fast search (not just prefix-based like autocomplete!), the 4th map/web died of ontology instead of spelling variants and the 5th isn't dead yet…) Name changes are made simpler if you have page IDs that are independent from the name (and actually Roam has that, too – so these two shouldn't be a problem. A difference is that they're using auto-generated IDs whereas I'm manually choosing mine and using them to encode hierarchy or other stuff that I want in there… I personally find that useful (as long as you don't over-ontologize), but YMMV….)

Next is unweighted linking of everything: If you see all the gazillion uses, sub-topics, … of linear algebra when you look at that page, that's not much better than seeing none of them. (Same for visual graphs.) Automatic sorting (or even just highlighting) by centrality / importance or, failing that, manually curated sub-lists are a necessity. Related, there's not hiding stuff outside the local project/topic: Separate small maps / collections, that can still refer to each other but are largely independent, seem to work better than having everything in one big blob. (Technically, that's basically equivalent to assigning a unique name/ID prefix to each project, but having to manually add that everywhere is draining. It's bad enough that I started working on splitting things up, in spite of all the new problems that brings up… To name just one, things aren't truly separate at the mechanical level, as you'll still have to rename/refactor in sync.) Also, graphs can pick a better layout if they're not constrained by the placement of nodes that you don't want to see anyway.

Beyond that point, I don't know yet… the last two are still only partially implemented. (And I'm not seeing anything like these in Roam… so you'll probably run into problems there eventually.) So far, it looks like that might be enough and from that point on experience in steering / organizing the thing becomes more important.[2]


On graphs: What I'm seeing in Roam is… underwhelming. (Same for the example linked by mr-hire.) Unless I've looked at bad examples, the graphs that Roam gets you are a fixed grid-based unordered (i.e. not ordered to minimize total edge lengths or something like that) mess and the only thing you have is that you can click a node and see its immediate neighbors highlighted? (Or the per-page graphs are essentially a linear list, presented as a very wide 1-deep "graph"?)

If that's what you're working with, then of course that won't be an effective learning tool. When I read you talking about the "clarity" of seeing the connectedness I thought what you'd get from Roam is much closer to something like this:

sample graph

This is a small-ish part of an older map of mine (with node labels censored)… rectangular nodes and the wide blue arrows show hierarchy, dotted arrows (not sure if there are any…) are references/mentions, solid arrows are dependencies. Green stuff is done, light beige stuff is dead/failed. The bright orange bubbles are temporary(!) todo markers – what do I want to achieve / understand now? The fat / colored bubbles are the response (a gradient from bright red thru purple and blue fading to white) – this is what you can work on next and how important it is. (Thin / gray nodes are off the active paths and can probably be ignored, nodes on the paths with open dependencies also show the max. flow through them in their border color so you can decide to cut further dependencies and bodge something in place of that sub-tree.) As you can mark multiple things at the same time (also with different weights/priorities), you can not only get local "to get X, you'll need A, B, C, …" information (plus information on the branching and where it might make sense to cut…) but you can also get "broad field" information. ("X, Y, Z all indirectly/weakly depend on A", so working on A might simplify work on all of these – something that you might not notice if you look at X, then Y, then Z one after another…)

If you have something like that – where you can ask questions and get responses based on topical dependencies and what you already know, and iterate those questions – I think that can be an effective learning tool. But I don't know for sure yet, this thing doesn't work well enough yet to manage large amounts of nodes/information… (My gut feeling currently says that you want to split things very fine-grained – every concept / theorem is its own node, so that you can mark them as done independently – and then when you work through, say, linear algebra, that'll be a lot of nodes. Not there yet, still not enough hiding/filtering…)

(Another thing I'm undecided on is whether I want/need "xor nodes" – "to get X, you'll need A or B or C; not A and B and C" – that might allow much fancier optimization but it also takes agency/information away from you and I'm not sure that you'd get the best decisions out of that, especially if map information is partial/wrong/incomplete and all that…)


As for why this isn't a thing yet… I guess it's (a) hard to make something that actually works well and lots of people try a bit and give up and then others see all those bad examples and conclude it can't work, so less people actually really try? Also, if you finally had the tooling, (b) getting it right would involve a lot of data entry, and you'd need more experience than you'd need to edit Wikipedia (I suspect closer to something like Wikidata) but it'd be a lot less work to just write a linear text book and be done with it. And (c) machine learning can help with the tagging involved in (b) and simplify that down to Wikipedia-level, but the fancy stuff has only happened in the last couple of years (and it'd be even more work on the tooling side)… so it's possible that something exists somewhere that works fairly well, but it's still rather unlikely?


[1] It's a terrible terrible terrible "organically grown" ~1.5KLoC Lua script (about 20% of that in a single function…) that, via Graphviz and Pandoc, generates (static) colorful graphs and HTML pages. Primary focus is tracking / prioritizing todos and projects, but that includes learning new stuff and recording knowledge. (Especially when approaching new areas of math, you tend to get loops (to understand A, you want to understand B, which can be done via C, which relies on A…) and one of the tasks will be to break up those loops… None of the existing tools I found were able to record and work with that. So that's how this started…)

[2] E.g. for my planning thing I now have broad categories like "knowledge" (static, non-actionable, non-directed), "projects" (active, non-actionable, directed / actively carving actionable todos), "farts" (inactive projects, ideas to do stuff, …), "beacons" (fuzzy long-term goals / directions to move in, grouping many of the projects), "spikes" (actionable project carvings) and specific (sub-)projects/tasks are repeatedly moving between these – 'projects.foo' grows a spike somewhere in its text, it moves to 'spikes.foo.how_random_is_enough', if I stop working on it it moves back into the project (and once restarted back to spikes…), and when done it gets a write-up and moves to 'knowledge.foo.random.spike' (for archival purposes), plus extra nodes like 'knowledge.foo.random.lcg_is_not_enough', 'knowledge.foo.random.pcg_works', … (for fast knowledge access). So far, this seems to finally reach the point where it starts to work… for me.

I tried to use Roam about a month ago. I like the idea, but Roam has bugs, when I use it from PC, and wasn't working on my laptop at all. It is just not ready to invest time.