I agree that formatting abstracts as single paragraph blocks is surprisingly bad for comprehension; I think it is because abstracts are deceptively difficult for the reader, as they tend to invoke a lot of extremely novel & unusual keywords/concepts and make new claims within the space of a few sentences (not infrequently dumping in many numbers & statistical results into parentheticals, which might have a dozen stats in less space than this), and that they are deceptively easy for the authors to read because they suffer from the curse of expertise. Once the reader has paid the cognitive tax of recalling and organizing all the concepts, then suddenly the abstract stops being so confusing.
Introspecting the experience, it feels as if the lack of explicit keywords like 'Results:' or their equivalent paragraph-breaks, is 'the straw that breaks the camel's back'. It's not that it is inherently difficult to understand a single run-on paragraph, it's that it is an extra burden at the worst possible time. (The same run-on paragraph would be read effortlessly a few paragraphs later, after much of the terminology has been introduced.)
I have sometimes tried to read a single-paragraph abstract, found my eyes glazing over as I lose track of the topic amidst the flurry of jargon (is this sentence part of the intro or is it methodology or...), and have to force myself back to the start, read it sentence by sentence, and wait for my understanding to catch up, at which point the abstract suddenly makes sense and I feel a bit frustrated with myself. (As a generalist, I read all sorts of abstracts and have to pay the 'abstract tax' each time, so I've been sensitized to the ways in which, say, CS & math abstracts tend to be much harder to read than explicitly-standardized keyworded medical abstracts reporting a clinical trial, and machine learning abstracts intermediate because they usually follow the standard organization but without keyword markers.)
This is also why it is so painful to read a series of 1-paragraph abstracts: you are being slammed in the face repeatedly by ultra-dense prose which rubs salt into the wounds by removing the typographical affordances you have been trained to expect.
What I do on Gwern.net is:
paragraphizer.py
GPT-3 API script which runs automatically on new annotations: if there are no newlines in the abstract, it calls the API with the abstract, asks it for a newline-split version, and if the new version with newlines removed== old version, returns it. It often fails, and I'm not sure why, because the task seems semantically quite simple. Probably the prompt is bad or I don't use enough shots.p
element - lists, blockquotes, tables etc), and prints out a warning so I will go and manually insert newlines.I always find the processed versions to be much more readable than the originals, and I hope it helps readers navigating a sea of references.
Have you considered switching to GPT-3.5 or -4? You can get much better results out of much less prompt engineering. GPT-4 is expensive but it's worth it.
It's currently at -003 and not the new ChatGPT 3.5 endpoint because when I dropped in the chat model name, the code errored out - apparently it's under a chat/
path and so the installed OA Py library errors out. I haven't bothered to debug it any further (do I need to specify the engine name as chat/turbo-gpt-3
or do I need to upgrade the library to some new version or what). I haven't even tried GPT-4 - I have the API access, just been too fashed and busy with other site stuff.
(Technical-wise, we've been doing a lot of Gwern.net refactoring and cleanup and belated documentation - I've written like 10k words the past month or two just explaining the link icon history, redirect & link archiving system, and the many popup system iterations and what we've learned.)
The better models do require using the chat endpoint instead of the completion endpoint. They are also, as you might infer, much more strongly RL trained for instruction following and the chat format specifically.
I definitely think it's worth the effort to try upgrading to gpt-3.5-turbo, and I would say even gpt-4, but the cost is significantly higher for the latter. (I think 3.5 is actually cheaper than davinci.)
If you're using the library you need to switch from Completion to ChatCompletion, and the API is slightly different -- I'm happy to provide sample code if it would help, since I've been playing with it myself, but to be honest it all came from GPT-4 itself (using ChatGPT Plus.) If you just describe what you want (at least for fairly small snippets), and ask GPT-4 to code it for you, directly in ChatGPT, you may be pleasantly surprised.
(As far as how to structure the query, I would suggest something akin to starting with a "user" chat message of the form "please complete the following:" followed by whatever completion prompt you were using before. Better instructions will probably get better results, but that will probably get something workable immediately.)
Yeah, I will at some point, but frontend work with Said always comes first. If you want to patch it yourself, I'd definitely try it.
https://github.com/gwern/gwern.net/pull/6
It would be exaggerating to say I patched it; I would say that GPT-4 patched it at my request, and I helped a bit. (I've been doing a lot of that in the past ~week.)
Do you have a link to a specific part of the gwern site highlighting this, and/or a screenshot?
What's there to highlight, really? The point is that it looks like a normal abstract... but not one-paragraph. (I've mused about moving in a much more aggressive Elicit-style direction and trying to get a GPT to add the standardized keywords where valid but omitted. GPT-4 surely can do that adequately.)
I suppose if you want a comparison, skimming my newest, the first entry right now is Sánchez-Izquierdo et al 2023 and that is an example of reformatting an abstract to add linebreaks which improve its readability:
This is not a complex abstract and far from the worst offender, but it's still harder to read than it needs to be.
It is written in the standard format, but the writing is ESL-awkward (the 'one of those' clause is either bad grammar or bad style), the order of points is a bit messy & confusing (defining the hazard ratio - usually not written in caps - before the point of the meta-analysis or what it's updating? horse/cart), and the line-wrapping does one no favors. Explicitly breaking it up into intro/method/results/conclusion makes it noticeably more readable.
(In addition, this shows some of the other tweaks I usually make: like being explicit about what 'Calvin' is, avoiding the highly misleading 'significance' language, avoiding unnecessary use of obsolete Roman numerals (newsflash, people: we have better, more compact, easier-to-read numbers - like '1' & '2'!), and linking fulltext rather than contemptuously making the reader fend for themselves even though one could so easily have linked it).
I'm one of the authors on the natural abstractions review you discuss and FWIW I basically agree with everything you say here. Thanks for the feedback!
We've shortened our abstract now:
We distill John Wentworth’s Natural Abstractions agenda by summarizing its key claims: the Natural Abstraction Hypothesis—many cognitive systems learn to use similar abstractions—and the Redundant Information Hypothesis—a particular mathematical description of natural abstractions. We also formalize proofs for several of its theoretical results. Finally, we critique the agenda’s progress to date, alignment relevance, and current research methodology.
At 62 words, it's still a bit longer than your final short version but almost 3x shorter than our original version.
Also want to highlight that I strongly agree having TL;DRs at all is good. (Or Intros were the first 1-2 paragraphs are a good TL;DR, like in your post here).
Papers typically have ginormous abstracts that should actually broken into multiple paragraphs.
I suspect you think this because papers are generally written for a specialist audience in mind. I skim many abstracts in my field a day to keep up to date with literature, and I think they're quite readable even though many are a couple hundred words long. This is because generally speaking authors are just matter-of-factly saying what they did and what they found; if you don't get tripped up on jargon there's really nothing difficult to comprehend. If anything, your 69 word version reads more like a typical abstract I see day-to-day than the more verbose version you had earlier; way too much filler to be a good abstract. For example, sentences like these ones rarely show up in abstracts:
This post summarizes and reviews the key claims of said agenda, its relationship to prior work, as well as its results to date. Our hope is to make it easier for newcomers to get up to speed on natural abstractions, as well as to spur a discussion about future research priorities.
Or, put more bluntly, papers really just aren't textbooks or press articles. They are written to be understandable to specialists in the field, and maybe adjacent fields (a PRL paper would be written to address all physicists, for example), but there's simply no effort made towards making them easy to understand for others. Look at what I consider to be a fairly typical abstract: https://arxiv.org/abs/2101.05078
It's really just 'We designed A. It works like this. We describe A and associated subsystems in detail in the paper. We characterise A by doing B, C, D, and E. The performance agrees with simulation." There are bad abstracts everywhere, of course, but I disagree that they're the norm. Many abstracts are quite reasonable, and effectively just say 'Here's what we did, and here's what we found'.
I buy that people who read abstracts all day get better at reading them, but I'm... pretty sure they're just kinda objectively badly formatted and this'd at least save time learning to scan it.
Like looking at the one you just linked
The ATLAS Fast TracKer (FTK) was designed to provide full tracking for the ATLAS high-level trigger by using pattern recognition based on Associative Memory (AM) chips and fitting in high-speed field programmable gate arrays. The tracks found by the FTK are based on inputs from all modules of the pixel and silicon microstrip trackers. The as-built FTK system and components are described, as is the online software used to control them while running in the ATLAS data acquisition system. Also described is the simulation of the FTK hardware and the optimization of the AM pattern banks. An optimization for long-lived particles with large impact parameter values is included. A test of the FTK system with the data playback facility that allowed the FTK to be commissioned during the shutdown between Run 2 and Run 3 of the LHC is reported. The resulting tracks from part of the FTK system covering a limited - region of the detector are compared with the output from the FTK simulation. It is shown that FTK performance is in good agreement with the simulation.
Would you really rather read that than:
The ATLAS Fast TracKer (FTK) was designed to provide full tracking for the ATLAS high-level trigger by using pattern recognition based on Associative Memory (AM) chips and fitting in high-speed field programmable gate arrays. The tracks found by the FTK are based on inputs from all modules of the pixel and silicon microstrip trackers.
The as-built FTK system and components are described, as is the online software used to control them while running in the ATLAS data acquisition system. Also described is the simulation of the FTK hardware and the optimization of the AM pattern banks. An optimization for long-lived particles with large impact parameter values is included. A test of the FTK system with the data playback facility that allowed the FTK to be commissioned during the shutdown between Run 2 and Run 3 of the LHC is reported.
The resulting tracks from part of the FTK system covering a limited - region of the detector are compared with the output from the FTK simulation. It is shown that FTK performance is in good agreement with the simulation.
I think once you think about breaking it into paragraphs, there are further optimizations that are pretty obvious (like, the middle paragraph reads like a bunch of bullet-points and would probably be easier to parse in that format).
I predict this'd be at least somewhat good for the specialists who are the primary audience for the thing, as well as "I think it's dumb for papers to only be legible to other specialists. Don't dumb things down for the masses obviously, but, like, do some basic readability passes so that people trying to get up-to-speed on a field have an easier time".
I genuinely don't see a difference either way, except the second one takes up more space. This is because, like I said, the abstract is just a simple list of things that are covered, things they did, and things they found. You can put it in basically any format, and as long as it's a field you're familiar with so your eyes don't glaze over from the jargon and acronyms, it really doesn't make a difference.
Or, put differently, there's essentially zero cognitive load to reading something like this because it just reads like a grocery list to me.
Regarding the latter:
I think it's dumb for papers to only be legible to other specialists. Don't dumb things down for the masses obviously, but, like, do some basic readability passes so that people trying to get up-to-speed on a field have an easier time
I generally agree. The problem isn't so much that scientists aren't trying. Science communication is quite hard, and to be quite honest scientists are often not great writers simply because it takes a lot of time and training to become a good writer, and a lifetime is only 80 years. You have to recognise that scientists generally try quite hard to make papers readable, they/we are just often shitty writers and often are even non-native speakers (I am a native speaker, though of course internationally most scientists aren't). There are strong incentives to make papers readable since if they aren't readable they won't get, well, read, and you want those citations.
The reality I think is if you have a stronger focus on good writing, you end up with a reduced focus on science, because the incentives are already aligned quite strongly for good writing.
I predict most people will have an easier time reading the second one that the first one, holding their jargon-familiarity constant. (the jargon basically isn't at all a crux for me at all)
(I bet if we arranged some kind of reading comprehension test you would turn out to do better at reading-comprehension for paragraph-broken abstracts vs single-block abstracts. I'd bet this at like 70% confidence for you-specifically, and... like 97% confidence for most college-educated people)
A few reasons I expect this to be true (other than just generalizing from my example and hearing a bunch of people complain about Big Blocks of Text)
Keeping track of where you are in the text.
If you're reading a long block of text, and then get distracted for any reason, you have to relocate where you left off to keep reading. A long block of text doesn't give you any hand-holds for doing that.
Pausing and digesting
I (and I think most people) can only digest so much information at once. Paragraph breaks are a way for the author to signal "here is a place you might want to pause briefly and consolidate your thoughts slightly before moving on."
The paragraph-break is a both a signal that "now is maybe a time to do that", and it also helps you avoid losing your place after doing so (see previous section)
Skimming
Often when I start reading a paragraph, I'm like "okay, I roughly get this. I don't really need to fully absorb this info, I want to move on to the next bit." This could be either because I'm hunting for a specific set of information, or because I'm just trying to build up a high-level understanding of what the text is saying before reading it thoroughly. Paragraphs give me some hand-holds for skimming, because they typically group information in a sensible way.
In the example you link, I think there's basically with sections of text, one of which saying overall what the topic is, one of which saying "what things do we describe in our paper", and one roughly describing what the overall results of the paper was. Having it separate paragraphs helps me, say, skip the results summary if I've already gotten a sense for what the overall paper was about.
Sure, it could easily be that I'm used to it, and so it's no problem for me. It's hard to judge this kind of thing since at some level it's very subjective and quite contingent on what kind of text you're used to reading.
When I wrote my thesis my abstract was broken into 4 paragraphs. The examiners suggested making it all one paragraph because "an abstract should be just one paragraph". But the university template required the abstract to have a page to itself, and I thought the page breaks helped so kept them. Arguably the abstract could have been shorter, but for a thesis like document its harder, because a thesis (in practice) is kind of a mash of different things you did over several years crammed together, so it doesn't have "a main point".
I would add an option to use GPT-4 API to show a post summary, offloading it from a human. For the above abstract the bot suggests the following:
The text is about John Wentworth's Natural Abstraction agenda. This is an effort to understand and recover natural abstractions in realistic environments. The post provides a summary and review of the agenda, including its relationship to prior work and results. The goal is to help people understand natural abstractions and discuss future research priorities.
The post summarizes the intuition behind the agenda and relates it to previous work in various fields. It then lists key claims, including the Natural Abstraction Hypothesis and redundant information abstractions. The post also includes mathematical proofs for some of the key results in the redundant information abstraction line of work.
However, the post also critiques the agenda and its progress to date. It notes gaps in the theoretical framework and challenges its relevance to alignment. Additionally, it critiques John's current research methodology.
And in the year 2031 of the Common Era, all abstracts on LessWrong are suddenly replaced with the line:
Hello world.
Later this would be judged to mark the beginning of Year 1 of the Silicon Dominate.
I agree, it's time for LessWrong to start integrating ChatGPT (go devs!). There's a wait list to access the GPT-4 API, although maybe LessWrong can get themselves to the front of the line faster. GPT-3.5 turbo might suffice.
IMO ~170 words is a decent length for a well-written abstract (well maybe ~150 is better), and the problem is that abstracts are often badly written. Steve Easterbrook has a great guide on writing scientific abstracts; here's his example template which I think flows nicely:
(1) In widgetology, it’s long been understood that you have to glomp the widgets before you can squiffle them. (2) But there is still no known general method to determine when they’ve been sufficiently glomped. (3) The literature describes several specialist techniques that measure how wizzled or how whomped the widgets have become during glomping, but all of these involve slowing down the glomping, and thus risking a fracturing of the widgets. (4) In this thesis, we introduce a new glomping technique, which we call googa-glomping, that allows direct measurement of whifflization, a superior metric for assessing squiffle-readiness. (5) We describe a series of experiments on each of the five major types of widget, and show that in each case, googa-glomping runs faster than competing techniques, and produces glomped widgets that are perfect for squiffling. (6) We expect this new approach to dramatically reduce the cost of squiffled widgets without any loss of quality, and hence make mass production viable.
I still claim this should be three paragraphs. In this breaking at section 4 and section 6 seems to carve it at reasonable joints.
Yes, with one linebreak, I'd put it at (4). With 2 linebreaks, I'd put it at 4+5. With 3 breaks, 4/5/6. (Giving the full standard format: introduction/background, method, results, conclusion.) If I were annotating that, I would go with 3 breaks.
I wouldn't want to do a 4th break, and break up 1-3 at all, unless (3) was unusually long and complex and dug into the specialist techniques more than usual so there really was a sort of 'meaningless super universal background of the sort of since-the-dawn-of-time-man-has-yearned-to-x' vs 'ok real talk time, you do X/Y/Z but they all suck for A/B/C reasons; got it? now here's what you actually need to do:' genuine background split making it hard to distinguish where the waffle ends and the meat begins.
(Writing this because it might help me with my actual job one day)
John Wentworth’s Natural Abstraction agenda aims to understand and recover “natural” abstractions in realistic environments. We introduce the conceptual framework around it and review its key claims, relationship to prior work in a number of fields, and results to date. Of particular interest are the Natural Abstraction Hypothesis and Wensworth's specific formulation of natural abstractions (here called "redundant information abstractions"). We re-define and draw mathematical proofs for some of the amassed key results. We then discuss the agenda to date including the gaps in theoretical framework and challenge its methodology and relevance to alignment research.
It looks to me like academia figured out (correctly) that it's useful for papers to have an abstract that makes it easy to tell-at-a-glance what a paper is about. They also figured out that abstract should be about a paragraph. Then people goodharted on "what paragraph means", trying to cram too much information in one block of text. Papers typically have ginormous abstracts that should actually broken into multiple paragraphs.
I think LessWrong posts should probably have more abstracts, but I want them to be nice easy-to-read abstracts, not worst-of-all-worlds-goodharted-paragraph abstracts. Either admit that you've written multiple paragraphs and break it up accordingly, or actually streamline it into one real paragraph.
Sorry to pick on the authors of this particular post, but my motivating example today was bumping into the abstract for the Natural Abstractions: Key claims, Theorems, and Critiques. It's a good post, it's opening summary just happened to be written in an academic-ish style that exemplified the problem. It opens with:
There are 179 words. They blur together, I have a very hard time parsing it. If this were anything other than an abstract I expect you'd naturally write it in about 3 paragraphs:
If I try to streamline this without losing info, it's still hard to get it into something less than 3 paragraphs (113 words)
If I'm letting myself throw out significant information, I can get it down to 69 words. I'm not thrilled with this as a paragraph, but my eyes don't completely glaze over it.
I think what I actually want in most cases is a very short abstract (1 long sentence or 3 short sentences), followed by a few paragraphs.
I do notice that once you start letting the abstract be multiple paragraphs, it ends up not that different from the introduction to the post.
For comparison:
Honestly I'm not sure the abstract really adds that much over this. This is 430 words. The original abstract is 179, about 42% as long. The parts of the abstract that nail down "literally what are all the things we included in this post" don't really seem to add much that I wouldn't get by skimming the bullet points in the intro. And it's much easier to read in the intro. (I also bet you could streamline the intro somewhat, which would further reduce the benefit of having an abstract in the first place)
Rather than copying academic abstract style, I'd rather people basically write good introductions, where the first paragraph helps you make a decision about whether to read the rest of intro, and the rest of the intro helps you decide whether to read the rest of the piece.
In this case, I'd maybe just replace the abstract with:
and then jump into the introduction, which covers the rest of the information.