Wikitag Contributions

Comments

Sorted by

Thanks for feedback. 

I’ll probably do the title and trim the snippets. 

One way of getting a quote would to be to do LLM inference and generate it from the text chunk. Would this help?

Update: HTTPS issue fixed. Should work now.

booksearch.samuelshadrach.com

Books Search for Researchers

Thanks for your patience. I'd be happy to receive any feedback. Negative feedback especially.

Update: HTTPS should work now

Search engine for books

http://booksearch.samuelshadrach.com

Aimed at researchers

 

Technical details (you can skip this if you want):

Dataset size: libgen 65 TB, (of which) unique english epubs 6 TB, (of which) plaintext 300 GB, (from which) embeddings 2 TB, (hosted on) 256+32 GB CPU RAM

Did not do LLM inference after embedding search step because human researchers are still smarter than LLMs as of 2025-03. This tool is meant for increasing quality for deep research, not for saving research time.

Main difficulty faced during project - disk throughput is a bottleneck, and popular languages like nodejs and python tend to have memory leak when dealing with large datasets. Most of my repo is in bash and perl. Scaling up this project further will require a way to increase disk throughput beyond what mdadm on a single machine allows. Having increased funds would've also helped me completed this project sooner. It took maybe 6 months part-time, could've been less.

Okay!

I'm not universally arguing against all technology. I'm not even saying that an arms race means this tech is not worth pursuing, just be aware you might be starting an arms race.

Intelligence-enhancing technologies (like superintelligent AI, connectome-mapping for whole brain emulation, human genetic engineering for IQ) are worth studying in a separate bracket IMO because a very small differential in intelligence leads to a very large differential in power (offensive and defensive, scientific and business and political, basically every kind of power).

@TsviBT I don't know if you were the one who downvoted my comment, but yeah I don't think you've engaged with the strongest version (steelman?) of my critique. Laws (including laws promoting genomic liberty) don't carry the same weight during a cold war as they do during peacetime. Incentives shape culture, culture shapes laws.

And the incentives change significantly when a technology upsets the fundamental balance of power between the world's superpowers.

Superhumans that are actually better than you at making money will eventually be obvious. Yes, there may be some lead time obtainable before everyone understands, but I expect it will only be a few years at maximum.

Yes it’s possible we end up in a world where the US govt is basically competing with its own shadow yet again. US startup builds some tech, it gets copied 6 months later by non-US startup, US startup feels pressure to move faster as a result and deploys next tech, the next tech too gets copied, etc etc. 

I’m not saying this will definitely happen, but there’s a bunch of incentives pushing in this direction. 

Load More