Internet Research (with tangent on intelligence analysis and collapse)

by Arkanj3l 3 min read31st Jul 201343 comments

11


Want to save time? Skip down to "I'm looking to compile a thread on Internet Research"!

Opinionated Preamble:

There is a lot of high level thinking on Less Wrong, which is great. It's done wonders to structure and optimize my own decisions. I think the political and futurology-related issues that Less Wrong cover can sometimes get out of sync with the reality and injustices of events in the immediate world. There are comprehensive treatments of how medical science is failing, or how academia cannot give unbiased results, and this is the milieu of programmers and philosophers in the middle-to-upper-class of the planet. I at least believe that this circle of awareness can be expanded, even if it's treading into mind-killing territory. If anything I want to give people a near-mode sense of the stakes aside from x-risk: all in all the x-risk scenarios I've seen Less Wrong fear the most, kill humanity somewhat instantly. A slower descent into violence and poverty is to me much more horrifying, because I might have to live in it and I don't know how. In a matter of fact, I have no idea of how to predict it.

This is one reason why I'm drawn to the Intelligence Operations performed by the military and crime units, among other things. Intelligence product delivery is about raw and immediate *fact*, and there is a lot of it. The problems featured in IntelOps are one of the few things rationality is good for - highly uncertain scenarios with one-off executions and messy or noisy feedback. Facts get lost in translation as messages are passed through, and of course the feeding and receiving fake facts are all a part of the job - but nevertheless, knowing *everything* *everywhere* is in the job description, and some form of rationality became a necessity.

It gets ugly. The demand for these kinds of skills often lie in industries that are highly competitive, violent, and illegal. I believe that once a close look is taken on how force and power is applied in practice then there isn't any pretending anymore that human evils are an accident.

Open Source Intelligence, or "OSINT", is the mining of data and facts from public information databases, news articles, codebases, journals. Although the amount of classified data dwarfs the unclassified, the size and scope of the unclassified is responsible for a majority of intelligence reports - and thus is involved in the great majority of executive decisions made by government entities. It's worth giving some thought as to how much that we know, that they do too. As illustrated in this expose, the processing of OSINT is a great big chunk of what modern intelligence is about aside from many other things. I think understanding how rationality as developed on Less Wrong can contribute to better IntelOps, and how IntelOps can feed the rationality community, would be awesome, but that's a post for another time.

--

The Show

Through my investigations into IntelOps I've noticed the emphasis on search. Good search.

I'm looking to compile a thread on Internet Research. I'm wondering if there is any wisdom on Less Wrong that can be taken advantage of here on how to become more effective searchers.  Here are some questions that could be answered specifically, but they are just guidelines - feel free to voice associated thoughts, we're exploring here.

  • Before actually going out and searching, what would be the most effective way of drafting and optimizing a collection plan? Are there any formal optimization models that inform our distribution of time and attention? Exploration vs exploitation comes to mind, but it would be worth formulating something specific. I heard that the multi-armed bandit problem is solved?
  • Do you have any links or resources regarding more effective search?
  • Do you have any experiences regarding internet research that you can share? Any patterns that you've noticed that have made you more effective at searching?
  • What are examples of closed-source information that are low-hanging fruit in terms of access (e.g. academic journals)? What are possible strategies for acquiring closed source data (e.g. enrolling in small courses at universities, e-mailing researchers, cohesion via the law/Freedom of Information Act, social engineering etc)?
  • I would like to hear from SEOs and software developers on what their interpretation of semantic web technologies and how they are going to affect end-users. I am somewhat unfamiliar with the semantic web, but from my understanding information that could not be indexed is now indexed; and new ontologies will emerge as this information is mined. What should an end-user expect and what opportunities will there be that didn't exist in the current generation of search?

That should be enough to get started. Below are some links that I have found useful with respect to Internet Research.

--

Meta-Search Engines or Assisted Search:

  • Carrot - http://search.carrot2.org/stable/search (concept clustering search engine)

Summarizers:

  • TextTeaser - http://www.textteaser.com/ - SOURCE: https://github.com/MojoJolo/textteaser
  • Copernic (Commercial Summarizing Feed Program) - http://www.copernic.com/en/products/summarizer/

Bots/Collectors/Automatic Filters:

  • Google Alerts - http://www.google.ca/alerts
  • Change Detection - http://www.changedetection.com/

Compilations and Directories:

  • How to Perform Industry Research - http://businesslibrary.uflib.ufl.edu/industryresearch

Guides:

  • From UC Berkeley - http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html 
  • "How to Solve Impossible Problems" - http://www.johntedesco.net/blog/2012/06/21/how-to-solve-impossible-problems-daniel-russells-awesome-google-search-techniques/ 
  • The NSA Guide to "Untangling the Web"; Internet Research - http://www.nsa.gov/public_info/_files/Untangling_the_Web.pdf [C. 2007]
  • Fravia's Learnings on searching (value in essays) - http://search.lores.eu/indexo.htm [C. 1990s - 2009]
  • "Power Searching With Google" Course - http://www.powersearchingwithgoogle.com/

Practice:

I don't really care how you use this information, but I hope I've jogged some thinking of why it could be important.

11