Improved Security to Prevent Hacker-AI and Digital Ghosts

Erland Wittkotter

In a previous post on “Hacker-AI and Digital Ghosts – Pre-ASI”. I discussed multiple problems contributing to our vulnerabilities – facilitating the development of an AI application (Hacker-AI) that could threaten our freedom, privacy, and quality of life.

I hypothesized that a Hacker-AI/Digital Ghost finds and uses (new) vulnerabilities in existing software or modifies software to create backdoors or sleeper code. The core hypothesis was that Hacker AI is flexible enough to find ways to manipulate every system (i.e., any OS, CPU, or device type). With Reverse Code Engineering, it could not just steal crypto-keys but also modify other software and try to avoid detection by manipulating the OS and removing all data traces it inadvertently generates (for becoming a Digital Ghost). Finally, this Hacker-AI could make itself irremovable on every device it visited.

My previous post also suggested that some governments are highly motivated to create Hacker-AI first because an irremovable AI on all devices could repel late-coming AIs and become the only Hacker-AI. This Hacker AI could make conventional war-fighting impossible, mainly by disrupting the supplies and logistics of nations that were taken by surprise.

A Hacker-AI cannot be prevented by governmental regulation or international treaties/diplomacy. We must have a reliable and fast technical solution to counter any side’s first (cyber-)strike.

The following 10 problems, issues, or vulnerabilities do not follow a particular order. It is assumed that preventing digital ghosts and irremovable Hacker-AI implies confronting these issues/problems technically.

10 issues or security-related problems

The issues introduced in the mentioned post are briefly repeated here:

(1) “Software is invisible”: Software can only be seen/validated indirectly; it requires trust in the entire ecosystem. Rephrased: “Software must be trusted (blindly); someone else will (hopefully) see the problems.”

(2) “Software is covertly modifiable” Emphasis is on “covert”; detection of modification is not reliable. There are too many blindspots where covert (temporary) modifications could happen.

(3) “Every tool/component could be compromised”. We must question our trust in whatever we use. Rephrased: “Any software can be dangerous.

(4) “Attackers know more (about vulnerabilities)”. Attackers know more about the weak spot they use, while defenders need to know (and fix) all. Defenders are often surprised by the used attack (details).

(5) “Attacker chooses methods and timing”. Attackers have the first-mover advantage. Defenders must be prepared anytime for anything that an attacker can do.

(6) “Attackers can adapt to (known) methods of detection”. Attackers don’t want to be detected. They could change their appearance, remove revealing data, or modify attack detection tools to remain undetected.

(7) “Secrets are unreliable in defense”. Defenders should not build their defenses on secrets they assume attackers don’t have. In particular, secrets known by humans could also be known by attackers.

(8) “Software output could be faked (late)”. We communicate with software via its output, but we must be careful not to trust the output we want to hear/see.

(9) “Complexity is an enemy of security”. If security is part of a complex system, we should not trust it.

(10) “Crypto-Keys/Units are unprotected” Protecting (locally stored) crypto keys from being stolen or preventing misuse of crypto-devices is considered a problem, but not a critical core problem in cryptography.

All these issues contributed to the feasibility of what I was describing as an irremovable, ghost-like Hacker-AI. To prevent this Hacker-AI, we must solve the above issues or problems and turn them in favor of cyber-defenders. Unfortunately, cybersecurity or cryptography did not contribute enough solutions; it seems that it has given up on providing help for these issues.

Still, this post argues that we can create technologies that surprisingly solve many of the above issues or make them irrelevant. In the following, I will give a quick summary (I) of what the solution could look like. Next, I will list the used solution components (II), and return to the above list of issues (III) to discuss the solutions in more detail and how they will deal with a Hacker-AI.

(I) Solution Summary

Software is treated as cloned publications with unique signatures (generated hashcodes of files) for contained instruction data. Any change to their instructions leads to a new hashcode. Issues 1 and 2 are addressed with hashcoding: software is made identifiable, and apps with unknown hashcodes are rejected and reported. Any covert modification is detected when a file is loaded into RAM. Software could still be compromised (issue 3), but the software must have had malicious flaws from the beginning. Due to the use of hashcodes, we could make developers accountable for malware; they would risk being exposed as bad actors.

Also, we can’t change that an attacker knows more about vulnerabilities (issue 4), but we can prevent that their information advantage becomes operational (via covert use of exploits). By establishing accountability for all used software, we remove the incentive to choose the time or method of an attack (issue 5). Also, we strictly separate regular from the security domain. Attackers can’t avoid being detected by independent security methods. Also, removing certain data traces is outside of attackers’ reach (issue 6).

All used secrets are kept from humans. Security-related data are only processed internally by systems that can’t leak them (issue 7). Output worth being faked will generate independent log data helping to reveal (later) how that fake was done (issue 8).

Security should not be allowed on complex CPUs administered by the main OS. Security should be (preferably physically) separated and run in a maximally simplified environment (issue 9). Also, all keys are prevented from being disclosed in cleartext. Any key showing up in cleartext in the main CPU is considered compromised, and the system responsible for the leak is flagged as unsuitable for en-/decryption (issue 10).

(II) Solutions Components

Current cybersecurity is overwhelmed with the challenges it has to face. It deals with many different threat scenarios while users demand flexibility. Additionally, security gets a lot of lip services but loses (often) in trade-offs. Unfortunately, there is not a lot of critical self-reflection in cybersecurity. Business is booming, and critique is deflected and turned into blaming others.

Our goal should be proactive, preventative, independent, and redundant cyber-security. Some security measures have to be low-level and unnoticeable to regular users. We should aim for redundancy of independent security, protection, or detection methods that can be called “security overkill” (but is still unnoticeable to users to be acceptable). Many architectural mistakes were made around security. Therefore, solutions must fix them via software updates and hardware retrofits; otherwise, we would still have a huge problem with our dependence on (unprotected) legacy devices.

Unfortunately, the mentioned problems can not be solved with a single solution. But the proposed solutions are easy enough to deploy and acceptable within the existing IT ecosystem. The most significant change is related to (A) self-regulation among developers. They need to be accountable for their developed apps, which are hashcoded (B) and thereby made easily identifiable, independent of their names. Separation (C): regular operations should not impact security operations. (D) Keys are protected from being stolen, and (E) Crypto devices or security components are protected from misuse or modification. And finally, (F) low-level security must be highly automated, i.e., we must react to global Hacker-AI threats instantaneously. Additional details on these methods will be published on https://nogostar.com/articles.

(A) Making Developers accountable

Using software is not so different from taking medicine prescribed by medical doctors or legal advice from lawyers. These occupations and others in financial services have self-regulatory rules protecting the public from rogue pretenders claiming professional reputations. We propose registering code/software via their unique hashcodes, which also creates indirectly enforceable accountability for intentionally inserting malicious code (snippets). Using hidden vulnerabilities or exploits by developers is risky, reputation-damaging behavior. The transparency of registered software creates both: evidence and deterrence.

Manufacturers/developers share (voluntarily) additional info on software’s relevant capabilities and third-party components, which could enable us to detect suspicious activities and notify developers. Developers who tell us about security-related features (done via checklists) indicate that they are aware of features that could cause harm/damage to users. It is expected that developers take these disclosures seriously because their reputation depends on that.

Additionally, developers are not called out for making mistakes or creating unintentional vulnerabilities. Reputation is based on comprehensive, truthful disclosure and responsive, cooperative behavior.

(B) White-/Gray-/Blacklisted Hashcodes

Every software file (including software library or script) is hashcoded and managed within the context of a software publication. Hashcodes are confirmed/validated (as known) via server requests or locally flagged as suspicious. Also, hashcodes could be linked to known software packages, and their validation can be derived from statistics (i.e., overwhelming votes that a component is part of the original publication). This makes the hashcode acceptable, i.e., graylisted. Hashcodes for which we have voluntarily shared registration data from manufacturers are then called whitelisted. Validated hashcode and additional server data related to the software are cached locally. They are (regularly) used to detect deviations from known software details for instant reporting.

White-/graylisted executables are allowed and blacklisted or unknown code is not accepted in RAM or by the CPU. Therefore, only known and trusted software could (theoretically) exploit vulnerabilities, which would have severe consequences.

(C) Separating Security-related from Regular computations

All executable software and their data are in a single place - RAM. The OS manages access to data via security methods stored in RAM. Unfortunately, security-related software can not sufficiently be protected against regular software when it shares the same space. Unfortunately, OS providers claiming their system is secure cannot be believed. It is extremely difficult to prove that systems are secure. Due to the complexity of CPU, OS, and its interplay, we should leave these systems alone, i.e., ignore them and leave them as they are.

Instead, we create another layer for security-related features, i.e., controlled access to storage and network features. This layer can later be extended to some aspects of in-/output as well. All security-related requests are accepted (pass through) or rejected by this additional security layer. The OS can easily handle situations when executables can’t be loaded or files cannot be overwritten or modified.

The security layer uses cached hashcodes and additional data (voluntarily provided by developers) for its activities. It also keeps software up-to-date by facilitating automatic updates because all software must be current for security reasons. We can visualize the security layer as watchdog components that can’t be bypassed, like bridges within the data bus to the storage or network components.

For the software-only solution, watchdogs are inserted in an independent hypervisor, i.e., below all OS activities. The watchdog activities in this hypervisor are the same as for the separate security hardware. The only problem is that watchdogs require unmanipulated hashcode data, i.e., it requires reliable encryption in which crypto-keys cannot be stolen. Software-based en-/decryption on the CPU can never be trusted. That’s why advanced Hacker-AI can breach software-only security because it can steal (cleartext) keys.

(D) No Crypto-Keys in Cleartext.

To protect keys’ secrecy, we must demand that no crypto-key appears in cleartext in RAM or the main CPU (ever). That means we must have separate/independent encryption and decryption units (EDU) with protected key storages/keysafes that manage all crypto-related processes. Even the public key from PKI, currently announced via unprotected certificate files, should not appear in cleartext within an improved (intentionally incompatible) PKI+. Instead, all keys should be referred to via their computed hashcodes. Additionally, public keys are provided (encrypted) via Secure Key Repositories (SKR) to hardware-based EDU only. A protected manufacturing process step provides the public (access) keys to SKR to EDUs.

Additionally, there will be multiple methods (independent counting of new transactions, exchanged bytes, hashcodes on exchanged data, etc.) to detect if secret keys were misused to gain access to session keys. The goal is to have methods to detect manipulations of exchanged data from a man-in-the-middle attack. Key-exchange protocols will help to detect software simulations of EDUs.

If keys are suspected to be compromised, they are flagged and turned into a honey pot. Keys are automatically replaced without letting anyone know. Even if key safes are damaged, they could be restored using cleartext hashcode (partials) from keys. Openly shared information on keys are insufficient to order/request keys even if the attacker would know the repository/SKR access keys.

(E) Interguarding Multi-Unit-Security

The current component architecture suggests that best practice is to have the least number of (hard- or software) components for a generic purpose; ideally, it is one. Applied to, e.g., encryption, one component is certainly sufficient for all crypto applications. There is only one TPM (Trusted Platform Module) or usually a single crypto card with hardware storage for private keys. However, if an adversary can access/interface with these components, it could covertly impersonate the owner/user and misuse this component without having resistance or independent detection from other components (known as the API-Problem).

But if we have multiple components, each having an autonomous and independent OS specialized in specific tasks, they could watch/inter-guard their neighbor instances for covert changes or misuses. A single compromised instance could be prevented from reporting that it was manipulated, but other instances could be enabled to detect/report these anomalies automatically. These events (like tripwires) could trigger hidden inspections on how this attack was done and which components must be watched or considered compromised or insufficiently disclosed by its original developers.

(F) Automated Security

Humans are weak links in security. Human involvement in security is dangerous because humans could be deceived, blackmailed, or turned into traitors. Frankly, humans should have no business being involved in low-level security. Secrets must be protected from humans, with no exceptions for special roles. When security, protection, and detection processes are automated, it is easier and more reliable to protect them from covert modifications. Any attack on automated security is a severe anomaly that must be investigated (automatically) to determine patterns or possible vulnerabilities, or undisclosed capabilities.

Full automation is required, mainly because Hacker-AI attacks could happen quickly. Defender’s reaction speed will determine if a camouflaged Hacker-AI continues to be accepted or whether it is blacklisted at the last second.

(III) Discussion of Applying Solutions to Problems/Issues

The above security components are about uncovering covert security-relevant activities. Stopping them is accomplished by making hashcodes blacklisted. Known hashcodes are accepted by default. The assumption behind the proposed security is that covertness around (critical) operations could turn innocuous activities into attacks. Therefore, the proposed security pushes for transparency.

If overt activities follow an attack pattern in which a user could be harmed or damaged, then they can likely be handled with business rules. Security rules can be optimized and adapted over time, demanding that users confirm operations (incl. settings) or are informed.

In the following, we discuss how the solution components affect the discussed problems. We restate the problems in light of the new approach to cybersecurity.

(1) “Software is invisible, but becomes identifiable”. We allow only white or graylisted software in RAM. Software always comes from sources that could be made accountable for malware or for using exploits. By using hashcodes before loading or using software, we know that we can reliably gather additional information about executed software, which makes it more transparent and trustworthy. A digital ghost can’t avoid hashcoding. Data about ghosts are extracted by code in the security domain; this is unmodifiable by ghosts’ regular code. Systematic hashcoding and separation of regular from security activities prevent digital ghosts.

(2) “Software is (not) covertly modifiable”. If hashcodes from executables are checked before every use, covert modifications are detected immediately. Overt and legitimate modifications/updates are not obstructed. If decisions on accepting hashcodes in RAM are done via the main CPU/OS, then there is the possibility that Hacker-AI could have found a way into RAM. Other compromised apps could follow. The problem of undue influence by CPU is solved by having hashcode generation and security decisions related to hashcode status done in physically separated components operated with OS-independent software. These components could be part of hardware components or being put as a retrofittable bridge component within the data bus.

A Hacker-AI could still try to manipulate development tools and get backdoors or sleeper code in published software. Dev tools and all infected software are fixed as soon as this is discovered.

(3) “Every tool/component could (still) be compromised, but we can stop it and know who did it”. Including malicious features is an intentional decision of software developers or the result of compromised third party-components or used tools. Once corrupt tools/components are found, we can fix them reliably. All late modifications are detectable via hashcodes. Software vulnerabilities are of no concern as they require detected or rejected exploits.

(4) “Attackers know more (about vulnerabilities) – but they won’t dare”. Because hashcoding detects modified software and exploitation of vulnerabilities, knowing more is not sufficient to gain an advantage. Also, to be successful, attackers must risk their anonymity and reputation, which is dangerous. The same applies to AI if they are the software provider/developer.

(5) “Attacker chooses methods and timing – but has no benefit from that”. Having a first-mover advantage is not enough. Defenders can create honey-pots or tripwires that attackers cannot systematically explore without the risk of being detected in probing security/detection methods, which must be done with white or gray-listed software.

(6) “Attackers can adapt to (known) methods of detection – but can’t bypass it”. Attacker changing their appearance creates a new unknown hashcode that is being rejected. Data that the separated security layer detects is unknown to attackers. Bypass detection is futile; the same applies to removing undetected data traces because attackers use regular processes without access to the security domain.

Digital ghosts could theoretically bypass updated software-only protection. But it is doubtful if they could bypass separate security hardware components watching each other for suspicious misuse or modifications.

(7) “Some Secrets are unreliable in defense – others can be made reliable”. Secret data, including operational status settings (i.e., is a security measure active or dormant), are made so secret that no human can know, even when trying hard. The secret of which data were generated or used in defense to detect anomalies is only shared when it becomes evidence. This intentional uncertainty can deter attackers from probing security covertly or systematically because it could be suspicious preparation for an attack.

(8) “Software output could still be faked (late), but we generate irrefutable evidence”. Some transactions are known to be harmful or critical for attacker preparation. With independent log data around these transactions generated within the security domain, their details are unknown to an attacker. We can deter attackers from using/exposing their attack tools because if they are used or tested once, defenders would know how the fake was done and be prepared or prevented the next time.

(9) “Complexity is an enemy of security – so we use/deploy simplified, dedicated systems for security”. We must have complex systems for performance optimization. But we could use separate components (simple, with tightly controlled updates) that only do security-related tasks. These components use only trusted standard operations – every deviation from the standard is easily detectable as an anomaly.

(10) “Crypto-Keys/Units are not unprotected – crypto-misuse is detectable”. Because keys are never allowed to be shown in cleartext, systems exporting keys are flagged. All devices of that type would then be excluded from receiving protected/secret keys. Additionally, compromised keys are replaced automatically without providing any hints that this has been done. All potentially compromised keys are used as honey-pots. Using protected keys by non-protected encryption/decryption hardware can be detected reliably. The misuse of crypto components is detectable because multiple security or crypto-key units watch each other if being misused. Covert misuse cannot happen. Still, misuse attempts could be traced back to attackers and their tool use. Hacker-AI stealing keys is detectable; consequences could immediately be mitigated.

“Hacker-AI”: The proposed security components could protect us against Hacker-AI’s four main capabilities: (a) extracting or creating vulnerabilities, (b) stealing or misusing encryption keys, (c) hiding as a digital ghost, i.e., circumventing detection methods, and (d) making itself irremovable on a device. Methods against the first three capabilities were already discussed above.

Only the irremovability (d) is an issue that was not addressed yet. If a Hacker-AI tries to make itself irremovable after the security components are deployed, then a Hacker-AI is likely rejected by one of the redundant security measures. Gaining sufficient information via probing relevant security is expensive for a Hacker-AI as it wastes/burns through many (then revealed) attack tools/methods.

However, what if a Hacker-AI is already on a device or even deployed on all IT devices? A global, persistent Hacker-AI, known to a small circle of people, is conceivable (I am not claiming this is likely). A Hacker-AI could try to sabotage new tool components and leave backdoors or sleeper code in security-component’s core software if it exists. However, in developing the security components, we must assume that this Hacker AI exists and is active, although it may not. This scenario is so difficult that it will be discussed in another post.

Still, if a Hacker-AI is already irremovable on devices, then is little chance of getting it removed with software-only security measures, even if we have the device in our direct access. We must assume Hacker-AI exploits every possible loophole to stay ahead of being removed.

It is assumed that non-bypassable hardware (security) components for storage and network are sufficient to gain control over resources required by a Hacker-AI. However, the most important solution component will be hardware-based crypto-key secrecy and multi-unit crypto-/security protection against misuse. If these components are available, then we may have a fighting chance to get control over RAM and CPU.

Conclusion

If we dare to tackle current cyber-security problems, we must do this comprehensively because the weakest components within security will waste all progress. The proposed security components were designed to make security as simple as possible, but not simpler. Additionally, we need redundancy and tools to improve our ability to detect threats automatically and reliably to address them as early as possible.

We have shown that danger from Hacker-AI and Digital Ghosts can be solved technically.

[-]Frank Haber1y10

I was concerned about the same -- AI used in hacking; it would be a nightmare. As a cyber-sec pro, I kept this scenario to myself because I didn’t want to call “fire” - In particular, if you didn’t see any solution - we joked about it because what can we do?

AI merged with low-level RCE, which is a 100x on my concern; this would be insane -- INSANE

I read through the proposed solutions -- they sound good on paper - but could we trust any security if we assume this kind of AI is already out there?

[-]Erland Wittkotter1y10

If we have Hacker-AI on a developer machine, we have a huge problem: Hacker-AI could sabotage basic security implementations via hidden backdoors or weak compilation tools. However, I am not giving up on seeking solutions to convince humans/experts that there are no hidden/late modifications or interferences from Hacker-AI. Trust must be earned; this could only come from constantly scrutinized source code and development tools.