LessWrong has a particularly high bar for content from new users and this contribution doesn't quite meet the bar.
Read full explanation
"The dot-com boom may be over, but demand for technology remains strong"
-Bill Gates, 2001
Bill Gates accurately predicted the resilience of internet demand with 6 billion persons utilizing the internet today. But what is the state of the supply feeding that demand?
TL;DR
Measuring how data sources affect model behavior enables early detection of contamination, manipulation, and degradation. This replaces trust-based assumptions with verifiable signals before damage moves downstream.
Rise of bots
In 2008 I was a teenager playing World of Warcraft and was intrigued by the capabilities of botters and gold farmers. The dangers of engaging with bots for a player were losing account access to scammers or getting banned. Albeit the risks didn’t convince me at that age, there was something inexplainable about the authenticity of ownership and sense of belonging in cooperative play that mattered more. Legislation ended up forcing Blizzard and bot-providers to make changes and the integrity of one of the world’s most popular games in history was restored. In hindsight a negligible case when the scope is limited to a fraction of the population and its impact is limited to virtual gold, silver, and copper.
Cloud-induced exponential risk
Fast-forward to 2015. The cloud revolution is in full swing and vendor-locking is but a bottom-line concern. Cloud technology allowed organizations to negotiate at a strategic level, assuming responsible rollouts and best practices. The result was synergized identity access management, hosting, system architecture, monitoring, and third-party marketplaces allowing anyone to bring their ideas to life. Abstraction and lowering barriers to entry for software development blinded its adopters, the suppliers of internet content, to the imposed risk. In response, the industry came up with roles to fill the gap such as Cloud Strategist, Cloud Transformation Lead, and Cloud Governance. Nevertheless cybercrime costs the world $1 trillion per year with the cybersecurity market totalling “only” $270 billion.
Societal Impact
Progress in privacy is being undermined by an increasingly unstable world where the world’s largest powers openly perform cyberattacks, espionage, and social erosion, as well as Big Tech lobbying with little to no pushback from governments. Meta reduced its efforts combating false information. Deloitte and the US government were caught delivering hallucinated evidence.
Masqueraded by shaky productivity claims, synthetic data is increasingly entering the systems that build and oversee our systems. OpenAI’s technological breakthrough cannot be held accountable for the lack of litigation and responsibility downstream by Big Tech. In combination with a growing interdependent technological surface, this raises the question:
Can the world afford not to segregate organic supply from the artificial?
Autogenic outputs are making their way into models while the line between truth and non-truth is increasingly blurred. There needs to be a way to understand what impact data has on the capabilities of the automated tools of today and tomorrow, to ensure reliability and bolster our society in an era of unprecedented technological development.
Intervention Mechanism
The proposed intervention is to make influence of data auditable and measurable for a model's intended purpose. Instead of a trust-based approach to accountability, organizations can independently detect contamination and adversarial manipulation. By starting with visibility in data impact, teams developing pilot projects are able to make faster decisions on data selection and training priorities. This entry tackles the trend of projects’ lengthy pilot phase by shortening exploration. In the medium term, influence aware knowledge distillation can be applied as a service offering to decouple quality and security assessments from model access. Thereby preserving the privacy of private models in the evaluation. In the long term, it can serve as a shield against model collapse by verifying data integrity at scale.
"The dot-com boom may be over, but demand for technology remains strong"
-Bill Gates, 2001
Bill Gates accurately predicted the resilience of internet demand with 6 billion persons utilizing the internet today. But what is the state of the supply feeding that demand?
TL;DR
Measuring how data sources affect model behavior enables early detection of contamination, manipulation, and degradation. This replaces trust-based assumptions with verifiable signals before damage moves downstream.
Rise of bots
In 2008 I was a teenager playing World of Warcraft and was intrigued by the capabilities of botters and gold farmers. The dangers of engaging with bots for a player were losing account access to scammers or getting banned. Albeit the risks didn’t convince me at that age, there was something inexplainable about the authenticity of ownership and sense of belonging in cooperative play that mattered more. Legislation ended up forcing Blizzard and bot-providers to make changes and the integrity of one of the world’s most popular games in history was restored. In hindsight a negligible case when the scope is limited to a fraction of the population and its impact is limited to virtual gold, silver, and copper.
Cloud-induced exponential risk
Fast-forward to 2015. The cloud revolution is in full swing and vendor-locking is but a bottom-line concern. Cloud technology allowed organizations to negotiate at a strategic level, assuming responsible rollouts and best practices. The result was synergized identity access management, hosting, system architecture, monitoring, and third-party marketplaces allowing anyone to bring their ideas to life. Abstraction and lowering barriers to entry for software development blinded its adopters, the suppliers of internet content, to the imposed risk. In response, the industry came up with roles to fill the gap such as Cloud Strategist, Cloud Transformation Lead, and Cloud Governance. Nevertheless cybercrime costs the world $1 trillion per year with the cybersecurity market totalling “only” $270 billion.
Societal Impact
Progress in privacy is being undermined by an increasingly unstable world where the world’s largest powers openly perform cyberattacks, espionage, and social erosion, as well as Big Tech lobbying with little to no pushback from governments. Meta reduced its efforts combating false information. Deloitte and the US government were caught delivering hallucinated evidence.
Masqueraded by shaky productivity claims, synthetic data is increasingly entering the systems that build and oversee our systems. OpenAI’s technological breakthrough cannot be held accountable for the lack of litigation and responsibility downstream by Big Tech. In combination with a growing interdependent technological surface, this raises the question:
Can the world afford not to segregate organic supply from the artificial?
Autogenic outputs are making their way into models while the line between truth and non-truth is increasingly blurred. There needs to be a way to understand what impact data has on the capabilities of the automated tools of today and tomorrow, to ensure reliability and bolster our society in an era of unprecedented technological development.
Intervention Mechanism
The proposed intervention is to make influence of data auditable and measurable for a model's intended purpose. Instead of a trust-based approach to accountability, organizations can independently detect contamination and adversarial manipulation. By starting with visibility in data impact, teams developing pilot projects are able to make faster decisions on data selection and training priorities. This entry tackles the trend of projects’ lengthy pilot phase by shortening exploration. In the medium term, influence aware knowledge distillation can be applied as a service offering to decouple quality and security assessments from model access. Thereby preserving the privacy of private models in the evaluation. In the long term, it can serve as a shield against model collapse by verifying data integrity at scale.