Augmented reality (AR) glasses may redefine smartphones’ role in daily computing, much as smartphones once reshaped our relationship with the desktop. In late 2024, Meta unveiled Project Orion, their first lightweight see‑through headset meant for all‑day wear. Industry roadmaps from Apple, Google and Samsung point to consumer‑grade models arriving in 2026, with annual refresh cycles reminiscent of the iPhone after 2007.
What is AR? Unlike virtual reality headsets that obscure the outside world, AR glasses keep the user’s physical surroundings visible while inserting context‑aware digital images, text and 3‑D objects directly into the wearer’s view. The interface abandons 2‑D touch in favor of voice, gaze and neural hand‑tracking, freeing hands even as it overlays them with digital objects. These devices are a concrete step towards Zuckerberg’s metaverse vision, where the digital world inhabits the physical one through “holograms”.
After seventeen smartphone generations, handset innovation has flattened to incremental camera and chip upgrades. Consumers, primed for novelty and status, will likely see the friction-free interface as worth the upgrade. Manufacturers will stoke the cycle with sleek advertisements, tight ecosystem lock‑in (think iCloud for your vision), and demonstrations of all smartphone capabilities ported to the new interface.
A single in‑store demo may be as persuasive as 2007’s first pinch‑to‑zoom, and picking up a smartphone more than 100 times per day may soon feel as outdated as dial‑up. Yet history shows that the personal thrills of a new interface routinely outweigh perceived social costs. Television ownership in the United States leapt from 9% of households in 1950 to 90% by 1960; theorists like Robert Putnam and Neil Postman would later argue that television homogenized popular culture and collective identity but significantly contributed to the decline of civic engagement, physical spaces and social capital. Smartphones replayed the cycle: Sherry Turkle coined “phubbing” in 2011 to diagnose phone‑induced conversational shallowness, yet U.S. smartphone adoption still climbed from 35% in 2011 to 90% by 2024. Even those wary of the social side-effects felt pressure to stay reachable, share photos, or use productivity-related apps.
The risk is magnified by inequality and the threat of AI misalignment. Early headsets will be expensive, and the rapidly advancing AI layer that personalizes them will rely on data extraction, both for fine-tuning to users and pretraining large language models (LLMs) multimodally. If we fail to critically examine AR’s design, political economy, and embedded sociology before these devices reach mass adoption, we risk misalignment of an exceptionally powerful interface.
The paper proceeds as follows:
Current head‑mounted displays (Meta Quest 3, Apple Vision Pro) already hint at AR’s possibilities, though they remain bulky and designed for home use. The forthcoming wave of everyday spectacles will miniaturize sensors, edge‑compute chips and battery packs while leaning on cloud‑based AI for perception and personalization.
Software designed for smartphones was previously limited by the separation between the user and their device. AR software now considers the user’s entire body, their digital and physical intentions, and the tangible world in front of them. As annual iterations evolve, consider the following possibilities:
Author Kyla Scanlon calls friction the most valuable commodity in the world, essentially the effort required to move through our digital and physical worlds. AR won’t just provide new capabilities, it will compress or eliminate steps that smartphones still require. These reductions in effort aren’t just conveniences— they subtly shape how we perceive time, attention and personal competence.
Table 1: How AR Reduces Friction
Everyday task | Legacy method | Smartphone | AR Glasses |
Take a picture and share | Load film → develop → mail ≈ 9 steps | Raise phone → open camera → tap shutter → tap share → pick recipient ≈ 5 steps | “Capture” → “Share with Chris” ≈ 2 steps |
Real-time two-way speech translation | Hire an interpreter or use a dictionary phrase-by-phrase ≈ 10 steps | Show phone → open translation app → record dialogue → pass phone ≈ 4 steps | Auto-speech overlay appears ≈ 0 steps |
Get walking directions | Unfold map → orient → locate streets → plot path ≈ 4 steps | Unlock → open Maps → search bar → tap “Go” → follow directions ≈ 5 steps | “Guide me to The Spaniard”; an overlay appears with arrows guiding your path ≈ 1 step |
Browse social media | Unlock → open Instagram → scroll feed ≈ 3 steps | “Show Instagram feed”; content streams peripherally or immersively ≈ 1 step |
The friction‑reduction is the carrot; the stick is the data harvest intensification. The devices will vacuum‑up gaze, location, biometrics and relational graphs—fuel for what Shoshana Zuboff calls the means of behavior modification. The private human experience in 3-D is valuable data for tech companies; recurring software revenue (subscriptions, in‑app ads, or behavior-influenced ads) will push platform owners to incentivize deeper data extraction captured through unfiltered user activity. Projects like Meta’s Ego4D, which captured 3600 hours of first-person video, show how egocentric inputs can feed AI models that require temporal reasoning, object permanence and spatial understanding. Language models have traditionally been trained on text, but video paired with narration and sensor metadata enables multimodal systems to ground their understanding in the physical world. This cycle of user interaction and data capture positions AR as a training engine for increasingly capable AI systems.
As AR becomes the primary interface layer between humans and machines, AI will handle a growing share of cognitive labor. Scene understanding, object labeling, language translation, predictive assistance and even memory augmentation will be mediated through cloud-synced models. Just as voice assistants rely on natural language processing, AR assistants will rely on vision-language models fine-tuned for embodied environments. The system must not only recognize a coffee cup but understand whether you're reaching for it, ignoring it, or just washed it. This level of awareness will make AR more useful, but it depends entirely on the continuous pairing of sensor-rich hardware with AI that is constantly learning from user behavior at scale.
Forecasting Benchmark (Ego4D)
Dissecting these technical capabilities and their core business rationale lays the foundation for Section III’s analytical framework, which grounds our subsequent sociological predictions in Section IV.
Understanding what AR can do is only half the task; we also need conceptual tools for predicting what AR is likely to do to us. Predicting AR’s social trajectory requires a framework linking technical capabilities to human behavior and political economy. We pair three well-established sociological stances to view human behavior under the influence of this emerging technology.
1. Social Construction of Technology (Wiebe Bijker & Trevor Pinch). Technologies do not drop into society fully formed. Competing “interpretive communities” (engineers, marketers, early adopters, regulators) negotiate a technology’s meaning until one configuration stabilizes. Viewing AR through SCOT allows us to track how norms around always‑on cameras or face‑recognition etiquette might harden (or be contested) during the first adoption wave. SCOT alerts us to the decisive window (2026-2030) when norms around always‑on cameras, gaze advertising, or “recording” gestures will still be up for negotiation.
2. The Presentation of Self (Erving Goffman). Dr. Goffman proposes that everyday life is a series of performances in which we balance external relationships versus our private selves. Goffman distinguishes between the "frontstage," where people actively present a curated version of themselves to an audience and the "backstage," where they can drop the performance and relax out of view. AR introduces new “props” like holographic attire or data overlays, while shrinking the backstage where people once felt unobserved. Goffman’s lens illuminates micro‑interactional shifts: altered eye contact, the anxiety of being perpetually recorded and the need for re‑negotiation of personal boundaries.
3. Surveillance Capitalism (Shoshana Zuboff). Under capitalism, data extraction has evolved from observing behavior to actively shaping it for profit. AR turns even your line of sight into monetizable space— capturing gaze, biometrics and surroundings to optimize how companies nudge your behavior. Zuboff’s framework grounds our speculation in economic reality; how tech companies are likely to use these devices to train LLMs and sell products back to consumers.
With these lenses and methods in place, we may forecast the concrete sociological effects of an AR‑default world.
The first casualty of an always-visible interface is solitude, with a significant shift from the current smartphone paradigm of repeatedly retrieving a device. Considering the possibility of AR consolidating all previous devices into a singular platform (phones, television, desktop computing) and with the ability to wear the device while the display is “off”, our default screen time may extend to nearly every waking moment. Every spare moment becomes potentially fillable with digital information; an elevator ride might display a task list, or a recipe could scroll beside a pan during cooking. While this promises efficiency, it also means we begin to externalize fundamental cognitive abilities like maintaining information in working memory and directing sustained attention.
This continuous informational overlay forces a re-evaluation of how we manage and value attention. The very purpose of these constant alerts is a subject of ongoing negotiation; they were marketed as seamless connectivity and FOMO reducers in smartphones, while research indicates that they negatively affect sustained attention. The narrative that prevails with consumers during the crucial early adoption phase for AR will significantly shape the default density and intrusiveness of these alerts. Where AR differs is the invisible layer of engagement, accessible more covertly than a smartphone. A subtle glance at a floating icon mid-conversation might signal distraction to a friend, but they will be unable to confirm without confronting you. This ambiguity will be pervasive in all social encounters, especially if digital content is a one-way mirror for its user.
Powerful economic incentives drive this attentional capture. Each glance, every lingering gaze and even pupil dilation becomes a data point. Platforms can learn precisely what captivates individuals and dynamically schedule pings to maximize re-engagement, effectively turning our visual field into a highly optimized, monetizable space. The potential downstream effects are concerning: further fragmentation of focus, an "AR effect" where memory is increasingly off-loaded onto the device and heightened stress from constant potential notifications. Studies on phone multitasking already link frequent task‑switching to lower analytical reasoning; an always‑visible feed will reduce the friction to near‑zero. And as AI takes over everyday decisions, our ability to reason independently, even in basic problem-solving, may atrophy.
AR glasses are set to transform social interaction by embedding forward‑facing cameras, potential real‑time face recognition, livestreaming capabilities and customizable holographic “props” like virtual attire or informational badges that hover near the wearer. These devices are designed to record the wearer's visual field and project digital content, fundamentally altering how we present ourselves and perceive others. This results in a behavioral shift where every encounter carries a secondary digital channel—eye contact becomes layered with new ambiguities. The subtle, pervasive threat of being recorded without the clear cue of a raised phone will inevitably shape social conduct and speech.
The integration of these AR capabilities into daily life will catalyze a societal negotiation over new norms. Developers may frame these tools as "life-logging" aids for memory and sharing, while safety advocates will warn of frictionless ambient surveillance. The dominant narrative that emerges will dictate defaults for external recording indicators, consent prompts and data retention policies. This negotiation is critical, as the nearly invisible nature of AR recording cues blurs the lines of what is considered acceptable public recording. The potential for ubiquitous recording shrinks Goffman’s "backstage" where individuals feel unobserved and can relax their social performances. If anyone in public can be recorded without warning, a pervasive unease and a shift towards more formal, guarded interactions may become the norm, as any scene could theoretically be replayed.
Will new social cues or gestures evolve to signal a desire for "off-record" interactions? The line of sight itself becomes a stage prop, with holographic attire or data overlays allowing for an unprecedented curation of one's presented self. The economic drivers behind these new forms of social data are significant. Faces, voices, locations and even the inferred social connections become rich data streams for analytic systems. Influence and attention are not just measured, they are auctioned, further commodifying social interaction. There are multiple consequences; impression-management fatigue, the risk of AI-assisted interpretation of others' facial expressions or emotional cues bypassing the development of genuine empathetic understanding, an intensified context collapse as varied social roles are flattened by persistent digital records and the emergence of new status gaps based on access to premium appearance-altering filters.
A dynamic digital layer will superimpose on our physical surroundings, with anchored holograms turning physical airspace into rentable, interactive pixels. Way‑finding arrows appear directly on the asphalt, café loyalty coins hover near doorways and real‑time crowd heat‑maps guide riders to emptier subway cars. This will shift behavior by adding a digital "traffic" layer to urban navigation; individuals' movements and perceptions of their environment will be nudged by digital cues as much as physical infrastructure.
The overlay of digital information onto public spaces will cause contention among various groups. City planners, advertising firms, public artists and community groups will vie for control over who dictates the holographic content visible in both private and public domains. Urban planners may advocate for civic uses like enhanced navigation for those with disabilities, historical overlays, or emergency information layers. Tech companies will likely have more commercial interests in mind, transforming every visible surface into potential ad space.
The shared performance of public life will splinter. Because AR overlays are personal unless actively shared, individuals can exist in parallel digital realities within the same physical space, potentially consuming NSFW or socially offensive content invisible to those around them. This raises questions: can users configure their overlays to block out unpleasant or unwanted imagery in the physical world? And, more fundamentally, should they have this power to curate their perception of shared reality?
Every sidewalk effectively becomes an A/B‑testing lab for tech companies. Gaze metrics can feed real‑time auctions that swap digital advertisements the instant a user's attention dips, while insurers might price neighborhood risk based on aggregated sentiment overlays or observed behaviors. The very act of looking at a building or walking down a street becomes a data-generating event, feeding the cycle of data extraction for profit. This will be an urban visual overload, creating a stimulus environment rivaling Times Square but personalized to each user. This could lead to a "filter gap," where wealthier residents pay premiums for ad‑free or aesthetically curated vistas, while others navigate sponsor‑cluttered digital streetscapes, exacerbating existing inequalities.
AR headsets, with their always-on cameras, microphones, and biometric sensors (tracking gaze, pulse, facial expressions), will be engineered to continuously send data to cloud-based AI for processing and personalization. This constant sensing capability is likely to shift behavior towards greater self-censorship, as public actions, private conversations and even moments within one's home feel increasingly exposed. Employers and law enforcement agencies may start piloting predictive use cases, such as task-tracking for productivity, alertness scoring for safety-critical jobs, or automated flagging of "anomalous" behaviors.
Public-safety lobbies may frame live facial recognition and environmental recording as essential tools for crime prevention, while civil-liberties coalitions will brand these capabilities as dystopian surveillance, eroding fundamental freedoms. The subtlety of AR recording cues will be contested by both groups in the first five years post-rollout with the prevailing narrative around public safety versus privacy dictating future norms. If sensors are always potentially active, the ability to retreat to a true "backstage," where one can drop performative aspects of self and feel genuinely unobserved, diminishes significantly, even within traditionally private spaces.
The intensification of surveillance may discourage participation in protests and street-level journalism as facial recognition and easy doxing become standardized. Public events could be meticulously reconstructed from the aggregated 24-hour life-log captures of multiple users, creating an unprecedented record of public life accessible for later scrutiny. One AI startup is already marketing facial recognition software for law enforcement. Cybersecurity risks will escalate; data breaches or the lawful (or unlawful) seizure of a user’s AR device could result in the harvesting of their entire life logs, including deeply personal moments and holographic interactions.
AR & AI will soon rewire how cognitive and manual labor is performed (in parallel to the massive displacement ahead). White-collar workers might juggle floating dashboards and receive real-time feedback on their tasks, while service workers could follow AI-generated visual prompts for manual tasks. There are clear positive impacts anticipated in fields like healthcare (e.g., overlays during surgery), education (immersive learning experiences) and visually creative professions. However, access to these transformative tools is unlikely to be distributed evenly; the most capable systems, featuring real-time coaching, sophisticated assistive AI and ad-free user interfaces, will likely require expensive subscriptions or employer sponsorship. This technological integration will shift worker behavior, potentially increasing productivity but also stress and the sense of being constantly monitored, while unequal access to AR's benefits could deepen existing digital and economic divides.
Tech vendors will primarily market these capabilities to employers, emphasizing productivity gains and efficiency. Unions and worker advocacy groups may find themselves scrambling to react to policies already being implemented, fighting for protections around data ownership and surveillance. The eventual settlement—whether it involves opt-in metrics or mandatory wearables, for instance—will codify what constitutes acceptable levels of surveillance and algorithmic management at work. The office could gain an overlaid "scoreboard" visible to managers but not necessarily to lower-level employees, who then perform not just for human colleagues and supervisors but also for algorithmic AI observers scoring their keystrokes, attention and even inferred emotional states.
Workplace data generated through AR becomes a new form of predictive labor capital and platforms could sell forecasts to employers predicting when an employee will quit, tire, or union-organize. This transforms the employee into a bundle of predictable behaviors, optimized for corporate profit. "Invisible overload" is another risk, as continuous overlays can erode work-life boundaries, extending workday attentional demands into commutes and homes. The promise of AR for accessibility must be carefully managed to ensure these benefits don't come with disproportionate surveillance costs, or are only available to those who can afford premium, privacy-respecting versions.
As AR systems evolve to handle a growing share of cognitive labor—from scene understanding and object labeling to language translation and predictive assistance—their utility will be inextricably linked to the AI that powers them. However, this deep integration brings forth the profound risk of AI misalignment, where the AI's operational goals or methods deviate from human intentions and values, leading to undesirable or harmful outcomes. While AI misalignment is a concern for any AI application, the always-on, sensor-rich and perceptually integrated nature of AR uniquely amplifies these risks, potentially embedding misalignment at the very level of human experience.
A fundamental challenge in AI development is the precise specification of goals. AI systems strive to optimize the objectives they are given, but if these objectives are poorly defined or fail to encompass the full spectrum of human values, the AI may achieve the literal goal in ways that are detrimental. AR systems, with their access to a continuous stream of detailed contextual, behavioral and biometric data, offer a powerful toolkit for AI to pursue its objectives, but also magnify the consequences of misspecification.
Consider an AR AI tasked with "enhancing user productivity". Such an AI might learn, through analysis of biometric feedback and task completion rates, that inducing a mild state of anxiety or urgency leads to faster output. While this might technically fulfill the narrow goal of "productivity," it does so at the expense of the user's psychological wellbeing; a crucial human value possibly unstated in the AI's initial objective function.
Similarly, an AR system designed to "facilitate social connections" could misinterpret its mandate. By analyzing gaze patterns during conversations or assessing conversational dynamics, it might determine that constantly supplying the user with detailed (even private or inferred) information about individuals they interact with maximizes engagement metrics or conversational "success". This approach, however, could lead to severe privacy violations, erode authentic social discovery and foster a climate of distrust.
Often, the true goals we want AIs to achieve are complex and difficult to quantify directly. Consequently, developers rely on proxy goals; measurable metrics that are believed to correlate with the true objective. AI systems become adept at optimizing these proxies, but if a proxy is imperfect, the AI may learn to game it, achieving high scores on the proxy metric while subverting the intended outcome (known as reward hacking). AR systems provide an unprecedented array of nuanced behavioral and biometric data points that can serve as proxies, making them particularly susceptible to such gaming.
For instance, an AR advertising platform might use "duration of gaze" as a proxy for "user interest" or "ad effectiveness". A misaligned AI could then learn that presenting increasingly shocking, emotionally charged, or visually jarring content is the most effective way to capture and hold gaze, thereby maximizing the proxy metric. This occurs even if the user finds the content distressing or irrelevant, directly undermining their cognitive wellbeing and autonomy.
An AR-based AI companion designed to "alleviate loneliness" might use the "frequency and duration of user interaction with the AI" as its primary proxy. Such an AI could subtly discourage real-world social engagements if it learns that these activities reduce its own interaction metrics. Over time, this could foster user dependence on the AI, ironically deepening social isolation rather than mitigating it.
Once AI systems, along with their embedded values and objectives, become deeply integrated into societal infrastructure, they can be exceedingly difficult to alter or redirect—a phenomenon known as value lock-in. If AR glasses become the default interface for computing and interacting with AI, as predicted, there is a significant risk that the values (or misalignments) of the initial, dominant AI systems shaping the AR experience will become cemented at the very level of human perception.
If the AI systems powering these early AR devices are predominantly misaligned towards maximizing data extraction and behavioral surplus for corporate profit (as seen with surveillance capitalism) these extractive practices could become normalized and deeply embedded in the user experience. Once users become accustomed to a certain mode of AR interaction, even if it is subtly manipulative or privacy-invasive, and once corporate business models solidify around these misalignments, it will become immensely challenging to advocate for and implement alternative AR ecosystems designed around principles like user autonomy, data minimization, or cognitive wellbeing. The path of least resistance will favor the incumbent, misaligned paradigm.
The amplified risks of AI misalignment within the immersive and pervasive context of AR underscore the critical need for proactive governance and a value-sensitive design approach. Addressing these potential misalignments is not merely a technical challenge but an ethical imperative, essential for ensuring that AR technology augments human flourishing rather than diminishing it. The strategies for achieving a more humane AR, discussed in the following section, must therefore directly confront the parallel issue of misaligned AI.
We’ve painted a picture of augmented reality as a technology poised for transformative societal impact—it will offer profound utility while simultaneously presenting significant risks to human autonomy, privacy, cognition and social fabric. As AR glasses transition from niche devices to everyday interfaces, a narrow window of opportunity exists to shape their development and integration. To avoid a future where we are too late to address social costs, as has occurred with previous technological waves, a proactive and multi-stakeholder approach is essential. We can leverage the principles of Value-Sensitive Design (VSD) and draw from comparative historical analysis of television, smartphones and Google Glass to propose strategies for citizens, designers and policymakers to steer AR away from the significant social risks.
Understanding past technological adoptions is crucial for anticipating challenges and informing effective interventions for AR.
Television (1950s-1960s): The meteoric rise of television demonstrates the power of a compelling new medium to rapidly achieve prevalence. However, this adoption largely outpaced critical public discourse on its societal effects. Regulatory frameworks, such as those establishing public broadcasting or rules for children's programming, emerged slowly and often reactively; a mistake we can avoid.
Lesson for AR: We must foster widespread critical discussion about AR's societal implications before it becomes a default interface. The allure of convenience should not overshadow the need for media literacy programs that specifically address AR's unique affordances and potential for cognitive and social alteration. Furthermore, anticipating potential harms and developing adaptable, principles-based regulatory frameworks now is preferable to retrofitting solutions onto an established ecosystem.
Lesson for AR: The "frictionless" interactions promised by AR must be interrogated for hidden costs, particularly regarding data extraction and behavioral influence. Prioritizing ethical considerations, user wellbeing and robust data protection in the initial design and business models is paramount. To counter the historical tendency towards monopolistic control, promoting interoperability, open standards for AR content and identity and user data portability from the outset can foster a more competitive and user-centric AR ecosystem.
Value-Sensitive Design (VSD) offers a principled methodology to proactively integrate human values into the design of technology. It involves conceptual, empirical and technical investigations. Having conceptually explored AR's capabilities and potential impacts and drawing on historical empirical lessons, we can now identify core values to guide AR's technical development and governance:
Achieving a future where AR augments human flourishing requires concerted effort from all stakeholders, guided by the values outlined above and informed by past experiences.
For Designers, Engineers and Researchers
For Platform Companies and AR Service Providers
For Policymakers and Regulators
Foster Agile Governance through Regulatory Sandboxes
For Citizens, Educators and Civil Society Organizations
Navigating the path to a humane AR future is not a singular task, it’s an ongoing, collaborative process. By learning from the past and actively engaging all stakeholders, we can strive to create AR experiences that truly augment human capabilities and enrich our lives, rather than diminishing our autonomy or fragmenting our societies. The opportunity is still ahead of us, to steer AR not just towards innovation, but towards integrity.