What is the metaverse?
2 December 2021 | 10:38 pm

Trends in tech come along every so often, co-opting and organising markets and sub-technologies around them like iron filings around a magnet. “Metaverse” is the latest, big enough that Facebook has renamed itself Meta to symbolise its enlarged focus. So I wanted to organise some of my own thoughts about what it is.

A trend is a fuzzy-edged phenomenon, a hyperobject touching on: products, protocols for inter-op, technology stacks, typical business models, and so on. So any definition is incomplete. The sharp end is the product experience, which is where adoption happens, and that drives everything else.

So what is the product experience of the metaverse? Loosely I see it as having three essentials:

  • Immersive
  • Multiplayer
  • Economy

Is it immersive?

I think about immersion on a spectrum. At one end you’ve got VR:

  • Full embodiment is 3D virtual reality, headsets, and sensors that detect muscle movement
  • Slightly below that there are video games like Fortnite or Roblox, or 3D environments like Arium (the pseudo-VR browser-based art gallery platform)

But “immersion” doesn’t have to mean entering cyberspace. You can get lo-fi immersion if these qualities get across:

  • a persistent world – this exists even when you aren’t there.
  • place – a sense that I am somewhere other than the chair I’m sitting in right now.
  • shared objects – everyone here is seeing and interacting with the same things.

And you know what? You don’t need VR for that. Sure it’s easier with 3D graphics and avatars, but those aren’t essential. You can have persistent worlds with a strong sense of place (and moving between places!) with text-based games (MUDs and MOOs, to go way back) – 2D graphics on the web can work just as well.

Is it multiplayer?

Today our apps, docs, webpages and computing environments, by default, are personal – the P in the PC. It takes work to make them social. In the metaverse, it’s social by default, and it takes effort to have a non-shared experience.

This is something different than the social of “social media,” or the comments and ratings you get with (ugh) UGC, “user generated content.”

To differentiate from the old social of the existing web, the term of art is multiplayer.

And that connotes liveness. You need a sense of presence. The ability to collaborate on shared objects in the shared world, whether for work or fun. Faces, emotion, video – all of these contribute to a transporting sense that you are surrounded by other people.

Yes that’s easier with video games and virtual reality. Immersive features such as place and proximity (and distance) make it possible for crowds to co-exist.

But again it can be lo-fi. I was previously tracking how the web is going multiplayer and I recently ran across another great example: tldraw is a tiny web-based drawing app. It’s elegant and playful. You scribble on the page, that’s all.

…then look in the menu. There’s an option named Create a Multiplayer Room. Select it. Grab the address from the address bar and share it with a friend. Now you can see each other’s cursors and you’re drawing on the same canvas. No bother no fuss. A little glimpse of the metaverse, right there.

Does it have an economy?

So we’ve got a shared, persistent world with shared, persistent objects. And it’s multiplayer. The last ingredient is economics.

By “economics” I don’t just mean buying objects (perhaps digital assets like costumes or upgrades) or even virtual land (such as in Decentraland, a land-based economy on the blockchain). What’s important is that these objects are assets. You must be able to sell them; they and their “ownership” must exist in a marketplace that transcends the platform in which they manifest.

So that implies a concept of identity, money, and rights that exists outside any given immersive, social platform or another.

Web3 is one obvious stack for this – or at least the collection of technologies. The stack hasn’t energy yet. By Web3 I mean the crypto (cryptocurrency, rather than cryptography) world of: identity; payments; contracts; ownership; currency; and the entire pile of derivatives that can be created. Yes NFTs are a big part of that. A powerful enabler.

But I don’t see that crypto is intrinsic to having an on-platform economy. It could happen with the dollar (or fiat currency generally).


So the metaverse is a product experience that is immersive and multiplayer with built-in economics.

And a metaverse company is a company that provides that or is somehow part of the stack. Maybe they provide an enabling technology, like easy-to-integrate presence or treasury, or maybe there’s a yet-to-be-identified marketing/distribution mechanism that has a particular requirement on analytics, or maybe they provide an interface like smart glasses. It’s hard to know at this point what the dynamics will be, or where value will be extracted in the value chain.


The history of the metaverse

It’s been a while since I’ve read Snow Crash, Neal Stephenson’s 1992 sci-fi novel in which he invented the concept, so I’ll just grab this from the Wikipedia page on Metaverse instead:

Neal Stephenson’s metaverse appears to its users as an urban environment developed along a 100-meter-wide road, called the Street, which spans the entire 65536 km (216 km) circumference of a featureless, black, perfectly spherical planet. The virtual real estate is owned by the Global Multimedia Protocol Group, a fictional part of the real Association for Computing Machinery, and is available to be bought and buildings developed thereupon.

Users of the metaverse access it through personal terminals that project a high-quality virtual reality display onto goggles worn by the user, or from grainy black and white public terminals in booths. The users experience it from a first-person perspective. …

Within the metaverse, individual users appear as avatars of any form, with the sole restriction of height, “to prevent people from walking around a mile high”. Transport within the metaverse is limited to analogs of reality by foot or vehicle, such as the monorail that runs the entire length of the Street.

So we’ve already got these three qualities: it’s a persistent world, social, with a built-in economy. It’s dogmatically physical and uses VR, which creates this sense of immersion, which I guess is why Meta nee Facebook is working on haptic gloves.

And the economics is kinda ugly. Ruthlessly commercial, and no room for the open source ethos that was the foundation of Web 2.0, the current generation of the web.

But the lineage is clear.

The biggest difference, for me, is that Stephenson’s capital-M Metaverse is singular. There’s one of it. That’s evidently what The Corporation Formally Known As Facebook imagines too, and they’ll own it. It’s possible that TCFKAF is right, and they’ll be the ones that win big.

The web - or rather the application and protocol WorldWideWeb - was a blip. I think that’s clear now. It was agnostic to document type, happy to link to email, gopher, image, and hypertext. It was frivolously free with assets: when you look at a webpage, the images are downloaded to your computer first and then assembled into a document! You can even “view source” to see the code behind a page. Websites are like applications that wear their source code on the outside.

The web isn’t how systems are typically architected. So we can’t take it for granted that we’ll end up with a small-m metaverse – a distributed network of interconnected metaverses, sharing identity and an economy but otherwise independently immersive. If that’s what we want, we’ll have to work for it.


Web 2.0

I think the last major technology trend like this was probably apps and the smartphone, but actually it’s hard to distinguish that from Web 2.0 that came just before. And it’s interesting to look back at O’Reilly Media’s catalysing 2005 essay: What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software.

What you’ve got in the “meme map” (on page 1 of the five page essay) is an approach is totally born of the new open, networked, social, apps-not-docs web. It’s a set of approaches that implies a set of technologies and commercial models.

Cited are ideas like:

  • Software that gets better the more people use it
  • Services, not packaged software
  • Architecture of Participation
  • Data as the ‘Intel Inside’
  • Granular Accessibility of content
  • Play

And in the years since, we’ve seen these formalised into social media, software as a service (SaaS), tools like git, and ways of working like agile – all applicable to this fast moving, fast growing world. Cloud platforms starting with Amazon Web Services and pioneering tech like Ruby on Rails grew up with Web 2.0. The economics (and economies of scale) of cloud platforms prescribe a cost model for companies, and that prescribes a revenue model: subscription for B2B, and the attention economy to B2C.

Web 2.0 even included its own aesthetic, so participants in the trend could recognise one another versus older, “enemy” approaches like Enterprise. We still have the warm, bold colours, the chatty brands, the rounded-off corners, and the cutesy illustrations of 2005.

Web3 is the re-platforming of Web 2.0 to be crypto-native: new identity, new payments, and new modes of collaboration.

Web3 has its own aesthetic and characteristic visual style. Even its own art movement with NFTs – see the new Outland magazine for art criticism in this domain. And it has its own shibboleth words to identify in-group and out-group. (gm.)

But Web3 isn’t a comparable trend. It’s the metaverse which rivals Web 2.0. The technology stack, the aesthetics, the community, and the products all grow up together. Web3 is part of the puzzle; the metaverse is the whole shebang.

What happened with Web 2.0 is that it became true but too much.

“Software that gets better the more people use it” is another way of saying that there aren’t any limits on network effects. Platform capitalism (Nick Srnicek’s term, mentioned in my Thingscon talk last year) is rapacious. We have one Facebook, not ten thousand. Whoops.

“Architecture of Participation” led to the sharing economy… which was co-opted and led to the gig economy, and to so-called “sharing” marketplaces like Airbnb and Uber mining the under-specified edges of the social contract.

The metaverse also contains as-yet unknown failure modes. It would be worth puzzling them out now.


Does the metaverse matter?

We can decompose the question of whether the metaverse trend matters into two parts:

  • Is the metaverse significantly different? Yes: it introduces the concept of scarcity (with space, time, and ownership) into the digital realm, which since its inception has been built on the idea of approx. zero marginal cost of duplicating assets, or communicating with +1 person. Just as Web 2.0 led to new community structures, new business models (and the attention economy), new ways of working, and even new expectations for government (witness GOV.UK in the UK), the metaverse will have its own long-term cultural effects.
  • Will the metaverse come about? Well now that’s a matter of opinion.

I will say that the metaverse trend has two qualities in common with two other large scale trends that I saw up close, or maybe let’s call them movements: Web 2.0 (described above) and Tech City, London’s transformation into a global startup hub (I was part of the inception).

Both had multiple constituencies that pushed for the movement for often barely overlapping reasons. With Web 2.0: corporations, individuals, investors, and customers up and down the technology stack. In the case of Tech City: Large corporations and government; founders and real estate owners; lawyers and journalists. Everyone felt they could immediately get more for their particular ship by working to rise this particular tide, and they did this without being instructed what to do.

With the metaverse we have crypto-libertarians tech nerds from Web3 somehow aligned with platform monopolist VR-maximalists from Facebook. Their values couldn’t be more opposed yet they are boosters for the same trend.

When a movement creates alignment without coordination, that’s a powerful force.


Thanks to Ed Cooke, Thomas O’Duffy, and others at Sparkle for knowledge and conversations. Blind spots and misconceptions all my own. As always this is a snapshot: thinking out loud rather than a final view.


Local streets for local people
26 November 2021 | 4:24 pm

I wonder how we can implement the social contract via technology, and how that can be done democratically.

A case study to explain what I mean…

One of the slow controversies in London over the past year has been the Low Traffic Neighbourhoods (LTN) programme: closing many residential streets to road traffic, sending cars onto main roads instead. There’s some background here including how it was built out of a schools-focused programme during the first lockdown (streets outside schools and on regular walking routes were closed to cars).

LTNs are a joy and a pain.

The future of the city involves fewer cars, we all know that. Walking on these quiet streets and having coffee in the parklets now built outside cafes is transformative. BUT the schemes channel cars onto already congested main roads, and semi-local trips that aren’t well served by buses are made much more difficult.

Jimmy Tidey’s brilliant research has shown how LTNs kicked off a culture war on Twitter – though catalysed by a relatively small number of vocal black cab drivers. There are posters in almost every local shop against LTNs and they’re often vitriolic. I spotted a banner headline, All Streets Matter. Breathtakingly tin-eared.

For me, the root of the vitriol is that two constituencies of people feel it is unfair.

  • Local people feel like these are our streets – why shouldn’t we be able to drive down them?
  • Black cab drivers used to have special access to The Knowledge: a detailed mental map of London’s short-cuts, effective precisely because it was specialist. But it was made accessible to everyone with Google Maps (which, I remember hearing, has contributed to a 30% traffic rise on residential streets over the past decade). And now we’re being “punished” for the traffic with street closures, but the cab drivers feel it’s not their fault and they should still have access.

The problem is exacerbated by technology. The LTNs are often in effect for only some of the day, so the street isn’t physical blocked. The closure is implemented by a road sign, cameras with automatic number plate recognition, and penalty fines sent through the post. One of my neighbours has been stung by a series of 65 quid fines, having sailed through computer-closed streets accidentally a number of times. So, poor software.

But technology is also perhaps part of the solution!

Long term we’ll have self-driving cars. We won’t need to close streets with bollards and impose fines – the cars can be programmed. The Low Traffic Neighbourhood policy will be a software point release.

So let’s think about how to bias that future pathfinding algorithm for fairness.

Perhaps what we’ve identified is that local people have more “moral right” to use their neighbourhood roads than people from across town who are using the street as a shortcut. Those people from across town feel like freeloaders: they’re taking the benefit of the cut-through but they don’t have to live here.

(Something similar happened in Los Angeles when Waze became popular. People went to extraordinary lengths to protect their local streets by fooling the Waze maps. As discussed here in December 2020.)

Could we say that fairness means: local streets for local people?

What if we had some way of categorising roads on a spectrum from small (local and residential) to big (thoroughfares)? If you live within 1 mile of a small road, it’s free to use. Over a mile, it’s thoroughfares only or you get a penalty.

The existing Low Traffic Neighbourhoods would see some cars again, but traffic volumes would be low: the streets would be closed to any car from outside the local area.

Ok, as a thought experiment that works for the future. NOW we can ask about how to implement this without waiting for robot cars. Could LTNs be implemented in software today?

From a product perspective, the answer is yes.

Let’s imagine we have multiple routing modes in Google Maps. Perhaps the different algorithms are embodied as different characters, just like each ghost in Pac-Man embodies a different search algorithm. (I’m picking on Google Maps but I’m using this as a stand-in for all routing apps.)

In addition to the “quickest” mode, and the “most fuel efficient” mode, there would be “social contract mode” – which would be the default. This mode would avoid residential streets outside a 1 mile radius of your home or 0.5 miles from your destination. And it would be the default.

Through legislation, “social contract mode” for map routing would be mandated for all in-car navigation from 2026.

Sounds plausible. The question then becomes… how could a policy like this get enacted? Three challenges:

  • Politicians and civil servants don’t know enough about technology to know what is hard and what is easy. Nor the media or the public: technology is a specialist subject. So how can we have a meaningful public debate – which is what we need for democracy?
  • Mandating a product solution is the wrong level of abstraction: but let’s say the outcome was somehow made law, how would that even be expressed? How is it possible to express a requirement like this in law? The social contract must be baked into product features?
  • Once we open the door to the state interfering in tech product decisions, how are stupid decisions defended against? What about dangerous decisions, or ones that reduce liberty? We have norms and laws and centuries of philosophy (and in some countries, a constitution) for the limits of how the state may limit the freedom of the individual, but we aren’t nearly as sophisticated when it comes to technology.

I don’t know the answers to these, but the utility of having a specific case study such as Low Traffic Neighbourhoods is that we have something concrete to debate.


London traffic is a specific case of something general and important, which is how society uses technology to enact its values, and what the mechanisms and limits on this should be.

Another instance of the general problem is Facebook’s engagement algorithm. Can society really tell Facebook how to tune its systems for chasing engagement, given that the ad-supported model requires it? Can we really insist that Facebook puts a cap on engagement, reducing its profit margins, or even changes its business model to include paid services – which will reduce accessibility?

I mean, yes we can and should be having that debate. The extremism caused by Facebook’s algorithms can be seen as a public health problem and, if that analogy holds, I can point out that we’re perfectly happy to tax the cigarette companies without outright banning them. (Paying for externalities is one of the uses of tax.) So maybe the same approach should be adopted with Facebook.

But the question is the same: how should the desired social outcome be expressed as a technology product requirement, and how can it be expressed in law?

There are social values baked into software already. We need democratic ways to tune the parameters.


AirPods, and the cognitive ergonomics of tools for thought
19 November 2021 | 6:21 pm

I’ve been trying out the dynamic head tracking feature of the new AirPods 3, and it makes me feel like the cognitive ergonomics of computer interfaces is - still - way too disconnected from everyday design.


The head tracking technology is intriguing.

First there is spatial sound, which arranges sound in a sphere instead of in stereo. Apple Music now has a a bunch of music remastered spatially and personally I find it distracting when, say, the vocals are placed to the side and behind the drums. But anyway, it’s a thing, and spatial sound isn’t just for music. It’s an enabler.

So then there is head tracking which fixes the sphere in space even as you move your head.

For example: you walk down the street listening to regular stereo music. You turn to look to the right briefly, and the left and right channel remain fixed on the imaginary sphere around your head. The music that was previously (apparently) ahead of you is now only in your left ear.

It’s awesome.

And weird.


There are some problems with head tracking as a feature.

You can switch between modes, but check out Apple’s own documentation: you have to long-press the volume on your phone to find out what options are available.

Then head tracking isn’t available with all devices. My AirPods switch seamlessly between my devices, but they don’t all have the ultra-wideband chip that head tracking requires. (UWB is some clever radio magic. Apple call their chip the U1.) So it’s sometimes available and sometimes not.

Now the UWB chip is what allows for relative positioning with high precision (mm accuracy last I checked) and low latency (you need it to be low milliseconds to work well with audio). It is clearly a jigsaw piece for Apple’s as-yet-unannounced work with augmented reality, so the U1 (and therefore head tracking) will end up in all their devices. So that inconsistency gets sorted.

But even so, head tracking gets used in a few different ways and it’s not clear to me, the user, what’s going on.

For instance: the other day I was working at my Mac and playing music from my iPad, and it appeared that the music was originating from the iPad itself – it had been spatialised to be located in the device.

Did I imagine that feature? How did it happen?

So there’s a lot of confusion in the user experience: poor naming, hidden modes, and so on. The technology is rock solid but with inconsistent manifestations.

Which is fine! There is a ton of learning going on.

(You can see Apple releasing jigsaw pieces like head tracking, photogrammetry with ARKit, and LIDAR in the Pro phones. At a certain point, the supply chain will be de-risked and the developer community will have devices that can function as dev kit – and then it will be the moment to land smart glasses, whatever form they take, and the only “risk” is the consumer experience, and Apple has nailed how to launch and iterate that. The playbook in action is astounding to see.)


OBSERVATION #1

Stereo music usually feels like it is located at the centre of my head, right between my ears.

Spatialised, these AirPods place the music right in front of my third eye: about an inch in front of my face, and just above my eye line.

With head tracking, the apparent locus is as steady as a rock.

And it is super bizarre. Like, I can see why Apple has made this decision: music played from the centre of my head would not move with head tracking at all. It would be at the centre of the imaginary sphere.

But placed where it is, I go slightly cross-eyed. I end up focusing really close up and looking up slightly, at an invisible source of sound.


OBSERVATION #2

When the music apparently came from my iPad, while I was working on my desktop Mac, I found it way easier to focus on my work. Oh!

The background sound was physically separated (not actually, but using head tracking) from the point of my attention: the on-screen document. That separation seemed to allow me to concentrate better.

Which is… a fact worth paying attention to, right?


The question for me is this:

What are computers for?

Are they, as the name of Howard Rheinhold’s 1985 book suggests, tools for thought?

If so, how do we understand how to bias interfaces to make it better thinking easier – and what are the contributing factors to good thinking anyway?

Specifically questions like:

Is it milliseconds faster to respond to a device notification if the sound of the notification appears to emanate from that device?

Can more be held in working memory (and therefore synthesise information in a more sophisticated way, faster and smarter) if the documents are distributed - using sound feedback - over a wide surface, rather than being at a single point under the thumb? And is that ability to synthesise measurable?

I’ve asked similar questions a couple of times before:

  • Do Star Wars wipes (a particular style of scene transitional) tap into underlying automatic mechanisms to more efficiently allocate and deallocate the brain’s scarce processing resources, a kind of attentional ergonomics?
  • Can we access the “memory palace” benefits of spatialising knowledge - in terms of capacity and organisation - simply by providing on-screen interfaces that visually resemble moving through doorways, a kind of hippocampus ergonomics?

I would generalise this to cognitive ergonomics: how do we make user interfaces that better match how we think? And by think I mean: synthesise, create, pattern match, abstract, linearise, and so on.

So much of today’s desktop user interfaces were driven by early psychological considerations: Fitt’s Law being how quick it is to move a cursor to a target (and that you can think in the meantime), or the screen itself being a visual cache for working memory.

I’m sure the HCI community has continued this good work.

I would love to know certain things, in addition to the above. For example, I have a hunch that the fundamental “tick” of the brain is around 100-150ms – that how long it takes for a signal to move across the thinking meat, if I remember right. Interfaces that respond within that time feel fluid, and outside that time make you feel like you have to wait. Is that true? Does it have an effect on, say, our ability to do recall or have a novel idea?

Or is parallel thinking possible? Does the time taken to move a mouse cursor provide the ability to consider what happens next? Does using sound to create a cognitive map and loading/unloading data from working memory allow for synthesis which is faster?

My dual wishes are these:

That Apple, Microsoft, Google and so on employ cognitive neuroscientists to develop quantifiable measures for good tools for thought, study modern interface approaches against these measures, and publish their research – just as they publish widely with machine learning or cryptography.

And that front-end code libraries bake in these rules. If 100ms is the cognitive tick, then that should be a top-level guarantee for any user interface toolkit. And so on.



More News from this Feed See Full Web Site