Hallucination should not be a dirty word
26 July 2024 | 9:41 am

One of my local schools, just down the road, is the Creative Computing Institute, part of University of the Arts London.

I was honoured to be a judge at the recent CCI Summer Festival. Students from the BSc and diploma courses were showing their projects.

Here’s the press piece: Creative industries experts recognise exceptional student work at the UAL Creative Computing Institute Summer Festival.

There was so much great work. All the awards have amazing winners, and there were many other projects right up there too.


I was awarding for Innovative Materials: "redefining the ways in which we use and conceive of the ‘stuff’ of computing practice."

Congratulations to the winners, Gus Binnersley, Kay Chapman, and Rebecca De Las Casas, for Talking to Strangers.

Artists’ statement: Talking to Strangers explores early theories of language development and symbiotic interspecies communication. Inspired by the work of linguist Jan Baudouin de Courtenay, this game of telephone explores his ‘bow-wow’ theory, which suggests that the beginnings of language involve progenitors mimicking sounds in their natural environment.

Ok I want to say something about this work and why it spoke to me, and about AI and hallucinations.


Let me describe the project, because I can’t find any pictures online:

  • Two sheets of metal hanging from the ceiling. Two telephone handsets, one at each end.
  • Speaking into a phone, your voice is transformed into different signals, transmitted through the metal, and reconstructed at the far end – that’s the game of telephone.
  • If you scratch or tap the metal, adding noise along the transmission path, the scratches and taps are reconstructed into what sounds like a voice.

Now my personal scoring rubric, for this particular award, was for using the material of computing - signal in the case of this project - as an intrinsic part of the work. And to tell a story about that material, rather than using it in service of another story.

And the story about the continuity of data is an interesting one. Voice remains regardless of the substrate. The invention of the category of data is a big deal!

But data-as-material is a well-trodden investigation.

SO:

What grabbed me here was the accidental voice reconstruction.

The project group used machine learning voice changing software, off the shelf, made for streamers.

The scratches and taps on the metal were transformed by the proto-AI into fragments of voice: burbles and syllables that sound something like a person speaking, but not quite. You strain to hear.

(I didn’t ask but I got the impression that the group didn’t originally intend for this to be part of their project, even though it was part of their demo by the time I spoke with them. That’s what you get from working directly with material.)

And this is something new:

Where does the voice come from?

Novelty in the signal.


Signal vs noise.

The story of the our networked age is noise. Data rot. Lossy compression. Entropy. Message attenuation over distance and time. Lost in translation.

And yet - with modern gen-AI - something new: Novelty on the wire. Originality from… somewhere?

If we’re to take that idea seriously then first we need to encounter it and experience it for ourselves.

That’s the work that Talking to Strangers was embarking on, for me.

The project had put its finger on brand new ‘stuff’, so new we can barely see it, but it found it somehow and that’s special.


Because novelty from computers is special.

I think it’s hard to come to terms with originality from computers and AI because it’s so counter to our experience of what data does.

But I was using a prototype of an AI system yesterday and the bot said back to me:

Oh, that reminds me of the time I accidentally entangled my toaster with my neighbor’s cat. Poor Mr. Whiskers meowed in binary code for a week!

A trivial example. But like, where does this even come from?

Here’s one of my posts from September 2020, just after I used GPT-3 for the first time:

Here’s what I didn’t expect: GPT-3 is capable of original, creative ideas.

(It had told me about a three-mile wide black ring deep in the Pacific Ocean.)

Now, we call these “hallucinations” and the AI engineers try to hammer it out, and people swap prompts to steer outputs with great reliability. Apple Intelligence irons out world knowledge, SearchGPT gives chatbots ground truth.

It’s so easy to dismiss any output that looks new, calling it just a recombination of training data fed through the wood chipper. We often resist the idea that originality might be possible.

But here’s a thought: a major source of new knowledge and creativity for us humans is connecting together far flung ideas that haven’t previously met. (That’s why multidisciplinary projects are so great.)

And as I said back in that 2020 post:

It occurred to me that GPT-3 has been fed all the text on the internet. And, because of this, maybe it can make connections and deductions that would escape us lesser-read mortals. What esoteric knowledge might be hidden in plain sight?

So, just in how it’s trained, the conditions are there.

I began my defence when I spoke in Milan in April about hallucinations, dreaming and fiction.

And

I am even more convinced of it today.


Those babbling voices from the sheet metal are not noise in the signal. They’re the point. Sources of creation are rare and here’s a new one!

What would happen if we listened to the voices?

What if we built software to somehow harness and amplify and work with this new-ness? There are glimmers of it with Websim and so on. But I don’t think we’ve really grappled with this quality of gen-AI, not yet, not fully. We should!

Hallucination is not a bug, it’s the wind in our sails.


Congratulations again to the Talking With Strangers team, and thank you to UAL CCI for having me – a privilege and a joy to see all the work and speak with the students.


More posts tagged: gpt-3 (31).


The Times They Are A-Changin’
23 July 2024 | 9:39 am

The Times They Are A-Changin’ by Bob Dylan (1963).

Come gather ‘round people
Wherever you roam
And admit that the waters
Around you have grown
And accept it that soon
You’ll be drenched to the bone
If your time to you is worth savin’
Then you better start swimmin’ or you’ll sink like a stone
For the times they are a-changin’

Come writers and critics
Who prophesize with your pen
And keep your eyes wide
The chance won’t come again
And don’t speak too soon
For the wheel’s still in spin
And there’s no tellin’ who that it’s namin’
For the loser now will be later to win
For the times they are a-changin’

Come senators, congressmen
Please heed the call
Don’t stand in the doorway
Don’t block up the hall
For he that gets hurt
Will be he who has stalled
There’s a battle outside and it is ragin’
It’ll soon shake your windows and rattle your walls
For the times they are a-changin’

Come mothers and fathers
Throughout the land
And don’t criticize
What you can’t understand
Your sons and your daughters
Are beyond your command
Your old road is rapidly agin’
Please get out of the new one if you can’t lend your hand
For the times they are a-changin’

The line it is drawn
The curse it is cast
The slow one now
Will later be fast
As the present now
Will later be past
The order is rapidly fadin’
And the first one now will later be last
For the times they are a-changin’

I’ve had this on repeat the last week or so.

Ancient wisdom from the vibe shift of another generation.

It sounds so innocuous (YouTube). Dylan with his level tone and that harmonica. It’s timeless. But then you listen to the lyrics

That fourth verse? Gives me shivers.

Some background over at Wikipedia.

You get those eras where everything up-ends. The axes on which the world is measured no longer make sense. On the other side, there are new institutions, new ways of operating, new ways of feeling. Until now, it hasn’t been like that.

As a Gen X tailender, the last couple decades have felt a bit like an eternal 1990s. But I want to be baffled by music! I want to be surprised by culture!

And now – it’s happening? I love it. I love Gen Z. I love being continuously challenged to re-configure my internal scaffolding. I love their energy. It’s infectious. They’re grappling with the world - politics and identity and fashion and everything else - and that’s infectious too.

Anyway, what does a vibe shift feel like? And how should you act? Dylan wrote this song as documentary, last time round – "a voice that came from you and me" as Don MacLean put it.

It’s funny, I must have heard it a thousand times, and I’ve really only heard it this week.

On a technical note, reading the final verse it looks like it should speed up. Shorter words, fewer beats per line.

As a writer, to me, that’s an acceleration.

BUT music is rhythmic. Each verse is the same length. So shorter lines mean the words have more room. It’s almost not noticeable but in that final verse, the words extend, they grow and lift.

So it ends on a note of power. Not urgency but strength. It hadn’t occurred to me before that music would work like this.


There’s a growing coalition around change. Change first, values next. Break the logjam, end the great stagnation, crack the egg on a societal level, whatever you want to call it. The coalition connects ugly politics burning it all to the ground and shitposting inventors on the socials. I mean, it’s not a coalition that can hold, clearly. And I’m sure many in it would deny their participation.

But if we are to (say) get through the climate crisis, the ability to change is a prerequisite. And it is all connected, the opening the Overton window of weirdness is contagious.

Though when things do get moving, we’ll all start arguing about which way. And who knows how it’ll settle out. One side or the other or more likely some unimagined and unimaginable synthesis/detente.

That’s the ancient roadmap in the song.

For the wheel’s still in spin
And there’s no tellin’ who that it’s namin’

Vibe shifts eh. As previously discussed.


More posts tagged: poetry (5).


Mapping the landscape of gen-AI product user experience
19 July 2024 | 4:32 pm

I talk with a lot of clients and startups about their new generative AI-powered products.

One question is always: how should users use this? Or rather, how could they use this, because the best design patterns haven’t been invented yet? And what we want to do is to look at prior art. We can’t look at existing users because it’s a new product. So what UX challenges can we expect and how have others approached them?

The problem is that there are so many AI products. Everything overlaps and it’s all so noisy – which makes it hard to have a conversation about what kind of product you want to build.

So I’ve been working on mapping the landscape.

As a workshop tool, really.

You’ll recognise the map if you saw me speak at Future Frontend in Helsinki or UX London. I’ve also been testing this landscape recently with clients.

It’s a work in progress, but I think ready to share.

Let me show you…


A map of 1st generation AI products (c.2022)

To start, let’s look at the first generation of AI products that came out right after large language models got good enough (i.e. GPT-3) with a public API and sufficient market interest.

So we’re rewinding to around the time of the ChatGPT release in November 2022.

What are we looking at?

A large language model on its own isn’t enough to enable products. We need additional capabilities beyond the core LLM.

Different product archetypes rely on different capabilities to different extents. That gives us a way to tease apart the products into a landscape.

To my mind, there are three capabilities that really matter:

  • RAG/Large context. Being able to put more information into the prompt, either using retrieval augmented generation or large context windows. This allows for steering the generation.
  • Structured generation. When you can reliably output text as a specific format such as JSON, this enables interop and embedded AI, eventually leading to agents.
  • Real-time. Faster means interactive. Computers went through the same threshold once upon a time, going from batch processing to GUIs.

These aren’t purely technical capabilities. Sure, there’s an element of tuning the models for reliability in various ways. But mainly it’s know-how and software frameworks. RAG was invented in 2020; the ReAct paper (which built on chain-of-thought and led to agents) was published only in October 2022. It takes time for ideas to propagate.

I’ve used these capabilities as axes on a ternary diagram (I love a good triangle diagram).

Now we can plot the original, common gen-AI use cases… what product experiences do these capabilities allow?

  • Reliable large context windows led to products for automating copy and visual assets
  • Combine context and some structure: we’ve got semantic search
  • Combine context and real-time: there’s the “talk to a PDF” archetype, we see a lot of those
  • Structured generation opened up data extraction from unstructured data, like web scraping. It was a huge acceleration; here’s me from Feb 2023.
  • Pure real-time: we’ve got chat.

What this map is not is a prescriptive chart of all possible products. Rather, it’s a way of mapping what we already see emerging, as a way to orient and perhaps inspire thought.

I’m not thinking about games, and I’m not looking (much) at what’s happening in the AIUX prototyping space: I’m looking at where there’s a fit between product and market need.

So this is a map specifically about products and user experience. I don’t think there would be a 1:1 correspondance if we looking at underlying software frameworks, for example.


Today’s gen-AI product landscape

As products lean more or less on different capabilities, I think we see four broad areas of user experience.

Users relate to the AI in different ways:

  • Tools. Users control AI to generate something.
  • Copilots. The AI works alongside the user in an app in multiple ways.
  • Agents. The AI has some autonomy over how it approaches a task.
  • Chat. The user talks to the AI as a peer in real-time.

(Note that because I’m mapping out user experience, these are all to do with collaboration.)

Now let’s break this down.

I’ll give some examples to bring these archetypes to life.

Tools:

  • There are generative tools like InteriorAI though quickly we see a cluster of workflow products like Jasper being used for, say, marketing copy. The watchword here is dependibility and the products need non-AI features like team collaboration to succeed.
  • Get more real-time and the tools become more about individual use and move inline: some of Notion’s AI tools and Granola are both here, in different ways.
  • Highly real-time tools feel more like sculpting and are great for creative work. See Adobe Generative Fill and tldraw’s Make Real (the real breakthrough is the iteration). What will matter here is what high-level tools are designed; what’s the AI equivalent of the Photoshop palette?

Copilots:

Here we have apps that would work just as well without any AI involved, usually for working on a distinct document type.

GitHub Copilot is the breakthrough copilot product. Also see Sudowrite which has multiple ways to collaborate with you when you’re writing prose fiction.

Agents:

A broad church!

Pure structured generation gives you data extraction from fuzzy data, like web scraping or looking at PDFs. But then you have function calling (tool use) and agents…

  • Small agents can be highly reliable and work more like tools, such as micro-agent for writing code.
  • Give contained agents more access to context - and integrations - and the product archetype is that they’re presented as virtual employees, like Lindy. End-user programmability is fascinating here. Look at how Lindy allows for a Zendesk support bot to be programmed in natural language: "If the customer seems very angry, just escalate to me."
  • Move in the real-time direction: agents become UI. This is how new Siri in Apple Intelligence is presented (see Lares, my smart home assistant prototype, for another example). You aren’t going to chat with these AIs, they’re super smart button pushers.
  • Even more in that direction, we get malleable interfaces. LangView (video) is a good example in prototype form; WebSim is the same as an open world code sandbox; Claude Artifacts brings micro-apps to regular users.

Chat:

  • Purely reliant on the real-time capability is chat. The product archetype that is working here is character chat like character.ai – easy to dismiss as “virtual girlfriends,” it’s incredibly popular.
  • Assistants: I make a distinction between “agents” (can use tools) and “assistants” (tools plus it presents itself as a general purpose helper). ChatGPT is an assistant, as is Google Gemini. I’d probably also put Perplexity somewhere around here. They all want to be the user’s point of first intent, competing with traditional search engines.
  • Overlapping with copilots now, and highly real-time: NPCs (non-player characters), when the AI acts like a human user. See AI Sidekicks from Miro, just released, and my own NPC work from last year.

(I have a ton of examples in my notes that I use as references.)


What do we learn?

Looking at this landscape, I’m able to see different UX challenges:

  • With generative tools, it’s about reliability and connecting to existing workflows. Live tools are about having the right high-level “brushes,”” being able to explore latent space, and finding the balance between steering and helpful hallucination.
  • With copilots, it’s about integrating the AI into apps that already work, acknowledging the different phases of work. Also helping the user make use of all the functionality… which might mean clear names for things in menus, or it might mean ways for the AI to be proactive.
  • Agents are about interacting with long-running processes: directing them, having visibility over them, correcting them, and trusting them.
  • Chat has an affordances problem. As Simon Willison says, "tools like ChatGPT reward power users."

The affordances problem is more general, of course. I liked Willison’s analogy here:

It’s like Excel: getting started with it is easy enough, but truly understanding it’s strengths and weaknesses and how to most effectively apply it takes years of accumulated experience.

Which is not necessarily the worst thing in the world! But just as there are startups which are essentially an Excel sheet with a good UI and a bunch of integration and workflow, and that’s how value is unlocked, because of the Excel affordances problem, we may see a proliferation of AI products that perform very similar functions only in different contexts.


How am I using this map?

I’ve been using this map to help think around various AI products and how we might interact with them.

One process to do that is:

  • What kind of product are we making? Locate it on the landscape
  • See what others products in this area are doing.

That is, it’s a way of focusing a collection of references in order to have a productive conversation.

But equally another process is:

  • Think about what we’re trying to achieve
  • Now imagine it as a tool, now a live tool, now a copilot, now an agent…

Generative!

It doesn’t help so much for inventing brand new ways of interacting. That’s why I hang out with and pay a ton of attention to the amazing and vibrant London coding scene. And that’s why I believe in acts not facts and rolling my sleeves up.

So it’s not a tool that gives me answers, it’s not that kind of map.

But it helps me communicate, and it’s a decent lens, and it’s a helpful framework in a workshop context.

Scaffolding for the imagination.


More posts tagged: gpt-3 (31).



More News from this Feed See Full Web Site