🤖 + 🍺 Bots and Beer 2x11 - Conversation-Driven Development (CDD)

by Michael Szul on Mon Aug 10 2020 15:55:00

The Bots + Beer newsletter ran for about 3 years between 2017-2020 during a time when I was highly involved in chatbot and artificial intelligence development. It was eventually folded into Codepunk as part of the Codepunk newsletter.

Some portions of this newsletter were contributed by Bill Ahern.

Conversation-Driven Development

The origin of this newsletter was that it was going to be about chatbots and craft beer. We ran out of chatbot content by about the fourth or fifth issue, but the newsletter evolved into a narrative, thematic newsletter about the future of computing: This life in the new cyberia that we constantly talk about on Codepunk.

Well I have good news: This issue is actually about chatbots.

More specifically, we're going to take a look at the concept of conversation-driven development (CDD)—a term coined by Rasa CTO Alan Nichol recently. Nichol is no stranger to philosophical concepts in conversational software. Some of his original ideas centered on the 5 Levels of Conversational Artificial Intelligence, which was a tool for viewing the stages of conversational AI through both the user and developer perspectives. For end users, Nichol's stages moved from the lowest level of an end-user doing all the work his or herself, all the way to an adaptive conversational component that gives the level of advanced detail the end user requires. From the developer perspective we move from the programmer handling everything, all the way to an automated CDD approach to conversational software.

If you have time, it's worth checking out Nichol's updated levels from the recent L3-AI virtual conference.

I bring this up because Nichol's has started to get laser-focused on the developer approach at level 3. At the L3-AI conference, Nichol specifically updated the developer levels to include CDD. He says:

At level 3, we start to accommodate that users don't think about problems the same way that developers do, and that not every message can be neatly classified into an intent. As developers, we love splitting larger problems up into separate components. But to achieve fluid conversation, we have to accommodate that users don't respect the boundaries we draw.

That statement has struck me as explicitly in alignment with my own thinking and why I've chosen to approach conversational software from a greater social science perspective with conversation analysis. There is a noted discrepancy between human behavior and software development, and this is where philosophies like conversation analysis and conversation-driven development are needed.

CDD has 6 clear principles:

Share
Review
Annotate
Test
Track
Fix

Although Nichol numbers these in his blog post, I'm going to leave them as bulleted, because the truth is that this is non-linear (and his blog post says as much). Although you'll obviously start at sharing your conversational context, it's not a 6-step process and you're done. You're going to move up and down (or left and right) on this ladder of principles as you follow a process of continuous improvement, like any Lean DevOps process.

With CDD, the idea is that we share our conversational designs and prototypes early and often with end users in order to be clear on how they would interact. Users are different from developers. In my conversation analysis presentation for the Global 2020 Summer AI/ML Fest (yes, really long, awkward conference name), I mangled a version of a joke about programmers and testers that Nichol later tracked down. The original joke may have come from Brenan Keller:

A QA engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd. First real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone.

Users are different when it comes to how they utilize software—mostly in ways that were never intended. This is why user experience and usability analysis is important. We have jobs starting to pop up in the field of conversational design, but what does that even mean? The reality is that we need to meet real users and run our scripts by them. How will they really interact? How will they talk?

Back to CDD, we also want to review the conversations that users have with our software (once built). Every major chatbot or NLU software company has some of these metrics. For example, with Microsoft's LUIS, you can easily see which intents are being routed correctly, which are hitting the special "None" intent, and what's falling through the cracks. But it's more than just tracking which intent a conversation falls into: Rather reviewing these conversations can enhance each aspect of the design and development process from beginning to end.

To improve on those metrics, we need to be sure to annotate our models based on real conversations and feedback. In fact, the CDD approach wants to put a threshold at about 10%—no more than 10% of your models data should be self or team-generated. This ensures that your production data is from real conversations, and will make that data more relevant.

Testing is obviously another important component in CDD, but testing is a little different than in traditional software development because the best tests aren’t the unit tests and the code coverage, but instead, end-to-end full conversations. This is more of integration testing as you will need to develop testing pathways from wake word to conclusion—whether that's a transaction or a human hand-off.

In traditional software development, if programmers write effective unit tests for software based on task requirements, and code coverage is sitting at around 80%, functionally, the application is likely to be uncoupled and sound, leaving quality assurance to focus on feature testing, and enabling more exploratory processes.

But conversational software isn’t as simple as a handful of unit tests. An application can be functionally sound, but that means little in the context of behavior and conversation. This puts a greater burden on exploratory testing, but we can limit that strain by designing conversational scripts based on real user feedback (and usage) that we can run through an automated pipeline to test each resulting turn.

Both review and track in the CDD framework are looking at the process of conversation throughout your software. With &quot review," we focus on the conversation being had, but with "track," we want to examine the resulting action upon completion. This is essentially tracking lead generation. If the chatbot is meant to complete a purchase, did that happen? Are we closing the deal?

Finally, and this goes without saying, fix your work. And you can do this with successful and unsuccessful conversations alike. Successful conversations can be adapted into end-to-end tests, and they help to inform successful patterns in conversation and usage. Failed conversations, meanwhile, give you insight into conversational and behavioral changes, as well as defects in your own code.

What is the future of CDD? For one, you can join the LinkedIn group that Rasa put together to voice your own opinions. CDD gets me excited because it fits my own exploration of conversation beyond code. Most of my recent research that been centered on conversation analysis (as you can tell by my recent talks), and my code explorations have been in prototyping a chatbot framework from scratch using these principles.

With CDD, I think this is a good stepping stone—a good start—to establishing an approach to conversational software that includes multi-disciplinary teams, understands the interconnectedness of conversational design, and forces us to consider human behavior and human-centered approaches to technology.

I'm excited to see where it leads.

Southern Tier Orange Creamsicle Milkshake IPA

Yes, read that again: Orange Creamsicle Milkshake IPA. I was repulsed and intrigued at the same time, so of course I bought some. They come in a 4-pack of pints, and I bought them when my parents came to visit. My mother drank 3 of the 4…

The beer sounds sweet, but as we know IPAs are bitter. The good news is that the bitterness here was subtle. You still taste that hoppy IPA flavor, but it was muted to not be as bitter. Meanwhile, the orange creamsicle came in at the end as an aftertaste. It was like orange cream soda, but not real sweet.

Overall, this turned out to be a very flavorful and balanced beer. Everything in moderation.

Credits

Header photo of "Conversations" by Steve McClanahan.

▲hckr.fyi // thoughts

Southern Tier Orange Creamsicle Milkshake IPA

Credits