Playback speed
undefinedx
Share post
Share post at current time
0:00
/
0:00

Reducing JTBD Success Statement Quantities - v2.0

Yes, AI can actually do it better than humans
Transcript

No transcript...

My name is Mike, and my goal is to help all of you achieve your innovation research outcomes faster, less expensively, with less bias, and fewer assumptions. The obvious next step is giving you a way to sort through strategic options so you can accelerate into experiments quickly, cheaply, easily and productively. This blog documents many of the thought exercises I go through as I continually strive to take existing knowledge and emerging technology and blend them together to address our unmet needs.

If you want a deeper dive, here are a few options:

Note: Paid subscribers get this a week early. Everyone else will get an email when it releases to the world.

For the “Too Long; Didn’t Read” Folks 🤫

My approach to using AI - in some cases - requires that we generate a quantity of outputs that is not necessarily conducive to survey construction. It just makes them too long.

Not too long ago I published an article and video (Blogcast) that demonstrated an approach you can use - after the fact - to shrink those lists down to a level that you can actually use.

The approach requires a review of other factors in the model - such as situational factors and contexts, in combination with stakeholder collaboration - to filter the list down to only those metrics that are relevant to a subset of those factors. It explained the reasoning for both inclusion and exclusion in the output. Still useful, but …

There is now a better way. In order to emulate what really happens when humans synthesize a large set of metrics, I decided to

theme metrics in a way that provides the same level of coverage on a concept level while reducing the number of metrics.

For example, if you’ve been outputting twenty metrics to provide yourself a robust set to consider, this would allow you to output 5 instead. These five would essentially be themes of the larger set.

Why do this?

Well, I’ve gotten a lot of feedback from people using my prompts that their stakeholders are telling them that all twenty metrics they output for each step are good, and they don’t want to exclude any of them in a survey 😉.

Unfortunately, that’s a problem.

So, let’s take a deeper dive… 👇🏻


The Current State of AI Modeling

Trust and the Large Content Sets

Those of your who have accessed my Masterclass on JTBD prompting know that I tackled the problem we sometimes face when experienced practitioners challenge us to replicate their maps.

The challenge always seems to include a simple job statement; but has a highly contextualized job map, many times not beginning at the traditional first stage [DEFINE]. 🤡 I also faced scenarios where some maps were ten steps, while others approached twenty (sometimes 30).

This is what happens when people begin to lose their edge, and their egos begin to overshadow the instructions they repeat to others on a rote basis. “I’m good enough to take these risks”, they say.

I dealt with this demand for cognitive dissonance by adding some parameters to my Job Mapping prompt that allowed me to impose explicit control over the start and end points of a job map. This operated within the bounds of another parameter which controlled the detail of job map. So, no matter where it is told to start or end, it would produce a ~10 step map or an ~18 step map, as well as a middle level map.

Was I able to perfectly reproduce the bias of a human? No, not perfectly, because after all no one can be as good as a perfect human. But given that I can produce my map in about 1.5 seconds, I’m pretty sure I win on the grounds of faster and more cheaply.

If you’re willing to pay for a hand-crafted, and hand assembled Ferrari that elevates social and emotional jobs over functional jobs, maybe you’ll shoot for the hand-crafted version of JTBD research modeling. But…

How many Ferraris can they build in a year? How many problems do you need to solve in a year?

Still, I’ll hold my models up against anything I’ve seen done by humans. They are more than good enough, and in most cases, far better.

Using other content + humans to improve the situation

Now, I had to deal with another challenge in that humans hand-craft their success metrics not based on what they hear, but based a quality assurance choke point with people who had no involvement with the interviews at all. The end result is often completely different than the research team came up with during the course of their interviews.

The focus is almost always on reducing the set, because the goal is to fit them into a survey that people can actually complete.

Initially, I lacked confidence in two areas. First, I didn’t trust the AI to prioritize success metrics for inclusion. Second, I didn’t trust my skills with prompting yet. So, I simply added an input parameter that allowed me to spit-out as few, or as many as I desired.

I figured that I could evaluate them myself for inclusion in a survey. Of course, this added time back into the equation, which is what I was trying to eliminate. So, I needed to address my prompting skills and see if I could work my way through this problem.

The Complication of JTBD Modeling with AI

Using AI for Synthesis

My first stab at this resulted in a blog post that you may have read that was about filtering down a set of metrics using ChatGPT.

What I described here leveraged Generative AI to help me reduce my set of metrics from the default of twenty, to a smaller number. The approach also provided validation for both exclusion and inclusion so we could justify the result to others (like stakeholders).

It was a simple process of identifying three things:

  1. What are the key research questions?

  2. What are the most important contexts to consider?

  3. What are the most common situational factors that we know about?

If I do say so myself, for about 30 minutes of prompt-engineering and testing, the result was pretty darn good. However, I still have a lot of work to do with my on screen presence 🤣

Getting Humans out of the equation

This approach definitely accelerated a purely human approach to synthesis. It eliminated metrics that weren’t deemed relevant to input factors, which would take humans far more time to replicate.

The problem was this; while the human process definitely goes off the reservation occasionally, the general goal is to theme-up an abundance of success metrics into metrics that encapsulate the meaning of two or more metrics.

This is only time when authoritative figures deem it acceptable to append examples to the end of a metric.

In the ODI world, the authoritative rule is that the perfect desired outcome statements shall not have examples! Sadly, this rule is commonly broken.

So if a human can theme-up metrics, why can’t Generative AI?

The Resolution

Theming - can it be done?

The current interfaces we have with large languages models (LLMs) have some serious limitations. Most people that write one or two sentences (and then bash AI) probably won’t see these. But people like myself who spent over 20 years writing code tend to test the boundaries.

A chat session with Open AI (and others) does not allow you to store variables in memory. In a theming scenario, you need to establish a benchmark to theme-up from, or chunk-down from. So, step one is to establish the benchmark.

Unfortunately, there is nowhere to store it, or reference it.

When I tried to set and store variables I would also instruct the LLM to be ready to back up its output with an explanation of how it themed-up metrics, step-by-step. It would gloriously do this. But, when I asked i to output the benchmark set so I can examine it, I was told essentially …

So, I had a problem. Upon further probing, this is when I determined that it couldn’t store or reference constants or variables dynamically.

What should I try next?

I did know that a chat thread could be used as memory. From what I’ve read (and this was about the Anthropic solution, specifically) for every additional message, the entire thread is re-read (which is why it can slow down as the thread gets longer).

Therefore, the logical next test was to tell the model to generate a benchmark set and output it (I default mine to twenty, but this is a parameter I can change at run-time). Then once that is outputted, take the next parameter, which is my target output number, and output those.

During my testing, I also asked it to explain how it themed the benchmark set into the target set.

Frankly, I was pretty amazed at how it worked.

The end result is a prompt that will give you the number of success metrics that you desire, or that help you to build a set that fits within your survey constraints. Let’s say you want a maximum of sixty metrics across a ten step job map. You would simply ask the model to generate 6 success metrics per step.

Based on my initial experimentation, it’s quite successfully theming-up my benchmark set to the target number I’m looking for. So, I’m getting complete coverage based on a larger benchmark without having to field a survey with 3-4 times as many success metrics.

Caveat: I have to output the benchmark set, which takes time and clutters things. The solution will ultimately be a prompt-chaining approach where I can store outputs in memory. This also makes it possible to avoid all the cutting and pasting because at the end of the process I can simply construct a sequential output by taking the relevant variables I’ve stored and outputting them in the order that I want.

I can also then determine what format the output should be (markdown, JSON, etc.) and send it to a database or platform of my choosing via an API.

Demonstrating theming language (without exposing the prompt)

While I won’t disclose my entire prompt (because there are many people who have paid good money to access them at my Masterclass) I do think it might help to break down the basic approach I used for theming. I will definitely be refactoring things once I am 100% convinced it works, but experimentation is messy!

Lest you think I’m always disorganized 👇🏻

Let me walk you through some things that I added to my 13k-plus character prompt that helped me to get the result I was looking for. Part I …

Establishing the Benchmark Metric Set

In the context of this conversation, establish a persistent variable called “benchmark” whose content I will refer to by wrapping the variable name in double curly brackets.

Generate an initial benchmark set of {{bn}} success statements per the specifications and assign them to the variable “benchmark”

This set will serve as the foundation for all subsequent theming or detailing of success metrics you output.

If the request is to produce (’n’) that is different than (’bn’), use {{benchmark}} as the baseline to either theme-up (for fewer statements) or break down into more detailed and specific statements (for more statements)

Ensure that the re-themed or detailed statements are based on the initial benchmark set {{benchmark}}, maintaining a direct connection to the original specifications

ALWAYS ADHERE TO THIS RULE: When theming-up, instead of grouping the verbs from multiple benchmarks, I would like you create a net new metric with a single verb. You may use the example space to express the rolled up verbs or concepts.

Always be prepared, if subsequently asked, to validate the set of (’n’) that you output. That includes outputting {{benchmark}}, if asked.

Then in another location of the prompt…

After you store the initial list of {{bn}} success metrics in the variable (’benchmark’), then for the job step, generate a list of {{n}} success statements related to a(n) {{end user}} trying to {{step}} using the theming instructions mentioned earlier. Think step-by-step. You will be provided one step at a time. The next step will be in a subsequent message, you do not need to ask for it.

I know what you’re thinking! “He said that you couldn’t store outputs in variables! 🤬🤬”

You are correct! This was my initial pass at things. However, instead of throwing it all out, I simply added this following instruction (I did say I would refactor this later).

IF {{outputbn}} = 'Yes' THEN Output the contents of the {{benchmark}}

Then explain step-by-step, your approach and logic to theming up, or breaking down, the list of {{n}} success statements based on the (’Benchmark’) set END IF

That first input parameter is a switch I added to output the benchmark set. Remember when I said that it goes back and re-reads what it outputs? That is why I did this.

And it worked.

While it’s possible that I could instruct the LLM to theme up a set based on it’s collective analysis there is no way to compare that to anything. You just have to trust it. I don’t … yet. So, until that time this is what we’ve got.

In Closing

I have spent about as much time inside chat interfaces as I intend to. I’m looking forward to beginning the automation of these processes 100% (in fact, it’s a work-in-progress). That will include the following:

  1. Creating an agent to walk you through the construction of a model (with the ability to re-scope dynamically).

  2. Taking model outputs and constructing survey instruments

  3. Taking survey instruments and injecting them into various survey platforms

  4. Possibly creating an agent swarm that recruits respondents at low, or no cost, from a variety of sources a human simply couldn’t process

  5. Taking the outputs from a survey and building data models inside business intelligence platforms

  6. Analyzing the data model “Six ways to Sunday” in a matter of minutes (or seconds) - like no human can.

  7. Constructing an answer engine with core output structures and intent-based query capabilities that allow you to interact with forward-looking research data in new, dynamic, and potentially unlimited ways (including comparisons to continually updated behavioral data from other systems for customer journey research 🤫)

I know some of you will continue to value bespoke, hand-crafted models and analyses that take 4-6 months to produce. Especially those of you who provide that service.

I’m not one of those people. 😉


End Matters

If you'd like to learn more...

  1. I do offer end-to-end consulting if you’re just not ready to do it all your own. I’m 20x faster and at least 10x cheaper than your alternatives. Big Brands: This means you can get many more problems solved with your existing budget (I work with a global team of experienced practitioners)

  2. I also offer coaching, if you’d like to know someone’s got your back and you want to do the heavy lifting and get some knowledge transfer, I'm there!

  3. I can help you get your qualitative research done in 2 days for mere budget scraps.

  4. I’ve also got an academy where you can find a number of options for a do-it-yourself experience. This portfolio of AI prompts eliminates the pain of learning how to perform proper qualitative JTBD research.

  5. Finally, I've recently opened up a JTBD community that is completely FREE! It's still early days and it's where I work with clients - in private spaces - and where I hang out to answer questions or just blather on. I hope you'll join us! There will be more and more free stuff, and there will also be some premium stuff eventually. I wonder what that will be?

0 Comments
Practical Innovation w/ Jobs-to-be-Done
Practical Innovation w/ Jobs-to-be-Done
Mike Boysen shares insights into the evolution of Jobs-to-be-Done, especially in the age of Generative AI. He makes the previously secret process more accessible new approaches and automated tools that vastly reduce the time, effort, and cost of doing what the large enterprises have been investing in for years. This will be especially interesting for the earlier stage, smaller enterprises, and those investing in them who have always had to rely on a superstar, or guess (or maybe that's the same thing!). So...check it out!