Can ChatGPT Answer Conjoint Analysis Questions to Replace Real Respondents? - Articles

Articles

10May

Can ChatGPT Answer Conjoint Analysis Questions to Replace Real Respondents?

Administrator | 10 May, 2024 | Return|

By Bryan Orme, President, Sawtooth Software
Sawtooth Software is a privately held Provo, UT company that provides DIY tools for conducting market research surveys, especially those involving conjoint analysis and MaxDiff.

AI can do amazing things, but can it act like a human to answer conjoint analysis questions? If it can, would such answers be representative of real consumer preferences? If so, can we use AI instead of surveying real consumers...and trust the results?

The science fiction of yesterday can become today’s reality. But, when it comes to the critical second question posed above, is what ChatGPT seems to do just “Weird Science”?

At Sawtooth Software’s Analytics & Insights Summit (May 1-3, 2024, San Antonio, TX, USA), Chris Moore, Cameron Stronge, and Manjula Bhudiya (all of Ipsos) addressed whether AI can answer conjoint questions like respondents would. They conducted an impressive amount of work involving 250,000 AI-generated respondent records comparing AI-generated conjoint responses to real human responses for recent conjoint studies Ipsos conducted. Also, a team from Microsoft (Dan Penney, Marianne Moran Peterson, and Jimbo Brand) presented related research not only regarding whether AI can realistically answer conjoint questions, but MaxDiff, willingness to pay, and "counts" questions about volume of installations/usage at businesses.

Prior to seeing the findings from the Ipsos and Microsoft teams, out of curiosity, I decided to take a quick stab at using the free version of ChatGPT v3.5 to see if it could answer conjoint analysis questions. This tinkering of mine couldn't answer the critical issues raised above, but ChatGPT's responses were at first glance surprising (at least to me)!

My first attempt to ask ChatGPT to choose among three electric vehicles that differed on brand, range per charge, warranty, and price had it declining to venture an opinion, giving the noncommittal (but wise) “it depends on consumer preferences” response. It "knew" I had asked it a complex question that was probably beyond its ability to do well. So, I repeated the question, imploring ChatGPT to give its best guess, which you can see below it did for me…along with providing detailed reasoning regarding how it resolved the tradeoffs to predict what a typical consumer might do.

In forcing AI to do for me what it initially balked at doing, I'm reminded of the cynical phrase, "torturing the data until it confesses".

My Prompt:
Which of the following electric vehicle options would a consumer be most likely to prefer? Give your best guess.

• Option 1: Cost: $30,000, Brand: Ford, Range: 125 miles per charge, Warranty: 2 years
• Option 2: Cost: $40,000, Brand: Tesla, Range: 200 miles per charge, Warranty: None
• Option 3: Cost: $50,000, Brand: Tesla, Range: 250 miles per charge, Warranty: 2 years

ChatGPT’s Response:
Based on various consumer preferences and trends in the electric vehicle market, my best guess would be that Option 2 would be most likely preferred by consumers. Here's the breakdown:

Option 1:

• Cost: $30,000
• Brand: Ford
• Range: 125 miles per charge
• Warranty: 2 years

Option 2 (Most Likely Preference):

• Cost: $40,000
• Brand: Tesla
• Range: 200 miles per charge
• Warranty: None

Option 3:

• Cost: $50,000
• Brand: Tesla
• Range: 250 miles per charge
• Warranty: 2 years

Reasoning:

  1. Brand Reputation: Tesla has established itself as a leader in the electric vehicle market, known for its innovative technology and high-performance vehicles. The Tesla brand carries a strong reputation for quality and cutting-edge features, which may attract consumers despite the higher price point.
  2. Range vs. Cost Trade-off: Option 2 offers a balance between range and cost. While Option 3 provides a higher range of 250 miles per charge, it comes with a significantly higher price tag of $50,000. Option 2 provides a respectable 200-mile range per charge at a more competitive price point of $40,000.
  3. Preference for Range: While Option 1 may appeal to budget-conscious consumers, the 125-mile range per charge might be seen as limiting for many drivers, especially considering the growing availability of electric vehicles with longer ranges. Option 3 offers the highest range of 250 miles per charge but at a premium price. Option 2 strikes a balance by offering a reasonable range of 200 miles per charge at a more accessible price point.
  4. Warranty Consideration: Although Option 1 and Option 3 offer warranties, the absence of a warranty in Option 2 may be outweighed by consumers' preference for the Tesla brand and the perceived value of its vehicles' advanced features and technology.

Overall, Option 2, the Tesla electric car priced at $40,000 with a 200-mile range per charge and no warranty, is likely to be the most preferred choice among consumers.

Impressive! It parsed the complex question and gave a detailed and reasoned response that discussed the tradeoffs among the three attributes, leading to an evaluation that the second option would likely be preferred by consumers. It accessed enough information about the current state of electric vehicles and typical trips driven by consumers to say that the 200 miles per charge was "respectable," and paying an extra $10,000 to extend the range to 250 miles (while also getting the 2 year warranty) was not worth the upgrade.

Next, I repeated the question for the same three vehicle EV scenario, but I didn’t just ask it for a discrete choice; I asked it to allocate probabilities of choice summing to 100%. Would ChatGPT be able to do it?

My Prompt:
Which of the following electric vehicle options would a consumer be most likely to prefer? Give your best guess, and please tell me the probabilities of consumer choice across the three options summing to 100%. 

• Option 1: Cost: $30,000 Brand: Ford Range: 125 miles per charge Warranty: 2 years
• Option 2: Cost: $40,000 Brand: Tesla Range: 200 miles per charge Warranty: None
• Option 3: Cost: $50,000 Brand: Tesla Range: 250 miles per charge Warranty: 2 years

ChatGPT's Response (summarized)
Along with its long explanation of how it considered the tradeoffs, ChatGPT replied that the probabilities of choice would be 30%, 40%, and 30% for the three options. Again, Option 2 most likely preferred, but now with estimated probabilities of choice. Interesting! Though, suspiciously uncertain...40% vs. 30%.

I next changed the question to ask ChatGPT to consider a specific type of consumer making the choice.

My Prompt:
Which of the following electric vehicle options would a consumer be most likely to prefer? Assume the consumer is a 30-year-old female who makes an income of $60,000 annually. Give your best guess, and please tell me the probabilities of consumer choice across the three options summing to 100%. 

• Option 1: Cost: $30,000 Brand: Ford Range: 125 miles per charge Warranty: 2 years
• Option 2: Cost: $40,000 Brand: Tesla Range: 200 miles per charge Warranty: None
• Option 3: Cost: $50,000 Brand: Tesla Range: 250 miles per charge Warranty: 2 years

ChatGPT Shifted Its Prediction Considering the 30-Year-Old Female!
Based on my updated query, ChatGPT changed its probabilities for the three options to 40%, 30%, and 30%. Option 1 (the $30K Ford) was now predicted to be the most likely choice. Furthermore, it told me why it changed its prediction to favor option 1 based on the demographics of the consumer I gave it. See its detailed response, below:

ChatGPT’s Response:
To tailor the analysis for a 30-year-old female with a $60,000 annual income, we should consider factors that might be particularly relevant to her demographic. These could include budget constraints, lifestyle preferences, and the perceived value of the vehicle features.

Given the options provided: 

Option 1: Ford - $30,000, 125 miles range, 2-year warranty Option 2: Tesla - $40,000, 200 miles range, no warranty mentioned Option 3: Tesla - $50,000, 250 miles range, 2-year warranty

Here's a revised analysis considering these factors:

  1. Cost: A $60,000 annual income might make the $30,000 price point of Option 1 attractive, especially if she wants to allocate her income conservatively or has other financial obligations.
  2. Brand: While Tesla is known for its brand appeal and technological innovation, Ford might still hold appeal for those who value traditional brands or have loyalty to American-made vehicles.
  3. Range: Considering she might have an average daily commute and possibly occasional longer trips, the range might be a significant factor. Option 3 offers the highest range, which could be appealing for someone who values longer travel without frequent charging stops.
  4. Warranty: A warranty provides peace of mind, especially for someone who may not have the means for unexpected repairs. Both Options 1 and 3 offer warranties, which might weigh into her decision.

Given these considerations, here are the estimated probabilities:

• Option 1 (Ford): The lower cost and warranty might be particularly appealing given her income level and potentially cautious financial approach. Probability: 40%
• Option 2 (Tesla): While Tesla offers brand appeal and moderate range, the lack of a warranty could be a deterrent, especially for someone who might prioritize reliability. Probability: 30%
• Option 3 (Tesla): This option offers the highest range and a warranty, which could be attractive for someone who values both range and peace of mind. Probability: 30%

These probabilities sum up to 100%.

Again, individual preferences can vary greatly, but these estimates are based on the demographic information provided.

Wow, Spooky!
ChatGPT seems reasoned and well organized in its reply. It follows that if we could programmatically automate (say, through API) asking blocks of CBC questions to ChatGPT for different representative demographic descriptions of consumers, ChatGPT could rapidly generate 100s or 1000s of pseudo-respondent records. We could analyze the CBC data as we normally would, using HB or Mixed Logit, and build a market simulator for EV choice that included filters and weights on demographics we used in the prompts such as for gender, age, income, etc.

Deliver the simulator to the client, and voila! Insta-data and no data collection costs! All with the perfection and magical mind-reading abilities of AI! (OK, everything I’ve said in this paragraph is dripping with sarcasm and skepticism).

Frankly, I'm very impressed that ChatGPT can parse such a complicated question and give what at face value seems like reasonable answers. To its credit, ChatGPT first declined to state which EV a consumer would prefer, replying that it would depend on varying consumer preferences. It's only when I forced it to make its best guess that it ventured to tell me which conjoint profile it thought consumers would prefer. When I asked it to project a choice for a 30-year old female consumer, it followed its prediction of choice with the warning, "individual preferences can vary greatly, but these estimates are based on the demographic information provided." Well played!

What the Ipsos and Microsoft Teams Found
As I mentioned earlier, teams from Ipsos and Microsoft presented research at Sawtooth Software's recent conference on whether AI can answer conjoint questions like humans do, potentially replacing or supplementing human respondents. These teams fine-tuned

the prompts to feed to AI and automated the submission of conjoint questions and collection of clean numeric choice data (sans the wordy AI response) via API calls.

What did they find? For well-known product categories in markets (e.g., countries) well-represented by the information available on the internet, AI respondents directionally give similar predictions as human respondents...usually. Sometimes, it doesn't get the preference order right for levels within attributes and concepts within simulated markets. But for less-known product categories and markets not well represented by information on the web, AI is going to be lost.

Mean preferences and mean estimates for well-known products and well-covered markets (across the internet) seem reasonable in most cases. But, the variance of the estimates representing the diverse tastes of respondents is vastly understated. The prompts Ipsos and Microsoft teams gave AI to differentiate among customer groups didn't lead to nearly the degree of differentiation in tastes observed in real human respondents. Furthermore, the "temperature setting," which is meant to add more random variability to AI's responses was largely inconsequential.

The Ipsos team found that ChatGPT v4 performed better than v3.5. Also, that all AI systems tested understated the None relative to humans. Retrieval-Augmented Generation--RAG (feeding conjoint utilities and sample real respondent choices to scenarios) to train the AI led to much improved predictions of human choices to conjoint questions. But, of course we'd expect it would! One might ask: if we already know humans' responses, why would we need to ask AI to generate them? That said, RAG is a topic for future investigation into augmenting conjoint data sets with additional AI-generated conjoint records...would it work better than traditional imputation methods?

We can conclude that as of now the AI machines are not ready to displace human respondents for conjoint analysis studies. AI is an able, willing, and tireless assistant to the human researcher, but it isn't ready to take over conjoint analysis execution and data generation/collection any time soon.

What AI Can and Cannot Do
It's preposterous to think that ChatGPT's responses can be as accurate in predicting real consumer choice as human responses for less well-known products in niche marketplaces. Chris Moore from Ipsos said in a recent email to me that AI might do an OK job representing human preferences for conjoint studies involving EVs in the USA, but what about light bulb preferences in Estonia?

I believe that AI can do certain marketing research tasks well and obviously much faster than humans, such as summarizing main topics from 100s of pages of unstructured open-end text. But, as for it being able to generate choices to conjoint questions to accurately represent the heterogeneity of tastes of humans, with their differential weights of attributes, and preferences for attribute levels, that's too big of an ask.

Could AI still be useful for conjoint analysis studies? Perhaps AI's responses to CBC tasks could provide good enough prior information for use with some experimental design algorithms that take priors into account? Perhaps preferences expressed by AI on behalf of consumers could serve as priors in HB utility estimation? These are some possibilities that would be interesting to investigate.

Technology evolves and AI improves. ChatGPT v4 already does better than v3.5, according to the Ipsos researchers, for answering conjoint questions. So, the story isn't over. As one famous AI terminator once pronounced, "I'll be back!"

Poetic Obituary (circa 2024)
No doubt, AI is very good at taking directions to write sensible new content. Based on what we saw at Sawtooth Software's 2024 A&I Summit, I coached ChatGPT to write me this poetic obituary (for now) in the style of a limerick:

In the digital realm, AI did strive,

To unravel conjoint, it did contrive.
But alas, it fell short,
In computational fort,
Human nuances it couldn't revive.

With data and algorithms in hand,
It ventured where humans do stand.
Yet the subtleties deep,
In choices we keep,
Confounded the AI's command.

It pondered and reasoned with might,
But couldn't grasp human insight.
Conjoint's complex dance,
Of preference and chance,
Remained beyond its digital sight.

So, while AI may excel in its quest,
Conjoint analysis remains human's best,
For in the dance of choice,
And the human voice,
AI falters, when put to the test.

About the Author

Related

Not any article
Members only Article - Please login to view