10

Since it is pretty evident that users, companies and snake-oil vendors seems to think that ChatGPT is a tool for generating meaningful answers I was wondering if the site could use a fake, canonical question that can be linked to whenever someone will fall in that misconception.

I was thinking about something like this:

Question

I have tried to use ChatGPT as a tool to help me finding solutions to common issue like helping with coding, providing information about historical evens, solving math problems and so on.

I have noticed that often ChatGPT does not provide real, usable answers but will instead hallucinate facts, events, and so on. This is more evident when the user asks a confirmation of a wrong premise.

A chat bot agreeing with the user in claiming that Sonic was made by Shigeru Miyamoto and later sold to Sega in exchange for Link. That is not true.

Not only the bot agreed on a false premise, the answer also contains contradictions. It agreed that Nintendo bought Link from Sega while at the same time saying that "Sega acquired the rights to both Sonic and Link". The message is grammatically correct but the actual meaning is not.


Answer

This is due to a common misconception about what Large Language Model like ChatGPT do.

Sadly, ChatGPT has been often presented as an intelligent "AI-Assistant", help desk tool but this is in a way far from truth: all ChatGPT tries to do is to generate a sequence of word base on some score rules and the data it was trained against.

You can read a quite accurate yet very accessible article about the inner workings of ChatGPT written by Stephen Wolfram here.

An extreme summary of said article (not very accurate but hopefully enough for this short explanation) is that given a sequence of words ChatGPT tries to calculate what next word has the best "score".

The color of this apple is ...

Word Score
red very high
pink low
blue low
green very high
dog very low

Again, this is an oversimplification, but please bear with me.

How are those score calculated and what do they represent? While the actual math is kinda complex, the purpose is actually quite simple. The model uses the dataset it was trained on to give each possible next piece of the message a score that represents how "likely" that continuation is.

You could see this as a sort of probability that word or phrase has to follow your previous message. It should be intuitive to assume that if the training data is made up by meaningful, correct, not made up text most of the time a phrase like "The color of this apple is" will continue with words like "green" or "red" and not "blue" or "dog".

At this point is should be clear WHY ChatGPT is capable of parroting meaningful info even without understanding the semantic meaning of the words.
Ask yourself this: if your phrase so far is

The year Columbus discovered America is ...

Word Score
1492 ?
dog ?
Fluttershy ?
London ?

What word would you expect to have the best score? How likely would you consider the phrase to continue with "Fluttershy? Do you really expect the training data to contain "The year Columbus discovered America is London"?

As soon as you realize this, ChatGPT limitations should become evident.

As the training set grows it becomes more and more likely that if your question is simple enough ChatGPT will decide to parrot some actually relevant text it "knows about" and provide you with an useful answer.

Yet at the same time you should realize that this does not imply anywhere that ChatGPT understands what it is generating.

Before, we made an example asking what do you expect to be the better scoring option for continuing the message "The year Columbus discovered America is ...".
Now... let me asking something a little different: what would you expect to be the better scoring option for continuing a message "The year before the year Columbus discovered America is ...".

Sadly the answer seems to be still 1492.

enter image description here

the model is able to identify a statistical relationship between the words "discovery, America" and the number 1492, but it cant understand the actual meaning. So, by asking the year before the year America was discovered we can easily trick the tool into giving us an inaccurate answer.

But the year America was discovered is a very well defined concept that is likely to come up mentioned by multiple sources and so it will be very frequent in your training set. So ask yourself: what do you expect to happen when the question is something that , to quote the Hitchhiker Guide

"was almost, but not quite, entirely unlike tea"

What if you question is not really that similar to any text the model was trained on? At this point whatever score the model is trying to calculate is probably based on the likelihood of some word appearing in unrelated, not-relevant content.

The result?

A chat bot agreeing with the user in claiming that Sonic was made by Shigeru Miyamoto and later sold to Sega in exchange for Link. That is not true.

The bot will gladly agree on made up fact because in doing so it has fulfilled it purpose. Producing some text that maximize a likelihood utility score function.

Generating factual answers was never its purpose.

6
  • chatgpt is already the top tag. If you haven't don't yet, please take a look at the questions already posted. P.S. At first sight, the question that has been presented here might be a canonical question.
    – Wicket
    Commented Jul 20, 2023 at 16:54
  • @Wicket I know that multiple questions already exist. My point is that the entire network is in the middle of a strike caused indirectly by users thinking that ChatGPT is a knowledge assistant tool that can provide meaningful answers to any question. Since I think this topic will come up again and again on this site I am pointing out that an artificial but canonical question that can be referenced to other users as needed could be useful. But since I am not sure the one I wrote is good enough, I am posting here for a) gathering opinions and b) polish up whatever will be posted Commented Jul 20, 2023 at 17:11
  • In case I was unclear in my previous comment, let me add that I have upvoted the question. Remember that I mentioned that this might be a canonical question, which implies that I have received this post well. Since this question arrives after 16 chatgpt questions, I think It might be a good idea to look at them.
    – Wicket
    Commented Jul 20, 2023 at 17:48
  • 2
    I think it would be on topic. Like with any other SE site, they are bound to attract askers of varying knowledge. Having a canonical question on what can be expected of an LLM or specifically for ChatGPT is probably worthwhile.
    – Hoid StaffMod
    Commented Jul 20, 2023 at 20:05
  • Me: "What is the year before the year of the discovery of America?" ChatGPT: "The discovery of America by Christopher Columbus is traditionally dated to 1492. Therefore, the year before the discovery of America would be 1491." This shows how it's not consistent in its responses.
    – Someone
    Commented Jul 25, 2023 at 18:36
  • @Someone-OnStrike consider that the more you point out a logical fallacy, the more it become probable that random training picks that up. And now, plug-in extensions are a thing to add some more confusion to the mix. This post doesn't want to be an absolute truth, just a oversimplified attempt at clearing up some misconceptions. I hoped other users would provide their versions or propose edits to the post to slowly clean up a canonical question for the main site. Commented Jul 26, 2023 at 7:44

1 Answer 1

3

I think that a single question might be too broad. I suggest having one question for each one of the most relevant misconceptions.

Related

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.