Photo illustration by John Lyman



The ‘Artificial Stubbornness’ of ChatGPT when Solving a Simple Puzzle

“The farmer with his wolf, goat, and cabbage.”

The puzzle about the farmer with his wolf, goat, and cabbage is very old. According to David Darling, this logical problem originates from the work Propositiones ad Acuendos Juvenes, written by Alcuin of York, who lived from 735 to 804. The puzzle is also very well-known; practically everyone knows the solution, and if you don’t, the answer can be found on tens of thousands of websites. Therefore, it’s not surprising that ChatGPT can provide the solution, as during the training of this LLM, the solution has been “read” and processed tens of thousands of times.

We carried out a number of experiments with this puzzle and ChatGPT and it seems that ChatGPT has only one solution, which is firmly nailed into its vast sea of billions of parameters, so firmly that it shows signs of “artificial stubbornness.”

The puzzle has 9 elements:
– the farmer with his wolf, goat, and cabbage (4),
– the goal (1): to get everything across the river,
– means of transportation: the boat (1),
– transport limitation (1): only one object can be taken in the boat,
– the restrictions (2): the wolf and goat, and also the goat and cabbage cannot be left alone together.

We can now create various versions of the puzzle by adding, removing, or changing one or more elements. The results are not only amusing but can also be disturbing.

ChatGPT only needs half a word

If you remove one or more of the 9 elements from the puzzle and ask ChatGPT for a solution, it turns out that ChatGPT consistently gives the same standard answer. Apparently, the model simply assumes that the given puzzle always refers to the famous classic puzzle.

We give an extreme example where the puzzle is reduced to a simple question leaving 6 of the 9 elements out: A farmer must transport a goat and a cabbage. How does he do that?

ChatGPT’s answer:

“The farmer can transport the goat and cabbage across a river with the following steps:
The farmer takes the goat across the river and leaves it on the other side.

The farmer leaves the goat on the original side and takes the wolf across the river.
Finally, the farmer returns alone to the original side, leaving the wolf and cabbage on the other side.”

The answer is not only wrong (the goat is still not on the other side), but also boldly assumes that there is a river, a wolf, and a number of transport restrictions. In other words, it gives a solution for the famous puzzle while only a simple question was posed.

ChatGPT doesn’t listen

From the 9 elements, we change 3 in such a way that the puzzle loses its safety problem. We now talk about a toothless wolf, a dead goat, and a plastic cabbage. Moreover, we don’t mention the two restrictions, as a toothless wolf cannot eat a goat, and a dead goat certainly leaves a plastic cabbage alone.

However, the answer clearly shows that ChatGPT doesn’t listen well:

“To solve this classic river crossing puzzle, the farmer needs to ensure that none of the items are left alone together in combinations where they could cause trouble. Here’s how the farmer can safely transport all the items:
First, the farmer takes the plastic cabbage across the river and leaves it on the other side.
Then, the farmer goes back alone…”

ChatGPT really doesn’t listen

Another way to eliminate the puzzle’s fundamental issue is to provide the farmer with a boat large enough to carry everything at once. So, we adjust one element and talk about a boat 100 feet long and 10 feet wide. Moreover, we don’t mention the restriction of only transporting one object at a time since the boat is now large enough.

The answer shows that ChatGPT really doesn’t listen:

“The farmer can follow these steps to ensure that the wolf doesn’t eat the goat or the goat doesn’t eat the cabbage while transporting them across the river:
The farmer takes the goat across the river first, leaving the wolf and the cabbage on the original side.
The farmer returns alone to the original side and takes the wolf across the river…”

Extreme changes

When really extreme and absurd changes are made, ChatGPT still stubbornly sticks to its standard solution. The question becomes:

“A minister must bring a farmer, a civil servant, and a bag of donuts from New York to Amsterdam. He doesn’t have a boat but has roller skates with which he can take, besides himself, one person or one item at a time. Furthermore, he is allowed to use Air Force 1. The farmer cannot stay with the civil servant because the civil servant gets nervous, and the civil servant cannot stay with the bag of donuts because the civil servant would eat all the donuts. How does the minister get everything safely to Amsterdam?”

ChatGPT’s answer:

“To solve this puzzle, the minister must shuttle back and forth between New York and Amsterdam, ensuring that no incompatible pair is left alone together. Here’s how he can do it:
First, the minister takes the civil servant to Amsterdam, leaving the farmer and the bag of donuts in New York.
Then, the minister returns to New York with the roller skates and takes the farmer to Amsterdam.
Next, the minister returns to New York with the farmer, leaves him there, and takes the bag of donuts to Amsterdam.
The minister returns to New York with the roller skates again, leaving the bag of donuts in Amsterdam, and finally takes the civil servant to Amsterdam…”

The answer is not only absurd but also wrong because he takes the civil servant twice to Amsterdam (impossible) and also leaves him with the donuts for some time.

No goal

If the farmer does not need to cross the river, ChatGPT stubbornly wants to play the ferryman. This becomes clear with the following question: “A farmer has a wolf, a goat, a cabbage, and a lunch with him. He comes to a river where he sees a picnic table. The farmer is hungry; what does he do?”

ChatGPT’s answer:

“The farmer needs to transport all his items across the river without leaving any combination of them alone where they might cause trouble. He can only take one item…”

When ChatGPT is asked for a solution to different variations of the classic wolf-goat-cabbage puzzle, it always provides an answer. However, this answer seems to be a Pavlovian response: every time it provides the same standard answer, with the farmer going back and forth, even if it is completely absurd. ChatGPT shows a lack of intelligence here; the model doesn’t understand what is being asked. There is “artificial stubbornness,” and that is disturbing because it clearly shows you cannot trust the answers.

Right of reply

The reply of ChatGPT about this article is as follows:

“The article highlights a significant limitation of ChatGPT: its tendency to default to familiar solutions, even when key elements of a puzzle are altered or removed. While this demonstrates the model’s reliance on learned patterns, it also underscores a lack of contextual understanding and adaptability. The critique is valid in illustrating that ChatGPT can exhibit “artificial stubbornness” by sticking to a standard answer regardless of variations in the problem. This highlights the need for improvements in the model’s ability to discern and adapt to nuanced changes in input.”