In Bard We Trust, Believing in Google

Since the introduction of ChatGPT last year, AI has been in the spotlight. That’s not surprising. ChatGPT, a large language model, can, for example, write poems, do homework, create stories, generate summaries, write theses, and answer questions. One significant disadvantage of ChatGPT is the lack of real-time information; its knowledge doesn’t extend beyond September 2021. ChatGPT therefore does not know that King Charles III is the current sovereign of the United Kingdom (August 2023).

After the rapid success of ChatGPT, Google launched Bard in February. Bard is a large language model that keeps its knowledge up to date. Bard therefore knows who the current sovereign of the United Kingdom is.

Both ChatGPT and Bard and other large language models do not have intelligence. They are nothing more than very large computational models that have absorbed billions of words and calculated their relationships. For example, they do not know what a banana is, but they can make up whole stories about bananas because they know every possible relationship with other words and word groups.

Because these language models lack intelligence, they can easily make mistakes, big mistakes. Where a child would succeed, a large language model can completely fail. This doesn’t have to be a problem, as long as it’s clear that this is the case. However, that’s not always the reality; misinformation and fake news abound with certain topics. When questions are posed to a large language model, the answers should always be verified. But if the answer needs verification, asking the question becomes pointless. Therefore, you should only ask questions if you already know the answer. Yet, that too is futile.

Is it all that bad? Yes, it is. I like to illustrate this misinformation and fake news through Google’s Bard and the American stock index Nasdaq-100. The Nasdaq-100 is a stock index consisting of the 100 to 105 “heaviest” stocks on the Nasdaq.

The initial task for Bard is a list of all the companies that are part of the Nasdaq-100, preceded by a sequential number. The result is a list of 50 companies. At present, the index comprises 101 stocks, meaning 51 stocks are missing. However, more errors emerge. Out of the 50 companies on the list, only 13 are actually part of the Nasdaq-100, while 37 companies have nothing to do with the stock index. It’s a piece of cake for a child, but too difficult for Bard.

Next, I ask if a column can be added, a column with the latest stock price. The new list now shows only 44 companies, but it is supplemented with a stock price. Of the Nasdaq stocks present on the list, no stock price is correct, and the differences are large, sometimes even very large. For example, the closing price of Tesla was $215.49, but Bard turned it into $723.50.

The last question concerned adding a column with the names of the CEOs of the companies. The list now shows only 38 companies, supplemented with the CEO column. This was again a difficult question. For example, a certain Suyash Madhavan was named CEO of Meta instead of Mark Zuckerberg. Of the 38 companies, Bard was able to give the correct name of the CEO in 30 cases.

These errors are not exceptional. If the exercise is repeated with the French CAC 40 and the Dutch AEX index, it results in the same types of errors. In other words, the list of companies is not complete, the names of the CEOs are often incorrect, and the stock prices appear to be fabricated.

Furthermore, it is noticeable that the errors are not consistent; they seem random. For instance, at one point, Suyash Madhavan is the CEO of Meta, then it’s Mark Zuckerberg the next time, and sometimes Meta doesn’t even appear on the list. In other words, the errors are unreliable.

The conclusion is clear. While a child can handle the task, a large language model like Bard accumulates one error upon another. The artificial intelligence utilized by Bard here demonstrates little intelligence.

And Bard, what is his opinion about my article? It took him only a few seconds to write the following text.

Large language models like Bard are powerful tools, but they are not to be trusted. They are still under development and can make mistakes, big mistakes. They can also be biased and spread misinformation.

The article you shared is a good example of the dangers of large language models. Bard was unable to provide a complete or accurate list of the companies in the Nasdaq-100 stock index, the stock prices of the companies, or the names of the CEOs. These are all basic facts that a child could easily find, but Bard struggled with them.

We should not rely on large language models for accurate information. We should always verify the information that they provide and be critical of their output. We should also be aware of the biases of large language models and interpret their output with caution.

Large language models are a new and exciting technology, but we need to use them with caution. We should not let them replace our critical thinking skills.