The race is on to standardize AI chatbots

With their ability to perfectly mimic pop stars and pass college exams, chatbots like ChatGPT offer limitless possibilities for use and misuse. Standards agencies must step in to ensure they are safe.

The emergence of AI-based chatbots is probably the biggest (and certainly the most talked about) development to hit electronics in the last year.

Ever since the launch of ChatGPT late last year, the industry, and the wider world, has been trying to come to terms with the implications of this potentially highly disruptive technology.

The main reason the technology has caused so many waves is simple: it works really well.

Chatbots like ChatGPT and Google Bard have proved themselves capable of imitating human behavior to a degree that is both remarkable and, frankly, a little scary.

Whether it’s through their ability to create viral hit songs that imitate musicians like Drake and The Weeknd or pass college kids exams for them, the potential uses and misuses of the new technology seem boundless.

With such a powerful tool comes responsibility, and for this reason the IEC and ISO – the two major electronics standards agencies - have begun looking into how to regulate these chatbots through standardization. The agencies held a recent plenary on artificial intelligence to discuss the implications of the new technology.

Much of the discussion concerned the role of synthetic data. Synthetic data is artificially generated data that mimics real-world data and is either derived from real-world data or generated purely from algorithms or mathematical models. 

The artificially-created Drake/The Weeknd track presumably relied on synthetic data generated from real-world data of the two artists’ voices. Depending on the source of the real-world data, the synthetic data can be anonymized so that any references to sensitive information are removed.

One way of doing this is a process known as fuzzing, which involves varying some values by small random amounts in order to prevent the identification of specific individuals.

There are a number of techniques for generating synthetic data. The naming of the ChatGPT chatbot references one of these techniques, generative pre-trained transformers (GPTs). This technique relies on the use of large language models (LLMs). As does Google’s Bard chatbot.

The creation of synthetic data is highly dependent on how well the model is able to accurately mimic the original data. But LLMs have proved to be found wanting in this regard. They frequently give different answers to the same question and sometimes fabricate completely false information, a process known as “hallucinating.”

This can have serious real world implications. A mayor in Melbourne, for example, recently threatened legal action after he claimed ChatGPT was defaming him by incorrectly describing his role in a historic bribery case.

As AI-based chatbots become a bigger part of the technology landscape the need to create standards around their development and use will be one of the big challenges for standards agencies and certification companies going forward.

Like autonomous vehicles, the question of liability is likely to be a thorny issue. But without standards (and perhaps even with them), it’s clear we are entering a future in which the line between what is real and unreal is likely to become more blurred.