Hiltzik: CNET’s chatbot stunt shows limits of AI

0
8

We’ve all been trained by decades of science fiction to think of artificial intelligence as a threat to our working futures. The idea is: If an AI robot can do a job as well as a human — cheaper and with less interpersonal unruliness — who needs the human?

The technology news site CNET tried to answer that question, quietly, even secretly. For months, the site employed an AI engine to write articles for its CNET Money personal finance page. The articles covered such topics as “What is compound interest?” and “What happens when you bounce a check?”

At first glance and to financial novices, the articles seemed cogent and informative. CNET continued the practice until early this month, when it was outed by the website Futurism.

A close examination of the work produced by CNET’s AI makes it seem less like a sophisticated text generator and more like an automated plagiarism machine, casually pumping out pilfered work.

— Jon Christian, Futurism

But as Futurism determined, the bot-written articles have major limitations. For one thing, many are bristling with errors. For another, many are rife with plagiarism — in some cases from CNET itself or its sister websites.

Futurism’s Jon Christian put the error issue bluntly in an article stating that the problem with CNET’s article-writing AI is that “it’s kind of a moron.” Christian followed up with an article finding numerous cases ranging “from verbatim copying to moderate edits to significant rephrasings, all without properly crediting the original.”

This level of misbehavior would get a human student expelled or a journalist fired.

We’ve written before about the unappreciated limits of new technologies, especially those that look almost magical, such as artificial intelligence applications.

To quote Rodney Brooks, the robotics and AI scientist and entrepreneur I wrote about last week, “There is a veritable cottage industry on social media with two sides; one gushes over virtuoso performances of these systems, perhaps cherry picked, and the other shows how incompetent they are at very simple things, again cherry picked. The problem is that as a user you don’t know in advance what you are going to get.”

That brings us back to CNET’s article-writing bot. CNET hasn’t identified the specific AI application it was using, though the timing suggests that it isn’t ChatGPT, the AI language generator that has created a major stir among technologists and concerns among teachers because of its apparent ability to produce written works that can be hard to distinguish as nonhuman.

CNET didn’t make the AI contribution to its articles especially evident, appending only a small-print line reading, “This article was assisted by an AI engine and reviewed, fact-checked and edited by our editorial staff.” The more than 70 articles were attributed to “CNET Money Staff.” Since Futurism’s disclosure, the byline has been changed to simply “CNET Money.”

Last week, according to the Verge, CNET executives told staff members that the site would pause publication of the AI-generated material for the moment.

As Futurism’s Christian established, the errors in the bot’s articles ranged from fundamental misdefinitions of financial terms to unwarranted oversimplifications. In the article about compound interest, the CNET bot originally wrote, “if you deposit $10,000 into a savings account that earns 3% interest compounding annually, you’ll earn $10,300 at the end of the first year.”

That’s wrong — the annual earnings would be only $300. The article has since been corrected to read that “you’ll earn $300 which, added to the principal amount, you would have $10,300 at the end of the first year.”

The bot also initially described interest payments on a $25,000 auto loan at 4% interest as “a flat $1,000 … per year.” It’s payments on auto loans, like mortgages, that are fixed — interest is charged only on outstanding balances, which shrink as payments are made. Even on a one-year auto loan at 4%, interest will come to only $937. For longer-term loans, the total interest paid falls every year.

CNET corrected that too, along with five other errors in the same article. Put it all together, and the website’s assertion that its AI bot was being “fact-checked and edited by our editorial staff” begins to look a little thin.

The bot’s plagiarism is more striking and provides an important clue to how the program worked. Christian found that the bot appeared to have replicated text from sources including Forbes, the Balance and Investopedia, which all occupy the same field of personal financial advice as CNET Money.

In those cases, the bot utilized similar concealment techniques as human plagiarists, such as minor rephrasings and word swaps. In at least one case, the bot plagiarized from Bankrate, a sister publication of CNET.

None of this is especially surprising because one key to language bots’ function is their access to a huge volume of human-generated prose and verse. They may be good at finding patterns in the source material that they can replicate, but at this stage of AI development they’re still picking human brains.

The impressive coherence and cogency of the output of these programs, up to and including ChatGPT, appears to have more to do with their ability to select from human-generated raw material than any ability to develop new concepts and express them.

Indeed, “a close examination of the work produced by CNET’s AI makes it seem less like a sophisticated text generator and more like an automated plagiarism machine, casually pumping out pilfered work,” Christian wrote.

Where we stand on the continuum between robot-generated incoherence and genuinely creative expression is hard to determine. Jeff Schatten, a professor at Washington and Lee University, wrote in an article in September that the most sophisticated language bot at the time, known as GPT-3, had obvious limitations.

“It stumbles over complex writing tasks,” he wrote. “It cannot craft a novel or even a decent short story. Its attempts at scholarly writing … are laughable. But how long before the capability is there? Six months ago, GPT-3 struggled with rudimentary queries, and today it can write a reasonable blog post discussing ‘ways an employee can get a promotion from a reluctant boss.’”

It’s likely that those needing to judge written work, such as teachers, may find it ever-harder to distinguish AI-produced material from human outputs. One professor recently reported catching a student submitting a bot-written paper the old-fashioned way— it was too good.

Over time, confusion about whether something is bot- or human-produced may depend not on the capabilities of the bot, but those of the humans in charge.

Source