From hyperlocal websites to large corporations, they are all experimenting with robo-writers. Cost cuts in recent years thinned out newsroom staff up to a point, where you have to get creative in order to grow your digital audiences with less and less people in a business, where the amount of eyeballs still count.
Automation on the micro-level
lsle of Wight seems to be a lovely place. It is located in the English Channel, about 4 mi off the coast of Hampshire. Unfortunately, I have to admit, I haven’t been there, yet, but these 148sq mi attracted even Queen Victoria to built her summer residence there and from Wikipedia I heard that it “has well-conserved wildlife and some of the richest cliffs”. Sounds like a perfect holiday destination and also like a perfect place for a hyperlocal news site, don’t you think so?
Sally and Simon Perry, who live in Ventnor on the south coast of the island must have thought the same, when they started their blog some time ago. They decided to scale their online endeavour once it became evident that people were more willing to spend time on news sites than blogs. On The Wight, their hyperlocal news portal, is now a place for what’s on information, general news and the latest in local sports. The site peaks at 100,000 unique visitors per month, not bad, if you remember that the whole island is populated with no more than 140,000 people, or 66,000 households and Sally and Simon publish ten to fifteen pieces per day.
If you’re a two-people business you have to be creative in order to keep the ball rolling and the two founders of On The Wight experiment wherever it makes sense or wherever it looks promising from a business point of view, testing services from WhatsApp to or using methods like data journalism to stay relevant. While many publishers struggle to deliver a great user experience, the two owners searched for and found a developer in their neighbourhood, who helped them to code and launch a dedicated mobile site, which loads fast and meets the user expectations in today’s increasingly mobile world. Sally and Simon created local service offerings like the live school status and transport snow updates during the cold winter time, where they used Google Spreadsheets to deliver relevant news to their audience. They introduced live council reporting, and launched an investigative journalism project called “Armchair Auditor”. I’m sure you can find the whole universe of journalism in this beautiful microcosms. But when they transformed from small local blog to news sites, the pressure to produce more and more content grew so that they had to turn to broadcast mode in order to keep the visitors coming. That’s hard especially if you consider relationship as a major success factor for small publishers with a local or regional focus. To consider automation as a solution to a content productivity problem came quite natural, so that they started to built their own system this year and to put it in use. While they are still testing automation to increase their content base with less pressure on the small team, one thing becomes crystal clear. If you want to remain competitive you have to increase the number of people delivering content, even if they people are in fact machines.
“This story was created by … ” and long tail considerations
In January 2015, the Associated Press announced that they would begin to use automation technology to write breaking news stories and earning reports and if you look at the bare facts, there was probably no option to do something else. AP once produced 1,000 earnings reports per quarter but due to dramatic downsizing ended up to publish no more than 300 any more. How do you want to stay relevant for your audience, if you can’t keep up with the information needs in the market, just because you face budget constrains, to that you lack the resources to do so?
AP’s backup plan was to experiment with automation. The goal was to have 130 words onto the wire in 15-20 minutes of the press release. With Automated Insights and data from Zacks Investment Research, they are now able to automate short stories (150-300 words) about company earnings and achieve an output of 4,400 reports for companies throughout the US each quarter. A content approach which AP could now now roll out as service for its international markets. Something, which wouldn’t be possible if it would try to come close to this number with human writers, and don’t forget that robot journalists can produce copy in several languages simultaneously without any substantial additional effort.
New York Times’ “The Best and Worst Places to Grow Up” offers another interesting aspect of how the automation of content could be used as it delivers customised machine-written content based on users IP-address. Stories vary in certain parts and won’t look the same just based on the location from where you access the story.
Above: Screenshot “Best and Worst Places to Grow Up“
Technically speaking AP is now able to produce earning reports on every company for which digital information and data is available for at a fraction of time not to speak about the costs for a single piece. The automation enables AP to deliver content matching demand along the long tail. They can achieve profits in areas where they haven’t been able to provide an offer before, just by customising content automatically to match individual needs.
If you want to start your own robo-writer project, plan at least 3 months to implement a professional solution for a standard setting. In complex scenarios, it could taken even longer as Alexander Siebert, CEO Retresco GmbH explains. Exceptions in the data sets are the challenges, which need to be addressed, and which can be time consuming. Most vendors expect a minimum of 10-20k articles and charge a minimum of 1€ per article. Economies of scale apply.
Computerisation of content production
“Most uses of robot journalism have been for fairly formulaic situations — company earnings reports, stock market summaries, earthquake alerts and youth sports stories,” writes standard editor of The Associated Press, Tom Kent. Machine learning algorithms and software like Automated Insights Wordsmith or Narrative Science’s Quill are looking for facts within the digital information. They identify row data, tables, graphs, lists and have to understand concepts like “comeback” which can be found in sports journalism. As Stephen Beckett of BBC noted: “To make the article sound natural it has to know the lingo,” automation software has to understand vocabulary and style of the context they are writing for or in case of AP, its has to comply with its stylebook as well. While machines use their understanding of structured text or “trees,” to investigate all possible ways to tell a story, they use data to make decisions.
Automated stories fall short and impersonal, lacking characters, missing personal words or any sense of the personalities included in the stories. So far we will hardly find lyrically openings of stories, because they’re hard to imitate for computers. Adding tone to a machine-written story remains a challenge, but it’s not impossible. E.g. post-game quotes integrated into stories would help to further imitate human writing style. Even the production of artifacts of cultural value such as poems, music and paintings are within the scope of creative work which could be computerised.
Automation comes with challenges. How do you want to encode news judgement in algorithms? How can you avoid ethical missteps? While AP is sure, that automated systems already produce less errors than humans, it comes with some pitfalls, if the underlying data might be wrong the automated stories can’t be better. That’s why vendors like Retresco implement routines to deal with anomalies in data.
Thomas Kent adds another interesting aspect to the discussion, when he asks, what would happen if political activists demand the source code behind an automated coverage in a legal case or politicians might search for a political bias in reporting, questioning the underlying algorithm of a machine-written article or report.
What’s left for humans to take care of?
Not a stone will be left standing. Expect the roles of journalists again to change substantially.
Alexander Siebert of automation software vendor Retresco expects data journalism to be done by machines in 2020. The role of data journalists would change this way, that they set the target only, but expect even that to be done by computers in the future. In an email exchange with the author of this article, he said, that he expects machines to write investigative pieces in four to five years time, maybe even less. Imaginable?
Companies like Google Ventures, and CIA founded Recorded Future already discover “hidden” relationships between people, companies, events, places, and time to make predictions about the security of specific places, e.g. in case of Egypt’s presidential election in 2014. The author used their algorithms to predict the escalation of events in preparation for the coverage of Movimiento M15 in Spain in 2012. It was part of an experiment to understand if predictive analytics could be used to be faster and to allocate resources more efficiently (right place, right time) in a newsroom environment.
(Above: Recorded Future in a blog post, 2014)
So far it is still up to humans to give numbers a meaning, to interpret what was said in earning calls, identify trends, to find exclusive stories or breaking news, AP wrote in their announcement. And it will be the task of a human to add a human touch, by editing robo-written stories or by writing additional pieces or follow-ups. “Isn’t that our whole job: understanding the purpose of any kind of narrative before we do it?” asks Scott Klein of ProPublica rhetorically and Ian Crouch, New Yorker wants his readers to look at it more like a nice opportunity to be freed to do the more interesting stuff.
Ian Crouch, New Yorker: “Perhaps human beat writers should make the most of this moment of technological innovation. They could leave the recaps to the algorithms, abandon the field as it were, and, newly liberated, put their knowledge and personal relationships to use in more compelling pursuits.”
More like this