Early in 2021, thousands of selfie videos from China’s Xinjiang province flooded social media, apparently showing ordinary Uighur men and women strongly denying what Western human rights groups and investigative journalists had already found: that the Uighur Muslim minority was suffering vicious government repression and forced labor in that region.
What could seem more credible than hairdressers, car mechanics, and clothing store clerks from the victimized group, personally and seemingly spontaneously scolding Western journalists and diplomats for exaggerating their plight?
That perceived credibility, it turns out, was the whole point of the more than 5,000 videos and their bot-powered online amplification, according to a four-month-long investigation by reporters at ProPublica and The New York Times. The team found the videos actually represented a coordinated state influence campaign that was carefully boosted on Western platforms, and likely re-exploited many of the individuals in those videos. Their statements were scripted. At least 600 used the Uighur phrase for “complete nonsense,” and hundreds more used phrases like “born and raised” and “we are very free.”
The reporting team warned of a possible authoritarian trend toward more of these mass disinformation campaigns using “ordinary people” giving scripted or coerced testimonials. For instance, could “ordinary Ukrainians” soon be seen claiming to welcome Russia’s invasion, or Russian military families insisting they support the war?
The Anatomy of a Selfie Disinfo Campaign
In a session on the topic at the NICAR22 data journalism conference — organized by Investigative Reporters & Editors — the Xinjiang project’s lead data reporters, ProPublica’s Jeff Kao and The New York Times’ Aaron Krolik, described the tools and techniques they used to untangle the multi-layered propaganda scheme. The same team — which includes reporters Raymond Zhong and Paul Mozur – also produced the “How China Censored the Coronavirus” exposé that comprised one part of a broader entry that won a public service Pulitzer Prize in 2021.
Their Xinjiang story described the campaign as “one of China’s most elaborate efforts to shape global opinion,” and its use of testimonials as indicative of a “savvier” pivot to credible-sounding disinformation.
While denying human rights abuses of Uighurs was the common theme, Kao said the team found two disinformation subjects in the campaign, and three parts to its operation:
- It first involved videos denouncing a statement made by outgoing US Secretary of State Mike Pompeo on January 19, 2021, which described the repression of Uighurs as “genocide.”
- A second vast batch of videos, starting in March, denounced Western clothing retailers for boycotting Xinjiang’s cotton industry out of concerns over forced labor.
- The campaign operation began by soliciting scripted video messages from targeted individuals for a local Communist Party-controlled news app, called Pomegranate Cloud, where Chinese-language subtitles were added.
- Government agents then polished the clips with English subtitles and codes to evade Western spam filters.
- The network then amplified them on Twitter and YouTube with bots and Twitter accounts that Kao refers to as “warehouse accounts,” which are set up purely to host and boost videos. The investigation flagged 300 of these accounts on Twitter.
The reporters revealed that two advanced tools were central to their investigation: the paid-for Google Cloud Vision (GCV) image-labeling tool, and a free, open-source, command-line download manager, youtube-dl – which, they stressed, had strengthened as a tool for journalists in the past year. They also used advanced tools and coding techniques to reverse-engineer the closed-source Pomegranate Cloud Android app, to be able to search the original clips.
Traditional Reporting Can Also Expose Selfie Propaganda
One important takeaway from the session was that a smaller but still effective version of the project could have been done manually — simply applying a set of analysis principles (see below) and clicking back and forth between YouTube and Twitter channels, with no need for automated scraping or coding skills.
Krolik said the volume of videos was the difference. Their team was forced to use automated tools because it chose to process and analyze all 5,000 unique and duplicate videos in the campaign, rather than an early sample of around 200 videos which could have been manually analyzed, and would have been sufficient to show coordination.
Traditional reporting, they say, can also reveal the campaign organizers.
For instance, reporters can simply try calling the people in the testimonials. In one case, a project reporter called the subject of one video whose used car dealership appeared in the background, and the man readily acknowledged that local government officials had produced his video. “Why don’t you ask the head of the propaganda department?” the man added, and even provided the official’s phone number.
After much experimenting with “fingerprinting” videos by their background features, the investigation ultimately succeeded, Krolik said, by transcribing the video subtitles, and finding telltale patterns in those transcriptions.
He said Twitter quickly took down the campaign accounts after the team flagged their URLs to the platform, weeks ahead of the story’s publication. Krolik suggested that reporters can treat the mere fact that platforms – which have unique technical insights about their accounts – remove the videos and accounts they highlight as additional confirmation of their toxicity.
Signs of a State-Backed “Selfie” Disinformation Campaign
- Testimonials using scripted language, or nearly identical word sequences.
- Slick, local-language videos duplicated across channels, with subtitles in English on Western platforms.
- Suspiciously similar backdrops. (Kao and Krolik never found different videos shot in the same store — but they did find that clothes racks and the walls of clothing stores, for instance, were a preferred campaign backdrop for solicited testimonials.)
- A short string of strange symbols or characters at the end of tweets and retweets, such as parentheses and percentage signs. For instance, many of the Xinjiang campaign tweets ended with five seemingly meaningless Chinese characters intended to confuse spam filters, which emerged as a core signifier of the campaign.
- A sudden deluge of similar testimonial content.
- Same-topic video clips of roughly the same duration — like 80 to 90 seconds.
- Video backgrounds that “happen” to show propaganda slogans. At least one Xinjiang testimonial took place in front of a banner that read “Happiness comes out of working” — a slogan, Krolik noted, with disturbing echoes of Nazi-era propaganda.
- Unbelievable statements — for instance, from a businessperson or an academic well-placed to know facts about a pressing issue documented by the media or the community, but who denies the existence of the issue outright.
- Twitter accounts that feature many videos on the same topic, but only a few followers. These could be “warehouse accounts” dedicated to spreading propaganda. “Ask yourself: ‘Where is this account with only five followers getting all these videos from?’” said Kao.
“Just scrolling Twitter and browsing a lot helps,” Kao concluded.
But Kao stressed that language skills and cultural knowledge are crucial for debunking coordinated testimonials. “I would never do an investigation like this without someone who speaks the language and understands the context in the videos,” said Kao. “Otherwise, you could make yourself look silly — saying, ‘This is a propaganda video,’ and it turns out to be just an ad for snacks.”
Krolik said the team initially looked for videos shot in the same location, or professional actors saying the same thing from different locations, to match traditional disinformation patterns, and found neither. Instead, they found a much more ambitious propaganda project, in which thousands of truly ordinary people were individually enlisted to produce unique, scripted videos, which were then polished and amplified on Western platforms.
Tips for Investigating Selfie Propaganda Campaigns
The ProPublica/New York Times team recommended more than a dozen steps for digging into similar testimonial propaganda campaigns:
- Having identified suspicious videos, “crawl” the accounts and channels hosting them — and compare real-time notes on any breaking, viral campaign with your team on group chats.
- Enlist the help of team partners who know the language and cultural context in the videos.
- “Click around” between platforms, and pin clues like strange character strings, identical content, and similar posting times between accounts as scraping filters.
- Download suspicious videos immediately, as they are often quickly removed by the account owner or the platform, or made private. Try youtube-dl if you have access to basic command-line skills because, they say, this tool now has powerful scraping features. (Other reporters have recommended user-friendly download sites like twittervideodownloader.com that require no coding skills, but caution that third-party sites can raise platform rights policy issues.)
- Establish the scale of the campaign: How many unique videos were posted and reposted on how many platforms, and in what time frame?
- If you gain access to a non-public app involved, like Pomegranate Cloud, look for sections dedicated to the campaign. The Xinjiang project team was surprised to find that Pomegranate had a section devoted to the cotton industry disinformation. Kao noted that apps like this often do not require strong authentication or logins, and that keys in the app can give data reporters access points to create their own scrapers.
- Explore the metadata on videos for names and posting dates.
- Convert videos into sample image frames, using GCV or the conversion tools you know best.
- “Fingerprint” video frames — especially from those that are too numerous to examine manually — with optical character recognition (OCR) tools like Google Cloud Vision. Krolik explained that this AI-powered tool uses OCR to efficiently annotate videos, and convert their features into searchable text. Another tip: complete this fingerprinting over a short time period, as tools like this, Krolink noted, could update their algorithm during your process, which could subtly change the next batch of results.
- Cluster videos according to their fingerprint features. “We didn’t have the time to watch 5,000 two-minute videos,” Kao explained. “So clustering by feature is helpful.” In this project, Krolik used the GCV tool to send Kao files that listed features like “hair salon,” “outdoors with trees,” “someone in front of clothes,” or “there’s a market.” Kao could then turn the document text into useful data — and used advanced set comparison algorithms like MinHash and LSH to process them.
- Create and analyze transcripts of all videos with subtitles, using any manual technique or software that works for you. (This project used a suite of tools including FFmpeg, ImageMagick, and Google’s Detect Text.)
- For automated projects: crop frames around subtitles, and stitch crops together in a vertical stack. Tip: since some OCR tools charge per image, Krolik suggested creating a very tall stack of cropped subtitle images as a single composite image to submit for processing, to save money.
- Exclude possible non-campaign videos with a keyword filter. Identify one or two keywords central to the campaign — like “Pompeo” and “cotton” in the Xinjiang project — and exclude videos that mention neither, using manual search or OCR tools.
- Share the campaign URLs with the relevant social media platforms prior to publication, and note which accounts they remove.
“When you see something weird in a video, make a mental model of what could be behind all of this,” Kao advised. “Sure, it could just be some happy villager, but when you see a pattern, a new model emerges.”
He added: “We thought it was so curious there were so many videos with such similar content. It would have been almost funny, if it wasn’t so disturbing.”
Rowan Philp is a reporter for GIJN. Rowan was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.