Stories

•

Topics

» Case Studies

Al Jazeera Analyzed 6,500 Homepage Images. Here’s What They Learned

by AJ Labs • May 9, 2018

Picture Perfect: AJ Labs’ collage of the images its journalists used in news articles in 2017.

Choosing the right image to tell your story is just as important as a good news headline. As 2017 came to a close we decided to collect all the images that we published on our homepage throughout the year and ask ourselves what types of images did our readers see when they came to our website?

Asking the Right Questions

To analyze both the contents and context of each image, we used Google’s Vision API. This powerful machine-learning model uses Google’s massive database of images to detect faces, landmarks and everyday objects within an image. It works by letting you upload any image to its service and returns an image’s characteristics in the form of a weighted score.

Uploading an image of US President Donald Trump returns:

To capitalize on this technology there were two main factors that we took into consideration:

What would we want to learn from our 6,500 images?

How can visual machine-learning techniques such as this one be used in the newsroom?

We started our exploration by asking ourselves the following questions:

Which president or public figure appeared the most in our images in 2017?
How many times did we use people’s faces and who were they?
How many photos were of women and how many of men?
How many times did we use photos of protesters?
How often did we reuse the same photograph for another story?
How many times did we use maps as our main image?
What everyday items appeared most in our images?

Of course, we weren’t sure how accurate or granular Google’s Vision API was in analyzing our dataset, so we started with a small sample of images and kept working our way up until we ended up querying and intersecting more than 25,000 records of data.

Technicalities

We used Python for scripting and querying the data and MySQL for storing and sorting the data.

It took around eight hours to run the script and another four hours to perform the SQL queries and analysis.

Preliminary Findings

While Google’s Vision API is regarded as one of the most advanced image detection platforms, it has its shortcomings. As expected, it doesn’t always correctly identify the objects within the frame. In some cases this margin of error is quite acceptable but in others it totally misses the mark.

Knowing this, here are some factors worth considering when using Google’s Vision API:

The most useful property for analyzing news images was definitely the “web entities” feature. Web entities returns a weighted keyword list as well as contextual links to stories containing the image. This was often very accurate for detecting well-known people.
In cases where people were less known, combining the “web entities” and “label entities” yielded better results.
Photos with groups of people didn’t perform very well. In several instances, a large group of refugees in boats wearing life jackets were often mislabeled as “fun” with a high level of certainty.
Sometimes important elements in a photo were neglected. For example, a photo of fighters on top of a pickup truck in the desert only returned “vehicle” as a keyword.
Hand-drawn images or illustrations performed very poorly.

Group Dynamics: Google’s Vision API didn’t perform that well with hand-drawn images or pictures of groups of people.

Answers to Our Questions

Which president or public figure appeared the most in our images in 2017?
Trump, followed by Turkish President Recep Tayyip Erdogan and former US Secretary of State Rex Tillerson. We further drilled down to find the emotions on Trump’s face to be 20 percent joy, 0.6 percent anger, 3 percent sorrow, and 2 percent surprise.
How many times did we use people’s faces?
3,726 times
Did we use more photos of men or of women?
Unfortunately, we weren’t able to answer this.
How many times did we use photos of protesters?
414 times
How often did we reuse the same photograph for another story?
We reused 1,703 images during the past year for news stories.
How many times did we use maps as our main image?
143 times
What everyday items appeared most in our images?

The list of people who appeared 5 times or more:

Final Thoughts

Using image analysis tools on their own means nothing without asking the right questions. To yield any actionable results, these kinds of technologies should ideally be integrated into existing newsroom processes to provide value for both journalists and viewers.

The plan now is to experiment with the following integrations:

Tagging photo repositories inside our CMS to make it easier for our journalists to find specific images very quickly. For example, find all images of Donald Trump next to Emmanuel Macron with a smile on his face.
Help journalists find the best photo that matches the story. Or better yet, filter out all the images that should not go with the story.
Utilize Google’s Cloud Video Intelligence to analyze the contents of live video and extract newsworthy content on the fly.
Apply this technology to VR and 360 images where objects in a given scene can be detected.

We believe 2018 will push machine-learning forward and we are looking forward to developing its applications within the newsroom.

Those were our questions. If you were to analyse the same set of photos, what questions would you ask?

This post first appeared on AJ Labs’ Medium page and is cross-posted here with permission.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

AJ Labs database dataset machine learning photo analysis Python

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Al Jazeera Analyzed 6,500 Homepage Images. Here&rsquo;s What They Learned</h2> by <a href="https://twitter.com/ajlabs">AJ Labs</a> for Global Investigative Journalism Network &bull; May 9, 2018 Choosing the right image to tell your story is just as important as a good news headline. As 2017 came to a close we decided to collect all the images that we published on our homepage throughout the year and ask ourselves what types of images did our readers see when they came to our website?<h4>Asking the Right Questions</h4>To analyze both the contents and context of each image, we used <a href="https://cloud.google.com/vision/">Google&rsquo;s Vision API</a>. This powerful machine-learning model uses Google&rsquo;s massive database of images to detect faces, landmarks and everyday objects within an image. It works by letting you upload any image to its service and returns an image&rsquo;s characteristics in the form of a weighted score.Uploading an image of US President Donald Trump returns:<a href="https://gijn.org/wp-content/uploads/2018/04/aj-labs-trump.jpg"><img class="size-full wp-image-71513 aligncenter" src="https://gijn.org/wp-content/uploads/2018/04/aj-labs-trump-1170x295.jpg" alt="" width="771" height="194"></a>To capitalize on this technology there were two main factors that we took into consideration:What would we want to learn from our 6,500 images?How can visual machine-learning techniques such as this one be used in the newsroom?We started our exploration by asking ourselves the following questions:<ol>
<li>Which president or public figure appeared the most in our images in 2017?</li>
<li>How many times did we use people&rsquo;s faces and who were they?</li>
<li>How many photos were of women and how many of men?</li>
<li>How many times did we use photos of protesters?</li>
<li>How often did we reuse the same photograph for another story?</li>
<li>How many times did we use maps as our main image?</li>
<li>What everyday items appeared most in our images?</li>
</ol>Of course, we weren&rsquo;t sure how accurate or granular Google&rsquo;s Vision API was in analyzing our dataset, so we started with a small sample of images and kept working our way up until we ended up querying and intersecting more than 25,000 records of data.<h4>Technicalities</h4>We used Python for scripting and querying the data and MySQL for storing and sorting the data.It took around eight hours to run the script and another four hours to perform the SQL queries and analysis.<h4>Preliminary Findings</h4>While Google&rsquo;s Vision API is regarded as one of the most advanced image detection platforms, it has its shortcomings. As expected, it doesn&rsquo;t always correctly identify the objects within the frame. In some cases this margin of error is quite acceptable but in others it totally misses the mark.Knowing this, here are some factors worth considering when using Google&rsquo;s Vision API:<ul>
<li>The most useful property for analyzing news images was definitely the &ldquo;web entities&rdquo; feature. Web entities returns a weighted keyword list as well as contextual links to stories containing the image. This was often very accurate for detecting well-known people.</li>
<li>In cases where people were less known, combining the &ldquo;web entities&rdquo; and &ldquo;label entities&rdquo; yielded better results.</li>
<li>Photos with groups of people didn&rsquo;t perform very well. In several instances, a large group of refugees in boats wearing life jackets were often mislabeled as &ldquo;fun&rdquo; with a high level of certainty.</li>
<li>Sometimes important elements in a photo were neglected. For example, a photo of fighters on top of a pickup truck in the desert only returned &ldquo;vehicle&rdquo; as a keyword.</li>
<li>Hand-drawn images or illustrations performed very poorly.</li>
</ul><h4>Answers to Our Questions</h4><ol>
<li>Which president or public figure appeared the most in our images in 2017? 
Trump, followed by Turkish President Recep Tayyip Erdogan and former US Secretary of State Rex Tillerson. We further drilled down to find the emotions on Trump&rsquo;s face to be 20 percent joy, 0.6 percent anger, 3 percent sorrow, and 2 percent surprise.</li>
<li>How many times did we use people&rsquo;s faces? 
3,726 times</li>
<li>Did we use more photos of men or of women? 
Unfortunately, we weren&rsquo;t able to answer this.</li>
<li>How many times did we use photos of protesters? 
414 times</li>
<li>How often did we reuse the same photograph for another story? 
We reused 1,703 images during the past year for news stories.</li>
<li>How many times did we use maps as our main image? 
143 times</li>
<li>What everyday items appeared most in our images?</li>
</ol><a href="https://gijn.org/wp-content/uploads/2018/04/aj-labs-label.jpg"><img class="aligncenter wp-image-71542 size-full" src="https://gijn.org/wp-content/uploads/2018/04/aj-labs-label.jpg" alt="" width="598" height="649"></a> 
The list of people who appeared 5 times or more:<a href="https://gijn.org/wp-content/uploads/2018/04/aj-labs-names-1.jpg"><img class="size-full wp-image-71546 aligncenter" src="https://gijn.org/wp-content/uploads/2018/04/aj-labs-names-1.jpg" alt="" width="548" height="857"></a><h4>Final Thoughts</h4>Using image analysis tools on their own means nothing without asking the right questions. To yield any actionable results, these kinds of technologies should ideally be integrated into existing newsroom processes to provide value for both journalists and viewers.The plan now is to experiment with the following integrations:<ul>
<li>Tagging photo repositories inside our CMS to make it easier for our journalists to find specific images very quickly. For example, find all images of Donald Trump next to Emmanuel Macron with a smile on his face.</li>
<li>Help journalists find the best photo that matches the story. Or better yet, filter out all the images that should not go with the story.</li>
<li>Utilize Google&rsquo;s Cloud Video Intelligence to analyze the contents of live video and extract newsworthy content on the fly.</li>
<li>Apply this technology to VR and 360 images where objects in a given scene can be detected.</li>
</ul>We believe 2018 will push machine-learning forward and we are looking forward to developing its applications within the newsroom.Those were our questions. If you were to analyse the same set of photos, what questions would you ask?<hr><a href="https://gijn.org/wp-content/uploads/2018/04/aj-labs.jpg"><img class="alignleft wp-image-71547 size-thumbnail" src="https://gijn.org/wp-content/uploads/2018/04/aj-labs-140x140.jpg" alt="" width="140" height="140"></a>&nbsp;This post <a href="https://medium.com/@ajlabs/weve-analysed-6-500-images-that-appeared-on-our-homepage-and-here-is-what-we-ve-learned-650fdedf4a45">first appeared on AJ Labs' Medium page</a> and is cross-posted here with permission.
	This <a target="_blank" href="https://gijn.org/stories/al-jazeera-analyzed-6500-homepage-images-heres-what-they-learned/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

Investigating Rainforest Destruction: Finding Illegal Airstrips Using Machine Learning

by Jelter Meers and Kuek Ser Kuang Keng • October 13, 2022

Rainforest Investigations Network fellow Hyury Potter used data reporting and machine learning to investigate the link between clandestine airstrips and illegal mining in the Brazilian Amazon during the past two years.

Case Studies

Dollars for Data Journalism: How One Project Makes Money by Selling Data on Police

by Erika Owens • August 12, 2019

After NJ Advance Media spent lots of time and money to find and clean up data for a police accountability project, it decided to sell this data, while still keeping its database free to the public. Reporter Stephen Stirling tells OpenNews’s Erika Owens how this works.

Case Studies

How They Did It: Building a Database of Police Use of Force in the US

by Craig McCarthy and Stephen Stirling • February 25, 2019

After a landmark New Jersey supreme court ruling made police use-of-force reports fully available to the public, NJ Advance Media decided to dig in. They fought tooth and nail to get their hands on the reports and create the United States’ most comprehensive statewide database on this issue.

Case Studies

How They Did It: Uncovering the Top 15 “Dark Money Groups” in US Politics

by Patrick Strohecker • November 27, 2018

Non-partisan advocacy organization Issue One spent a year combing through thousands of financial filings over a six-year period to figure out where the money for campaign advertisements in the USA was coming from and who it was going to. They ultimately compiled a database that outlined the top 15 “dark money groups” — organizations that receive millions of dollars of donations from companies and businesses to fund campaign ads.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Al Jazeera Analyzed 6,500 Homepage Images. Here’s What They Learned

Asking the Right Questions

Technicalities

Preliminary Findings

Answers to Our Questions

Final Thoughts

Read other stories tagged with:

Republish this article

Read Next

Case Studies Reporting Tools & Tips

Investigating Rainforest Destruction: Finding Illegal Airstrips Using Machine Learning

Case Studies

Dollars for Data Journalism: How One Project Makes Money by Selling Data on Police

Case Studies

How They Did It: Building a Database of Police Use of Force in the US

Case Studies

How They Did It: Uncovering the Top 15 “Dark Money Groups” in US Politics

Stories

Topics

Al Jazeera Analyzed 6,500 Homepage Images. Here’s What They Learned

Related Resources

GIJN Guide to Investigating Foreign Lobbying

Guide to Investigating Caste

Gathering Evidence and Documents in Conflict and War Zones — A MENA Case Study

Tipsheet: How Journalists Can Use a UN Process to Evaluate National Human Rights Records

Share

Asking the Right Questions

Technicalities

Preliminary Findings

Answers to Our Questions

Final Thoughts

Related Resources

GIJN Guide to Investigating Foreign Lobbying

Guide to Investigating Caste

Gathering Evidence and Documents in Conflict and War Zones — A MENA Case Study

Tipsheet: How Journalists Can Use a UN Process to Evaluate National Human Rights Records

Related Stories

Investigating Rainforest Destruction: Finding Illegal Airstrips Using Machine Learning

Dollars for Data Journalism: How One Project Makes Money by Selling Data on Police

How They Did It: Building a Database of Police Use of Force in the US

How They Did It: Uncovering the Top 15 “Dark Money Groups” in US Politics

Read other stories tagged with:

Republish this article

Read Next

Case Studies Reporting Tools & Tips

Investigating Rainforest Destruction: Finding Illegal Airstrips Using Machine Learning

Case Studies

Dollars for Data Journalism: How One Project Makes Money by Selling Data on Police

Case Studies

How They Did It: Building a Database of Police Use of Force in the US

Case Studies

How They Did It: Uncovering the Top 15 “Dark Money Groups” in US Politics