Image: Shutterstock

Stories

•

Topics

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

by Clark Merrefield, The Journalist's Resource • October 20, 2023

In September 2021, the Wall Street Journal began publishing a series of articles exposing the inner workings of Facebook and subsidiaries such as Instagram, including evidence that company insiders knew Instagram made teen girls’ body image issues worse and that Facebook leaders did little to curb recruitment activities of human traffickers and drug cartels.

Much of that reporting was based on a trove of documents and images leaked by former Facebook product manager Frances Haugen, who came forward publicly several weeks after the series published.

(In October 2021, Facebook Inc., the parent company, was rebranded as Meta Platforms Inc., an effort at least six months in the making that some commentators in the news media noted might have had the effect of blunting public backlash following Haugen’s leaks.)

In November 2021, Harvard Kennedy School’s Public Interest Tech Lab received an anonymous drop of information from the Haugen leak, comprising roughly 20,000 images and more than 800 internal Facebook documents, such as chat threads and research, starting from 2016.

As of October 18, 2023, that information is available to the public, in a searchable format, via a virtual tool called FBarchive. Users need to register for a free account to access the archive.

FBarchive is designed to help researchers, journalists, and policymakers understand how, why and when decisions have been made at some of the most influential social media platforms in the world. The project is led by Latanya Sweeney, a technology professor at Harvard who heads the Public Interest Tech Lab.

Sweeney says making these internal deliberations and thought processes public will help policymakers and technology researchers discover solutions to the problem of moderating content on social media platforms that billions of people use.

“We just don’t know how to do moderation at scale — we don’t have the technology, we don’t have the know-how — and that’s something that’s true on all of these platforms where we try to do moderation,” says Sweeney, a pioneer in the field of data privacy. “So, the question is, how should we do that? Can we look at these documents to see where the fault lines are and inspire new technologies, or new technological approaches?”

How to Use FBarchive

Go to fbarchive.org and hit “Enter.” This will bring you to a sign-in page. First-time visitors will receive directions to sign up for a new account via the Public Interest Tech Lab’s MyDataCan platform. Harvard-affiliated users can sign in with their university ID. All other users can click “sign up” to create a username and password.

The primary gateway to accessing the FBarchive materials is a Boolean search bar, meaning certain operators, such as “and,” “or,” or “not” will either broaden or restrict results. Anyone who wants to view a document in FBarchive needs to be logged in.

The search bar is useful for researchers and reporters who already have some focus on what they are interested in — for example, specific keywords or phrases related to body image, gender issues or global conflicts. Journalists and researchers can also get a general sense of what is in the archive by using broader terms — “drug cartels” or “human trafficking,” for example.

Users can also search for information about particular people, such as executives at Meta. The FBarchive team redacted names of people who likely have an expectation of privacy — a software engineer outside of top management, for example. Names of public figures, such as C-suite executives, politicians and celebrities, are not redacted.

To help users understand what they’re reading, Sweeney and her team created a glossary of terms and phrases found in the documents. The “audience problem,” for example, is “a term used internally to describe the years-long trend of declining post numbers on Facebook,” according to the glossary.

“There’s a lot of inside Facebook language in there,” Sweeney says.

When using FBarchive, click the book icon, circled above, to see the glossary. Image: Screenshot, FBarchive

Users can bookmark particular documents and images, and create their own tags, which can be used to curate collections of images and documents. For example, a journalist reporting on how social media affects body image could collect relevant images and documents by adding a “bodyimage” tag to them.

To search for a specific topic, enter a phrase and then click the plus button to create a tag and apply it to a document you’re viewing. Image: Screenshot, FBarchive

FBarchive Story Ideas and Research Angles

The FBarchive is full of unexplored investigative story ideas and scholarly research topics. To get you started, Sweeney has offered questions needing more journalistic and academic attention, including the following, among others:

Is viral content more likely to increase Facebook’s revenues? How does Facebook handle this tension? Under what circumstances are the needs of human users traded for corporate revenue?
At least 95 countries are identified in the Facebook documents. What are the top concerns Facebook considers for people in these countries on the platform? Are the concerns and the way Facebook addresses them similar or different across countries?
Violence and political unrest exists around the world and is evidenced within the Facebook documents. What is the nature and extent that Facebook itself plays in the proliferation of these tensions, if any? What role could Facebook play to help reduce these tensions?

Informing Future Regulation

The stories and studies prompted by the archive, along with the content of the archive itself, could inform potential regulation.

For legislators and officials interested in regulating tech, trying to understand how Facebook functions has, so far, been like trying to see what’s going on in a “black box,” Sweeney says.

She likens FBarchive to taking an opaque case off an overheating radio and replacing it with a clear one. Everyone can now see the hot spots inside causing problems.

“I just don’t think policymakers have ever had the opportunity to understand where real leverage points were,” Sweeney says. “They always had to depend on what the tech companies themselves said was possible, not possible. And seeing the inside content gives you a much better sense of, how does this really operate?”

This post was originally published by The Journalist’s Resource and is reprinted here with permission.

Clark Merrefield joined The Journalist’s Resource in 2019 after working as a reporter for Newsweek and The Daily Beast, as a researcher and editor on three books related to the Great Recession, and as a federal government communications strategist. He has been selected for fellowships in juvenile justice and solitary confinement at the John Jay College of Criminal Justice and his work has been awarded by Investigative Reporters and Editors.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

Cross post database Facebook whistleblower

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents</h2> by <a href="https://journalistsresource.org/author/clarkmerrefield/">Clark Merrefield, The Journalist's Resource</a> for Global Investigative Journalism Network &bull; October 20, 2023 In September 2021, the Wall Street Journal began publishing a <a href="https://www.wsj.com/articles/the-facebook-files-11631713039">series of articles</a> exposing the inner workings of Facebook and subsidiaries such as Instagram, including evidence that company insiders knew Instagram made teen girls&rsquo; body image issues worse and that Facebook leaders did little to curb recruitment activities of human traffickers and drug cartels.<aside><a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> is designed to help researchers, journalists, and policymakers understand how, why and when decisions have been made at some of the most influential social media platforms in the world.</aside>Much of that reporting was based on a trove of documents and images leaked by former Facebook product manager Frances Haugen, who came forward publicly several weeks after the series published.(In October 2021, Facebook Inc., the parent company, was rebranded as Meta Platforms Inc., an effort <a href="https://www.theverge.com/22749919/mark-zuckerberg-facebook-meta-company-rebrand">at least six months in the making</a> that <a href="https://www.theatlantic.com/ideas/archive/2021/10/facebook-metaverse-mark-zuckerberg/620538/">some commentators in the news media</a> noted might have had the effect of blunting public backlash following Haugen&rsquo;s leaks.)In November 2021, Harvard Kennedy School&rsquo;s <a href="https://techlab.org/">Public Interest Tech Lab</a> received an anonymous drop of information from the Haugen leak, comprising roughly 20,000 images and more than 800 internal Facebook documents, such as chat threads and research, starting from 2016.As of October 18, 2023, that information is available to the public, in a searchable format, via a virtual tool called <a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a>. Users need to <a href="https://auth.mydatacan.org/login?response_type=code&amp;client_id=7cgtc9o08s92u195k013lt55id&amp;redirect_uri=https%3A//fbarchive.org/api/auth/aws_cognito_redirect&amp;state=ea63341bc5d570a2c4f0099130463154" data-type="link" data-id="https://auth.mydatacan.org/login?response_type=code&amp;client_id=7cgtc9o08s92u195k013lt55id&amp;redirect_uri=https%3A//fbarchive.org/api/auth/aws_cognito_redirect&amp;state=ea63341bc5d570a2c4f0099130463154">register for a free account</a> to access the archive.<a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> is designed to help researchers, journalists, and policymakers understand how, why and when decisions have been made at some of the most influential social media platforms in the world. The project is led by <a href="https://www.hks.harvard.edu/faculty/latanya-sweeney">Latanya Sweeney</a>, a technology professor at Harvard who heads the Public Interest Tech Lab.Sweeney says making these internal deliberations and thought processes public will help policymakers and technology researchers discover solutions to the problem of moderating content on social media platforms that billions of people use.&ldquo;We just don&rsquo;t know how to do moderation at scale &mdash; we don&rsquo;t have the technology, we don&rsquo;t have the know-how &mdash; and that&rsquo;s something that&rsquo;s true on all of these platforms where we try to do moderation,&rdquo; says Sweeney, a pioneer in the field of data privacy. &ldquo;So, the question is, how should we do that? Can we look at these documents to see where the fault lines are and inspire new technologies, or new technological approaches?&rdquo;<h4 class="wp-block-heading">How to Use FBarchive</h4>Go to <a href="http://fbarchive.org/">fbarchive.org</a> and hit &ldquo;Enter.&rdquo; This will bring you to a sign-in page. First-time visitors will receive directions to sign up for a new account via the Public Interest Tech Lab&rsquo;s <a href="https://mydatacan.org/">MyDataCan</a> platform. Harvard-affiliated users can sign in with their university ID. All other users can click &ldquo;sign up&rdquo; to create a username and password.The primary gateway to accessing the <a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> materials is a Boolean search bar, meaning certain operators, such as &ldquo;and,&rdquo; &ldquo;or,&rdquo; or &ldquo;not&rdquo; will either broaden or restrict results. Anyone who wants to view a document in FBarchive needs to be logged in.<aside>The <a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> is full of unexplored investigative story ideas and scholarly research topics.</aside>The search bar is useful for researchers and reporters who already have some focus on what they are interested in &mdash; for example, specific keywords or phrases related to body image, gender issues or global conflicts. Journalists and researchers can also get a general sense of what is in the archive by using broader terms &mdash; &ldquo;drug cartels&rdquo; or &ldquo;human trafficking,&rdquo; for example.Users can also search for information about particular people, such as executives at Meta. The FBarchive team redacted names of people who likely have an <a href="https://www.law.cornell.edu/wex/expectation_of_privacy">expectation of privacy</a> &mdash; a software engineer outside of top management, for example. Names of public figures, such as C-suite executives, politicians and celebrities, are not redacted.To help users understand what they&rsquo;re reading, Sweeney and her team created a glossary of terms and phrases found in the documents. The &ldquo;audience problem,&rdquo; for example, is &ldquo;a term used internally to describe the years-long trend of declining post numbers on Facebook,&rdquo; according to the glossary.&ldquo;There&rsquo;s a lot of inside Facebook language in there,&rdquo; Sweeney says.Users can bookmark particular documents and images, and create their own tags, which can be used to curate collections of images and documents. For example, a journalist reporting on how social media affects body image could collect relevant images and documents by adding a &ldquo;bodyimage&rdquo; tag to them.<h4>FBarchive Story Ideas and Research Angles</h4>The <a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> is full of unexplored investigative story ideas and scholarly research topics. To get you started, Sweeney has offered questions needing more journalistic and academic attention, including the following, <a href="https://fbarchive.org/explorations">among others</a>:<ul>
<li>Is viral content more likely to increase Facebook&rsquo;s revenues? How does Facebook handle this tension? Under what circumstances are the needs of human users traded for corporate revenue?</li>
<li>At least 95 countries are identified in the Facebook documents. What are the top concerns Facebook considers for people in these countries on the platform? Are the concerns and the way Facebook addresses them similar or different across countries?</li>
<li>Violence and political unrest exists around the world and is evidenced within the Facebook documents. What is the nature and extent that Facebook itself plays in the proliferation of these tensions, if any? What role could Facebook play to help reduce these tensions?</li>
</ul><h4 class="wp-block-heading">Informing Future Regulation</h4>The stories and studies prompted by the archive, along with the content of the archive itself, could inform potential regulation.For legislators and officials interested in regulating tech, trying to understand how Facebook functions has, so far, been like trying to see what&rsquo;s going on in a &ldquo;black box,&rdquo; Sweeney says.She likens <a href="https://fbarchive.org/" data-type="link" data-id="https://fbarchive.org">FBarchive</a> to taking an opaque case off an overheating radio and replacing it with a clear one. Everyone can now see the hot spots inside causing problems.&ldquo;I just don&rsquo;t think policymakers have ever had the opportunity to understand where real leverage points were,&rdquo; Sweeney says. &ldquo;They always had to depend on what the tech companies themselves said was possible, not possible. And seeing the inside content gives you a much better sense of, how does this really operate?&rdquo;This <a href="https://journalistsresource.org/home/fbarchive-new-reporting-tool/">post </a>was originally published by The Journalist&rsquo;s Resource and is reprinted here with permission.<hr><a href="https://gijn.org/wp-content/uploads/2023/10/Clark-Merrefield.png"><img class=" wp-image-1237103 alignleft" src="https://gijn.org/wp-content/uploads/2023/10/Clark-Merrefield-336x337.png" alt="Clark Merrefield, The Journalist's Resource" width="152" height="152"></a><a href="https://twitter.com/cmerref">Clark Merrefield</a> joined The Journalist&rsquo;s Resource in 2019 after working as a reporter for Newsweek and The Daily Beast, as a researcher and editor on three books related to the Great Recession, and as a federal government communications strategist. He has been selected for fellowships in juvenile justice and solitary confinement at the John Jay College of Criminal Justice and his work has been awarded by Investigative Reporters and Editors.
	This <a target="_blank" href="https://gijn.org/stories/introducing-fbarchive-a-new-searchable-repository-of-facebook-whistleblower-documents/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

Lessons Learned from Exposing Child Sex Trafficking Online

by Majdoleen Hasan • August 22, 2023

A reporting team provides the backstory of the Guardian’s years-long investigation into the world of online child sex trafficking.

News & Analysis Reporting Tools & Tips

Experts Discuss Investigating Social Media Companies Like Facebook, Twitter, and TikTok

by Holly Pate • May 29, 2023

Investigative journalists intending to cover social media and its societial effects must understand the intricacies of the companies that drive them, and think critically about novel angles of coverage.

Reporting Tools & Tips

New Investigative Tools for Monitoring Social Media Platforms

by Rowan Philp • March 20, 2023

Social media platforms are among the most difficult sites to scrape for data across the internet. A recent session at NICAR23 unveiled several dynamic new tools — including Junkipedia, a possible CrowdTangle replacement — that can perform a wealth of social media monitoring tasks, from tracking down who is behind harmful ads to identifying conspiracy groups or influencers spreading disinformation.

Maria Ressa on the Battle for Truth, the Role of America’s Social Media Platforms, and What Comes Next

by Gaelle Faure • September 29, 2019

Before she could even start to speak, Maria Ressa received a standing ovation, which brought tears to her eyes. She was out on bail, and was forced to pay a travel bond in order to leave the Philippines and come to Hamburg, where she gave a keynote address to an auditorium full of fellow investigative journalists from around the world.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

How to Use FBarchive

FBarchive Story Ideas and Research Angles

Informing Future Regulation

Read other stories tagged with:

Republish this article

Read Next

Case Studies

Lessons Learned from Exposing Child Sex Trafficking Online

Stories

Topics

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

Related Resources

Toolkit: How to Investigate Illegal, Unreported, and Unregulated (IUU) Fishing

4 More Essential Tips for Using the Wayback Machine

Remote Sensing and Data Tools for Environmental Investigations

Tips for Building a Database for Investigations

Share

How to Use FBarchive

FBarchive Story Ideas and Research Angles

Informing Future Regulation

Related Resources

Toolkit: How to Investigate Illegal, Unreported, and Unregulated (IUU) Fishing

4 More Essential Tips for Using the Wayback Machine

Remote Sensing and Data Tools for Environmental Investigations

Tips for Building a Database for Investigations

Related Stories

Lessons Learned from Exposing Child Sex Trafficking Online

Experts Discuss Investigating Social Media Companies Like Facebook, Twitter, and TikTok

New Investigative Tools for Monitoring Social Media Platforms

Maria Ressa on the Battle for Truth, the Role of America’s Social Media Platforms, and What Comes Next

Read other stories tagged with:

Republish this article

Read Next

Case Studies

Lessons Learned from Exposing Child Sex Trafficking Online

News & Analysis Reporting Tools & Tips

Experts Discuss Investigating Social Media Companies Like Facebook, Twitter, and TikTok

Reporting Tools & Tips

New Investigative Tools for Monitoring Social Media Platforms

Maria Ressa on the Battle for Truth, the Role of America’s Social Media Platforms, and What Comes Next