Editor’s Note: A team of Harvard Law School researchers recently studied the 2.3 million hyperlinks appearing on The New York Times website between 1996 and 2019. They found that one-quarter of all linked sources were now completely inaccessible and more than half — 53 percent — of all articles that contained links suffered from at least one example of so-called “link rot.” It’s issues like this that underscores the need for reports such as the Donald W. Reynolds Journalism Institute’s new “The State of Digital News Preservation,” which we’ve excerpted below.
It’s no headline that newsrooms across the country are struggling to survive, battered by multiple economic forces, the manic march of digital competition and technology, the storm of political attacks, and in 2020, the sudden repercussions of a pandemic predator.
While these are well known across the news industry, one little-recognized, unlisted casualty of the struggle for newsroom survival is the impact on an irreplaceable resource that citizens around the world rely on: the public record of their communities as recorded by their local newspaper, radio, TV station, online newsroom, or other news outlet.
What if that record is going away? What if significant parts of that information stream, digital content especially, are getting lost, erased, chewed up by the machinery of technology, untended in the financial struggle and increasingly allowed to digitally decay, to get disconnected from its various components, to disintegrate? What if, because of the mind-boggling complexity of modern digital publishing systems, our first draft of history is dissolving?
That’s the unfortunate fact of what’s happening right now in newsrooms across the country. Quietly, in the background of the news industry’s public struggles is a nearly invisible but dramatic decline in efforts to preserve our daily news.
As the Donald W. Reynolds Journalism Institute’s new report “The State of Digital News Preservation” makes clear, there are several steps any newsroom can take right now, with no financial investment, to address the problem of digital news preservation. You may not be able to do everything you want to do right away, but getting started now is the best path forward toward solving these issues.
Here are tips from the researchers on what your newsroom can do right now.
1. Create a preservation policy, replace happenstance
We recommend that every news organization institute a written policy for news preservation. This need not be a long document, a page or two will suffice as a start. Outline the goals of the policy, including the content you want to preserve, for whom the content is being preserved, and what purposes you expect it to be used for — internal news research for example. Once that’s in place you may want to [also] specify other file formats, who should have access and guidelines on what kinds of changes can be made after content is published, who can do this, and what permissions are needed, if any. For example, you should include your policy on unpublishing news (see below), as this can directly affect your saved content.
You may also want to involve other parts of the organization beyond the newsroom, especially technology staff, along with those who develop new market opportunities from improved content archives. For technology, our research shows that having a written policy is a major step in the preservation process. If your newsroom is like most of the ones we interviewed, they likely have no preservation guidelines to follow and would welcome them. Once this is done, it’s important to communicate your preservation policy to everyone in the organization and arrange a mechanism to update and re-communicate any changes as this develops and improves over time.
2. Tap someone to be responsible for preservation
It’s important to tap someone in your newsroom to begin taking responsibility for content preservation. Most likely your newsroom no longer has a news librarian, or perhaps never did. Start the process now by assigning someone in your newsroom responsibility for beginning the process to ensure long-term preservation of your digital news content. Even if this is only part of their role to begin with, don’t wait. There’s a long road ahead that will take many months if not years, but this step is an essential prerequisite, and it cannot begin too soon. It won’t happen at all unless there’s someone to take ownership.
Ideally, it would be best to hire media archivists for this work, who could assist in creating digital news content preservation plans, manage metadata and workflow, and much more. Our study shows that organizations with dedicated archival staff, even part time, are doing better in preserving digital news content.
3. Review metadata to ensure you have what’s needed
Knowing and understanding your metadata is a critical step in improving content preservation for your unique, original content. A review will show what steps you can take to ensure you have clear metadata on content objects, including origin and ownership. This may call for changes in your existing technology, especially configuration changes to add or modify metadata fields needed with each type of content.
For example, one trend we noticed is that news content assignment functions are commonly done off-platform, not in any CMS but in tools such as Google Docs or Google Sheets, where it gets overwritten or discarded over time. But this information is a gold mine for preservation because it shows the intentions, the planning for news that help identify the purpose, time and date, and individuals involved in creating that story, photo, or video.
If possible, work with your tech vendors to find a way to feed that info into metadata fields for your content, so it’s always there, helping to ensure that you know where that photo or video came from, when it was shot, who did it, and what the story was that generated that assignment. This is priceless information that most news organizations don’t take advantage of. And if possible, be sure to also do this with data that’s not in your production or publishing systems: the still photo outtakes, the original raw video, and audio files. This one step can go a very long way toward ensuring that you know the full story behind all the content saved over time.
As part of this, check to make sure your CMS is utilizing one of the most common metadata standards in the industry: the IPTC photo and video metadata that’s created by cameras from the moment images are captured, including timestamps and geolocation data. Typically, the IPTC fields are supplemented with captions and other information as image data goes through a news publishing workflow of editing and packaging before it goes live.
But we also learned that some web CMS systems either do not read the IPTC fields when imported or accept only a fraction of what is there. If that is your only system, this could mean your primary photo file will be missing critical metadata. It’s okay if this data is stripped out in rendering the html for the public web page. But take the time to understand this process for your newsroom, to make sure that somewhere along the line the original image files with full metadata are stored.
4. Establish a plan for handling “unpublishing” requests
Consider your newsroom’s plans for managing requests to remove, de-index, or otherwise alter published content, a process sometimes called unpublishing news. With growing concerns about internet privacy, the proliferation of reputation management services, and emerging legal restrictions such as the European Union’s data privacy legislation, this is a fairly urgent and rapidly growing area of concern for every news organization.
It’s not a question of whether this is happening. Chances are it has already happened. It’s typical that many people in newsroom or technology staffs could potentially unpublish without managers’ knowledge. The only way to control it is to have a plan and clearly communicate it to your entire organization. Communicating your policy publicly will support transparency and accountability.
5. Clarify content ownership in CMS, other systems
Clarify ownership and licensing for the content you publish now. While ownership information on past content may be difficult to establish or correct after-the-fact, there is no reason to wait to address any ownership uncertainties around content your newsroom and freelancers are currently creating and publishing.
We suggest you do some research on your company’s existing policies, contracts, and service agreements regarding any of the content you publish at present. For example, is there a clear expectation on who owns still photos or videos shot by your staff photographers? How about rights to news stories, graphics, audio, and video from interviews or database journalism projects? There’s no single answer to these questions, the key thing that matters is to ensure clarity for all involved.
This is especially an issue with content created by freelance news staff. It’s important to be aware of the implications of the Tasini v. The New York Times case of 2001, for example. It’s critical that you take steps to ensure your newsroom is following the guidelines that came out of that case. And if you already have clear policy or guidelines, it may still be worth updating to ensure it covers today’s digital realities, including all content types (reporters who shoot video, for example), social media, and others.
6. Run a self-assessment test on metadata, ownership, workflows
To check whether you may have issues to address related to the factors above, we suggest you conduct some simple tests, a self-assessment on how well your news organization is doing in ensuring preservation, identification, and ownership of the unique content your news organization created.
Here are some sample questions to ask, and additional resources on this process:
• Self-test: Here’s a test you can do any day to clarify what’s going on in your news workflows and systems. Check whether you can identify, locate and retrieve the original version of every still photo and video your newsroom published today. Not multiple copies that may show up in your CMS, the resized and resampled versions, but the original, the high-resolution originals. Can you do that? And if so, does that content have sufficient metadata to clearly show your company owns that image, that video?
• Going further on this, does your CMS include fields that clearly show whether content is staff- generated or comes from a freelance reporter or photographer, or some other third-party source? If those fields are not available, consider taking steps to get them added, visible, and editable in your content management systems, and add steps in the news workflow to ensure the fields are populated.
• Check to find out what news content is being saved, for how long, and what content is not being saved or is purged after a period of time. In today’s newsrooms this is often a function left to technology staff, who in many cases have little to no guidelines to follow. In the absence of any other guidance, tech staff may default to deleting content or purging version stacks as needed simply to maintain system performance, a practice that can inadvertently work against good preservation practices.
Finding a way to preserve copies of the original full resolution, full color-depth, full-size news photographs, and news videos, with full original metadata, is one of the best steps you can take for long-term preservation of this critical, irreplaceable content. It’s great if you can get these files into a system with a database and redundant storage and fully indexed content. But even a simple system of drives can work to get this process started. Note: it’s okay to save images in JPEG format rather than camera-raw image files, which are far larger, as JPEG is an accepted standard.
Digital News as Historic Record
Humankind can benefit greatly from the lessons of history, but only if a clear, accurate record is passed from one generation to the next. If a deadly pandemic strikes again in 100 years, will journalists, historians, and researchers in the year 2121 have the chance to benefit from our experience today? The answer to that question is not obvious. There currently is no clear pathway from the systems holding born-digital news content today to some version of publicly accessible archives of the future. It does seem apparent from this study that, despite many risks and challenges, contemporary news content is still valued enough to not be deleted. The window of opportunity for long-term preservation is still open for a great deal of born-digital news content, but that opportunity will not last indefinitely.
This post is an edited excerpt of the new report “The State of Digital News Preservation” published by the Donald W. Reynolds Journalism Institute at the University of Missouri. It is republished here under a Creative Commons Attribution License.
Edward McCain is the digital curator for journalism at the University of Missouri’s Reynolds Journalism Institute. He co-authored this report with Neil Mara, Kara Van Malssen, Dorothy Carner, Bernard Reilly, Kerri Willette, Sandy Schiefer, Joe Askins, and Sarah Buchanan.