Image: Shutterstock

Stories

•

Topics

» Data Journalism » Methodology » News & Analysis

Tips for Investigating Hate Crimes and Violence When Government Data Sources Fail

by Rachel Chitra • February 10, 2022

Read this article in

The National Crime Records Bureau (NCRB) in India, which has been tracking and publishing the country’s crime statistics since the 1980s, stopped tracking religious killings and farmer suicides in 2017. This leaves no way of checking on whether either of these is trending upwards, although the frequency of reports in the media would suggest, anecdotally, that they are.

Hindustan Times Hate Tracker pulled down 2

Image: Screenshot

Similarly, there is no official tracking of hate crimes. An attempt was made in 2017 by the Hindustan Times to start a hate tracker to document victims, but under government pressure the tracker was taken down. Editor Bobby Ghosh was asked to step down. A similar fate met the Hate Tracker created by online website IndiaSpend, and its editor Samar Halarnkar also resigned.

In my fellowship project at the Reuters Institute for the Study of Journalism, I worked under the guidance of communication researcher Dr. Sílvia Majó-Vázquez to take in key lessons about how journalists can reliably step in to gather, clean, and publish data when the government fails to do so.

Sourcing the Data

First, I defined what would count as a hate crime: criminal acts committed with a bias motive in relation to a group characteristic of the victim such as race, ethnic background, religion, gender, physical or mental disability, or sexual orientation. Then I scoured English language news media for reports of hate crimes between January 1, 2014 and December 31, 2020, copying links and noting details about each attack into a spreadsheet.

I excluded riots, because I did not have the time or resources to catalogue these fully. I excluded social media reports, because I didn’t have the resources to independently verify them. Finally, I excluded regional-language news outlets because I didn’t have the resources to accurately translate all of them.

The result is a Google Sheet with 212 incidents of hate crime reported in English-language media. Wherever possible, I have catalogued details such as the date, type of violence, gender, caste, and socio-economic details of victims and perpetrators, as well as their religion, politics, and the police response. I then analyzed the data for patterns and trends. You can download the full report for my findings.

In any database like this one – a database that relies on available sources, instead of sampling among all known occurrences – we cannot claim that the data is representative. The value of the cases I gathered speaks only to the characteristics of those that were recorded in English language media, and it also highlights the need to independently curate a comprehensive database for a better understanding of the prevalence and characteristics of hate crimes in India.

If you are planning to create your own database to plug a gap in official reporting, but don’t have the luxury of working with a communications researcher like Dr. Majó-Vázquez, I have gathered some helpful lessons from my time working with her.

1. Make Sure Your Data Sheet is Workable

In the beginning, I was very keen on adding every little detail about each case. For example, when documenting the profession of the victim, I would specify whether the person was a meat seller, IT worker, a farmer, or a student. This resulted in nearly 40 categories of work, which didn’t give a sense of the socio-economic class of the victim. With Dr. Majó-Vázquez’s help, I narrowed down the professions of the victims to just five variables: blue-collar worker, white-collar worker, student, religious workers, and other. This let me create a clearer picture of who was most likely to be targeted in a hate crime.

Data on profession of hate crime victims in India

Image: Screenshot

2. Prepare for Preconceived Ideas to be Challenged

Your database could surprise you out of preconceived notions, but you cannot exclude or include data to confirm your own preconceptions without rendering your work useless.

3. Beware of Typos

It sounds like a small issue, but one or two typos can make a worksheet unsearchable. For example, under region, Uttar Pradesh spelled as “Utar Pradesh” meant that it would pull up as a different state while trying to filter search results. Copy editing is dull work, but it meant that I and others could easily analyze the data.

4. Delete Columns That Do Not Have Enough Data

Initially, I had columns documenting ethnicity of victim, sexual orientation, and the number of passive observers of the crime. In the case of the first field, the information became so repetitive as to be redundant. I realized that the nationality of the vast majority in the attacks was Indian, and that filtering for this information would not be of use. In the case of the second two fields, the information was available so rarely that the columns were no longer useful, because the available data would not be representative. Focus on the data that is available within your set.

5. The Need for Footnotes

While it makes sense in a database to refer to the Vishva Hindu Parishad [a Hindu religious organization] as the VHP, I often had to explain my acronyms and shorthand to others. Use a separate sheet or the comments tool to store footnotes about acronyms or links to explainers. This adds clarity so that anyone using the dataset will understand, without “breaking” your filters or cluttering your data set.

6. The Value of Brevity

In a database, I realized some of the easiest searches happen with columns that are in the yes/no format. For instance, did the police file a First Information Report (FIR, the equivalent of a police complaint): yes, or no? Did the police file an FIR against the victim or the perpetrator? By inputting the answers to such questions, I was able to see some unexpected results, for example: police filed FIRs against both the victim and perpetrator in 13% of cases.

Police FIR filings of hate crimes in India

Image: Screenshot

Filling the Void

The list of information that India’s current government withholds from the public keeps getting longer. I can envision independent journalist teams collecting data on topics like how many health workers have been infected with COVID-19, the number of rural child-care social workers who left their jobs during the pandemic, the number of school dropouts, the closure of small- and medium-sized enterprises, the number of attacks on Right to Information activists, and many other stories that require careful monitoring and data that may not be forthcoming from official sources.

In response, other journalists, academics, and activists in India have started creating databases because of the growing void in official data on crucial issues. For example, Article 14 created a Decade of Darkness database project, in which it documents the number of sedition cases filed in the past 10 years. An increasingly common intimidation tactic in India involves the current government accusing journalists and activists of sedition and being “anti-national.”

Conclusion

My hope is that this paper helps journalists think about how they define the questions they want to answer, how they collect the data, and how they report on it.

Work on the hate crimes tracker is by no means complete: I intend to take it online and invite other journalists to gather and catalogue information. It currently only includes reports from English-language outlets, and could be dramatically expanded if we were to include regional language reports too.

The risk to quality of information in crowdsourced projects like these lies in duplication of entries, authentication of the reports, and implicit bias in the interpretation of results. With full awareness of those risks – and the security risks to those who try to keep tallies – I believe it is still worth trying.

This article was originally published by the Reuters Institute for the Study of Journalism. GIJN added a small amount of additional material from the author, including from the full paper.

Additional Resources

Interpreting Data: Tips to Make Sure You Know How to Read the Numbers

How One Mexican Data Team Uncovered the Story of 4,000 Missing Women

How Bellingcat and Forensic Architecture Teamed Up with Colombia’s Cerosetenta to Map Police Violence

Rachel Chitra is a ﬁnance journalist and special correspondent for the Times of India with a passion for using data to tell human stories. She has previously worked for Reuters, New Indian Express, and Deccan Chronicle. Her project for the Reuters Institute looked at the documentation of lynching and rape of Muslims and Dalits in India.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

access to public information crowdsourcing data journalism data sources database hate crimes India investigative Journalism investigative reporting lynching open source reporting race Reuters Institute

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Tips for Investigating Hate Crimes and Violence When Government Data Sources Fail</h2><p class="byline"> <span>by</span> <a href="https://reutersinstitute.politics.ox.ac.uk/people/rachel-chitra">Rachel Chitra</a> <span>for Global Investigative Journalism Network</span> <span>&bull; February 10, 2022</span> </p><p>The National Crime Records Bureau (NCRB) in India, which has been tracking and publishing the country's crime statistics since the 1980s, <a href="https://indianexpress.com/article/india/ncrb-leaves-out-data-on-lynchings-khap-and-religious-killings-6081188/">stopped tracking</a> religious killings and farmer suicides in 2017. This leaves no way of checking on whether either of these is trending upwards, although the frequency of reports in the media would suggest, anecdotally, that they are.</p><p>Similarly, there is no official tracking of hate crimes. An attempt was made in 2017 by the Hindustan Times to start a hate tracker to document victims, but under government pressure<a href="https://thewire.in/media/hindustan-times-hate-tracker"> the tracker was taken down</a>. Editor Bobby Ghosh was asked to step down. A similar fate met the <a href="https://scroll.in/latest/937076/factchecker-pulls-down-hate-crime-watch-database-sister-websites-editor-resigns">Hate Tracker</a> created by online website IndiaSpend, and its editor Samar Halarnkar also resigned.</p><p>In my fellowship project at the Reuters Institute for the Study of Journalism, I worked under the guidance of communication researcher <a href="https://reutersinstitute.politics.ox.ac.uk/people/dr-silvia-majo-vazquez">Dr. S&iacute;lvia Maj&oacute;-V&aacute;zquez</a> to take in key lessons about how journalists can reliably step in to gather, clean, and publish data when the government fails to do so.</p><h4>Sourcing the Data</h4><p>First, I defined what would count as a hate crime: criminal acts committed with a bias motive in relation to a group characteristic of the victim such as race, ethnic background, religion, gender, physical or mental disability, or sexual orientation. Then I scoured English language news media for reports of hate crimes between January 1, 2014 and December 31, 2020, copying links and noting details about each attack into a spreadsheet.</p><p>I excluded riots, because I did not have the time or resources to catalogue these fully. I excluded social media reports, because I didn&rsquo;t have the resources to independently verify them. Finally, I excluded regional-language news outlets because I didn&rsquo;t have the resources to accurately translate all of them.</p><aside class="module align-right half type-pull-quote">Copy editing is dull work, but it meant that I and others could easily analyze the data.</aside><p>The <a href="https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2022-01/RISJ_Final%20Report_Rachel_2021_Final.pdf">result is a Google Sheet with 212 incidents of hate crime</a> reported in English-language media. Wherever possible, I have catalogued details such as the date, type of violence, gender, caste, and socio-economic details of victims and perpetrators, as well as their religion, politics, and the police response. I then analyzed the data for patterns and trends. <a href="https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2022-01/RISJ_Final%20Report_Rachel_2021_Final.pdf">You can download the full report for my findings.</a></p><p>In any database like this one &ndash; a database that relies on available sources, instead of sampling among all known occurrences &ndash; we cannot claim that the data is representative. The value of the cases I gathered speaks only to the characteristics of those that were recorded in English language media, and it also highlights the need to independently curate a comprehensive database for a better understanding of the prevalence and characteristics of hate crimes in India.</p><p>If you are planning to create your own database to plug a gap in official reporting, but don&rsquo;t have the luxury of working with a communications researcher like Dr. Maj&oacute;-V&aacute;zquez, I have gathered some helpful lessons from my time working with her.</p><h4>1. Make Sure Your Data Sheet is Workable</h4><p>In the beginning, I was very keen on adding every little detail about each case. For example, when documenting the profession of the victim, I would specify whether the person was a meat seller, IT worker, a farmer, or a student. This resulted in nearly 40 categories of work, which didn&rsquo;t give a sense of the socio-economic class of the victim. With Dr. Maj&oacute;-V&aacute;zquez&rsquo;s help, I narrowed down the professions of the victims to just five variables: blue-collar worker, white-collar worker, student, religious workers, and other. This let me create a clearer picture of who was most likely to be targeted in a hate crime.</p><h4>2. Prepare for Preconceived Ideas to be Challenged</h4><p>Your database could surprise you out of preconceived notions, but you cannot exclude or include data to confirm your own preconceptions without rendering your work useless.</p><h4>3. Beware of Typos</h4><p>It sounds like a small issue, but one or two typos can make a worksheet unsearchable. For example, under region, Uttar Pradesh spelled as &ldquo;Utar Pradesh&rdquo; meant that it would pull up as a different state while trying to filter search results. Copy editing is dull work, but it meant that I and others could easily analyze the data.</p><h4>4. Delete Columns That Do Not Have Enough Data</h4><p>Initially, I had columns documenting ethnicity of victim, sexual orientation, and the number of passive observers of the crime. In the case of the first field, the information became so repetitive as to be redundant. I realized that the nationality of the vast majority in the attacks was Indian, and that filtering for this information would not be of use. In the case of the second two fields, the information was available so rarely that the columns were no longer useful, because the available data would not be representative. Focus on the data that is available within your set.</p><h4>5. The Need for Footnotes</h4><p>While it makes sense in a database to refer to the Vishva Hindu Parishad [a Hindu religious organization] as the VHP, I often had to explain my acronyms and shorthand to others. Use a separate sheet or the comments tool to store footnotes about acronyms or links to explainers. This adds clarity so that anyone using the dataset will understand, without &ldquo;breaking&rdquo; your filters or cluttering your data set.</p><h4>6. The Value of Brevity</h4><p>In a database, I realized some of the easiest searches happen with columns that are in the yes/no format. For instance, did the police file a First Information Report (FIR, the equivalent of a police complaint): yes, or no? Did the police file an FIR against the victim or the perpetrator? By inputting the answers to such questions, I was able to see some unexpected results, for example: police filed FIRs against both the victim and perpetrator in 13% of cases.</p><h4>Filling the Void</h4><p>The list of information that India&rsquo;s current government withholds from the public keeps getting longer. I can envision independent journalist teams collecting data on topics like how many health workers have been infected with COVID-19, the number of rural child-care social workers who left their jobs during the pandemic, the number of school dropouts, the closure of small- and medium-sized enterprises, the number of attacks on Right to Information activists, and many other stories that require careful monitoring and data that may not be forthcoming from official sources.</p><p>In response, other journalists, academics, and activists in India have started creating databases because of the growing void in official data on crucial issues. For example,&nbsp;<a href="https://www.article-14.com">Article 14</a> created a <a href="https://article-14.stck.me/post/16189/A-Decade-of-Darkness-Our-Sedition-Database-Is-Now-Live">Decade of Darkness database project</a>, in which it documents the number of sedition cases filed in the past 10 years. An increasingly common intimidation tactic in India involves the current government accusing journalists and activists of sedition and being "anti-national."</p><h4><strong>Conclusion</strong></h4><p>My hope is that this paper helps journalists think about how they define the questions they want to answer, how they collect the data, and how they report on it.</p><p>Work on the hate crimes tracker is by no means complete: I intend to take it online and invite other journalists to gather and catalogue information. It currently only includes reports from English-language outlets, and could be dramatically expanded if we were to include regional language reports too.</p><p>The risk to quality of information in crowdsourced projects like these lies in duplication of entries, authentication of the reports, and implicit bias in the interpretation of results. With full awareness of those risks &ndash; and the security risks to those who try to keep tallies &ndash; I believe it is still worth trying.</p><p><em><a href="https://reutersinstitute.politics.ox.ac.uk/how-cover-hate-crimes-and-violence-when-government-sources-fail">This article</a> was originally published by the <a href="https://reutersinstitute.politics.ox.ac.uk">Reuters Institute for the Study of Journalism.</a> GIJN added a small amount of additional material from the author, including from <a href="https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2022-01/RISJ_Final%20Report_Rachel_2021_Final.pdf">the full paper</a>.&nbsp;&nbsp;</em></p><h4><b>Additional Resources</b></h4><p class="entry-title"><em><a href="https://gijn.org/2021/09/30/interpreting-data-tips/">Interpreting Data: Tips to Make Sure You Know How to Read the Numbers</a></em></p><p class="entry-title"><em><a href="https://gijn.org/2016/06/29/how-one-mexican-data-team-uncovered-the-story-of-4000-missing-women/">How One Mexican Data Team Uncovered the Story of 4,000 Missing Women</a></em></p><p class="entry-title"><em><a href="https://gijn.org/2021/06/03/how-bellingcat-and-forensic-architecture-teamed-up-with-colombias-cerosetenta-to-map-police-violence/">How Bellingcat and Forensic Architecture Teamed Up with Colombia&rsquo;s Cerosetenta to Map Police Violence</a></em></p><hr><p><strong><a href="https://gijn.org/wp-content/uploads/2022/02/Rachel-Chitra-profile-picture.png"><img class="alignleft wp-image-471526 size-thumbnail" src="https://gijn.org/wp-content/uploads/2022/02/Rachel-Chitra-profile-picture-140x140.png" alt="Rachel Chitra profile picture" width="140" height="140"></a></strong><em><a href="https://reutersinstitute.politics.ox.ac.uk/people/rachel-chitra"><strong>Rachel Chitra</strong></a> is a ﬁnance journalist and special correspondent for the Times of India with a passion for using data to tell human stories. She has previously worked for Reuters, New Indian Express, and Deccan Chronicle. Her project for the Reuters Institute looked at the documentation of lynching and rape of Muslims and Dalits in India.</em></p><p>
	This <a target="_blank" href="https://gijn.org/stories/tips-for-investigating-hate-crimes-and-violence-when-government-data-sources-fail/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">
</p>

Lessons Learned: 10 Common Mistakes in Data Journalism

by Rowan Philp • April 24, 2024

GIJN asked speakers and attendees in the NICAR conference hallways for the data journalism gaps they see, and for under-covered topic areas and under-used skills that newsrooms can address.

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

by Joseph A. Davis, SEJournal • April 4, 2024

Environmental journalists should check out the new database tool Spill Tracker, but should also bookmark these other resources for reporting on hazmat events.

Nalbari,,Assam,,India.,18,April,,2019.,An,Indian,Voter,Casts

Data Journalism Investigating Institutions News & Analysis

Investigating India: How Smaller, Independent News Outlets Are Digging into Politics in a Key Election Year

by Bhavya Dore • April 3, 2024

Smaller investigative news outlets in India have been finding ways to do the most challenging and impactful accountability reporting leading up to the country’s key parliamentary elections in 2024.

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

by Marianne Bouchart • March 27, 2024

There were 52 data journalism entries from 22 countries in shortlist for the 2024 Sigma Awards. Here are the top 10 winning projects.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Tips for Investigating Hate Crimes and Violence When Government Data Sources Fail

Read this article in

Sourcing the Data

1. Make Sure Your Data Sheet is Workable

2. Prepare for Preconceived Ideas to be Challenged

3. Beware of Typos

4. Delete Columns That Do Not Have Enough Data

5. The Need for Footnotes

6. The Value of Brevity

Filling the Void

Conclusion

Additional Resources

Read other stories tagged with:

Republish this article

Read Next

Data Journalism News & Analysis

Lessons Learned: 10 Common Mistakes in Data Journalism

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Data Journalism Investigating Institutions News & Analysis

Investigating India: How Smaller, Independent News Outlets Are Digging into Politics in a Key Election Year

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Stories

Topics

Tips for Investigating Hate Crimes and Violence When Government Data Sources Fail

Read this article in

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJN Updates Its Reporting Guide for Investigating Sexual Abuse

10 Investigative Questions to Ask after a Natural Disaster

Share

Sourcing the Data

1. Make Sure Your Data Sheet is Workable

2. Prepare for Preconceived Ideas to be Challenged

3. Beware of Typos

4. Delete Columns That Do Not Have Enough Data

5. The Need for Footnotes

6. The Value of Brevity

Filling the Void

Conclusion

Additional Resources

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJN Updates Its Reporting Guide for Investigating Sexual Abuse

10 Investigative Questions to Ask after a Natural Disaster

Related Stories

Lessons Learned: 10 Common Mistakes in Data Journalism

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Investigating India: How Smaller, Independent News Outlets Are Digging into Politics in a Key Election Year

10 Outstanding Data Projects Win the 2024 Sigma Awards

Read other stories tagged with:

Republish this article

Read Next

Data Journalism News & Analysis

Lessons Learned: 10 Common Mistakes in Data Journalism

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Data Journalism Investigating Institutions News & Analysis

Investigating India: How Smaller, Independent News Outlets Are Digging into Politics in a Key Election Year

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards