Top Ten #ddj: This Week’s Top Data Journalism

What’s the global #ddj community tweeting about? Our NodeXL mapping from June 5 to 11 includes #VisualTrumpery from @mcrosasb, analysis of Theresa May’s election disaster by @GuardianVisuals, dataviz structuring strategies from @eagereyes, and school enrollment woes in Delhi from @htTweets.

A Poor Journalist’s Text-Mining Toolkit

How can you search and analyze collections of documents on your own computers with simple tools? At DataHarvest, Robert Gebeloff and I ran a workshop to answer that question. As people were seemed interested, here’s a write-up of the two key tools we worked with: Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.