3 thoughts on “On the Ethics of Web Scraping and Data Journalism

  1. Hello.

    I’m a reporter in Australia who scrapes pages.
    I have to say, I disagree with identifying yourself while scraping.
    I don’t think that’s necessary, because there is nothing technically different between sending a request via a web browser or a scraper.
    I see no difference between viewing HTML via a browser and saving it to a text file and reading it, if that’s how I wish to consume HTML.

    There is an ethical consideration that was not pointed out in the article, which I find to be by far the most important consideration:
    If you’re going to scrape many pages, then you should consider padding out the time between requests.
    Sending 1000 requests per minute to a server would slow it down considerably, and that would be very unethical.


  2. I am learning about ditigal journalism in Mexico. I understand there is not any law about use of web scrap in Mexico, but you have to be very careful with private information. I considerer it is better to identify as reporter.

  3. In Honduras we do not have any law about. No matter the topic, if you tell you are a reporter, then you just not going to have any info. Not even the public offices where it is mandatory to give info, they refuse to give it.

