Web site abuse by Yahoo

My colocation provider informed me that I went way over bandwidth for June, and would have to either pay a large overage charge, or upgrade to a higher bandwidth tier.  I couldn’t really afford to do either, and was surprised by the large jump in bandwidth, so I started analyzing the logs.  I was expecting that the server was under some kind of attack, but was very surprised at the nature of the attack and the identity of the attacker.

For a long time, I’ve had a directory on my web server containing local copies of DVD images of various Fedora Linux distributions, for convenience in local installs.  There aren’t any links to the directory, and the parent directory isn’t indexable, though some years back it was.  Inktomi found it in 2003 and started indexing it.  Their web crawler did download the files once, but after that they only asked the server if the files had changed.  Inktomi was later acquired by Yahoo.

What I found was that on June 4, Yahoo started indexing them.  Since there is no current link to the files, they probably got the URL from an Inktomi database.  That would have been fine, except that they started downloading all of them every day!

While the public is generally authorized to download any of the content of my web server, downloading the same large files on a daily basis is abusive, and constitutes unauthorized use of computer services and  unauthorized access to a computer (California Penal Code sections 502(c)(3) and 502(c)(7), respectively).

I’ve taken steps to prevent this from happening again.

I would demand that Yahoo reimburse me for the bandwidth fees that they caused me to incur, and pursue legal remedy if they refuse, but as it turns out when I explained the situation to my ISP, they agreed to a one-time waiver of the overage fees.

It seems bizarre to me that any web crawler would actually download ISO image files even once, let alone repeatedly.  Yahoo probably pays less than 1% of what I pay per megabit per second of bandwidth I use, but even at low rates I would have expected that they would not want to download content that is not suitable for indexing, or would only download it once.

This entry was posted in Internet. Bookmark the permalink.

2 Responses to Web site abuse by Yahoo

  1. Les Hildenbrandt says:

    It makes no sense why a seach engine would download an iso image. Are they looking for links? Can you get the cached page?

    How much was the bandwidth charge?

  2. Pingback: Bookmarks about Abuse

Leave a Reply