Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Robots.txt are an industry-standard, but non-legally-binding method of denying access. If you wish to actually deny access, the proper, legally-recognized means for doing so is by actually denying access (i.e., by using .htaccess or similar means).

Also, mens rea (i.e., "guilty mind" or colloquially, intent) is a criminal law concept that has no relevance in the civil/tort law realm. It doesn't matter whether the machine has been programmed to ignore robots.txt because no law requires a software program to obey robots.txt.



Indeed your assertions about robots.txt are the expected logical response. However when the issue of copyright infringement for Google (and other search engines) came up in the past [eg storing entire page text, thumbnailing and such] the response, that was apparently accepted, was that as a protocol exists (namely robots.txt) that enabled a website owner to refuse access to works there was no actual infringement. The onus being placed on the works owner to label the work as off-limits using robots.txt.

If that line holds then the corollary appears to be that programming a robot to duplicate linked content deny-ed by robots.txt is certain copyright infringement.

Copyright law is a weird one, it sits in a sort of limbo between tort and crime hence it seems relevant to not only consider the balance of probabilities and the test for tortuous infringement but also the criminal considerations. Moreover in various jurisdictions it's a crime to access a part of a computer system without authorisation, this would appear to apply to MS's directorship in this case; such laws might "require" obedience to established protocols for determining if access is authorised [would be interested if anyone reading can cite caselaw to support/discredit this line of thinking].




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: