Robots.txt are an industry-standard, but non-legally-binding method of denying a...

pbhjpbhj · on May 15, 2013

Indeed your assertions about robots.txt are the expected logical response. However when the issue of copyright infringement for Google (and other search engines) came up in the past [eg storing entire page text, thumbnailing and such] the response, that was apparently accepted, was that as a protocol exists (namely robots.txt) that enabled a website owner to refuse access to works there was no actual infringement. The onus being placed on the works owner to label the work as off-limits using robots.txt.

If that line holds then the corollary appears to be that programming a robot to duplicate linked content deny-ed by robots.txt is certain copyright infringement.

Copyright law is a weird one, it sits in a sort of limbo between tort and crime hence it seems relevant to not only consider the balance of probabilities and the test for tortuous infringement but also the criminal considerations. Moreover in various jurisdictions it's a crime to access a part of a computer system without authorisation, this would appear to apply to MS's directorship in this case; such laws might "require" obedience to established protocols for determining if access is authorised [would be interested if anyone reading can cite caselaw to support/discredit this line of thinking].