If you don't want your data used by others, don't send it to them.
You explicitly give them permission to have it by going out of your way to install a program on a common port, with a common API, and giving it a directory full of documents to distribute, and not using any form of authentication. The way the web works is that answering is equivalent to granting permission to ask and sending a file is tantamount to granting permission. When you receive a file you don't first receive a permissions document, you receive the file - authentication and contractual obligations come first because there is no later. (This is like the tide, you may not like it but that doesn't mean you can change it, especially not with laws.)
You have many ways to check authentication and legally they can be VERY weak, 1-bit passwords are sufficient, but if you don't restrict access it is open - not just because it's the default, but because it's the technical reality: they didn't hack into your computer to get that file, they asked your document server and it gave it to them!
Robots.txt is a suggestion, for the scraper's benefit! It suggests better links. You're allowed to see the rest (the server sends them to you without a password) but you're unlikely to find good content.
If you're afraid of someone examining data you send them, don't send them the data if they ask. Expecting them to not ask, or once they've received it, to not manipulate it in certain ways because you can't then extract a fee for them doing so is controlling and more-over, doomed to fail.
You explicitly give them permission to have it by going out of your way to install a program on a common port, with a common API, and giving it a directory full of documents to distribute, and not using any form of authentication. The way the web works is that answering is equivalent to granting permission to ask and sending a file is tantamount to granting permission. When you receive a file you don't first receive a permissions document, you receive the file - authentication and contractual obligations come first because there is no later. (This is like the tide, you may not like it but that doesn't mean you can change it, especially not with laws.)
You have many ways to check authentication and legally they can be VERY weak, 1-bit passwords are sufficient, but if you don't restrict access it is open - not just because it's the default, but because it's the technical reality: they didn't hack into your computer to get that file, they asked your document server and it gave it to them!
Robots.txt is a suggestion, for the scraper's benefit! It suggests better links. You're allowed to see the rest (the server sends them to you without a password) but you're unlikely to find good content.
If you're afraid of someone examining data you send them, don't send them the data if they ask. Expecting them to not ask, or once they've received it, to not manipulate it in certain ways because you can't then extract a fee for them doing so is controlling and more-over, doomed to fail.