It's amazing that we can refer to data in a globally-unique way with small content-based hashes. Hash collisions aren't usually worth worrying about.
Another benefit is that its easy to store large numbers of hashes with basic metadata.
SHA-256 hashes are 32 bytes. If it takes 512 bytes on average to store author/title/publish-date/ISBN, then the hash is a small part of the total per item (though not well-compressible.) You can store the info for 2 million books in a megabyte.
Shadow librarians can also publish curated collections of books. I know a guy who tried to do this in a systematic way for college-level history textbooks covering a wide swathe of the world's history. The entire catalog with metadata and hashes is probably only a few hundred thousand KB.
The tricky part is finding a balance between the blockchain approach, where everybody pins every item (wasteful) and the other end of the spectrum where people only pin the things they're thinking about right now (lossy).
There's some middle ground where we coordinate who pins what in a way where there's just enough enough redundancy to not worry about it disappearing, but not so much that any one of us is bearing an unnecessarily large burden.
I don't think we've quite figured that part out yet.
You bring up a great insight - that blockchain/pinning are on opposite ends of the spectrum of what one can choose to store, if one participates in the system at all.
We do have a good example of this already I think: torrent client peer lists. When participating in a torrent, we can see the current availability of the data in the swarm, displayed usually as a bar visualization, with each pixel being a chunk. The darker the chunk, the more peers have a copy of that chunk. The result is also summarized in a number capturing the general health of the swarm wrt hosting the torrent.
All we need to do, then, is to have a mechanism where 1) the client has a list of items to pin 2) the client uses this existing swarm-tracking mechanism to figure out which files need more hosts 3) the client picks the best ones to host given the available space/network constraints. One can be smarter than just picking the lowest-seeded file. If a host is known to have files consistently, but is offline for a few hours a day, the client can be smart and not worry about immediately getting those files, perhaps spending available resources on less-seeded files.
This is possible with current technology. We can do a simple version of this via a torrent client plugin, reading the list of files from RSS or a special text file.
I've seen communities do this manually actually. For hosting SciHub torrents, the community made a web page that showed the current number of known seeds per torrent. Users were responsible for picking whichever ones, usually the lowest-seeded ones, and seeding them. We can remove this tedious and error-prone work.
Doing this per IPFS file will probably take up too many resources. Perhaps we need a standard IPFS<->torrent correspondence. Something as simple as a text file in the root of the torrent file structure, a file that maps IPFS hash <-> file inside the torrent. This way an IPFS swarm and a torrent swarm can work together. You get the easy retrieval of IPFS and the increased durability of torrent swarms.
> Doing this per IPFS file will probably take up too many resources
I think we can come up with some scheme where it doesn't have to be centrally hosted. Like if my public key is 18 mod 256, then it's up to me to pin all of the files I rely on whose CID is also 18 mod 256.
If you've got thousands of users doing this, each one of them has to bear only 1/256th the burden.
I imagine incentive schemes where we keep track of which peers have gotten which files from our nodes, and then later randomly check to see if they're doing their part and pinning what they got from us
We'd all put $5 into a pot at the beginning of the month, and at the end of the month we'd share our data re: who seeded and who leeched. Maybe the bottom 10% leechers get no money back, the top 10% seeders get $10 back and everybody else gets their $5 back.
So it's like it costs $5 to access the first time, but if you leave a node running (like maybe you have a phone with a cracked screen that you just leave plugged in) then you'll never have to pay that $5 again, and if you're lucky you'll get $5 from a leecher.
Of course it doesn't make sense for all files anywhere, but in context with a project, like Z-library, where somebody is curating the files that are in scope. Otherwise an attacker could flood the network with noise that hashed in such a way that it affected only their target (and their target's 1/256 slice of the community).
It's amazing that we can refer to data in a globally-unique way with small content-based hashes. Hash collisions aren't usually worth worrying about.
Another benefit is that its easy to store large numbers of hashes with basic metadata.
SHA-256 hashes are 32 bytes. If it takes 512 bytes on average to store author/title/publish-date/ISBN, then the hash is a small part of the total per item (though not well-compressible.) You can store the info for 2 million books in a megabyte.
Shadow librarians can also publish curated collections of books. I know a guy who tried to do this in a systematic way for college-level history textbooks covering a wide swathe of the world's history. The entire catalog with metadata and hashes is probably only a few hundred thousand KB.