Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BitTorrent Site Magnet Database Archives Available via Torrent and Direct DL (archive.org)
160 points by metadat on May 21, 2022 | hide | past | favorite | 84 comments


Let's not forget one of the best approaches for indexing magnet hashes: https://btdig.com . It basically connects to the DHT and listens to queries. When a new one is found, it downloads the torrent data and indexes it.


DHT work is fun - you can scrape a lot of interesting content. BitTorrent and Tor both use DHTs for content discovery.

If you’re interested in building a minimum viable scrapper, this is a good reference: https://github.com/retrohacker/Taboo/blob/main/lib/index.js

The trick to content discovery on the DHT is to parrot any incoming request you receive. So if someone asks for a hash or announces a hash, you turn around and ask for that hash too. You can spin up multiple DHT nodes to increase your coverage of the table.

Some cool user experiences can be built off of this. I’ve discovered a tonne of interesting content using “BitTorrent Roulette” applications that download random hashes from the DHT - often times finding fascinating content from languages I don’t speak.


"BitTorrent Roulette" sounds like a wonderful way to end up with child porn on your computer.


Are people still so naive as to use torrent for that?

Well I'd guess I might be the naive, given the ample amount of honey pots and such


I’ve found that, no, they aren’t.

The ratio of this content on the DHT is so incredibly low you’re very unlikely to pull it. I never have, and I’ve resolved a metric tonne of random torrent hashes.

You do find content labeled as such - but it’s either password protected rars (I suspect the password is to remove plausible deniability? Idk what’s inside these rars, I obviously don’t have the password and I immediately shred them when they do fall out of the roulette wheel - I’m not interested in trying to break into them) or a honeypot with a government agency video advertising a support hotline. I’ve noticed a government in LatAm has a lot of honeypot content on the public BitTorrent DHT. It’s a guy sitting at a desk in front of the state flag advertising an anonymous support hotline.

I have, however found some ethically questionable things. Dumps of credit card info, sus military schematics written in an Eastern European language, etc.

Ethics aside - if you live in the United States, more than 90% of what falls out of the Roulette Wheel is going to be strictly illegal because of copyright law. Downloading random hashes from the DHT is not something I’d suggest doing from a computer that can easily be traced back to your identity. I summarize this in the README of Taboo: https://github.com/retrohacker/taboo


Wow, thank you for that. That's very clever and very useful.

It's a shame they use recaptcha, it's means it's impossible to script searches with something like Jackett.


well, you could run your own instance: https://github.com/btdigg-org/dhtcrawler2


Am I missing something or is the source not actually available in that repo? I only see the compiled .beam files


Source code is available under the “src” branch.


what a strange way to use git


> This git branch maintain pre-compiled erlang files to start dhtcrawler2 directly. So you don't need to compile it yourself, just download it and run it to collect torrents and search a torrent by a keyword.


Ah, that is perfect. Thank you.


Well, one can write a bot that defeats recaptcha. Trivial, once you know what to do.


If "one" can write a bot that defeats recaptcha, what is the point of recaptcha? I think extraordinary claims require extraordinary evidence.


I presume by "defeat" they mean "use one of the libraries that allows you to pay $0.10 for each query, to route your CAPTCHA to people in developing countries who will solve them for you." Which is economically viable for individual use-cases, but still economically unviable for spambots.


It is much cheaper than 0,10 USD per captcha.


Another technique is to enable the accessibility mode, which is a speech-to-text prompt and is supposedly easier to bypass.


It would still limit "most" =p


Spot on!


Badass. Thank you sir, madame, or they.


> sir, madame, or they

Neither, I think it's a cylon.


I am assembling the grand mac daddy of all magnet link dbs / seeding systems. If you want to subscribe to updates or are otherwise interested please reply here so I can gauge interest. Will post to HN when ready. This is bigger than piracy, it's about free flow of information over time for humanity.

I am up for collaborating and sharing.


your default /var/www/html/index.html is showing ;)


Did I stumble upon a cylon hive?


Look left, look right. It appears so.


> https://btdig.com

First hint suggested: "interracial gangbang"

They might want to put those hints behind a NSFW warning; just my .2c.


Anyone who has spent more than 5 minutes in torrent-land has seen NSFW


that link doesn't work


I love what the Internet Archive does and I assume it pisses off many big fish. How do they get away with so much content? What guarantees are there that they're going to be able to preserve digital history in the long term?


Here is one of thepiratebay up to 2019: https://archive.org/details/ThePirateBayMarch2019


Thanks, the one in the link is from 2016


A different torrent metadata archive (.torrent files) that's somewhat more up to date:

https://archive.org/details/torrent_metadata_archive_sample

(Though, in practice not as easy to download, you need to download a tar ball for each month, the naming is predictable, so you can at least script the download)


This is one of the biggest DHT search engines I know of.. though it'll just tell you if something exists or not, you have to find the magnet data somewhere else:

https://torrenthistory.org/


Any way to know which ones might be under surveillance?

Right now I have to choose between darkweb piracy (Slow downloads, hard to find things, search services are slow) and clearnet piracy (Lots of popular shows are monitored by copyright bots), so I've been pirating less and less


Pay $40 a year for a vpn. Use a docker container that has iptables set up to only use the vpn and put a torrent client. $40 a year and nobody will bother you. The person that a vpn provider will expose to lose their company base will be a terrorist not a torrentor


Or use a seedbox


or set up openvpn on a dirt cheap vps based in romania or some shit


hello eu-central-1... :D


use Tribler? The torrent client with own Tor-fork for bulk tunneling and support for search.


I have fiber so a VPN just nukes my download speeds enough that I hated using it for torrenting.

Is this normal in your experience? I am/was a VPN noob.


Do not route everything through the VPN then, only bind your BitTorrent client to the VPN’s network interface. Also has the advantage that if the VPN connection accidentally drops, your BT client doesn’t start transmitting on your residential and expose your IP.


Get your own VPS and install OpenVPN server on it. Much faster than shared VPNs. I get gig speed on a $12/year VPS.


Doesn't that defeat the point? Most VPS providers will quickly ban you upon receiving copyright notices.


that sounds sweet. do you mind telling us which VPS are you using?


I on and off use mikrovps as they advertise they don’t care about dmca complaints. Looks like the first time I signed up was in 2018 and never heard a peep from them.

Usually I get the cheapest plan and run a torrent client on it then use scp from there. I almost got wireguard working once but had to get them to add a kernel module through a support ticket then reinstalled and didn’t feel like bugging them again.

Just a happy customer.


I thought Wireguard was the new, preferred, simpler way of running a VPN nowadays. Supposed to be lighter protocol and less complicated to set up wrong than OpenVPN.


> Any way to know which ones might be under surveillance?

Yes, the answer is that all of them "might" be under surveillance. There's no way to guarantee that any of them aren't.


Content piracy is one of the very few valid uses of a public VPN provider service.


I receive complaints all the time by my ISP for torrenting... they do nothing except frightening older people


Not universally true though, and in some countries (like e.g. Germany) it's a really bad idea to torrent without a seedbox or vpn in another country.


What's different about Germany when it comes to piracy? And what about using a PeerGuardian blacklist to block known trackers?


There are lawyers that speciallize in sending fined cease and desist letters, and German courts are very quick in giving court orders to ISPs to force them to give such lawyers the name and address of the offending IP address.

If you torrent in Germany without a VPN or a Seedbox, so from a residential IP, there is an extremely high chance (if you torrent the latest blockbuster from Pirate Bay it will approximate 100%) that you will get a so called "Abmahnung" and have to pay either a few hundred / thousand euros or fight it in an actual court.

PeerGuardian could help, but it's risky and both VPN and Seedboxes are pretty cheap, so I would use one of the latter. :D


Those lawyers aren't going to bother to go to court for less than 1000€ though. They probably don't even check if their data is correct. They hope you incriminate yourself by paying the fee and signing their prewritten contract which is massively stacked against your favour. You'll have to go to a lawyer just to write an actual cease and desist agreement and most of the time you can dodge the entire bill.


Sure, but why even bother with this stuff if there are (more or less) safe ways (VPN, Seedbox in another country) available for cheap? :D


I wonder, can this be spoofed to essentially DoS them by making them send their BS letter to every single German resident?


Just the politicians in power and their entourage, please


I'm from Germany and I had an issue with this two times. The first time was when I torrented a Movie when I was like 17, and got a C&D which ordered me to pay a €1000 fine. Got a lawyer (which cost around €500) and never heard back. Two years later they sent a letter again, because of the same thing and demanded €2000, to which I just didn't respond.

The time window from me torrenting to the letter being sent was around 2 months, not that bad.

Nowadays, I just use a VPN & Torrent Caching, no issue at all (40 TB and counting) :)


If you're in the US I advise caution as ISPs are going to be increasingly aggressive about shutting off accounts. The media industry has demanded that repeated, unproven, accusations alone should be enough to force your ISP to close your account and never allowing you to open a new one. ISPs don't want to do that, but courts so far have been agreeing with the media industry on that. ISPs who don't do it, risk literal billions in fines.

IF you don't have multiple ISP options each offering reliable high speed connections in your area, you should be especially careful or you could find yourself with very limited access to the internet or even no internet access at all.


13 years ago, last I used them, Comcast would send an email with a list of the titles of the torrents it thought you downloaded.

Good enough to scare ~everyone by itself, though iirc they did threaten to halt your service, something I never decided to test out.


I tested it, it's legit, they will terminate your service.


I've always wondered about this. I'm assuming that they terminate, you so paying (obviously), but can't you just get another service with someone else?

I'm aware that in the US the ISPs are essentially monopolies avoiding each others areas, which always confused me for the original capitalist economy. Are you then on a blacklist? What's the recourse?


The last mile problem with internet and telecommunications to homes has been a battle going on now for decades.

Comcast, Spectrum, Charter, etc all lobbied for laws that said that they “own” the equipment (and poles) going all the way to your house. Not the property owner and certainly not the city/municipality. This is why community fiber has failed here in the US. It would have to be done with entirely new lines and infrastructure not owned by a “utility”.

Worse still is they have lobbied to make sure they AREN’T classified as a utility and be regulated as one.

Are you on a blacklist if you get kicked off? Yes. Though it’s easy to work around it.

I got kicked off because my IP address was the same IP address that someone saw on a torrent tracker. Despite the fact that cable has dynamic IP’s. I eventually fought them and got back on their good side.

So to recap. A community foots the bill for infrastructure, that they’ll never own, to a cable company, who has guaranteed rights through legislation, to own all of the infrastructure including the coaxial cables running to your house. They still want to own your router and modem and your TV. Most people just give it to them. It’s absurd. Using Comcast’s router/modem, on Comcast coaxial, on Comcast utility poles, to a Comcast repeater station, and yet NOT A UTILITY.


In most places, there's at least 1 other option in my experience. But there's rarely more than 2 decent options, and there might only be the 1.

Starlink is coming into play, though, giving everyone a third option that is probably decent, unless you're a serious gamer or have other needs that require minimum latency.


There is no recourse for 90%+ of US households if they want wired internet. The company that owns the coaxial wire connection to your home is the only broadband option (download bandwidth only). Comcast is the biggest company, but there is always only 1 coaxial cable connection to each home so only 1 company you will be able to get it from.

If you are one of the lucky few, then you also have the option of fiber internet. But fiber actually provides broadband upload in addition to broadband download, so if you get cut off from fiber, then you are left with no option for broadband upload.


Private torrent trackers. Check out r/trackers.

It requires time and effort though.


How does that help? YourISP is not looking at your HTTPS traffic, but your torrent traffic, right?


I don't think it's the ISP monitoring for torrent traffic, but copyright holders honeypotting torrents and keeping track of the IP of seeders, maybe leechers too I guess. If the IP holder isn't on the tracker they don't see that you're pirating their stuff.


Whoever is sniffing, is surely sniffing DHT too, right? I would not count on all private trackers users disabling that.


Private tracker's "business model" as it were relies on a central tracker tracking the download and upload ratio of your client. They explicitly disable DHT as it misses the point of the economy.


The ISP in general isn’t going looking for your torrent traffic. The DMCA groups are, and when they find it, they contact your ISP to enforce it. If it’s a private tracker, there’s a decent chance the DMCA groups’ tracking bots aren’t able to see that traffic.


Aren't some ISP in the US owned by media companies? For those there would be a clear incentive to scan their customer's traffic for copyright violations, right?


Whoever is sniffing, is surely sniffing DHT too, right? I would not count on all private trackers users disabling that.


Yes as the above commenters say the copyright holders (their enforcers really) don't have access to the private trackers swarm and will be unable to report your IP address.


You can hire a 3rd party to do the torrenting and then stream from them (put.io, etc.), but there's no guarantee they can't be compelled to hand over their logs.


2016 database, someone needs to update it. And how does one even get all the magnet links like this?


One way is querying the dht


Funny, most of those torrent got a __padding_file, which comes from bitcomet, a bad torrent client... I think?

Other thing, I realized that there are less and less torrent indexes those last 5 years, they all end up offline or full of ads.


It's part of BEP-47 [1], an extension meant to make it easier to handle single file downloads and multiple torrents serving the same file. BEP-47 aware clients, which is most of them these days, I think, hide these files from the user.

[1] https://www.bittorrent.org/beps/bep_0047.html


> Funny, most of those torrent got a __padding_file

Those are supposed to be hidden by clients. They represent "free space" between files, allowing clients to download individual small files without having to also download the files adjacent to them.


yeah bitcomet had a similar, but incompatible scheme


How is this different from https://torrents-csv.ml/?


seems to have a whole lot more results, for one


The one posted here is a dump from 2016 and https://torrents-csv.ml/ has imported piratebay dumps from then (probably from the same source). The difference is that https://torrents-csv.ml/ does health checks on the torrents and will remove dead ones which means that a lot of the torrents in this dump will be removed since they have probably died either before 2016 or during the last 7 years.


Is the torrent link in the database?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: