A friend of mine worked two years in YouTube as a content admin.
Basically being given videos to watch all day, especially coming from the middle east (this was ISIS time so any video from the area had someone watching it as soon as uploaded).
Needless to say there's endless gold no view videos according to him.
It's also interesting that it was no open secret that already in 2018 they were all told that they were essentially training machines to do their job.
That would be an odd thing to do. HD is low resolution already, and 480 is noticeably worse.
If they really wanted to compress, take out every other frame, and regenerate those frames with a neural decoder. But I don't know why that would be worth the effort for a stable number of low res files either.
I wonder if that still holds true? The volume of videos increases exponentially especially with AI slop, I wonder if at some point they will have to limit the storage per user, with a paid model if you surpass that limit. Many people who upload many videos I guess some form of income off YouTube so it wouldn’t that be that big of a deal.
What they said only holds true because the growth continues so that the old volume of videos doesn't matter as much since there's so many more new ones each year compared to the previous year. So the question is more about whether or not it will hold true in the long term, not today
The framing here is really weird. The volume of videos increasing isn't 'growth.' Videos are inventory for Youtube. They're only good when people (without adblocks!) actually watch them.
Growth in this context is that there are a larger volume of videos each year. So each year a single video is exponentially a smaller and smaller percentage of the total.
For example, if in year N youtube has f(N) new video. Let assume f(N) = cN^2. It's a crazy rate of growth. It's far better than the real world Youtube, which grew rather linearly.
But the rate of "videos that are older than 5 years" is still faster than that, because it would be cubic instead of quadratic. Unless the it's really exponential (it isn't), "videos that are older than 5 years" will always surpass "new videos this year" eventually.
Video sensors are continuously getting cheaper, better and more more prevalent over time. The trend is towards capturing all angles of everything, everywhere, at increasingly higher resolutions.
Maybe it could be used to train a neutral network. Maybe it contains dirt on a teenager, who might become a politician two decades from now. Maybe it contains an otherwise lost historical event.
Or it just helps to cement YouTube as the go-to place for uploading and sharing videos for almost any purpose which has a long-term positive effect for user engagement and retention
I assume it's an economics issue. As long as they continue making money off the uploads to a higher extent than it costs for storage, it works out for them.
One day, it will matter. Not even Google can escape the consequences of infinite growth. Kryder's Law is over. We cannot rely on storage getting cheaper faster than we can fill it, and orgs cannot rely on being able to extract more value from data than it costs to store it. Every other org knows this already. The only difference with Google is that they have used their ad cash generator to postpone their reality check moment.
One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.
You can see how Google rolls with how they deleted old Gmail accounts - years of notice, lots of warnings, etc. They finally started deletions recently, and I haven't heard a whimper from anyone (yet).
The problem is that some content creators have already passed away (and others will pass away by then), and their videos will likely be deleted forever.
That may be, but I assume for videos that had some viewership base, there may be a consideration. E. g. if a video was viewed 20 million times, it may be worth more than one that was viewed only 5 times.
I've stumbled upon very valuable content with very low view numbers - the algorithms spiral around spectacularity and provocation, not quality or insight.
Goog is 100% not going to delete anything that is driving any advertising at all. The videos are also useful for training AI regardless, so I expect the set of stuff that's deleted will be a VERY small subset. The difference with email is that email can be deduplicated, since it's a broadcast medium, while video is already canonical.
I expect rather than deleting stuff, they'll just crank up the compression on storage of videos that are deemed "low value."
I met a user from an antique land
Who said: Two squares of a clip of video
Stand in at the end of the search. Near them,
Lossly compressed, a profile with a pfp, whose smile,
And vacant eyes, and shock of content baiting,
Tell that its creator well those passions read
Which yet survive, stamped on these unclicked things,
The hand that mocked them and the heart that fed:
And on the title these words appear:
"My name is Ozymandias, Top Youtuber of All Time:
Look on my works, ye Mighty, and like and subscribe!"
No other video beside remains. Round the decay
Of that empty profile, boundless and bare
The lone and level page stretch far away.
Would've been, once. These days I assume bentcorner asked their favourite LLM to generate a poem parodying Ozymandias about once-popular youtube videos.
It doesn't feel like it at all (I'd never expect an LLM to say 'pfp' like that, or 'lossly[sic] compressed', ASCII instead of fancy quotes) but who knows at this point.
I may have gotten incredibly neurotic about online text since 2022.
I actually considered using an LLM but in my experience they "warp" the content too much for anything like this. The effort required to get them to retain what I would consider something to my taste would take longer than just writing the poem myself. (Although tbf it's been awhile since I've asked a LLM to do parody work, so I could be wrong)
Dropbox seem to be doing the same thing. After years of whining about my 2TB above limit I recently received a mail with a deadline to delete my files or they will.
It depends. At the rough 2 PB of new data they get a day that’s about 10 sq ft of physical rack space per day. Each data center is like 500,000 sq feet so each data center can hold 120 years of YouTube uploads. They’re not going to have to restrict uploads anytime soon.
Oh. I noticed in an AI music generation service I use that old pieces were severely degraded to the point that they were crackling really bad... And I remember thinking that it's a good thing I downloaded an mp3 of my favorites. I confirmed that the quality is very different by listening to the downloaded recording with the hosted version side-by-side.
The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.
It might really just be cheaper to keep buying new HDDs.
This is why they removed searching for older videos (specific time) and why their search pushes certain algorithmic videos, other older videos when found by direct link are on long term storage and take a while to start loading.
Well the time filters (before/after:date) still seem to work, but for controversial / hot topics, somehow, more recent videos tend to still show up at the top. Try "scandal after:2010 before:2012"..
Besides with their search deteriorating to the point where a direct video title doesn't result in a match, nobody can see those videos anyway and they don't have to cache them.
It's not just the search deteriorating. The frontend is littered with bugs. If you write a comment and try to highlight and delete part of that comment, it'll often delete the part you didn't highlight. So apparently they implemented their own textfield for some reason and also fucked it up. It's been like that for years.
The youtube shorts thing is buggy as shit, it'll just stop working a lot of the time, just won't load a video. Some times you have to go back and forth a few times to get it to load. It'll often desync the comments from the video, so you're seeing comments from a different video. Some times the sound from one short plays over the visuals of another.
It only checks for notifications when you open the website from a new tab, so if you want to see if you have any notifications you have to open youtube in a new tab. Refreshing doesn't work.
Seems like all the competent developers have left.
Yeah, one that I forgot to mention is if you pause a youtube short and go to a different tab, the short will unpause in the background, or it might change to an entirely different short and start playing that.
They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket