Yeah if your program has a natural notion of a "frame" (eg most video games), you can do memory management by simply incrementing an integer (bump allocation). At the end of your frame, you reset the integer to zero. You can't really get any faster than that.
An additional benefit of this style of allocation over malloc/free is that you can get a lot of the same type of objects contiguous in memory so that iteration over them is a lot faster because there are fewer cache misses.
> Before their papers, mathematicians had assumed that even though the number line might look like a continuous object, if you zoomed in far enough, you’d eventually find gaps.
I'll try to interpret this sentence.
We all have some mental imagery that comes to mind when we think about the number line. Before Cantor and Dedekind, this image was usually a series of infinitely many dots, arranged along a horizontal line. Each dot corresponds to some quantity like sqrt(2), pi, that arises from mathematical manipulation of equations or geometric figures. If we ever find a gap between two dots, we can think of a new dot to place between them (an easy way is to take their average). However, we will also be adding two new gaps. So this mental image also has infinitely many gaps.
Dedekind and Cantor figured out a way to fill all the gaps simultaneously instead of dot by dot. This method created a new sort of infinity that mathematicians were unfamiliar with, and it was vastly larger than the gappy sort of infinity they were used to picturing.
We've known since Zeno that all of our ways of visualizing infinity in finite terms are incomplete and provably incorrect, despite being unavoidable in human thinking. In other words, we knew the "gaps" reflected incomplete reasoning, not real emptiness between "consecutive" numbers. If Dedekind and Cantor only changed how we visualize infinity, I don't understand why it would cause a stir.
> This method created a new sort of infinity that mathematicians were unfamiliar with, and it was vastly larger
I understand that the construction of the reals paved the way for the later revolutionary (and possibly disturbing, for people with strongly held philosophical beliefs about infinity) discovery that one infinity could be larger than another. But in the narrative laid out by the article, that comes later, and to me it's clear (unless I misread it) that the part I quoted is about the construction of the reals, before they worked out ways to compare the cardinality of the reals to the cardinality of the integers and the rationals.
"Knowing" something and proving it mathematically are two different beasts.
Zeno couldn't prove that there were no gaps; he showed that infinity was different from how we understood finite things, bit that's not the same as proving there are no gaps.
Later, mathematicians proved the existence of irrational numbers. These were "gaps" in the rational numbers, but they weren't all the "same" of that makes sense? The square root of 2 and Euler's number are both irrational, but it's not immediately clear how you'd make a set that includes all the numbers like that.
I'm not sure everyone knew that gaps reflected incorrect reasoning. It would have been natural to assume that all infinite sets were qualitatively the same size, since uncountable infinity was not an idea that had been discovered yet. Zeno's own resolution wasn't that his reasoning wrong, but that our perception of the world itself is wrong and the world is static and unchanging.
As for the importance of visualization (of the reals), I don't think you can cleanly separate it from formalism (as constructed in set theory).
I think we all have built in pre-mathematical notions of concepts like number, point, and line. For some, the purpose of mathematics is to reify these pre-mathematical ideas into concrete formalism. These formalisms clarify our mental pictures, so that we can make deeper investigations without being led astray by confused intuitions. Zeno could not take his analysis further, because his mental imagery was not detailed enough.
From clarity we gain the ability to formalize even more of our pre-mathematical notions like infinitesimal, connectedness, and even computation. And so we have a feedback loop of visualization, formalism, visualization.
I think the article was saying that Dedekind and Cantor clarified what we should mean when we talk about the number line, and dispelled confusions that existed before then.
> If Dedekind and Cantor only changed how we visualize infinity, I don't understand why it would cause a stir.
Because scientific progress is explicitly the process of changing the general mental model of how people approach a problem with a more broadly capable and repeatable set of operations
I should have been more specific; I understand why it was a mathematical breakthrough. What I don't understand is why it would have triggered some kind of psychological horror or philosophical crisis. It was a new way of understanding numbers, but it didn't reveal numbers to be acting any differently than we had always assumed.
If anything, it seems like it would have been comforting to finally have mathematical constructions of the real numbers. It had been disturbing that our previous attempts, the rational and algebraic numbers, were known to be insufficient. The construction of the reals finally succeeded where previous attempts had failed.
I would invite you to be more open to the idea that people don’t live in a world where they operate inside a theoretical framework with localized test actions
major breakthroughs tend to cause existential crises because most people don’t have full scope of their work in order to understand where it is broken
Because painting those who objected to these definitions of mathematical infinity as "horrified" and "disturbed" was a form of character assassination, which was not uncommon at the time. The high moderns didn't play.
Extraordinary claims require extraordinary evidence. Can you cite any claims by mathematicians that there were "gaps"? It isn't even true for rational numbers that you can identify an unoccupied "gap".
Yeah, it took me a second, too. By "gaps" they mean numbers that can't be represented in a given construction. So irrational numbers are "gaps" in the rational numbers, and transcendental numbers are "gaps" in the algebraic numbers. Not the best spatial metaphor.
You’re thinking of this with the benefit of dedekind in your schooling - whether or not your calculus class told you about him.
Density - a gapless number line - was neither obvious nor easy to prove; the construction is usually elided even in most undergraduate calculus unless you take actual calculus “real analysis” courses.
The issue is this: for any given number you choose, I claim: you cannot tell me a number “touching” it. I can always find a number between your candidate and the first number. Ergo - the onus is on you to show that the number line is in fact continuous. What it looks like with the naive construction is something with an infinite number of holes.
I think you are getting away from my point, which pertains to what the article said, which is that mathematicians thought there were "gaps". What mathematician? Can I see the original quote?
The linguistic sleight-of-hand is what I challenge. What is this "gap" in which there are no numbers?
- A reader would naturally assume the word refers to a range. But if that is the meaning, then mathematicians never believed there were gaps between numbers.
- Or could "gap" refer to a single number, like sqrt(2)? If so, it obviously is not a gap without a number.
- Or does it refer to gaps between rational numbers? In other words, not all numbers are rational? Mathematicians did in fact believe this, from antiquity even ... but that remains true!
Regarding this naive construction you are referring to: did it precede set theory? What definition of "gap" would explain the article's treatment of it?
I don’t know the answers to all of your questions - but I believe you’d benefit from some mathematical history books around the formalization of the real analysis; I’m not the best person to give you that history.
A couple comments, though - first, all mathematics is linguistics and arguably it is all sleight of hand - that said the word “gaps” that you’ve rightly pointed out is vague is a journalists word standing in for a variety of concepts at different times.
The existence of the irrationals themselves were a secret in ancient greece - and hence known for thousands of years, but the structure of the irrationals has not been well understood until quite recently.
To talk precisely about these gaps, if you’re not a mathematical historian, you have to borrow terminology from the tools that were used to describe and formalize the irrationals -> if former concepts about the lines sound hand-wavy to you, it is because they WERE handwavy. And this handwaviness is about infinity as well, the two are intimately connected. In modern terms, the measure of the rationals across any subset of the (real) number line is zero - that is the meaning of the “gaps”. There is, between any two rationals, a great unending sea where if you were to choose a point completely at random, the odds of that point being another rational is zero.
EDIT: for a light but engaging read about topics like this, David Foster Wallace’s Everything and More is excellent.
Of course a contractor could not decide to unilaterally shut off their missile system, because that would be a contract violation.
A contractor may try to negotiate that unilateral shut off ability with the government, and the government should refuse those terms based on democratic principles, as Luckey said.
But suppose the contractor doesn’t want to give up that power. Is it okay for the government to not only reject the contract, but go a step further and label the contractor as a “supply chain risk?” It’s not clear that this part is still about upholding democratic principles. The term “supply chain risk” seems to have a very specific legal meaning. The government may not have the legal authority to make a supply chain risk designation in this case.
It sounds like the "supply chain risk" designation is just about anyone who works with the DoD not using them, so their code doesn't accidentally make it into any final products through some sub-sub-subcontractor. Since they've made it clear that they don't want to be a defense contractor (and accept the moral problems that go with it), the DoD is just making sure they don't inadvertently become one.
This really reminded me of the first part Flowers for Algernon. The main character undergoes a treatment which improves is intelligence and the story is narrated via a series of diary entries which become successively more fluent and sophisticated.
I read it decades ago, and from time to time I mention it to someone who has not read it and I end up telling them the story.. and I'm usually tearing up before getting to the end. Such a moving piece.
On the contrary, I think it demonstrates an inherent limit to the kind of tasks / datasets that human beings care about.
It's known that large neural networks can even memorize random data. The number of random datasets is unfathomably large, and the weight space of neural networks trained on random data would probably not live in a low dimensional subspace.
It's only the interesting-to-human datasets, as far as I know, that drive the neural network weights to a low dimensional subspace.
Each fine tune drags the model weights away from the base model in a certain direction.
Given 500 fine tune datasets, we could expect the 500 drag directions to span a 500 dimensional space. After all, 500 random vectors in a high dimensional space are likely to be mutually orthogonal.
The paper shows, however, that the 500 drag directions live in a ~40 dimensional subspace.
Another way to say it is that you can compress fine tune weights into a vector of 40 floats.
Imagine if, one day, fine tunes on huggingface were not measured in gigabytes, megabytes, or even kilobytes. Suppose you started to see listings like 160 bytes. Would that be surprising?
I’m leaving out the detail that the basis direction vectors themselves would have to be on your machine and each basis direction is as big as the model itself. And I’m also taking for granted that the subspace dimension will not increase as the number of fine tune datasets increases.
I agree that the authors decision to use random models on hugging face is unfortunate. I’m hopeful that this paper will inspire follow up works that train large models from scratch.
Agreed. What's surprising here to me isn't that the fine tunes are compressible, it's the degree to which they're compressible. It seems like very little useful new information is being added by the fine-tune.
They're using SVD to throw away almost all of the "new information" and apparently getting solid results anyhow. Which of course raises interesting questions if replicable. The code doesn't seem to have been released yet though.
Some considerations can be similar, but the total set of details are different.
It also depends which lock free solutions you're evaluating.
Some are higher order spins (more similar high level problems), others have different secondary costs (such as copying). A common overlap is the inter-core, inter-thread, and inter-package side effects of synchronization points, for a lot of stuff with a strong atomic in the middle that'll be stuff like sync instruction costs, pipeline impacts of barriers/fences, etc.
In a very basic case lock free data structures make threads race instead of spin. A thread makes their copy of a part of list/tree/whatever it needs updating, introduces changes to that copy and then tries to substitute their own pointer for the data structure pointer if it hasn't changed in the meantime (there is a CPU atomic instruction for that). If the thread fails (someone changed the pointer in the meantime) it tries again.
There are several classes of lock free algorithms with different properties.
Lock free algorithms for read only access to shared data structures have only seldom disadvantages (when the shared data structure is modified extremely frequently by writers, so the readers never succeed to read it between changes), so for read-only access they are typically the best choice.
On the other hand lock free algorithms for read/write access to shared data structures must be used with great caution, because they frequently have a higher cost than using mutual exclusion. Such lock free algorithms are based on the optimistic assumption that your thread will complete the access before the shared structure is accessed by another competing thread. Whenever this assumption fails (which will happen when there is high contention) the transaction must be retried, which will lead to much more wasted work than the work that is wasted in a spinlock.
Lock free algorithms for read/write access are normally preferable only when it is certain that there is low contention for the shared resource, but in that case also a spinlock may waste negligible time.
The term "lock-free" is properly applied only to the access methods based on optimistic access instead of mutual exclusion (which uses locks).
However, there is a third kind of access methods, which use neither optimistic access nor mutual exclusion with locks, so some authors may conflate such methods together with the lock-free algorithms based on optimistic access.
This third kind of access methods for shared data have properties very different from the other two kinds, so they should better be considered separately. They are based on the partitioning of the shared data structure between the threads that access it concurrently, so such methods are applicable only to certain kinds of data structures, mainly to arrays and queues. Nevertheless, almost any kind of inter-thread communication can be reorganized around message queues and shared buffers, so most of the applications that use either mutual exclusion with locks or lock-free optimistic access can be rewritten to use concurrent accesses to dynamically partitioned shared data structures, where the access is deterministic unlike with lock-free optimistic algorithms, and there are no explicit locks, but the partitioning of the shared data structure must be done with atomic instructions (usually atomic fetch-and-add), which contain implicit locks, but they are equivalent with extremely short locked critical sections.
Typically lock-free algorithms do not retry when a read happens concurrently with a write[1], so they scale very well for read mostly data or when data is partitioned between writers.
Scalability is always going to be poor when writers attempt to modify the same object, no matter the solution you implement.
[1] of course you could imagine a lock-free algorithm where reads actually mutate, but that's not common.
An additional benefit of this style of allocation over malloc/free is that you can get a lot of the same type of objects contiguous in memory so that iteration over them is a lot faster because there are fewer cache misses.
reply