I don’t see how the GIL makes writing thread safe software any easier. The GIL m...

chacham15 · on Oct 17, 2021

There are certain classes of errors that it prevents. E.g:

Thread1: a = 0xFFFFFFFF00000000

Thread2: a = 0x00000000FFFFFFFF

One might think that the two possible values of a if those are run concurrently are 0xFFFFFFFF00000000 and 0x00000000FFFFFFFF. But actually 0x0000000000000000 and 0xFFFFFFFFFFFFFFFF are also possible because the load itself isnt atomic.

The GIL (AFAICT) will prevent the latter two possibilities.

adrian_b · on Oct 17, 2021

Most CPUs guarantee that aligned loads and stores up to the register size, i.e. now usually up to 64-bit, are atomic.

The compilers also take care to align most variables.

So while your scenario is not impossible, it would take some effort to force "a" to be not aligned, e.g. by being a member in a structure with inefficient layout.

Normally in a multithreaded program all shared variables should be aligned, which would guarantee atomic loads and stores.

knorker · on Oct 17, 2021

Well, thread safety is exactly about these cases of "well, it's hardly ever a problem".

Real life bugs have come from misapplication of correct parameters for memory barriers, even on x86. Python GIL removes a whole class of potential errors.

Not that I'm against getting rid of the GIL, but I'm more sceptical that it won't trigger bugs.

Though in my opinion python just isn't a good language for large programs for other reasons. But it'd be nice to be able to multithread some 50 line scripts.

Thorrez · on Oct 18, 2021

Python's integers are arbitrary precision. If large enough, that certainly won't be atomic on any normal CPU. I'm not sure how the arbitrary precision integers work internally, but it's possible they wouldn't be atomic for any value.

vvanders · on Oct 17, 2021

"Most" ends up being surprising(ex: ARM has a fairly weak memory model). I've seen a lot of code with aligned access and extensive use of "volatile" in MSVC/x86 explode once it was ported to other architectures.

adrian_b · on Oct 18, 2021

Older ARM CPUs did not have a well defined memory model, but all 64-bit ARM CPUs have a well defined behavior, which includes the atomicity of any loads and stores whose size is up to 64-bit and which are aligned, i.e. the same as Intel/AMD CPUs.

The current ARM memory model is more relaxed regarding the ordering of loads and stores, but not regarding the atomicity of single loads and stores.

ignoramous · on Oct 17, 2021

Related: Symmetric Multi-Processor primer for Android, https://developer.android.com/training/articles/smp (although for Android/ARM, it makes for a pretty good read on the topic).

intrepidhero · on Oct 17, 2021

But if you're writing to the same object from two different threads you're going to have undefined behavior regardless of the GIL, yes?

hexane360 · on Oct 17, 2021

It depends what you mean by 'undefined behavior'. The GIL makes operations atomic on the bytecode instruction level. Critically, this includes loading and storing of objects, meaning that refcounting is atomic. However, this doesn't extend to most other operations, which generally need to pull an object onto the stack, manipulate it, and store it back in separate opcodes.

So with Python concurrency, you can get unpredictable behavior (such as two threads losing values when incrementing a counter), but not undefined behavior in the C sense, such as use-after-free.

fulafel · on Oct 17, 2021

No. The L in GIL stands for lock. So only the thread that holds it can write or read from the object, and the behavior is well defined at the C level, because C lock acquire and release operations are defined to be memory barriers.

dkersten · on Oct 17, 2021

But when each thread reads the variable, you have no control over which value you see, since you don't control when each thread gets to run. So its undefined in the sense that you don't know which values you will get: a thread might get the value it wrote, or the value the other thread wrote. The threads might not get the same value either.

The GIL exists to protect the interpreters internal data, not your applications data. If you access mutable data from more than one thread, you still need to your own synchronisation.

Thorrez · on Oct 18, 2021

How is the GIL different from atomics in other languages? There are many cases where atomics are useful.

One example would be incrementing a counter for statistics purposes. If the counter is atomic, and the reader of the value is ok with a slightly out of date value, it's fine. If code is doing this in GIL Python, it's working now, and will break after the GIL is removed.

dkersten · on Oct 18, 2021

> If the counter is atomic

if

I know you came to the same conclusion in another comments, but here's a look at it using the `dis` module:

    a += 1

turns into

    LOAD_FAST 1    (loads b)
    LOAD_CONST 1   (loads the constant 1)
    INPLACE_ADD    (perform the addition)
    STORE_FAST 1   (store back into b)

So if the interpreter switches threads between LOAD_CONST and STORE_FAST (so either before or after the INPLACE_ADD), you could clobber the value another thread wrote to `b`.

From your other comment:

> so even though an increment on a loaded int is atomic, loading it and storing it aren't

Its always the loads and stores that are the problem.

But that's the problem with relying on the GIL: it has lock in the name, but it does not protect you, unless you understand the internals and know what you're doing. It protects the interpreter. This isn't much different from programming in other languages without a GIL: if you understand the internals what you're doing you may or may not need locks, because you will know what is and isn't atomic (and even when things are atomic, its still difficult to write thread-safe code! lock-free algorithms are much harder than using mutexes).

Thread safe code requires thinking hard about your code, the GIL does not protect you from that.

dkersten · on Oct 18, 2021

Too late to edit correction: between LOAD_FAST (not LOAD_CONST) and STORE_FAST

fulafel · on Oct 18, 2021

Numbers are immutable and incrementing one eg in an attribute is actuay many byte code ops. So doesn't work even currently unless you are fine with losing updates. But a version of this question using another example (eg using a list as a queue) is interesting.

Thorrez · on Oct 18, 2021

Hah, we both came to the same conclusion at the same time.

Regarding list, it sounds like it might actually keep working atomically without the GIL:

> A lot of work has gone into the list and dict implementations to make them thread-safe. And so on.

fulafel · on Oct 18, 2021

Yep. So in a final answer to the original question of backwards bug-compatibility (https://news.ycombinator.com/item?id=28897534), it seems that it will be retained under the current proposal.

detaro · on Oct 18, 2021

What atomic counter increment exists currently and would be broken by this proposal?

Thorrez · on Oct 18, 2021

Hmm, so even though an increment on a loaded int is atomic, loading it and storing it aren't.

https://stackoverflow.com/a/1717514

The Python documentation seems misleading to me on this:

> In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are.

    count = 0
    def inc():
      count += 1

sure "looks atomic" to me, so according to the documentation should be, but isn't.

On the other hand, I think you could build a horribly inefficient actual atomic counter with

    count = []
    def inc():
      count.append(None)
    def get_count():
      return len(count)

I think it's quite likely there's correct and non-horrible code relying on list append and length being atomic. Although it sounds like this might continue working without the GIL:

> A lot of work has gone into the list and dict implementations to make them thread-safe. And so on.

So removing the GIL might not be a problem.

detaro · on Oct 18, 2021

Indeed, the proposal explicitly covers maintaining existing guarantees, which is why I was confused that so many people just assumed it would break them.

dragonwriter · on Oct 18, 2021

> How is the GIL different from atomics in other languages?

Atomics are isolated to where they are needed, the GIL is global though C (but not Python) code can release it.

Thorrez · on Oct 18, 2021

My question was about from a program correctness standpoint, not an efficiency standpoint. But it does look there is a difference from a correctness standpoint, as other comments address.

fulafel · on Oct 18, 2021

I agree that in principle the GIL is there for the interpreter, but in practice in globally ensuring all Python objects are in a coherent and safe state for each running thread, it makes many things quite thread-safe. For example you could use a normal list as a fifo queue, if you were to rely on the current behaviour.

dkersten · on Oct 19, 2021

Sure, but you really need to understand the internals to be able to make these assumptions.

For instance, in your example, I know that the call to the C code to do the list operations is atomic (a single bytecode instruction), but I can't assume that all such calls to C code are safe because of this, unless I know for sure that the C code doesn't itself release the GIL. I assume that simple calls like list append/pop wouldn't have any reason to do this, but I can't assume this for any given function/method call that delegates to C, since some calls do release the GIL.

So, with or without GIL, you either really need to understand what's going on under the hood so you can avoid using locks in your code (GIL or atomics-based lock-free programming), or you use locks (mutex, semaphore, condition variables etc). No matter what you do, to write thread safe programs, you need to understand what you're doing and how things are synchronizing and operating. The GIL doesn't remove that need.

Of course, removing the GIL removes one possible implementation option, I just don't believe the GIL really makes it any easier. Once you know enough internals to know what is and isn't safe with the GIL, you could just as easily do your own syncrhonization.

ajkjk · on Oct 17, 2021

It prevents classes of errors, such as, as the parent mentioned, non-atomic writes to individual variables.

NovemberWhiskey · on Oct 17, 2021

Not really. If you're doing an atomic write to the same object from two different threads, you're going to have one win the race and the other lose. That may be a bug in your code, but it's not undefined behavior at the language level.

heavenlyblue · on Oct 18, 2021

Python doesn't care about these errors (because python's integers are not as simple as 64-bit operations), however if you generalise this error to a "transaction" of two operations (registers in this case), you'd end up with the same ability to see a view of a state that should not be seen by another thread.

snek_case · on Oct 17, 2021

AFAIK CPUs implement atomic load and store instructions and the performance overhead of these is very small compared to something like a software busy lock. So I think it's quite possible to take away the GIL while still making it impossible to load only half of a value.

dleslie · on Oct 17, 2021

> The GIL might prevent two Python threads executing simultaneously, but it doesn’t change the fact that a Python thread can be preempted, meaning your global state can change at any point during execution without warning.

That thread behavior is enough to reduce the likelihood of races and collisions; particularly if the critical sections are narrow.

avianlyric · on Oct 17, 2021

That just means the GIL is good at hiding concurrency bugs. It doesn’t make writing correct code any easier. Arguably you could say it makes writing correct concurrent code harder, because it’ll take significantly longer for concurrency bugs cause errors.

laserlight · on Oct 17, 2021

I wouldn’t call it thread-safe when race conditions are possible.

jldugger · on Oct 17, 2021

Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

nuerow · on Oct 17, 2021

> Then we need a term for when code race conditions are possible but rare enough that nobody using the software notices. thread-timebomb?

There's already a term for that: not thread-safe.

The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occurr. It only assesses whether a specific class of problems is eliminated or not.

jldugger · on Oct 17, 2021

>The definition of thread safety does not include theoretical or practical assessments regarding how frequent a problem can occur.

Well, obviously.

The challenge I am putting forth on HN is to meaningfully describe _usable_ thread-unsafe software. If you've spent enough time outside university, you'll be aware that there are all kinds of theoretical race conditions that are not triggered in practical use.

klyrs · on Oct 17, 2021

If you've worked at industrial scale, you'll be aware that even the most theoretical-seeming race condition will be triggered frequently.

The_Colonel · on Oct 17, 2021

That reminds me how I was called to fix some Java service, which was successfully in production for 10 years with hardly any incident, but it suddenly started crashing hard, all the time. It was of course a thread safety issue (concurrent non-synchronized access to hashmap) which laid dormant for 10 years only to wreak havoc later.

Nothing obvious changed (it was still running a decade old JRE), perhaps it was a kernel security patch, perhaps a RAM was replaced or even just the runtime data increased/changed in some way which woke up this monster.

jldugger · on Oct 18, 2021

> If you've worked at industrial scale,

Fun fact, I actually do! It's from that perspective I wrote that: every time you perturb the software environment, a new set of bugs that didn't happen in the old env before arises.

dkersten · on Oct 17, 2021

That's not useful. If you have a race condition, you will eventually hit it and when you do, you may get incorrect results or corrupt data. Thread unsafe is thread unsafe, regardless how rare it appears to be.

Also, rare on one computer (or today's computer) might not be rare on another (tomorrows faster one for example).

These types of bugs are also very hard to detect. You might not know your data is corrupted. Reminds me of how bad calculations in excel has cost companies billions of dollars, except now, the calculations could be "correct" and the error sitting dormant, just waiting for the right timings to happen. Much better to not make assumptions about the safety and think about it up front: if you are using multiple threads, you need to carefully consider your thread safety.

phkahler · on Oct 18, 2021

>> then we need a term for when code race conditions are possible but rare enough

There is no such thing as "rare enough". Random or probabilistic bugs are one of the worst things software can have.

lazide · on Oct 18, 2021

While I agree - plenty of CIOs and managers vote the opposite with their wallets.

Why not just set it to auto reboot every week, that seems to fix it - right?

phkahler · on Oct 18, 2021

MCAS only made the planes crash sometimes and then only when a sensor failed

formerly_proven · on Oct 17, 2021

Heisenbug.

Brian_K_White · on Oct 17, 2021

There is no such thing as "likely". A thing is either possible or not possible.

kingofpandora · on Oct 17, 2021

Likelihood refers to probability not possibility.

Brian_K_White · on Oct 17, 2021

There is no such thing as probability. All there is is possible and not possible.

I don't know how the point of the comment could be missed, but what I am saying is, it is a mistake, a rookie baby not-a-programmer not even any kind of engineer in any field, to even think in those sorts of terms at all. At least not in the platonic ideal worlds of math or code or protocol or systems design or legal documents, etc.

Physical events have probability that is unavoidable. How fast does the gas burn? "Probably this fast"

There is no excuse for any coder to even utter the word "likely".

The ONLY answers to "Is this operation atomic?" or "Is this function correct?" or "Does this cpu perform division correctly?" Is either yes or no. There is no freaking "Most of the time."

"Likely" only exists in the realm of user data and where it is explicitly created as part of an algorythm.

yholio · on Oct 17, 2021

There are whole branches of computer science and IT dedicated to reducing the likelihood of unpleasant outcomes: cryptography, security, disaster recovery etc.

You cannot guarantee your public key algorithm is impossible to break, but you can use keys long enough that an attacker has an arbitrarily low chance of succes with the best known methods.

You cannot prove your program is bug free, outside of highly specialized fields like aircraft control, but you can build a multi-layered architecture that can reduce the likelihood of successful intrusion. You cannot prevent a EMP bomb from wiping all your hard-drives at once, but you will likely maintain integrity of your database for uncorrelated hardware errors.

"Likely" is a tool that works in the real world. If you will chase mathematic certainty, your competition will likely eat your lunch.

Where you might be correct is that "unlikely" is very close to "likely" in the particular topic of thread safety, you just need a sufficiently large userbase with workloads and environments sufficiently different from your test setup.

Brian_K_White · on Oct 18, 2021

I have already eaten my competitions lunch through not being afraid of a little rigor, and not leaving a wake of shit that only works on good days behind me.

stefan_ · on Oct 17, 2021

From running the same software on two moderately powerful embedded systems, one single-core and one multi-core, the latter is a lot more reliable in immediately exposing races and concurrency issues.

dehrmann · on Oct 17, 2021

It's more cpython than the GIL, but it lets you get away with using += and certain dict operations without locks.

hexane360 · on Oct 17, 2021

Is this true? It looks like += compiles to four bytecode instructions: two loads, an increment, and a store. It should be possible for a thread to get paused after the load but before the store, resulting in a stale read and lost write.

Some more discussion here: https://stackoverflow.com/questions/1717393/is-the-operator-...

pansa2 · on Oct 17, 2021

With the GIL, for an int i, `i += 1` is not thread-safe, but IIRC for a list l, `l += [1]` (i.e. extend) is.

Presumably this patch changes the list implementation in some way so that the extend operation remains thread-safe without the GIL.

dehrmann · on Oct 17, 2021

Maybe it's just certain collection operations, then.

detaro · on Oct 18, 2021

And the proposal clearly addresses adjusting those to maintain the expected guarantees.