Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is pretty interesting stuff, but one note:

> The correction explains away the failures of randomization as an error in translation; the authors now claim that they let participants self-select their condition. This is difficult for me to believe. The original article’s stressed multiple times its use of random assignment and described the design as a "true experiment.”

> They also had perfectly equal samples per condition ("n = 1,524 students watched a 'violent' cartoon and n = 1,524 students watched a 'nonviolent' cartoon.") which is exceedingly unlikely to happen without random assignment.

This actually cannot happen with random assignment either. The only way you're going to get equal numbers in each bin is if your process is intentionally constrained to do that. If assignment were random, the odds of assigning 1,524 to one bin and 1,524 to the other bin would be C(3048, 1524) / 2^3048, or 1.4%.



Here, "random assignment" means randomly assigning half the participants to each of two bins, I think.


Right --- it's called block randomization.


How do participants get assigned to the respective halves?


The simplest possible algorithm would be:

1. Shuffle the list of participants.

2. Put the front half of the list into one half of the trial, and the back half of the list into the other half.

Generalizing this to more than two groups is straightforward. This algorithm is mentioned sidethread, by sterlind, with the (meaningless) modification of splitting the list even-and-odd instead of front-and-back. As I mentioned there, you can only do this if the list of participants is fixed before the beginning of the study, which is not in general the case.


couldn't you just assign each student an ID, get a random permutation of the array of students and assign violence to even indices and non-violence to odd? what am I missing here?


You can do that, but it requires all of the assignments to be done simultaneously at the beginning of the study, which will cause problems for e.g. medical trials where not everyone enrolls at once.

But why bother? There's no special statistical value in having two exactly equal buckets as opposed to one bucket with 1,621 people in it and another with 1,427.


If you did want an exactly even split, you could assign every even numbered student randomly and every odd numbered student to the opposite group of the student before them. That guarantees an even split and doesn't require all the participants to be known in advance.

It also guarantees that you split evenly any group of people arriving at similar times, so no correlation between arrival time and outcome will affect the study.


How about the following process? Each person gets randomly assigned to one of the two groups, when one group is full, move the rest to the other group. Does this make sure every equal partition have the same probability of showing up?


> Does this make sure every equal partition have the same probability of showing up?

I'm not sure, but I wouldn't bet against it. But what is the value of having an exactly equal partition?

On second thought, the algorithm you describe processes people in a particular order, and it is much more likely to put two people who both occur near the end of the list into the same bucket than to put them in different buckets. So if that processing order is constant, the algorithm cannot produce every equal partition with equal probability.


I agree. It would be simpler to shuffle the list of people, then split the list in half.

Here's a proof this algorithm doesn't work by counter-example (N=6)

Consider a list of 6 elements. Elements 5 and 6 must be in the same bucket 50% of the time and different buckets 50% of the time. For this to be true, after we place the first 4 elements into their buckets according to this algorithm, there must be space left in both buckets 50% of the time and in only one bucket 50% of the time.

Sequences of the first 4 coin flips where neither bucket is filled, followed by possible ending sequences, and the odds of the prefix.

AABB(AB, BA) = 1/16th

ABAB(AB, BA) = 1/16th

ABBA(AB, BA) = 1/16th

BBAA(AB, BA) = 1/16th

BABA(AB, BA) = 1/16th

BAAB(AB, BA) = 1/16th

Total: 3/8ths

Sequences of the first 3-4 coin flips where one bucket is filled, followed by possible ending sequences, and the odds of the prefix:

AAA(BBB) = 1/8th

BBB(AAA) = 1/8th

AABA(BB) = 1/16th

ABAA(AA) = 1/16th

ABBB(AA) = 1/16th

BBAB(AA) = 1/16th

BABB(AA) = 1/16th

BAAA(BB) = 1/16th

Total: 5/8ths

Since one bucket is filled 5/8ths of the time after 4 elements are processed according to this algorithm, the final two elements will be in the same bucket 5/8ths of the time, not the expected 4/8ths of the time.


> This actually cannot happen with random assignment either. The only way you're going to get equal numbers in each bin is if your process is intentionally constrained to do that.

There are several CS shuffle/Fisher-Yates algorithms that can do this. Instead of calling the usual rand() on a mathematical interval multiple times, they do selection over the remaining elements (ie. constrained.)

https://dev.to/babak/an-algorithm-for-picking-random-numbers...

But I would expect CS people to have awareness about that, not social scientists, unless somebody wrote a paper with examples for that field.

I've seen Fisher-Yates used in an SRE interview before, which is pedantic - it's just whiteboard hazing, at a very high cost to your recruiting and interviewing staff.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: