We're getting a little off topic here, but the Federal government has published specific guidance on de-identification of medical records. You can construct some artificial scenarios where re-identification might be theoretically possible through record linkage with other data sources but in practice it's unlikely. In principle a similar approach could be used for student data, although I'm not familiar with the legal issues.
But all of this is orthogonal to the core issue of whether a state government should be allowed to prevent researchers from participating in lawsuits. There is no student privacy issue involved there. Witnesses in a civil suit still aren't allowed to violate student privacy laws regardless of the data they have access to, so it makes no sense to conflate those issues.
> We're getting a little off topic here, but the Federal government has published specific guidance on de-identification of medical records.
But releases (even without patient consent, with an IRB waiver) of non-deidentified PHI data for research is allowed, and this is specifically because deidentification necessarily destroys elements that would often be necessary in research.
> You can construct some artificial scenarios where re-identification might be theoretically possible through record linkage with other data sources but in practice it's unlikely.
It is explicitly part of the HIPAA safe harbor standard that, in addition to removing the required identifiers, you cannot come up with such a scenario, and if you can, the data is not deidentified. (The last criterion of the standard is “The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information”.)
What does any of that have to do with the legal issue of whether a state should be able to prohibit participation in certain lawsuits as a condition of gaining access to research data? Neither party has raised re-identification as a concern, nor have there been any allegations of privacy law violations.
The top-level concern is something like this: professors use their trusted relationship to schools in order to make bank on expert witness fees, which feels a bit corrupt and calls into question the researcher's motives.
A rebuttal to this concern is that we can side-step that issue entirely because these data sets should be public anyways (anonymized, of course!). This obviates the above concern, since the researchers won't need to compromise themselves in order to get exclusive access to data that allows them to be expert witnesses and rake in $$$$.
But the problem with that proposal is re-identification: if we can't make the data anonymous, then we all agree that it shouldn't be released (implicit in the "anonymized, of course!" caveat to "just release all the data" proposal).
Then you pointed out that even for more important data like healthcare data, FDA apparently has ways of allowing release of data that takes into account the risk of re-identification risk (I didn't know this; thanks for sharing!)
Then dragonwriter and you got deep into the weeds on HIPPA stuff.
TBH I have no idea which of you is most correct here. But anyways, there are two ways for this conversation to go:
1. You are correct, good enough anonymization is possible: Stanford researchers should not be silenced; it is problematic that they have access to data other people cannot access, but the correct solution is to negate the originally problematic distinction between those researchers and the general public by making data public. Then there is no reason for the researchers to agree to these contract clauses, because they will have access to the data.
2. dragonwriter is correct, good enough anonymization is not possible: We can go back up to the top-level concern and observe that "just release all the data with anonymization" isn't a feasible solution to this problem. Or maybe there isn't actually a problem here at all. IDK. But in any case, "obviate the problem in the top-level post by releasing anonymized data" isn't a workable solution.
Again, not following closely enough to have an opinion, but that's where we are now.
I think a good compromise position is that we should have a law stating that K12 data should be available to certain education researchers -- subject to IRB approval and so on -- without any other strings attached. Including "don't sue me" clauses in releases of public data sets does feel like an inappropriate abuse of student privacy concerns.
The researchers don't have a trusted relationship with schools. They have a contractual relationship with the state government. The fundamental issues underlying the lawsuit are First Amendment freedom of expression and contract law; expert witness fees and researchers' motives are irrelevant.
Whether student data de-identification is good enough or not is a total red herring. No one has accused the researchers in this case of violating privacy rules. The comments here about such privacy issues are largely hypothetical and tangential.
If you think that California needs a new law expanding research access to educational data then feel free to suggest that to your state legislators, or sponsor a ballot initiative.
It's not a red herring. It's a side conversation about a different but related topic.
Someone proposed just releasing all the data.
Someone else replied with why that wouldn't work.
Ie, a conversation happened and the topic of discussion shifted.
FWIW I agree with you on the object level question. No idea why you're being so abrasive, especially when you're the one who initiated/continued the conversational thread about deanonymization and even prefaced with "We're getting a little off topic here".
Presumably at that point you understood that the topic of conversation had shifted, and people's agreement/disagreement didn't necessarily have anything to do with the original topic... since you literally said so and no one disagreed... so your reaction here is pretty odd and off-putting.
> What does any of that have to do with the legal issue of whether a state should be able to prohibit participation in certain lawsuits as a condition of gaining access to research data?
As you yourself noted upthread, you had already taken this subthread afield from that topic.
https://www.hhs.gov/hipaa/for-professionals/privacy/special-...
But all of this is orthogonal to the core issue of whether a state government should be allowed to prevent researchers from participating in lawsuits. There is no student privacy issue involved there. Witnesses in a civil suit still aren't allowed to violate student privacy laws regardless of the data they have access to, so it makes no sense to conflate those issues.