this post was submitted on 11 Nov 2024
579 points (99.2% liked)

Privacy

1312 readers
1 users here now

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 43 points 1 month ago* (last edited 1 month ago) (2 children)

their given reasons are "to keep backups" and "academic and clinical research with de-identified datasets"

they seem to actually do a fairly good job with anonymizing the research datasets, unlike most "anonymized research data", though for the raw data stored on their servers, they do not seem to use encryption properly and their security model is "the cloud hoster wouldn't spy on the data right?" (hint: their data is stored on american servers, so the american authorities can just subpoena Amazon Web Services directly, bypassing all their "privacy guarantees". (the replacement for the EU-US Privacy Shield seems to be on very uncertain legal grounds, and that was before the election))

[–] [email protected] 13 points 1 month ago

de-identified

Doubt.

[–] [email protected] 6 points 1 month ago* (last edited 1 month ago) (1 children)

De-identified data is an oxymoron. Basically any dataset that's in any way interesting is identifiable.

[–] [email protected] 4 points 1 month ago* (last edited 1 month ago)

no it's not. If you reduce the information in the datapoints until none of them are unique, then it is very obviously impossible to uniquely identify someone from them. And when you have millions of users the data can definitely still be kept interesting

(though there's pretty big pitfalls here, as their report seems to leave open the possibility of not doing it correctly)