this post was submitted on 01 Jul 2023
129 points (97.1% liked)

Selfhosted

40415 readers
447 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I'd be really keen to host a lemmy instance but just wondering with GDPR and everything, if there is anything else to consider outside of the technical setup and provisioning of hardware?

Lemmy is storing users data so is there any requirement to do anything GDPR wise?

Hope this is the right place for this - But seen a lot of posts interested in hosting their own lemmy instance, and this is an extension of that

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 8 points 1 year ago* (last edited 1 year ago) (3 children)

Actually I wonder if the end result would end up essentially being, you can only federate with other GDPR compliant instances that you trust will respect the GDPR and honor federated data delete requests.

The core of the issue is that just by the virtue of running, an instance collects a stupid amount of data. I was baffled at how many user accounts my instance had discovered mere hours after starting it up.

Edit: row counts after just a week of running my private instance with only 3 users:

The profiling potential is scary, so users should be really careful with basically every interaction on the Fediverse, including votes. I bet the feds are having a field day monitoring what's going on on exploding-heads and lemmygrad.

[–] [email protected] 10 points 1 year ago (2 children)

IANAL but no, as instances do not share "personal data". There is a misconception that GDPR deletion requests apply to all data created by a user, but to my understanding it only applies to "personal data" as defined here: https://commission.europa.eu/law/law-topic/data-protection/reform/what-personal-data_en

[–] [email protected] 12 points 1 year ago (3 children)
[–] [email protected] 5 points 1 year ago

Don't they have mastodons accounts

[–] [email protected] 4 points 1 year ago

Interesting read, thank you!

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

However, this duplication mechanism renders content deletion or rectification more difficult. In case of deletion by the user, the platforms with duplicates receive usually an automated deletion request and must be trusted to comply and delete their duplicate.

Seems like sending the delete notice is all that's required?

[–] [email protected] 2 points 1 year ago

Seems like sending the delete notice is all that’s required?

Yes, but

and must be trusted to comply and delete their duplicate.

So because of that trust factor, if you really want to protect yourself and be 100% GDPR compliant, you'd probably want a legal contract with every instance to federate with ascertaining that they are GDPR compliant too to legally deflect blame if you're unable to comply with a data delete request.

[–] [email protected] 6 points 1 year ago (2 children)

Under GDPR, any piece of potentially identifying information is considered personal data. I had GDPR training at work. Under the GDPR it's not even possible to count unique visitors to your website because you'd have to keep track of some identifier even if just IP address and User-Agent, even if it's entirely client side. You still have to get consent for this.

Even just community subscriptions is plenty of data to make a rather comprehensive profile of the user's interests, and if you throw in votes it quickly becomes scary.

This is everything you upvoted:

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago) (1 children)

Obviously IP addresses are personal data, but those are not shared to other instances.

You could probably argue that the federated ID is personal data, but I am not sure as it might also count as only an internal identifier required for operation. IANAL but I don't think votes can be considered personal data under the GDPR.

[–] [email protected] 1 points 1 year ago (2 children)

Question boils down to where is the boundary. Does an alias of your choosing, which uniquely identifies you across the fediverse personally identifiable? I think we all would say yes. Does then actions linked to that alias constitutes as personally identifiable? Well, in absence of the correlation of the ID, it is still technically possible to map out who this user is and what their interests and preferences are, so maybe yes? That’s a hard grey area to determine IMO.

[–] [email protected] 2 points 1 year ago

Indeed, but I think email addresses for email providers (but not everyone else) are handled differently by the GDPR as they are necessary for providing the email service. I think this is similar to how functional cookies do not require consent under the GDPR if they are only used to keep you logged into the site etc.

[–] [email protected] 1 points 1 year ago (1 children)

I think as @[email protected] commented slightly higher up, this might be considered pseudonymised data? The link he provided suggested it was considered personally indentifying information - I'm (as per my question) definitely no expert in this though

[–] [email protected] 4 points 1 year ago (1 children)

The link I provided says that pseudonymous data can be used to hide personalized data.

If you are a DPO, you can see the appeal and benefits of pseudonymization. It makes data identifiable if needed, but inaccessible to unauthorized users and allows data processors and data controllers to lower the risk of a potential data breach and safeguard personal data.

GDPR requires you to take all appropriate technical and organizational measures to protect personal data, and pseudonymization can be an appropriate method of choice if you want to keep the data utility.

The owner of lemmy.one can use [email protected] to map it to an IP and/or email address. This becomes now personally identifiable data. But other instance owners can't map it to any personalized data, so it is basically "anonymized data" for them.

You just have to provide a way to either

  • To delete personally identifiable data
  • Unlink the personally identifiable data from the pseudonymized data on your local instance.

Disclaimer, IANAL, YMMV, yaddy, yadda,...

[–] [email protected] 1 points 1 year ago

Understood, missed that subtelty. The fact emails aren't actually shared makes it very GDPR "friendly"

[–] [email protected] 1 points 1 year ago (1 children)

This is everything you upvoted:

How does that work? As the admin of the lemmy.max-p.me you have access to your server's db which contains a replica of the db of all servers you receive federation from, including detailed per-user upvotes/downvotes? Correct?

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (1 children)

Yeah pretty much, although not entirely. I only get pushed copies of the intersection between the communities my instance tracks and the victim's, and only from the time my server started federating those. I guess I could make a bot account that subscribes to every possible Lemmy communities so that I do get a copy. I could also patch up the backend to ignore any deletion requests and stash up everyone's deleted posts and even go fetch linked images and store them forever.

It's not really a secret though. Some users in another thread were shocked to learn that kbin does publicly display that information. For example, picking the first post on kbin.social: https://kbin.social/m/tech/t/124303/Bluesky-temporarily-halts-sign-ups-because-so-many-people-are-joining/votes/up

Essentially, it's extremely public, so one's gotta be careful about every single interaction on here.

I only did this for example's sake, I respect people's privacy and have no intention of running a hostile instance. But point being, anyone can rather easily.

[–] [email protected] 1 points 1 year ago

Interesting - I had the feeling this was how the federation mechanism worked, I don't see how it could work without sacrificing privacy.

So a "bad" actor could just spin up their own instance, federate with a huge amount of other instances (I don't think other instances have a say in this, except if they explicitly, manually blacklist the "bad" instance?), and start profiling users based on their votes.

The potential for global surveillance is enormous. But I can also see it being useful to detect and fight bot farms, spam, brigading and other bad stuff that has plagued Reddit for quite some time.

Lemmy could do a better job at informing users that basically everything you do here is public (including votes). On Kbin the /votes/up page makes it clear at least (I like that even comments have a /votes/up page).

[–] [email protected] 4 points 1 year ago

I believe this is probably what will happen if this ever becomes a big issue. GDPR was never intended to be navigable for anything except giant proprietary blob tech companies.

[–] [email protected] 1 points 1 year ago

I believe this is probably what will happen if this ever becomes a big issue. GDPR was never intended to be navigable for anything except giant proprietary blob tech companies.