instance owners have quite a lot more information on their user’s activities
Not really. Only thing additional that could be identified is browsing patterns while on the site itself. I don’t think it’s that valuable. You likely already gave up what you’re likely to see by commenting in communities. That’s going to be tracked best through a proxy or something, not lemmy itself. And can even be tracked externally through other means. Ex: This post has a tracking image on it and because you need to connect to me to load it I now see everyone that had loaded this comment. So this can be done externally without even being an instance owner. Click view source to see it at the end of the post.
Votes are federated, kbin instances see them as “likes” publicly. Messages are federated, sent in clear text. And posts that are loaded can be tracked via other means… Think of sites that display ads… They do this exact thing and collect information by the boatload because they can inject on every page that shows an ad. Without needing to be an admin on the site itself.
Edit: In theory someone could canvas/comment on every post with a bot and embed tracking images everywhere. Rotate usernames doing it from different servers and rotate through domains that are all cnamed back to the same tracking node and you could attack the whole fediverse with this type of tracking. Probably already being done… But it would be visible in that we have the ability to check source of each comment. But who the hell is going to take the time to do that?
Edit2: Here’s example of what was collected with that embedded image. Keep in mind that this type of tracking can happen with REAL images as well, making it impossible to track. And I’m specifically not tracking much of anything. But things like IP address used to access is on the backend. There’s also Browser, OS, referrers… etc…
In a recent Lemmy version they added support for proxying images. So for people worried about this, see if you can find an instance (or set up your own) that does image proxying.
Before you ask, I’m not aware of any but I’m sure there are some.
Yeah that was 19.4. It’s doesn’t proxy everything unless explicitly set to. Just thumbnails I believe. But I could be wrong. And many instance owners would be allergic to that as it leaves them on the hook for storing content. For example… someone posts CSAM… a copy of that is now on your server. You get police raided and you’re fucked.
Certain (but not all) thumbnails have been sort of proxied for a while, but it’s complicated. But for example if someone posts a link to some questionable content on imgur, your instance will have a copy of that cached (and never delete it, because… Lemmy reasons). The recent changes just mean you can now enable other images to be proxied, though this is disabled by default. This proxy has an age (a day or a week or whatever you set) and content is deleted if it hasn’t been accessed in that timeframe - this is in contract to the normal Lemmy image stuff that I believe still stays forever unless that was fixed recenty.
And many instance owners would be allergic to that as it leaves them on the hook for storing content
This is already a risk whether via the existing thumbnail storage or via user uploads. It’s a pretty common recommendation that you should never host a website like Lemmy on a home server, always use a VPS for this reason. Then make sure you understand your local laws as well.
This is already a risk whether via the existing thumbnail storage
Not anymore. You can opt out of it for the most part.
# Leave images unchanged, don’t generate any local thumbnails for post urls. Instead the the
# Opengraph image is directly returned as thumbnail
“None”
you are (still) missing my point - but i might be wrong as well (i am mot too familiar with ActivityPub).
my point is not that my public posts are in fact public and can be (and probably are) mined through unknown parties, but that instance owners have even more, probably more valuable info, like IP addresses from which not just geolocation but also wake times, device usage patterns and other gnarly stuff could be extracted, that could - together with other personalized surveillance info (like the usual adware stuff) - be aggregated to give a bigger picture.
just showing (as you did) that one can get some info about me through my (public) actions does not refute the point that instance owners have access to more, not-so-public information
but that instance owners have even more, probably more valuable info, like IP addresses from which not just geolocation but also wake times, device usage patterns and other gnarly stuff could be extracted, that could - together with other personalized surveillance info (like the usual adware stuff) - be aggregated to give a bigger picture.
I have IP behind the geolocation. How do you think that I know the geolocation? It’s an IP lookup. My interface that I shown in the image just doesn’t publish it because I don’t care personally. What I use that service for is simply to track where sensitive emails/documents go. Not to track lemmy. I don’t need specific resolutions. Just to know if they leak outside of what I expected.
Device patterns? The app you use is the app you use. That would be given away via your browser header. I also collect that with the tracking image. Just once again. Not shown in the graph cause I don’t care to track it personally (I’m only doing this as an example, not to actually aggregate data).
If you use lemmy over the web browser, browsers don’t really give up that much information unless you’re google themselves. In which case apparently chrome gives up a boatload of information to google’s domains.
not-so-public information
You’d have to give me an example of any of what you’re referencing. I can collect IP, web headers, access times, and if I tag enough pages or mark the image as non-cacheable could even see multiple views/accesses (you see views higher than actual visitors) I can track your movement across all of the fediverse.
that one can get some info about me through my (public) actions
Simply “viewing” the page (which pulls the image and is not necessarily “public”) is a direct rebuttal to obtaining data that isn’t “public”.
I’ve addressed the points you’ve brought up. I run my own instance. I can collect just about everything in the DB tables I’ve seen without being logged into the instance with some external work.
Are you trying to get my point? If you have a specific item that you believe is stored on a lemmy server that you think isn’t possible to obtain. I’m all ears. otherwise I think this conversation is done. This kind of response is pointless and I’m not interested in continuing if you’re going to act like that.
The hardest thing to collect would be private messages, and login information (which is hashed btw, so even your server operator doesn’t really know it). But messages are plaintext and openly federated. All the other information is really really easy to collect through other means.
First if all: my Not Sure Fry was intended as a joke.
So, just to understand you correctly:
I can collect just about everything in the DB tables I’ve seen without being logged into the instance with some external work.
Can you see which communities I follow? Which feeds I watch (and when I do that)? Who I interact with through DMs?
Its all public anyways. Did you think you had privacy posting to a public forum?
you miss the point: instance owners have quite a lot more information on their user’s activities than what’s public.
or would you argue that reddit does not aggregate data because it’s all public?
Not really. Only thing additional that could be identified is browsing patterns while on the site itself. I don’t think it’s that valuable. You likely already gave up what you’re likely to see by commenting in communities. That’s going to be tracked best through a proxy or something, not lemmy itself. And can even be tracked externally through other means. Ex: This post has a tracking image on it and because you need to connect to me to load it I now see everyone that had loaded this comment. So this can be done externally without even being an instance owner. Click view source to see it at the end of the post.
Votes are federated, kbin instances see them as “likes” publicly. Messages are federated, sent in clear text. And posts that are loaded can be tracked via other means… Think of sites that display ads… They do this exact thing and collect information by the boatload because they can inject on every page that shows an ad. Without needing to be an admin on the site itself.
Edit: In theory someone could canvas/comment on every post with a bot and embed tracking images everywhere. Rotate usernames doing it from different servers and rotate through domains that are all cnamed back to the same tracking node and you could attack the whole fediverse with this type of tracking. Probably already being done… But it would be visible in that we have the ability to check source of each comment. But who the hell is going to take the time to do that?
Edit2: Here’s example of what was collected with that embedded image. Keep in mind that this type of tracking can happen with REAL images as well, making it impossible to track. And I’m specifically not tracking much of anything. But things like IP address used to access is on the backend. There’s also Browser, OS, referrers… etc…
In a recent Lemmy version they added support for proxying images. So for people worried about this, see if you can find an instance (or set up your own) that does image proxying.
Before you ask, I’m not aware of any but I’m sure there are some.
Yeah that was 19.4. It’s doesn’t proxy everything unless explicitly set to. Just thumbnails I believe. But I could be wrong. And many instance owners would be allergic to that as it leaves them on the hook for storing content. For example… someone posts CSAM… a copy of that is now on your server. You get police raided and you’re fucked.
https://github.com/LemmyNet/lemmy/blob/705e86eb4c0079d0775f0c1490968f1183095fcc/config/defaults.hjson#L51
Actually going over it briefly looks like it has a few available options for what it will cache…
I refuse to enable it myself for the above reason. I would venture 99% of instances out there would also refuse for liability and bandwidth costs.
Certain (but not all) thumbnails have been sort of proxied for a while, but it’s complicated. But for example if someone posts a link to some questionable content on imgur, your instance will have a copy of that cached (and never delete it, because… Lemmy reasons). The recent changes just mean you can now enable other images to be proxied, though this is disabled by default. This proxy has an age (a day or a week or whatever you set) and content is deleted if it hasn’t been accessed in that timeframe - this is in contract to the normal Lemmy image stuff that I believe still stays forever unless that was fixed recenty.
This is already a risk whether via the existing thumbnail storage or via user uploads. It’s a pretty common recommendation that you should never host a website like Lemmy on a home server, always use a VPS for this reason. Then make sure you understand your local laws as well.
Not anymore. You can opt out of it for the most part.
you are (still) missing my point - but i might be wrong as well (i am mot too familiar with ActivityPub).
my point is not that my public posts are in fact public and can be (and probably are) mined through unknown parties, but that instance owners have even more, probably more valuable info, like IP addresses from which not just geolocation but also wake times, device usage patterns and other gnarly stuff could be extracted, that could - together with other personalized surveillance info (like the usual adware stuff) - be aggregated to give a bigger picture.
just showing (as you did) that one can get some info about me through my (public) actions does not refute the point that instance owners have access to more, not-so-public information
I have IP behind the geolocation. How do you think that I know the geolocation? It’s an IP lookup. My interface that I shown in the image just doesn’t publish it because I don’t care personally. What I use that service for is simply to track where sensitive emails/documents go. Not to track lemmy. I don’t need specific resolutions. Just to know if they leak outside of what I expected.
Device patterns? The app you use is the app you use. That would be given away via your browser header. I also collect that with the tracking image. Just once again. Not shown in the graph cause I don’t care to track it personally (I’m only doing this as an example, not to actually aggregate data).
If you use lemmy over the web browser, browsers don’t really give up that much information unless you’re google themselves. In which case apparently chrome gives up a boatload of information to google’s domains.
You’d have to give me an example of any of what you’re referencing. I can collect IP, web headers, access times, and if I tag enough pages or mark the image as non-cacheable could even see multiple views/accesses (you see views higher than actual visitors) I can track your movement across all of the fediverse.
Simply “viewing” the page (which pulls the image and is not necessarily “public”) is a direct rebuttal to obtaining data that isn’t “public”.
are you trying to get my point?
I’ve addressed the points you’ve brought up. I run my own instance. I can collect just about everything in the DB tables I’ve seen without being logged into the instance with some external work.
Are you trying to get my point? If you have a specific item that you believe is stored on a lemmy server that you think isn’t possible to obtain. I’m all ears. otherwise I think this conversation is done. This kind of response is pointless and I’m not interested in continuing if you’re going to act like that.
The hardest thing to collect would be private messages, and login information (which is hashed btw, so even your server operator doesn’t really know it). But messages are plaintext and openly federated. All the other information is really really easy to collect through other means.
First if all: my Not Sure Fry was intended as a joke.
So, just to understand you correctly:
The instance has 200 users active in the last month. Not quite a trove :D