I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.

This led me to discovering zfs-deduplication.

As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.

Has anyone here used that for similar files before? What was your experience with it?

I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).

My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.

    • friend_of_satan@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      15 days ago

      I was also going to link this. I started using zfs 10-ish years ago and used dedup when it came out, and it was really not worth it except for archiving a bunch of stuff I knew had gigs of duplicate data. Performance was so poor.

    • undefined@lemmy.hogru.ch
      link
      fedilink
      English
      arrow-up
      0
      ·
      14 days ago

      I’m in almost the exact same situation as OP, 8 TB of raw Blu-ray dumps except I’m on XFS. I ran duperemove and freed ~200 GB.

      • needanke@feddit.orgOP
        link
        fedilink
        English
        arrow-up
        0
        ·
        13 days ago

        I think I was a bit unclear on that, I meant uncompressed rips as in I ripped the relevant media to unkompressed mkvs, I didn’t save the entire disk dump. I also have mostly such rips, but also a bit of media from other sourches ™ which is already compressed. So I suspect my results would be even worse.