Track_Shovel@slrpnk.net to Lemmy Shitpost@lemmy.worldEnglish · 24 hours agoHexadecimalslrpnk.netexternal-linkmessage-square58fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkHexadecimalslrpnk.netTrack_Shovel@slrpnk.net to Lemmy Shitpost@lemmy.worldEnglish · 24 hours agomessage-square58fedilink
minus-squaremorrowind@lemmy.mllinkfedilinkarrow-up0·14 hours agoNot really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte
minus-squareGissaMittJobb@lemmy.mllinkfedilinkarrow-up0·13 hours agoIt is a concern. Check out https://tiktokenizer.vercel.app/?model=deepseek-ai%2FDeepSeek-R1 and try entering some freeform hexadecimal data - you’ll notice that it does not cleanly segment the hexadecimal numbers into individual tokens.
minus-squaremorrowind@lemmy.mllinkfedilinkarrow-up0·12 hours agoI’m well aware, but you don’t need to necessarily see each character to translate to bytes
Not really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte
It is a concern.
Check out https://tiktokenizer.vercel.app/?model=deepseek-ai%2FDeepSeek-R1 and try entering some freeform hexadecimal data - you’ll notice that it does not cleanly segment the hexadecimal numbers into individual tokens.
I’m well aware, but you don’t need to necessarily see each character to translate to bytes