this post was submitted on 05 May 2024
25 points (100.0% liked)
TechTakes
1436 readers
114 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
hmm. I can guess at a few reasons this could be happening: model coders "normalizing" everything to flat-ascii in training, or similar happening at training stage (because of the previously-referenced RLHF datamills employing only people with specific localized dialects, instead of wider context-local languages), etc.
wonder if this particular thing is a confluence of those, or just one specific set
have you ever met an English-native dev who didn't need to be trained out of the world being 7-bit ascii
@dgerard
7 bits were good enough for Jesus.
First efforts at bible digitization seems incredibly poorly documented online, and from a casual inspection in google scholar, not very well referenced. It's a pity it sounds like a fascinating topic, though 7 bits is likely for the first english versions yes (And according to this there are horrid 7-bits encodings for the ancient greek)
My Jesus wanted characters for drawing borders and playing card suits, which is why He handed down to us Code Page 437. Using the upper 128 characters for things like vowels with funny marks on them is catholic heresy (nuts to Latin 1, down with Unicode).
I got lucky and largely missed out on having to deal with those, at a guess largely because of location and age. the type I got to deal with instead were the php-/perl-brained "everything is just a string" types
hell, (circa 2010) I had beers with someone once who was really into Tcl and Second Life, and wanted to be uploaded as a digital consciousness. way before I knew about the other nutjobs, but in retrospect I now have a couple other questions I might've wanted to ask at the time...