Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected].
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try [email protected] or [email protected]
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
Almost all web traffic now uses the utf-8 encoding, a clever hack which works because ascii is a seven-bit code but web traffic uses 8-bit bytes.
multi-byte characters in utf-8 can officially be up to four bytes long, with 11 of those 32 bits used for tracking the size of the multi-byte block. That leaves 2^21 code points available, about two million in total, easily enough for every alphabet you could need to write on a website, and all without breaking ascii.
Oh, I wondered about why there weren't more characters in the ASCII code set.
yep! the ascii standard was originally invented for teletypewriters, and includes four 'blocks' of 32 codes each, for 128 in total, so it only uses seven bits per code.
the first block, hex 00 - 1F, contains control codes for the typewriter. stuff like "newline", "backspace", and "ring bell" all go in here.
The second block has the digits are in order, from hex 30 = '0' all the way to hex 39 = '9',
The uppercase alphabet starts at hex 41 = 'A', and exactly one block later, the lowercase alphabet starts at hex 61 = 'a'. This means their binary codes are 100 0001 and 110 0001, differering only in a single bit! So you can easily convert between upper and lowercase ascii by flipping that bit.
The remaining space in the last three blocks is filled with various punctuation marks. I'm not sure if these are in any particular order.
The final ascii code, 7F, is reserved for "delete", because its binary representation is 111 1111, perfect for "deleting" data on a punch card by punching over it.