this post was submitted on 10 Aug 2023

536 points (97.9% liked)

Programmer Humor

20010 readers

1050 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

Keep content in english
No advertisements
Posts must be related to programming or programmer topics

founded 2 years ago

MODERATORS

[email protected]

536

Order (programming.dev)

submitted 1 year ago by [email protected] to c/[email protected]

46 comments fedilink hide all child comments

all 47 comments

sorted by: hot top controversial new old

[–] [email protected] 111 points 1 year ago (2 children)

Chaotic evil is encrypting, compressing, then encrypting again.

[–] [email protected] 49 points 1 year ago (1 children)

Then decompress after. Let fear be your cypher.

[–] InfiniteStruggle 32 points 1 year ago (2 children)

When playing football, to keep the socks from riding down our legs, we used to put loose rubber bands on top of them, near the top of the sock. Then to avoid the rubberbands from riding up above the sockline, we used to fold the sock over the rubberbands downwards. Then to avoid the fold from being undone during play another rubberband had to be put on top of the folded part.

Sounds similar to this. Just thought it was notable.

[–] [email protected] 11 points 1 year ago (1 children)

So, when you foot turns purple from the multitude of rubber bands, did it make you play any better?

[–] InfiniteStruggle 4 points 1 year ago

Nope. It was uncomfortable and I'd argue we played worse because of the discomfort. We were also pretty bad at the game so I think we wanted the socks with rubber bands as a scapegoat instead of accepting we were shit.

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago) (1 children)

Why didn't you guys just buy good socks? Or those sock suspenders?

[–] [email protected] 1 points 1 year ago

That's like md5(sha512("somefile.blah"))

[–] [email protected] 86 points 1 year ago (2 children)

The encryption: base64 encoding

[–] [email protected] 36 points 1 year ago* (last edited 1 year ago) (2 children)

Nah, you just XOR the data with itself and it becomes uncrackable.

Also after encryption like this the result can be compressed down to 4 bytes as long as the data is not larger than around 4Gb, 8 bytes if you need more.

[–] [email protected] 16 points 1 year ago* (last edited 1 year ago)

My god, that is absolute perfect encryption (completely uncrackable by brute force) and compression. This is genius and I'm gonna switch all my data to this encryption scheme. Now I just need somewhere to store the decryption keys...

[–] [email protected] 12 points 1 year ago (1 children)

You are truly a mastermind.

[–] [email protected] 9 points 1 year ago

What an excellent username for such a chain of comments

[–] kakes 10 points 1 year ago

SHA-256

[–] [email protected] 61 points 1 year ago* (last edited 1 year ago) (9 children)

Anyone willing to drop some learning on a lay person? Is encrypted data less compressable because it lacks the patterns compression relies on?

Or, is it less secure to encrypt first because smart people things? I know enough about cryptography to know I know fuck all about cryptography.

[–] Dogeek 60 points 1 year ago

ELI5 : Take the string AAAA.

A simple Cypher would be to change the letters to the next one in the alphabet and offset by 1 for each letter, the message would encrypt to ABCD.

If you try to compress that, well you can't do it, otherwise you lose required information.

If you were to compress AAAA first, you could represent it as the string 4A. You can then encrypt that to 5B.

Encrypting is about adding entropy to a message. Compressing is about finding common groups and represent them differently so that the size is lower. Compressing an encrypted message is basically useless because you added so much entropy to the message that there are no more recognizable patterns to apply compression to.

[–] [email protected] 58 points 1 year ago (1 children)

It's not compressible at all. You will end-up with a file that is larger than the original.

Whether compressing before you encrypt leaks information, or not compressing is what leaks, or it's irrelevant is complicated to decide and depends on the details of what you are doing. But encrypting and then compressing is a bit worse than useless, and always a mistake.

[–] [email protected] 26 points 1 year ago (2 children)

Also might be faster to encrypt a compressed file, since it will be smaller?

[–] [email protected] 13 points 1 year ago

Yes.

[–] [email protected] 5 points 1 year ago

Well, encryption tends to be either a very fast operation or something with a slow stage that doesn't depend on the size of your data. So although this is technically true, it's also not relevant.

[–] [email protected] 45 points 1 year ago (1 children)

It isn’t compressible at all, really. As far as a compression algorithm is concerned, it just looks like random data.

Imagine trying to compress a text file. Each letter normally takes 8 bits to represent. The computer looks at 8 bits at a time, and knows which character to display. Normally, the computer needs to look at all 8 bits even when those bits are “empty” simply because you have no way of marking when one letter stops and another begins. It’s all just 1’s and 0’s, so it’s not like you can insert “next letter” flags in that. But we can cut that down.

One of the easiest ways to do this is to count all the letters, then sort them from most to least common. Then we build a tree, with each character being a fork. You start at the top of the tree, and follow it down. You go down one fork for 0 and read the letter at your current fork on a 1. So for instance, if the letters are sorted “ABCDEF…” then “0001” would be D. Now D is represented with only 4 bits, instead of 8. And after reading the 1, you return to the top of the tree and start over again. So “01000101101” would be “BDBAB”. Normally that sequence would take 40 bits to represent, (because each character would be 8 bits long,) but we just did it in 11 bits total.

But notice that this also has the potential to produce letters that are MORE than 8 bits long. If we follow that same pattern I listed above, “I” would be 9 bits, “J” would be 10, etc… The reason we’re able to achieve compression is because we’re using the more common (shorter) letters a lot and the less common (longer) letters less.

Encryption undoes this completely, because (as far as compression is concerned) the data is completely random. And when you look at random data without any discernible pattern, it means that counting the characters and sorting by frequency is basically a lesson in futility. All the letters will be used about the same, so even the “most frequent” characters are only more frequent by a little bit due to random chance. So now. Even if the frequency still corresponds to my earlier pattern, the number of Z’s is so close to the number of A’s that the file will end up even longer than before. Because remember, the compression only works when the most frequent characters are actually used most frequently. Since there are a lot of characters that are longer than 8 bits and those characters are being used just as much as the shorter characters our compression method fails and actually produces a file that is larger than the original.

[–] [email protected] 11 points 1 year ago

I understood this despite being drunk, thank you for the excellent explanation!

[–] [email protected] 33 points 1 year ago

Any good encryption should make data looks random. Looking for patterns in encrypted data is one of the most basic steps to break an encryption. Therefore, good encryption should make data almost uncompressable, as in it's so random that compression does not reduce the size.

[–] [email protected] 30 points 1 year ago* (last edited 1 year ago) (1 children)

Lossless compression algorithms aren’t magical, they can’t make everything smaller (otherwise it would be possible to have two different bits of input data that compress to the same output). So they all make some data bigger and some data smaller, the trick is that the stuff they make smaller happens to match common patterns. Given truly random data, basically every lossless compression algorithm will make the data larger.

A good encryption algorithm will output data that’s effectively indistinguishable from randomness. It’s not the only consideration, but often the more random the output looks, the better the algorithm.

Put those two facts together and it’s pretty easy to see why you should compress first then encrypt.

[–] bastian_5 7 points 1 year ago

And the fact that it can grow data means you should really put a test to make sure that the compressed data is actually smaller... I once had something refuse to allow me to upload a file that was well below their 8Mb file limit while it was claiming it was above the limit, and I'm assuming it was because they were testing the size after compression and that file grew from 6Mb to above the limit.

[–] [email protected] 9 points 1 year ago (1 children)

In an ideal encryption, the resulting data should be indistinguishable from random when doing statistical analysis.

So yes, such data will be really hard to compress, so typically compression is done before encryption.

Now here's a twist. The compression before encryption can reveal some details about the encrypted data. This is especially true if attacker has a way to generate encrypted message with part of information that is being encrypted (for example some kind of token etc).
There were attacks on it. For example https://en.wikipedia.org/wiki/CRIME or https://en.wikipedia.org/wiki/BREACH (this was during that idiotic phase where vulnerabilities had those lame-ass names and they even created webpages)

Ideally compression would be done after encryption, but because of issues described earlier, that wouldn't give any benefit.

[–] [email protected] 4 points 1 year ago

idiotic phase where vulnerabilities had those lame-ass names and they even created webpages

Bro what are you talking about? These names are ..bad ass! Like, lets do CRIME!

[–] [email protected] 5 points 1 year ago

Encrypted data don't compress well.

[–] [email protected] 4 points 1 year ago

Anyone willing to drop some learning on a lay person? Is encrypted data less compressable because it lacks the patterns compression relies on?

No, I can't because you just answered your own question perfectly there. If there's any patterns detectable at all in your ciphertext that's a very bad sign.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Also a lay person. Maybe it's because of the time complexity. Let's say you have data of size 10 (any unit) which becomes size 8 after compression.

If you encrypt first: Encrypt size 10 of data -> Conpress size 10 of data

If you compress first: Compress size 10 of data -> Encrypt size 8 of data

So second way is faster as the compression still takes the same size of input but the encryption takes a smaller input.

[–] [email protected] 23 points 1 year ago (2 children)

The real question is do you encrypt-and-sign or sign-and-encrypt?

[–] [email protected] 30 points 1 year ago

Encrypt then sign. Always authenticate before any other operations like decryption. Don't violate the cryptographic doom principle.

[–] [email protected] 29 points 1 year ago

Encrypt then sign. Verification is often much faster than (or at worst as fast as) decryption. Signature can also be verified without decryption key, making it possible to verify the data along the way.

[–] [email protected] 10 points 1 year ago* (last edited 1 year ago) (2 children)

Don't compress encrypted data since it opens you up to attacks like CRIME, unless it's at rest and static data.

[–] bastian_5 17 points 1 year ago (1 children)

If that's true, what's to stop someone else from just compressing it themself and opening the same attack vector?

[–] [email protected] 4 points 1 year ago (1 children)

Compressing what themselves? Compress then encrypt leaks information about the data being encrypted if an adversary can affect some part of the data being encrypted. If the data is at rest and repeated encryptions are needed , then this isn't a concern.

[–] bastian_5 13 points 1 year ago (1 children)

Compress the encrypted data. You're talking about encrypting compressed data, this was talking about compressing encrypted data.

[–] [email protected] 2 points 1 year ago (1 children)

Technically you would be fine to compress the encrypted data, but encrypted data doesn't compress well so it's not really worth your time

[–] bastian_5 1 points 1 year ago (1 children)

Depends on if you're using lossless or lossy compression. Lossless compression will usually make it bigger, because it relies entirely on data being formatted so their are common patterns or elements that can be described with fewer parts. Like, an ok compression algorithm for a book written in English and stored as Unicode would be to convert it to ASCII and have a thing that will denote Unicode if there happens to be anything that can't convert. An encrypted version of that book would look indestinguishable from random characters, so compressing it at that point would just put that Unicode denoter before every single character, making the book end up taking more space.

[–] [email protected] 1 points 1 year ago (1 children)

The problem is that when you compress before you encrypt, the file size becomes a source of data about the contents. If an attacker has control of part of the data - say - a query string, they can use that to repeatedly add things to your data and see how the size changes as a result.

[–] bastian_5 2 points 1 year ago

So it sounds like compression before encryption should only be done in specific circumstances because it can be a security issue depending on use case, but encryption before compression should never be done because it will almost always increase the size of the file

[–] [email protected] 15 points 1 year ago (1 children)

Encrypted data cannot be compressed anyway

[–] [email protected] 0 points 1 year ago

It can. Just not lossless. Which it means it can't.

[–] astarob 1 points 1 year ago

Don’t know about gz but zip files can be encrypted using passwords