this post was submitted on 20 Jul 2024
158 points (97.6% liked)

Technology

1180 readers
139 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.


Rules

1. English onlyTitle and associated content has to be in English.
2. Use original linkPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

[email protected]
[email protected]


Icon attribution | Banner attribution

founded 10 months ago
MODERATORS
 

Here are the details about what went wrong on Friday.

you are viewing a single comment's thread
view the rest of the comments
[–] gravitas_deficiency 43 points 1 month ago (5 children)

I feel like that’s not even close to what the real number is, considering the impact it had.

[–] [email protected] 28 points 1 month ago

If this figure is accurate, the massive impact was likely due to collateral damages. If this took down every server at an enterprise and left most of the workstations online, then that still means that those workstations were basically paperweights.

[–] [email protected] 17 points 1 month ago* (last edited 1 month ago)

They have about 24,000 clients so that comes out to around 350 impacted machines per client which is reasonable. It only takes a few impacted machines for thousands of people to be impacted if they are important enough.

[–] [email protected] 8 points 1 month ago* (last edited 1 month ago) (1 children)

My bothers work uses VMs so if the server is down there’s probably 50k computers right there. But it’s only 1 affected computer.

[–] gravitas_deficiency 5 points 1 month ago (2 children)

As far as I know, none of the OSes used for virtualization hosts at scale by any of the major cloud infra players are Windows.

Not to mention: any company that uses any AWS or azure or GCP service is “using VMs” in one form or another (yes, I know I am hand waving away the difference between VMs and containers). It’s basically what they build all of their other services on.

[–] [email protected] 3 points 1 month ago

No, but HyperV is used extensively in the SMB space.

VMWare is popular for a reason, but its also insanely expensive if you only need an AD server and a file share.

[–] [email protected] 2 points 1 month ago (1 children)

Banks use VMs and banks were down without access to their systems to login into the VM, so they could work. They were bricked by extension.

[–] gravitas_deficiency 1 points 1 month ago (1 children)

No, the clients were bricked. The VMs themselves were probably fine - and in fact, probably auto-rollbacked the update to a working savepoint after the update failed (assuming the VM infrastructure was properly set up).

[–] [email protected] 2 points 1 month ago* (last edited 1 month ago)

He couldn’t login to the VM to access his work portals or emails, call it what you will, but one bricked computer/server affected thousands.

It’s weird that you’re arguing, but asked how it was possible in the first place. VMs are the answer dude, argue all you want, but it’s making you look foolish for A not understanding, and B arguing against the answer. Also, why this one thread? Multiple other people told you the exact same thing. You just looking for an argument here or something?

[–] [email protected] 6 points 1 month ago

I wonder if a large percentage of impact is internal facing systems.

And we won't know until Monday.

[–] [email protected] 5 points 1 month ago (1 children)

That's how supply chains work. A link in the chain is broken, the whole thing doesn't work. Also 10% of major companies being affected, is still giant. But you're here using online services, probably still buying bread probably got fuel, probably playing video games. It's huge in the media, and it saw massive affects but there's heaps of things that just weren't even touched that information spread on. Like TV news networks seemingly kept going enough to report on it non stop unaffected. Tbh though any good continuity and disaster recovery plan should handle this with impact but continuity.

[–] [email protected] 3 points 1 month ago (1 children)

The only companies I have seen with workable BCDR plans are banks, and that is because they handle money for rich people. It wouldn't surprise me if many core banking systems are hyper-legacy as well.

I honestly think that a majority of our infrastructure didn't collapse because of the lack of security controls and shitty patch management programs.

Sure. Compliance programs work for some aspects of business but since the advent of "the cloud", BCDR plans have been a paperwork drill.

(There are probably some awesome places out there with quadruple-redunant networks with the ability to outlast a nuclear winter. I personally haven't seen them though.)

[–] [email protected] 3 points 1 month ago (1 children)

It's impossible to tell and you're probably more close to the truth than not.

One fact alone, bcdr isn't an IT responsibility. Business continuity should be inclusive of things like: when your CNC machine no longer has power, what do you do? Cause 1: power loss. Process: Get the diesel generator backup running following that SOP. Cause 2:broken. Process: Get the mechanic over, or get the warranty action item list. Rely on the SLA for maintenance. Cause 3: network connectivity. Process: use USB following SOP.

I've been a part of a half dozen or more of these over time, which is not that many for over 200 companies I've supported.

I've even done simulations, round table "Dungeons and dragons" style with a person running the simulation. Where different people have to follow the responsibilities in their documented process. Be it calling clients and customers and vendors, or alerting their insurance, or positing to social media, all the way through to the warehouse manager using a Biro, ruler, and creating stock incoming and outgoing by hand until systems are operational again.

So I only mention this because you talk about IT redundancy, but business continuity is not an IT responsibility, although it has a role. It's a business responsibility.

Further kind of proving your point since anyone who's worked a decade without being part of a simulation or contribute to their improvement at least, probably proves they've worked at companies who don't do them. Which isn't their fault but it's an indicator of how fragile business is and how little they are accountable for it.

[–] [email protected] 2 points 1 month ago (1 children)

You aren't wrong about my description. My direct experience with compliance is limited to small/medium tech companies where IT is the business. As long as there is an alternate work location and tech redundancy, the business can chug along as usual. (Data centers are becoming more rare so cloud redundancy is more important than ever.) Of course, there is still quite a bit that needs to be done depending on the type of emergency, as you described: It's just all IT, customer and partner centric.

Unfortunately, that does make compliance an IT function because a majority of the company is in some IT engineering function, less sales and marketing.

I can't speak to companies in different industries whereas you can. When physical products and manufacturing is at stake, that is way out of scope with what I could deal with.

[–] [email protected] 2 points 1 month ago (1 children)

Hmm, yeah. Thanks for sharing. Because of 15 odd years of IT Managed Services, I only have non-technical companies on the brain and in my world view I hadn't considered technology provider companies at all. They typically don't need managed service providers (right or wrong :p).

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

It gets worse. Tech companies are service providers that typically work with a chain of other service providers. About 40%-50% of the controls for the last SOC2 audit I ran was carved out and deferred to our service providers. (Also, there are limited applicable frameworks: SOC2, PCI, ISO-270001, HIPAA and HITRUST are common for me, but usually related to cloud services.)

Yeah, I tend to break the brains of auditors that have never dealt with startups and have been used to Fortune 500 mega-companies. What's funnier, is that I am just a lowly security engineer. A very experienced security engineer, but a lowly one nonetheless.

Auditor: So what is your documented process for this ?

Me: Uhh, we don't have one?

Auditor: What about when X or Y catastrophic issue happens?

Me: Anyone just pushes this button and activates that widget.

Auditor: Ok. Uh. Is that process documented?

Me: Nope. We probably do it about 2-3 times a week anyway.

[–] [email protected] 2 points 1 month ago

Yeah we do a lot around frameworks at my current place, and previously we worked directly with customers with iso and acsc essential 8 frameworks. For us, non-compliance = revenue opportunity. That means we are financially rewarded for aligning them and encouraged to do so. On that same note I wrote up a checklist for "sysadmin best practices" aimed for driving reviews and checks and Remedial opportunities for small businesses, useful in that space. I got such an overwhelming amount of response in the msp reddit from people asking in DMs about it (not hundreds, just dozens, too many for me though). It's quiet here in lemmy. Happy to share my updated version of course, just I think if you're dealing in your sector it'll look like childs play lol. But I kind of want to encourage a bit of community within professionals here. I just don't want do spend time on it..

I feel you about the lowly experienced officer bit though. An account manager or business development manager, or even CTO won't listen to me. I have a business degree, most of them don't. I try to apply critical decision making in my solutions and risk advisory. But the words fall on deaf ears. I take a small but very guilty pleasure watching the very thing I warn against, happening both to clients and my employers. Especially when the prevention was trivial but all it needed was any amount of attention.

After nearly 20 years of IT and about 15 in MSP I'm so tired. I'm very much resonating with that "lowly engineer" comment.