this post was submitted on 21 Jul 2024
199 points (78.2% liked)

Technology

58011 readers
3076 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

This is an unpopular opinion, and I get why – people crave a scapegoat. CrowdStrike undeniably pushed a faulty update demanding a low-level fix (booting into recovery). However, this incident lays bare the fragility of corporate IT, particularly for companies entrusted with vast amounts of sensitive personal information.

Robust disaster recovery plans, including automated processes to remotely reboot and remediate thousands of machines, aren't revolutionary. They're basic hygiene, especially when considering the potential consequences of a breach. Yet, this incident highlights a systemic failure across many organizations. While CrowdStrike erred, the real culprit is a culture of shortcuts and misplaced priorities within corporate IT.

Too often, companies throw millions at vendor contracts, lured by flashy promises and neglecting the due diligence necessary to ensure those solutions truly fit their needs. This is exacerbated by a corporate culture where CEOs, vice presidents, and managers are often more easily swayed by vendor kickbacks, gifts, and lavish trips than by investing in innovative ideas with measurable outcomes.

This misguided approach not only results in bloated IT budgets but also leaves companies vulnerable to precisely the kind of disruptions caused by the CrowdStrike incident. When decision-makers prioritize personal gain over the long-term health and security of their IT infrastructure, it's ultimately the customers and their data that suffer.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 171 points 1 month ago (76 children)

Please, enlighten me how you'd remotely service a few thousand Bitlocker-locked machines, that won't boot far enough to get an internet connection, with non-tech-savvy users behind them. Pray tell what common "basic hygiene" practices would've helped, especially with Crowdstrike reportedly ignoring and bypassing the rollout policies set by their customers.

Not saying the rest of your post is wrong, but this stood out as easily glossed over.

[–] [email protected] 9 points 1 month ago (2 children)

Rollout policies are the answer, and CrowdStrike should be made an example of if they were truly overriding policies set by the customer.

It seems more likely to me that nobody was expecting "fingerprint update" to have the potential to completely brick a device, and so none of the affected IT departments were setting staged rollout policies in the first place. Or if they were, they weren't adequately testing.

Then - after the fact - it's easy to claim that rollout policies were ignored when there's no way to prove it.

If there's some evidence that CS was indeed bypassing policies to force their updates I'll eat the egg on my face.

[–] [email protected] 7 points 1 month ago* (last edited 1 month ago)

from what ive read/watched thats the crux of the issue.... did they push a 'content' update, i.e. signatures or did they push a code update.

so you basically had a bunch of companies who absolutely do test all vendor code updates beings slipped a code update they werent aware of being labeled a 'content' update.

[–] [email protected] 1 points 1 month ago

I’m one of the admins who manage CrowdStrike at my company.

We have all automatic updates disabled, because when they were enabled (according to the CrowdStrike best practices guide they gave us), they pushed out a version with a bug that overwhelmed our domain servers. Now we test everything through multiple environments before things make it to production, with at least two weeks of testing before we move a version to the next environment.

This was a channel file update, and per our TAM and account managers in our meeting after this happened, there’s no way to stop that file from being pushed, or to delay it. Supposedly they’ll be adding that functionality in now.

load more comments (73 replies)