Speex

joined 1 year ago
MODERATOR OF
[–] Speex 1 points 1 year ago

It’s on the list of places for us to visit for sure.

[–] Speex 1 points 1 year ago
[–] Speex 2 points 1 year ago

Nope. Not a financially viable thing to do atm.

Good luck at your first event. Hope it’s smooth and filled with appropriate amounts of available hydration opportunities.

[–] Speex 2 points 1 year ago

I wonder if it’s a Alonso knows who dad is. On the other hand they all do seem to be working well together an making excellent progress. Either way it’s good to see an AM getting some podiums

[–] Speex 2 points 1 year ago

:( sad but true.

[–] Speex 3 points 1 year ago

Agreed. It’s wild they have no consistency this year. I hope it changes. Either way still rooting for them.

I got the rookie joke messed up with De Vries from earlier in the year, My bad just ignore that one.

[–] Speex 3 points 1 year ago

NP.

I haven’t looked really. There is a DevOps community I think. Haven’t seen any SRE (site reliability engineer) or monitoring communities, One will probably pop up sooner than later.

[–] Speex 6 points 1 year ago (2 children)

This is pretty accurate to what I do professionally.

The point made here about the Average user experience is super super important. It’s good to know what that is for several reasons. Mainly performance tuning. But when it comes to trying to prevent disasters the middle isn’t useful.

Another thing to add. This came to me recently. There are two kinds of graphs and dashboards, those for technical folks and those for managers and non-technical folks. You want to develop both or one with variables to then simplify the graphs/dashboard. Annotations and good titles IMHO are good. Some folks prefer to have technical graph titles. I get the draw but I have to deal with multiple leads, C levels, project managers, and managers that don’t care about the technical stat just where it is compared to where it should be

[–] Speex 16 points 1 year ago (4 children)

I can give a brief(ish) overview sure.

Monitor everything :P

But really monitor meaningfully. CPU usage matters but a high CPU usage doesn’t indicate an issue. High load doesn’t mean an issue.
High CPU for a long period of time or outside normal time frames does mean something. High load outside normal usage times could indicate an issue. Or when the service isn’t running. Understand your key metrics and what they mean to failures, end user experience, and business expectation.

Start all projects with monitoring in mind, the earlier to you begin monitoring the easier it is to implement. Re configuring code and infrastructure after the fact is a lot of technical debt. If you are willing and can guarantee that debt will be handled at a later time then good luck. But we know how projects go.

Assign flags to calls. If your application runs results in a response that’s started from and ends up at an end user, Send an identifying flag. Let that flag travel the entire call and you are able to break down traces and find failures.. Failures don’t have to be in error outs, time outs. A call that takes 10x longer than the rest of the calls can cascade and shows the inefficiency and realiability.

Spend time on log and error handling. These are your gatekeepers to troubleshooting. The more time spent upfront making them valuable, the less time you have to look at them when shit hits the fan.

Alerts and Monitors MUST mean something. Alert fatigue is real, you experience it everyday I’m sure. That email that comes in that has some kind of daily/weekly status information that gets right clicked and marked as read. That’s alert fatigue. Alerts should be made in a way that scales.

  • Take a Look as a time allows - logs with potential issues
  • Investigate as something could be wrong - warnings
  • Shits down fix it - Alert

APM matters Collect that data, you want to see everything from processor to response times, latency, and performance. These metrics will help you identify not only alerting opportunities but also efficiency opportunities. We know users can be fickle. How long are people willing to sit and wait for a webpage to load…. Unlike the 1990’s 10-30 seconds is not groovy. Use the metrics and try to compare and marry them with business key performance indicators(KPI). What is the business side looking for to show things are successful. How can you use application metrics and server metrics to match their KPIs.

Custom scripts are great. They are part of the cycle that companies go through.
Custom scripts to monitor —> Too much not enough staff —> SAAS Solutions (Datadog, Solar Winds, Prometheus, Grafana, New Relic) —>. Company huge SAAS costs high and doesn’t accurately monitor our own custom applications —> and we’re back to custom scripts. Netflix, Google, Twitter all have custom monitoring tools.

Many of the SAAS solutions are low cost and have options and even free tiers. The open source solutions also have excellent and industry level tools. All solutions require the team to actively work on them in a collaborative way. Buy in is required for successful monitoring, alerting, and incident response.

Log everything, parse it all, win.

[–] Speex 19 points 1 year ago (7 children)

You got some monitoring in place? Can offer some assistance with monitoring ideas if you need, is part of what I do.

Also take care of yourself. We can go outside if we can’t log in. Or go back to work..

[–] Speex 6 points 1 year ago (8 children)

2002 Hyabusa, 2018 BMW R1200GS Rallye, 2021 GasGas 280TXT

[–] Speex 1 points 1 year ago

Gotta get that oh so satisfying Poutine

view more: ‹ prev next ›