Forum Discussion

Mike_B's avatar
Mike_B
Icon for Neophyte rankNeophyte
21 days ago

Alerting across multiple sites with different business hours

Hi team,

Got a dilemma I'm hoping someone can help with.

We have a customer with 200+ sites across multiple time zones.

Although some of these sites in the same TZ have the same opening hours, there are more that do not.

In the past we have used the earliest and latest times for each TZ as the times for the escalation chain.
But, this has meant our engineers are getting alert callouts when a lot of the sites are closed and we cannot perform any troubleshooting at site.

We need to stop this without making the job too onerous on our admin team.

Our idea is to have the customer manage some properties on each site device that specifies the business hours for each day.
Then we want alerting and escalation to be able to reference those properties and ONLY alert and escalate when within the specified business hours.

Anyone know how we can do this?

Thanks.

  • To throw it out there, perhaps some 3rd party services may work better for implementing that logic. Perhaps something like OpsGenie or PagerDuty? Not endorsing either but they sound like they was designed with notification scheduling in mind.

  • There's no way you can let a user edit some properties and not others. In fact, in order to edit any property, they'll need full RW access on the device. Alternatively, you could make a simple front end where they use a form to modify a record; you'd then need some code to sync that choice into the property on the device (not too complicated). Once you figure out how that property will get set...

    That property can drive group membership. You can have bespoke alert rules per time zone that apply to those dynamic groups. Each rule can point to a different escalation chain.

    • Mike_B's avatar
      Mike_B
      Icon for Neophyte rankNeophyte

      Thank Stuart,

      I like the dynamic group idea and rules for each.
      I might play around with that idea a bit more.

      I was hoping to not have to do a heap of rules and escalation chains.

      The main issue is managing the rules and escalation chains.

      I've created a matrix and gone through every days opening hours for every site.
      I've done this across every time zone.
      Then I've looked for any marked any duplicate open periods. This lets me consolidate some sites to a single rule / chain.

      But, I'm still left with 151 unique opening hours across those sites that we need rules for and chains based on the time zones they're in.
      And then we hit a problem should the opening hours get changed to fall outside of what we've created dynamic groups for.  This is a likely possibility as the customer is a large retail brand across multiple countries.

      Tbh, I'm not liking my chances of simplifying the rules & chains....and am probably just putting off the inevitable.

      Keen to hear any other ideas about this.

      Cheers,

      • Stuart_Weenig's avatar
        Stuart_Weenig
        Icon for Mastermind rankMastermind

        It wouldn't be too difficult to maintain those via script through the API. Your spreadsheet could be the input and your script could manage the alert rules based on what's in the spreadsheet.

  • Hi Mike, I had same issue and I signed up for Pagerduty (Free account) and connected LM with pagerduty account and created the escalation policies in the pagerduty to forward the alerts to different teams (in different timezones) and also configured a backup where in if no one is availble or acknowedging that alert in pagerduty app then route it to someone else.