Recent Discussions
User defined "host dead" status
There are two ideas that I need help maturing before talking to LM about them. Both have to do with how LM uses server side logic to declare a device dead. We need the ability to designate what metric declares a device as dead/undead and when. What We have several customers who have devices, usually UPS, at remote sites all connected to a Meraki switch. The collector is not at the remote site, but connects over a VPN tunnel, which may be torn down due to inactivity or could be flaky for any other reason. When the VPN tunnel goes down, the devices alert that they have gone down. We have added monitoring to the tunnel and also get alerted when it goes down. However, we'd like to prevent the host down alerts when the only problem is that the VPN tunnel is down. RCA (or recently renamed DAM?) would likely solve this, but defining that mapping manually or through a topologysource is not scalable (plus visibility into the RCA logic is never been good). Luckily, Meraki has an API where we can query the status of devices connected to the switch. During a tunnel outage, this API data shows that the device is still connected to the switch and online. Since it's a UPS, that's sufficient. We've built the datasource required to monitor the devices via the Meraki API. However, since it's a scripted datasource, it doesn't reset the idleInterval. (Insert link here to a really good training or support doc explaining how idleInterval works.) Since none of the qualifying datasources are working on the UPS during the VPN outage, the idleInterval eventually climbs high enough to trigger a host down alert. When the host is declared down, other alerts, like the alerts from this new Meraki Client Device Status DS, are suppressed. How can this be remedied? So, we need the supported and documented ability to use the successful execution of a collection script to reset the idleInterval. I know this is possible today as I've seen it in several of LM's modules. However, I've never seen official documentation on how to do it. LM's probably worried someone will add it to all their scripts, which wouldn't be the right thing to do. When I know I'm not the only one. I need control over the server side logic that determines when the idleInterval declares a device dead. In the example above, we get a slew of host down alerts when the VPN tunnel goes down. However, usually within a few minutes, the VPN gets reestablished and the collector reestablishes connectivity to the device and the idleInterval resets, thus clearing the alerts. With a normal datapoint, I'd just lengthen the alert trigger interval for the idleInterval datapoint. This would mean that the device would have to be down for 15 minutes, 20 minutes, however long I want before generating the alert. What's great is that now we can do that on the group level, so I can target these devices specifically and not alert on them unless they've been down for a truly unacceptable amount of time (i.e. not just a VPN going down and coming right back up). However, the idleInterval datapoint is an odd one. Two things happen. One happens when you surpass the threshold defined on the datapoint. I can't remember what the default is, but in my portal, that's > 300 or 5 minutes. At 6 minutes server side logic, which has been inspecting the idleInterval, decides that the device is down which has implications on suppressing other alerts on the device. As far as I can tell, lengthening the alert trigger interval on the idleInterval datapoint has no effect if the window would exceed the 6 minutes that the server side logic uses to declare the device down. What do we need? We need the ability to set the amount of time that the server side logic uses to declare the device down. We need to be able to set that for some devices and not others. So we need to be able to set it globally, on the group level, and on the device level. Preferably this could be set in the alert trigger interval on the idleInterval datapoint since this mechanism already exists globally and on the group and device levels. Knowing that this could be a confusing way of defining it (since it's measured in poll cycles not minutes/seconds) so, it could alternatively be done as a special property on the group(s)/device(s). I'm interested in hearing your thoughts, even if you are an LMer.Anonymous40 minutes ago51Views7likes3CommentsAny way to automate tasks via alerts?
Hi, I know LM doesn't support taking any action when there's an alert. However, I'm wondering if anyone has any neat ideas on how to accomplish something by way of some other automation program. Maybe Power Automate? Here's what I'm thinking. I have a module that monitors a service to see if it's running or not. If it's not, it generates an alert. This seems like the most basic thing to automate because all I'd want to do is run the start-service command to start it back up. So, I'm wondering if I can have the alert send an email to a certain address. That address could then watch for a certain email to arrive. It could then parse out the servername and service name and put that into a start-service -computername xxx -name yyy. Has anyone looked into doing anything like that and did you have any success? Thanks.6Views0likes1CommentBug Report: LogicModule Microsoft_SQLServer_GlobalPerformance
Active Discovery script At the end, line 22 is missing ?: '.' ...meaning that it should be: def jdbcConnectionString = hostProps.get("mssql.${instanceName.trim()}.mssql_url") ?: hostProps.get("auto.${instanceName.trim()}.mssql_url") ?: '.'David_Bond2 days agoProfessor67Views1like8CommentsUIv4 Lacks Parity
Thought I'd give UIv4 dashboards another try today. Lasted less than 30 seconds. Op notes used to be visible on line graph widgets. They're not there anymore. You can get them to show up if you select the "show ops notes" button; it pops up a drawer from the bottom, covering up part of my dashboard. I can minimize it, but now i have a big bar of wasted space covering up part of my dashboard.Anonymous2 days ago41Views2likes2CommentsResource Tree Deprecation?
I missed the latest the recent virtual Roadmap Roadshow. When reviewing the summary notes from the session, I noticed the following statement regarding the resource tree. ResourceExplorer: Say goodbye to the traditional resource tree! Our new Resource Explorer is designed for modern environments like Cloud and Kubernetes. It leverages metadata and tagging to help you quickly find and visualize resources. I don't see how resource explorer can provide feature parity when it comes to filtering, setting custom/inherited properties, grouping devices, API query refinement etc. If we lose this functionality, the product loses half it's value. Is there anyone in the community that sat in for this session that can provide some more context?SolvedDrelo2 days agoNeophyte34Views0likes3CommentsExcessive snmp requests with a community string I am not using
I have some switches that are getting hammered by a few of my collectors and I can't figure out why. The logs on them are full of this message: snmp: ST1-CMDR: Security access violation from <Collector IP> for the community name or user name : public (813 times in 60 seconds) I don't have "public" set for this set of switches anywhere and it is coming from my collectors. I don't have any netscans for the subnet they are on. In my portal everything looks normal for these switches. I'm not sure what else to be looking at to figure this out, anyone have any thoughts? Thank you!pgordon3 days agoAdvisor53Views1like7CommentsCan I monitor a Linux Process by something other than the name?
Hi, We have some servers with a certain process we need to monitor. It's a Java process that runs a specific Jar file. This is how it show sup in the "Add Other Monitoring" area: I have a LogicModule to monitor Java processes already, but it goes off the name. The name of this is just Java, but there could be lots of other things running in Java. I need to be able to just find the one with the imqbroker.jar part in there. Unfortunately, I don't know how to do that. Can it be done in LM easily so I can apply this to a bunch of servers? Thanks for any help.20Views0likes1CommentNew Support Portal Suggestions
New Support Portal Suggestions Can the default sort order display the most recent tickets at the top? I do not need to see my oldest tickets at the top. Can a few date columns be added? Date Opened, Date Updated, Date Resolved Then I want to be able to add them or hide them from my list view. Can the ticket comments, correspondence and history be included in the email notification I receive when an update to a ticket has been made? I think its important to be able to correspond to tickets via email. Can we have that back? For organizational tickets - I see a contact name field but in my portal they are all blank. Will those be populated?Shack4 days agoAdvisor186Views8likes8Comments