13 May 2020 Office Hours - Alerts Take 2 - Recording and Q&A
Thanks to all the attendees at this week's Office Hours session. You can find the recording here: The Q&A transcript is at the bottom of this post. We'd love your feedback, so please take a moment to respond to our survey. Our next session is planned for May 27th. You can register here. Q: How can I get the list of all unmonitored devices and data sources? and is there a way to prevent people from truning off the alerts? A: For retrieving a list of unmonitored devices, I would recommend check the minimal monitoring dynamic folder. This will generally contain all devices that have not discovered or associated. Regarding preventing users from adjusting the tuning of alerts this would be handled through our RBAC. Their user role will need to be restricted to prevent that. https://www.logicmonitor.com/support/settings/users-and-roles/roles Q: Sorry- very new to LM... Can you set alerts to only trigger during "prime time", for example, like 9:00-5:00? A: Yes, you can set a different threshold for different parts of the day. You need to open the wizard and select the Timeframe. Q: Can time based thresholds have days of the week? Not just time? A: No, but escalation chains can be configured to behave differently depending on time of day and day of week. (See "Create time-based chain" on https://www.logicmonitor.com/support/alerts/alert-delivery/escalation-chains) Q: I asked this on the invitation, but would really like to see the ability to route alerts based on datasource group. Multiple teams care about different things on a server. A: Would like to talk more about this to better understand the use case. I will open a support case for you so we can get some more detail. Q: Hello, please disregard if this isn't the correct forum, but I have a specific issue: configuring LM for EMC Avamar and it only seems to work when giving LM MCUser account credentials. When I give it a read-only account it throws errors. Is this safe, since MCUser is admin privileges? A: This may be better to dive into over a support case, but generally we will require whatever permissions are need to make a successful collection query to the device. If the device is only presenting this data when queried with administrative rights then it may be a firm requirement. If you have another account that can make the same query externally let us know and we can help narrow down what options might be available. Q: Instead of a Stage 2 that's empty, why not simply make the Escalation Interval zero? A: That’s definitely another option that’s completely valid. Q: Is there an "on call" or schedule feature for alerting rules on the roadmap? A: I don’t think it’s on the roadmap, but there is a workaround built by a former LogicMonitor engineer: https://www.youtube.com/watch?v=9CTfWJe8ujo Q: how do you add an empty stage? A: For completeness: Go to Settings >> Escalation Chains >> click the Gear icon to manage a specific escalation chain >> click the plus sign. Don’t enter any recipients. Q: For our organization the advices from LM was to get a range of priorities per recipient (groups) meaning priority 1000-2000 to Network 2000-3000 to Servers 3000-4000 to Workstations etc... but doesn't this create a level of priority between the groups? so If Network is concerned then the next group will not receive the alerts ? A: live answered Q: might be worth noting that things like Dynamic Thresholds aren't available for certain LM license levels. I went digging for that after the last Alert Routing Office Hours and couldn't find it. I thought I was going crazy, until support explained that our license level didn't include those features! A: live answered0Views0likes0CommentsCisco UCS Manager
For those that are monitoring Cisco UCS, using...https://www.logicmonitor.com/support/monitoring/networking-firewalls/cisco-ucs-monitoring are you finding that the default mapping sequence of minor and warning are out of order? def severity_map = [ "cleared" : 0, "info" : 1, "condition": 2, "minor" : 3, "warning" : 4, "major" : 5, "critical" : 6 ] And perhaps making the above point somewhat moot, that the thresholds are too low out-of-the-box? https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/ucs-manager/GUI-User-Guides/System-Monitoring/4-1/b_UCSM_GUI_System_Monitoring_Guide_4-1/b_UCSM_GUI_System_Monitoring_Guide_4-1_chapter_010.html Quote You monitor all faults of either critical or major severity status, as immediate action is not required for minor faults and warnings. You monitor faults that are not of type Finite State Machine (FSM), as FSM faults will transition over time and resolve. I know we can tune these and filter out the FSMs, curious if these defaults make sense to anyone. Thanks - Dave3Views0likes0Comments29 April 2020 Office Hours - Dashboards - Invitation
At LogicMonitor, we recognize that your business and infrastructures are under increased strain due to COVID-19. To assist you in monitoring your vital infrastructure, we would like to invite you to join us at 11am PST (1pm CST), Wednesday April 29th, for another LogicMonitor Office Hours. This week's session will highlight best practices and strategies for dashboards (including our new "Work From Home Dashboard" templates) as well as some of the most commonly asked customer support questions received over the past week. Specific areas of focus include: WFH Dashboard Dashboard basics Dashboard best practices A Q&A will also be available time permitting. Please use the link below to register: https://logicmonitor.zoom.us/webinar/register/6215880871322/WN_kBPpRBuZR3W3as2PcvYjcA Hope to see you there!Anonymous5 years ago0Views0likes0Comments15 April 2020 Office Hours - Alerts & Routing - Recording and Q&A
Thanks to all the attendees at last week’s Office Hours session. You can find the recording here: https://logicmonitor.wistia.com/medias/l1tusqhxg4 and the Q&A transcript below. If you would like to give feedback, go here: https://docs.google.com/forms/d/e/1FAIpQLScPWW5DzNxe2W5ieh6PjamLYWcP5AhDbUl1E3U7ZKryEgwEoA/viewform?usp=pp_url&entry.2118543627=2020-04-15. Our next session is planned for April 29th. You can register here: https://logicmonitor.zoom.us/webinar/register/1615849840118/WN_kBPpRBuZR3W3as2PcvYjcA?_ga=2.177082021.1444769543.1587337597-865086799.1576104046 Q&A Transcript Q: Is there a way you can download or view alert history? A: Hi Gary, you can configure an alert report to pull a historical view. https://www.logicmonitor.com/support/reports/report-types/alerts-report Q: Can we extract a report to see what instances or datasources or groups have alerting off manually or have modified alert thresholds? A: Yes, there is an alert thresholds report available in the report suite. https://www.logicmonitor.com/support/reports/report-types/alert-threshold-report Q: Hi LM! We rely on alert thresholds to provide insights on capacity management for our Cloud Based Phone System. 
One of our challenges is leveraging time-based thresholds to tell us if our traffic is greatly above, or greatly below our our typical usage. With that said, do you have any advice on handling time-based alert thresholds for Saturday/Sundays, where our usage is vastly different weekdays? A: Time-based thresholds are a good way to deal with situations like this. I may not understand your question, but the principles are the same as normal alert tuning, but with the difference that you take the time to think also about the different thresholds that are appropriate at different times. You would follow the same workflow, but just categorize the times differently. I would recommend that you use the wizard to make time-based thresholds. Q: RE: Time-based alerts cont. Hi Mike, My challenge is - LM alert-threshold for say, 9am may work for Weekdays, but not be what I need for 9am Saturday or Sunday. A: Ah, that’s correct. I was thinking of escalation chains, which do allow for day of week. I do not believe that time based thresholds allow for this. Sorry about my confusion. Q: Is there a way to still alert on something, but not route that alert anywhere if you're still utilizing a "catch-all" Alert Rule like "Error" or "Critical"? Basically.. we'd want to see them in the Alerts Dashboard, but not get emails/texts/calls about it. It would be useful to have a button to not route the alert, perhaps under the "Alert Routing" screen. I understand we can create high-level Alert Rules to not escalate those alerts, but I'm curious if there's a more granular option at the datasource/datapoint level. (Sorry for the book, just trying to be clear!) A: You can get as granular as you like with alert routing. The key is to remember that the alerts will be evaluated against the alert rules in order of alert rule priority, ascending. You can use this property to handle this kind of filtering. If you want alerts to not be routed, even with a catch-all, simply make sure that the empty escalation chain is matched with a rule that comes before the routed catch-all. Does this make sense? Q: If I put a device in SDT, and that device is also a collector, I need to configure SDT again in the collectors section. Is there a way that when I schedule a device or group down that the collector is also scheduled down? A: Not at this time. Collector alerts are handled differently than resource alerts. Q: How to trigger an alert whenever device configuration changes? A: On the page https://www.logicmonitor.com/support/logicmodules/articles/creating-a-configsource it describes how to create a ConfigSource. The section that begins with “Check Type: Any Change (diff check)“ Shows how to set this up. Q: When you have a device already managed in Logic Monitor that gets a hardware upgrade and the management IP of that host stays the same, the properties in LM remain tied to the old hardware I've seen. For example if we have a 2960X switch that gets replaced with a 9300, obviously two different model devices with different software version. Will LM automatically re-poll the PropertySource once the new host comes online (even though the monitored IP doesn't change), or do we have to go to Settings -> LogicModules->PropertySources, find the associated PS to the host and manually run it? A: Generally, if you're not replacing hardware with something that is "apples to apples" I would suggest adjusting the old hostname or removing the sunsetted device first. That said our discovery will attempt to apply or remove any logicmodules that meet the appropriate discovery criteria. Reason being is some modules may leave instances that no longer necessarily apply depending on their configuration. Q: With dynamic thresholds how far back does LM look back to determine what is normal? Does it take into account regular, periodic oscillations of the metric value? A: Hi, dynamic thresholds are determined by three days of prior data. For more information you can check out the following. https://www.logicmonitor.com/support/alerts/tuning-alerts/enabling-dynamic-thresholds-for-datapoints Q: Is there way to configure LM to do speed test checking? A: While I don't believe we have anything out of the box, you could likely script something for this. We have an example on our communities page. a href="https://communities.logicmonitor.com/topic/2282-ookla-speedtest/" rel="">https://communities.logicmonitor.com/topic/2282-ookla-speedtest/ Q: Can we insert list of top processes consuming CPU utilization in Linux for CPU load alert mail A: It's probably doable. You'd have to create a Groovy based complex datapoint that watches the cpu utilization datapoint. When that goes over threshold, the complex datapoint would run a Groovy script that could reach out to the device and gather the list of top processes. The script would then need to store that data as a property on the device. You'd then need to modify the alert message template so that the top-processes property were included in the alert message. This is very advanced stuff and I would recommend reaching out to your CSM for help engaging our Professional Services team. Q: Hi LM, First Thank you for this Webinar, I'm recently started to work with LM! I'm still learning about your platform and I have a problem with the dashboard. I'm getting information by SNMP on a network device and I have "NaN" when I want to exploit data, my data is a "String value" and visible by a brut data. Do you have any ideas how can I exploit a String value? Regards. A: If it’s a string, but contains numeric data, you can use a regex match to extract the numbers. If it’s strictly text, you can use a regex to match it (returning 1 if found and 0 if not found). The help page that describes these options is here: https://www.logicmonitor.com/support/logicmodules/datasources/datapoints/normal-datapoints/ Q: Are there plans to post this talk? I missed portions of it in the middle and would like to review at my convenience. A: Hi Dennis, these are recorded, we'll be sure to get a link your way once it's ready. Q: Will there be a recordings of these sessions? A: Hi Mike, we definitly record these! I'll make sure we get a link your way when this is ready. Q: AutoLogout when Displaying a Dashboard on a Wall Mounted TV requiring multiple Re-Logins throughout the day A: Turn on a slideshow. It should keep things refreshed. Q: Integration with FreshService using Cluster Alerts is failing to provide the ##HOST## or #SYSTEM.STATICGROUPS## details (leaving it blank in the subject line) at this time for our integration. Is there a known issue where cluster alerts will no longer provide the affected host in an alert subject line if they are part of cluster alerts? A: This question was answered live. See https://www.logicmonitor.com/support/logicmodules/about-logicmodules/tokens-available-in-datasource-alert-messages#cluster-alerts for tokens that can be used in Cluster Alert notifications. Q: Is there a way to audit how frequently dynamic thresholds decide to not send a notification? A high frequency of suppressions could indicate I should edit the static threshold A: That's a great idea. Be sure to submit feedback through the community or through your CSM to make sure that feature request gets added into the system. Q: Hello, will this session be recorded for us to reference later with our teams? A: Hi David, this will be recorded. We'll be sure to get you a link once it's ready. Q: Hi, is there a way to get the historical data, such as once per minute readings of available free space and capacity? A: Yes, simply expand the graph and you'll have the option to view and download the raw data.16Views0likes0CommentsPaloAlto firewall license monitoring
I created this a while back for a customer who wanted to view and alert on license expiry within their PaloAlto firewalls. It uses the same API key and API connection methods as our core PaloAlto_FW_* LogicModules, and calls the'<request><license><info></info></license></request>' endpoint. Monitors for days until expiry (or, whether the license is perpetual, or has already expired). Alert thresholds in this version are as requested by the original customer, at 60 and 30 days remaining, and for "has already expired". v1.0.0:FGMCN9Antony_Hawkins5 years agoEmployee4Views0likes0CommentsLMConfig flash contents tracking
Please add flash contents tracking to standard LMConfig modules. Ideally, also many other elements ala RANCID. Don't know if you would get in trouble just for getting ideas from the code, but there are a lot of useful commands RANCID tracks (https://github.com/haussli/rancid).mnagel5 years agoProfessor1View0likes0CommentsMethod to specify TITLE of Graph in Reports
We need a way to be able to specify the Title of any graph when we are creating reports.The OOTB method uses a very un-discernable title for something simple as CPU USAGE. It displays: "WinCPU with instance * on datapoint CPUBusyPercent". Any client looking at this report is going to ask WTHdoes this mean? We need a way to be able to specify the %TITLE% on each Datasource we specify when building a report. IF you can add a column to the DataPoints section called Title: Example in the above report each category is titled with: CPU Usage % title is: WinCPU with instance * on datapoint CPUBusyPercent Memory Usage % title is:WinOS with instance * on datapoint MemoryUtilizationPercent Disk Usage % title is:WinVolumeUsage- with instance * on datapoint PercentUsed Bandwidth In:WinIf- with instance * on datapoint BytesReceivedPerSec Bandwidth Out:WinVolumeUsage- with instance * on datapoint PercentUsed None of these titles make any sense to anyone who's trying to read the graphs. Can we please just have a way to specify the title.4Views2likes0CommentsDell DRAC slot occupancy DataSources
From a customer request: Two datasources that track slot occupancy in Dell DRAC blade devices. One lists each slot, one just gives a vacant/occupied count. Sadly, the OIDs exposed by the device give virtually nothing of use for alerting (e.g., there is no reflection of the health of whatever's in any slot), butthe value case here is for capacity planning. These will give you instant insight as to the occupancy or otherwise of your devices across all or any parts of your estate, so you'll be able to see quickly whether you already have space to add those six new blades you've determined that you need, whether you can consolidate into fewer lumps of tin, etc. Dell_DRAC_SlotOccupancy:FAY4PX(v1.0.0) (Multi-instance, lists details per slot as instance properties, including occupant type, service tag, etc) Dell_DRAC_SlotOccupancyCounts:RJDT3Z(v1.0.0) (Single-instance, returns counts of total, occupied and vacant slots) Example dashboard view, highlighting low occupancy:Antony_Hawkins5 years agoEmployee5Views0likes0CommentsModify alert notes en mass.
The ability to modify alert notes en mass the same way you can acknowledge multiple alerts at once would be a nice thing to have. When multiple (20+) alerts have a incorrect note put into them it is time consuming to go back and manually fix them one by one. I see the in new UI you can mass tag them but when you go to modify the note it tells you their already acknowledged. Thanks!1View0likes0Commentsneed distinct RBAC control for groups versus group members
We have concluded after some time that using groups to manage SDT for clients is the only real option (compared to repetitive manual effort by our clients) and realized we can grant manager access to "SDT groups" to allow clients to add/remove devices on their own. Except.... this also means they could remove the group itself. We need to be able to apply RBAC controls to group members only with the parent group protected from change or deletion.mnagel5 years agoProfessor1View1like0Commentsenable automatic FQDN in Netscan
Due to lack of partitioning for MSP setups, we MUST use FQDNs for all devices, but when scans run, they put in the discovered barenames. Please allow scans to include an FQDN, either via the scan definition or a property. FQDNs break things like Windows Cluster checks, but it has to be done.mnagel5 years agoProfessor1View0likes0CommentsTable widget enhancement
I have a need to design a table with 2x columns using the "percentage bars" with data points from differing Data Sources. Currently the table widget only allows "percentage bars" to work on "Dynamic" tables and "Dynamic" tables only allow you to pull data points from a single Data Source. Can the "Percentage bars" feature be added to "Custom" tables... or.. can "Dynamic" tables be enhanced to allow data points from multiple Data Sources?0Views1like0CommentsVMWARE: SNAPSHOT LOCKING - CONSOLIDATION FAILURE
VMWARE ESXi can perform snapshots on Virtual Machines which create the effect of a point in time (freezing) of a virtual disk. The new disk (delta disk) is created to continue to record changes in the I/O since the snapshot was created. When a snapshot is deleted, then all the changes are written down into the earlier disk. If you continue to delete snapshots until there are no snapshots then all the changes will be written finally down into the base vmdk disk. The delta disks are removed after this process which is called "Consolidation" of the delta disks. Due to situation in VMWARE ESXi where the files are locked, then the consolidation process will not occur or be interrupted by the condition. This will abort the process, and vm will continue to use the delta disk, and back out the changes or not commit them to the disk earlier. This will then create a snapshot chain that continues to grow. In our situation we use a Backup Solution that utilizes Snapshot Technology to freeze (Quiesce) the operating system and take a recoverable backup. Backups are taken every 15 minutes. In this scenario we have reach a snapshot situation where the disk chain is 255 Delta disks in length. There are no Snapshots in the GUI (LogicMonitor can see these). However, LogicMonitor can not see the delta disks. In theory, when a snapshot is taken, it will create a delta disk named "vmdiskname-000001.vmdk", and if a second snapshot is taken it will create a delta disk called "vmdiskname-000002.vmdk". When you remove or delete the snapshots, and the process completes, then the delta disks no longer exist. We need to see if the number of delta disks exceeds the number of snapshots in the GUI. If it does, then we have to repair the VM Manually. This situation is very dangerous for an ESX Host, because with a disk chain at around 255 Delta Disks, you will start to see very strange Delay Behaviors. The ESX has to track all the data through the Chain. Behaviors include VM Freezing, and then releasing, and then speeds up and processes faster than normally until all the data has been processed. Then it will freeze again. The more delta disks the worst the decay in performance for the ESX host.15Views1like0CommentsImplement a taxonomy only resource group type
We organise our resources into groups that are our business services. We also use the device groups to define a hierarchy of the business service groups, so we have device groups that are used as taxonomy groups, simplyto do nothing other than organise our hierarchy, and will never have any resources assigned tothem. However, it's possible to accidentally drag and drop resrouces into these 'taxonomy' groups, or for a user to mistakenly select one of these groups. What I would like is to have a special resource group type which is used for defining resource group taxonomy only and cannot have resources assigned to it. The group's icon should also be different so that it is easy to know that this is a taxonomy group.Mosh5 years agoProfessor3Views0likes0CommentsAlert when failover link is active
We have a Cisco router that has a primary link on G0/0 and a failover link on a cell link (Loopback0). I want to trigger an alarm when the Loopback0 interface starts passing traffic. Is there an easy way to set this up? I cloned the snmp64_If datasource and started trying to pull out the exclusion for loopback interfaces so it will show them in the interface list. Now I just need to figure out how to alert on traffic for those interfaces. Thanks, JW2Views0likes1CommentStatusPage.IO Monitoring
I have built a generic StatusPage.IO datasource to allow for monitoring the status of various services we use. Since so many companies are using StatusPage.io, I figured it's a good idea to have a heads up in the event there is an outage with one of our many service providers. This has worked well as an early warning system for our service desk guys to know about issues before they start getting calls from end users. LogicMonitor actually uses StatusPage, but of course there are many, many others. Attached is a screenshot of the Box.com StatusPage data that we've collected from https://status.box.com. This datasource should be universal to any statuspage.io site. So far it has worked against every site I have tested it against. NYJG6J12Views2likes0Comments- 2Views0likes5Comments
Datasource for API Gateway Resources behind a stage
I have been using a custom datasource to collect the metrics for each resource and method (excluding OPTIONS) behind a API Gateway stage. It has been extremely useful in our production environments. I would share the datasource via the Exchange, but the discovery method I'm using will not be universal, so I think it would be best if that discovery were to work natively. If possible, could we please have a discovery method for AWS API Gateway Resources by Stage? *Something to note - This has the potential to discover quite a few resources and thus, create a substantial number of cloudwatch calls which might hit customer billing. For this reason, I added a custom property ##APIGW.stages## so that I could plug in the specific stages I wish to monitor instead of having each one automatically discovered. The Applies To looks like this: system.cloud.category == "AWS/APIGateway" && apigw.stages Autodiscovery is currently written in PowerShell (hence why not everyone can take advantage of it) $apigwID = '##system.aws.resourceid##'; $region = '##system.aws.region##' $stages = '##APIGW.Stages##'; $resources = Get-AGResourceList -RestApiId $apigwID -region $region $stages.split(' ') | %{ $stage = $_ $resources | %{ if($_.ResourceMethods) { $path = $_.Path $_.ResourceMethods.Keys | where{$_ -notmatch 'OPTIONS'} | %{ $wildvalue = "Stage=$stage>Resource=$Path>Method=$_" Write-Host "$wildvalue##${Stage}: $_ $Path######auto.stage=$stage" } } } }6Views0likes1CommentHTTP / HTTPS Remote Session
Hi All, We have really been enjoying the Remote Management feature of logic monitor. For sites that we don't have a direct interconnect with its great being able to quickly SSH onto our devices to make adjustments or check config without having to open up a separate VPN tunnel. However with HTTP/HTTPS management becoming common with Firewalls, Controllers, Routers etc... I feel there is a huge opportunity to have logic monitor be able to fit almost every management use case by implementing an HTTP/HTTPS remote session functionally in the same way RDP and SSH remote sessions work. We as a company would primarily use this feature for help managing networking Equipment, but functionality would extend to Printers, IPCameras, Security Systems, Phone systems, UPS and many more. Let me know your thoughts, Thanks, Will.12Views4likes1Commenthow to identify network devices that do not have config back-up config source
Hi all, We have a heap of network devices that we monitor. Sometimes I'm not sure if all have their config backups running via the config source. I'm looking for a wayto identify network devices that do not have specific "properties" set that would cause them to not be backed up correctly. E.g. some sort of report that identifies all Cisco switches that do not have ssh.user and ssh.pass correctly set. This is the reverse of the "applies to" button in the config or data source configuration. How can I do this? Simon.2Views0likes2CommentsEscalation Chain text ACK notification, send to all users in alert stage
Some teams operate primarily through text and as such their escalation chains are set that way. The utility of an ACK notification to all members of the escalation chain stage gets lost when it is sent by a method, email, not even used in the escalation chain. This notification should use the same delivery method used in that stage to send the original alert. (https://www.logicmonitor.com/support/alerts/responding-to-alerts/responding-to-native-sms-alert-notifications/see description in table for ACK). This has been tested and email is the only method that will notify users, request is to have functionality put in place to send all users in alerting escalation chain the ACK notification instead of through email only. Thanks1View0likes0Comments- mnagel5 years agoProfessor6Views0likes4Comments
Raspberry Pi Core Temperature (SSH DataSource)
I was recently attempting to use a Raspberry Pi 4 as a streaming/ display mechanism for 1080p security camera footage - when I noticed a little temperature gauge flashing at the top of the display. Gee,I thought,it would sure be swell if: a) I knew what that meant, and 2.)I had some visibility into the temperature of thisPi, so I don't burn yet another hole in my workbench. Thus was bornRaspberryPi_CoreTemperature,an SSH-based datasource that uses the VideoCore General Command (vcgencmd)to monitor the core temperature of a Raspberry Pi. This module was developed on Buster, the command is also present in previous builds of Raspbian. Locator Code:Y6EY62Kerry_DeVilbiss5 years agoEmployee5Views1like0CommentsMissing ConfigSources PropertySource
This one came up when a customer pondered how they'd know if ConfigSources weren't finding instances on devices they should be (typically this would be due to an absence of valid credentials, for example). This PropertySource relies on API credentials (set as propertiesapiaccessid.key andapiaccesskey.key) and checks devices for any ConfigSource that applies to that device, but which has zero instances. As a PropertySource it *only* writes properties, which will look a bit like this: Obviously this doesn't trigger any alerting as written, but you can very very easily write a datasource that simply returns theauto.missing_configsources_count value and then alert on anything non-zero. v1.0.0:ZGEG67 Note:If you're tempted to try the same with DataSources, remember that they're nothing like as clear-cut -a resourcemay have all sorts of DataSources applied to it (e.g. IIS for a Windows server) that may quite correctly have zero instances discovered.Antony_Hawkins5 years agoEmployee3Views0likes0Commentsallow multiple http proxies in config
We believe we will soon move to a proxy architecture and allow only a few machines to reach out to the internet. For HA, I would like to request that "proxy.host=ip" be allowed to accept more than one proxy. The logic is that it should try first and if http connection is rejected then go on to the second one. It would be stupid for us to rely on one proxy as it is a single point of failure for all collectors in that datacenter.2Views0likes1CommentIgnore trailing part of oid
Hello all, I have a slight problem I'll try to explain as best I can. In a nutshell I want to plot latency among our router's system-ip (think loopback) I have an OID that begins: "1.3.6.1.4.1.41916.9.2.1.6.4.64.68.161.220.4" doing a walk on that will give values of our routers system-ip (2.21.184.1) . example: $ !snmpwalk [redacted] 1.3.6.1.4.1.41916.9.2.1.6.4.64.68.161.220.4 Walking OID 1.3.6.1.4.1.41916.9.2.1.6.4.64.68.161.220.4 from [redacted]:161 for version v3 with securityName=[redacted] authProto=MD5 authToken=***(10) privProto=AES privToken=***(10) pdu.timeout=5s walk.timeout=5s 107.199.135.25.2.12426.12386 => 2.21.184.2 107.199.135.25.2.12426.62520 => 2.21.184.1 The OID for latency begins with "1.3.6.1.4.1.41916.9.2.1.10.4.10.0.41.2.4" doing a walk on that gives correct values for latency. example: !snmpwalk [redacted] 1.3.6.1.4.1.41916.9.2.1.10.4.10.0.41.2.4. Walking OID 1.3.6.1.4.1.41916.9.2.1.10 from [redacted]:161 for version v3 with securityName=[redacted] authProto=MD5 authToken=***(10) privProto=AES privToken=***(10) pdu.timeout=5s walk.timeout=5s 107.199.135.25.2.12386.12386 => 83 107.199.135.25.2.12386.62520 => 82 If we compare the outputs we see where the problem comes in: 107.199.135.25.2.12426.12386 => 2.21.184.2 107.199.135.25.2.12386.12386 => 83 The 2nd to last group (sometimes the last 3), do not match so OID.##wildvalue## will not work. 12426 != 12386 What I need is a way to have ##wildvalue##.<ignore this> since the 107.199.135.25 part will always match, and the last 3 groups could change. Currently I have the datasource's SNMP OID parameter set to1.3.6.1.4.1.41916.9.2.1.6.4.64.68.161.220.4.. this gives the correct instance data and breaks it out by system-ip. But like i mentioned the ##wildvalue## does not work when trying to add1.3.6.1.4.1.41916.9.2.1.10.4.10.0.41.2.4. as the datapoint to plot because of the different trailer oid sections. Has anyone needed to do this before? If so, how was it done?4Views0likes2CommentsWidget for ping monitoring
Hi Everyone! I'm trying to figure out how to create a widget, that will show the top ten servers that have a ping of above 100ms. I can make the standard ping widget, but I don't see a way to limit the graph to those servers that have a high ping. Do I need to use the virtual datapoint? Or, maybe I need clarification on what it means when I select top 10 on an average, or maxrtt datapoint. Thanks!28Views1like6CommentsServiceNow New York upgrade
We are looking to integrate logicmonitor with our servicenow instance. At the same time, we are looking to upgrade to the New York release. The integration notes state that releases supported are up to Madrid. Will the integration continue to be supported on New York?3Views0likes1CommentServer Reboots
Does LogicMonitor have the ability to schedule automated reboots of Windows Servers AND provide notifications if/when a scheduled restart fails? Currently using DesktopCentral to schedule groups of servers to reboot on different days/times but it is limited in regards to notifications/reporting. If for whatever reason, ServerA does not reboot, no one knows unless you manually check. It would be great if I could schedule a server, (or group of servers), to be rebooted at a certain day&time and if for some reason one of them doesn't or can't reboot, a notification is sent out informing someone that it did not reboot.8Views0likes1CommentESXi Datastore Capacity - Thin and Thick Provisioned
In our environment we have a mixture of thin and thick provisioned datastores and my problem is that all of the ones that are thick provisioned immediately trigger the percent used thresholds set on the datapoint. Have any of you tinkered with splitting out thin vs thick into 2 datasources? I'm looking for ideas. Thanks.Shack5 years agoAdvisor3Views0likes1CommentWinServer Instance ErrorsLogon Alert
Hi Team, I have a server that is triggering an alert for FileServer instance with the datapoint of ErrorsLogon. I checked the Event Log and there is no events for failed logins. The WMI Query isSELECT * FROM Win32_PerfRawData_PerfNet_Server. Anyone have any ideas on how I can troubleshoot this? Has anyone else experienced this issue? It is only 1 server, no others are having this issue. As I stated in Event Viewer there is no failed login attempts. This triggers several times a day. Any help would be appreciated.joshlowit15 years agoNeophyte9Views0likes1CommentDimensionless Cloudwatch Metrics
I've been creating datasources to collect our custom AWS Cloudwatch metrics as per the docs:https://www.logicmonitor.com/support/monitoring/cloud/monitoring-custom-cloudwatch-metrics/- mainly this is fine... However it can't cope with dimensionless metrics: " Namespace>Dimensions>Metric>AggregationMethod , where Dimensions should be one or more key value pairs" I've tried creating datapoints without a dimension but it returns NAN (probably because LM requires "one or more key value pairs" for dimensions). We currently use a Python script to collect most our custom metrics but it's resource intensive for our collectors and I'm trying to move away from it. Does anyone know of a way to use the 'AWS CLOUDWATCH' collector with dimensionless metrics?12Views0likes5Commentsgroovy code library support
I just cannot bring myself to paste the same complex code into multiple LogicModule scripts, leaving little land mines scattered randomly. I was working today on a general template for using the API from within LogicModules using code I found scattered around different modules (we keep backups of everything, making it somewhat easy to search for those). Just a few things I noticed: * all the code is different * nothing I found so far accounts for API rate limiting * various inefficiencies exist in at least some of what I found The correct solution to all of this is to make a library feature available so we can maintain Groovy functions and such in one place, calling them from LogicModule scripts. It is very sad to see how little re-use is possible within the framework at all levels, and this one is especially bad in terms of maintenance and things breaking easily when changes are made in the API backend.mnagel5 years agoProfessor7Views1like2CommentsDrag and Drop: End-user Control/Management
Good Morning All, As you can imagine, we had some human error, were an engineer had accidentally dragging and dropping a group into an incorrect location. In working with support, I see that I could make a user account and modify that account to work as needed. However, if there could be an easier or more direct method of preventing users from moving but still allowing engineers how to do their tasks; I feel it would be a great benefit to the product and sanity of management. Thank you, Rudy1View0likes0CommentsURL Provided in Alert based on DataPoint
Hi all, I have been tasked with providing specific URLs to our troubleshooting knowledge base, based on the type of alert received. I know that we can add the ##DEVICEURL## token to alerts, but we want to be even more granular than device-level URLs. For example, say an alert comes in for high CPU utilization, we'd like the alert to provide the link to the page that instructs our service desk on the required steps to troubleshoot that particular issue, so there would be multiple URLs associated with one device. I hope I explained that correctly. I'm a bit of a novice in IT so I'm hoping someone can point me in the right direction. Thanks!4Views0likes5CommentsAdd 'Threshold History' and 'LastModified' to Alert Thresholds report
Hi Can we please get the 'Threshold History' and a 'Last Modified' date as optional columns in the Alert Thresholds report. This would allow a huge time saving in reviewing modified thresholds. Currently we have to run the report which results in hundreds of entries, each of which then requires someone to go to the corresponding device, locate the alert, go into edit the thresholds in order to view the history and time the change was made. This current method is not practical now when we've just started using LogicMonitorlet alone when we add hundreds more devices.2Views1like0CommentsMake auditing work all the time
Hi guys I've notice that the audit logs don't capture all events all the time. EG When changing a threshold on an alert the accompanying note is only captured by auditing some of the time. This makes me worry about what else is not being captured reliably. Support suggested I log this as a feature request so please make auditing capture ALL events ALL the time. Thanks.0Views1like0CommentsDashboard Templates
I'm currently working on building out common dashboards for a large number of subgroups of server resources in our environment (we're a multi-tenant managed hosting company - so one for each of our customer's environments). it's a simple dashboard with 5 widgets. I have each of them driving device selection through a token with our internal unique customer id number. The problem will come if I ever need to change anything in the dashboard... while I built a single dashboard initially, then cloned it 50 times... making any incremental change 50 times sounds like I'm guaranteed a day of annoyance in the future. I'd like to see a way to make a template dashboard that can be linked to so you can change a single master dash and have an entire set of child dashboards change. In our example, the server status dashboards for each of our clients.Cole_McDonald5 years agoProfessor3Views3likes4CommentsFunctionality to return the details of a Logic Monitor dashboard in json format via REST API
In order to integrate Logic Monitor with external reporting tools like Grafana, we need REST APIs from Logic Monitor which can return all the real-time details of a Logic Monitor dashboards data in form of json format. As per our understanding such REST APIs do not exist as of now and the only option is to manually click the export option on the Logic Monitor portal and the then manually import the data back in external system which is not a viable option. Therefore request you to provide the REST APIs which we can returnthe details of the Logic Monitor dashboard in json format.3Views0likes1CommentADFS Certificate Expiration Monitor
Two quick monitors to check days left on ADFS TOKEN SIGNING and TOKEN DECRYPTING certs. I ran into a situation that even though they were set to auto rollover they did not. Also this will help in letting you know and prepare to get the new cert to external sources for ADFS LM Exchange Locator: 3WYNWA YDWEHH5Views1like2Commentslmconfig control at device/group level
I have raised this numerous times with my CSM, account manager, etc. but it does not seem to be getting traction. It is a lawsuit waiting to happen, so it really deserves attention. Right now, the trigger for LMConfig is entirely arbitrary depending on who wrote which code. Most often it is based on ssh.user and ssh.pass being defined in a device. The problem with this is there are other reasons to have those properties (e.g., Err-Disabled port detection), so you can enable LMConfig across many devices and incur a large cost (especially with the new contract terms) without intent. It should be required to check off an "Enable LMConfig" option at the device or group level, and similarly for any other premium feature. Minimally, all the configsources should be changed to have a"enable_lmconfig" or similar property required in the Applies To logic.mnagel5 years agoProfessor1View0likes0CommentsLM "Actions"
I would love to see LM implement a new feature for taking a built-in, self prescribed,action on an alert. To minimize any exposure that LM might have in an action gone awry, the actions taken could occur as the result of a script that one could upload into the Escalation Chain. Ideally you could define multiple actions or multiple retries on an action and whether that occurred before or after the recipient notification in the notification chain. This would allow for very basic alerts (disk, service restarts, etc) to be resolved programatically. Also being able to support various scripting languages such as PowerCLI, Ansible, etcwould allow for some very creative ways to integrate with solutions such as VMWare or Ansible Tower for very complex actions to be crafted by more expert skill level folks.15Views5likes7CommentsCisco c300v Cloud Email Security Appliance
Isthere anyone know how I can monitor this appliance in LM. Show following informationin LM for monitoring <features> <feature name="McAfee" time_remaining="102892200" /> <feature name="External Threat Feeds" time_remaining="102892200" /> <feature name="Sophos" time_remaining="102892200" /> <feature name="File Analysis" time_remaining="102892200" /> <feature name="Bounce Verification" time_remaining="2712117032" /> <feature name="Cloud Administration" time_remaining="2712117032" /> <feature name="IronPort Anti-Spam" time_remaining="102892200" /> <feature name="IronPort Email Encryption" time_remaining="102892200" /> <feature name="Data Loss Prevention" time_remaining="102892200" /> <feature name="Intelligent Multi-Scan" time_remaining="102892200" /> <feature name="File Reputation" time_remaining="102892200" /> <feature name="Incoming Mail Handling" time_remaining="2712287735" /> <feature name="Graymail Safe Unsubscription" time_remaining="102892200" /> <feature name="Outbreak Filters" time_remaining="102892200" /> </features> <counters> <counter name="inj_msgs" reset="734118" uptime="734118" lifetime="734118" /> <counter name="inj_recips" reset="734965" uptime="734965" lifetime="734965" /> <counter name="gen_bounce_recips" reset="298748" uptime="298748" lifetime="298748" /> <counter name="rejected_recips" reset="5087" uptime="5055" lifetime="5087" /> <counter name="dropped_msgs" reset="48" uptime="48" lifetime="48" /> <counter name="soft_bounced_evts" reset="143030" uptime="143030" lifetime="143030" /> <counter name="completed_recips" reset="734896" uptime="734896" lifetime="734896" /> <counter name="hard_bounced_recips" reset="495144" uptime="495144" lifetime="495144" /> <counter name="dns_hard_bounced_recips" reset="6" uptime="6" lifetime="6" /> <counter name="5xx_hard_bounced_recips" reset="495118" uptime="495118" lifetime="495118" /> <counter name="filter_hard_bounced_recips" reset="0" uptime="0" lifetime="0" /> <counter name="expired_hard_bounced_recips" reset="20" uptime="20" lifetime="20" /> <counter name="other_hard_bounced_recips" reset="0" uptime="0" lifetime="0" /> <counter name="delivered_recips" reset="239751" uptime="239751" lifetime="239751" /> <counter name="deleted_recips" reset="1" uptime="1" lifetime="1" /> <counter name="global_unsub_hits" reset="0" uptime="0" lifetime="0" /> </counters> <current_ids message_id="734968" injection_conn_id="442587" delivery_conn_id="73052" /> <rates> <rate name="inj_msgs" last_1_min="0" last_5_min="0" last_15_min="4" /> <rate name="inj_recips" last_1_min="0" last_5_min="0" last_15_min="4" /> <rate name="soft_bounced_evts" last_1_min="0" last_5_min="0" last_15_min="0" /> <rate name="completed_recips" last_1_min="0" last_5_min="0" last_15_min="4" /> <rate name="hard_bounced_recips" last_1_min="0" last_5_min="0" last_15_min="0" /> <rate name="delivered_recips" last_1_min="0" last_5_min="0" last_15_min="4" /> </rates> <gauges> <gauge name="ram_utilization" current="2" /> <gauge name="total_utilization" current="28" /> <gauge name="cpu_utilization" current="9" /> <gauge name="av_utilization" current="0" /> <gauge name="case_utilization" current="11" /> <gauge name="bm_utilization" current="0" /> <gauge name="disk_utilization" current="0" /> <gauge name="resource_conservation" current="0" /> <gauge name="log_used" current="4" /> <gauge name="log_available" current="352G" /> <gauge name="conn_in" current="2" /> <gauge name="conn_out" current="0" /> <gauge name="active_recips" current="23" /> <gauge name="unattempted_recips" current="23" /> <gauge name="attempted_recips" current="0" /> <gauge name="msgs_in_work_queue" current="0" /> <gauge name="dests_in_memory" current="11" /> <gauge name="kbytes_used" current="205" /> <gauge name="kbytes_free" current="71302963" /> <gauge name="msgs_in_policy_virus_outbreak_quarantine" current="0" /> <gauge name="kbytes_in_policy_virus_outbreak_quarantine" current="0" /> <gauge name="reporting_utilization" current="6" /> <gauge name="quarantine_utilization" current="0" /> </gauges>8Views0likes0Comments