Recent Discussions
What is regex and how to use it in LogicMonitor
Our Tech Support team occasionally received some customer's questions related toRegEx usage within LM environment, Regex can be complicated if you do not know how to use it, however, it can be a very useful tool for you here in LM. I am going to cover 4 topics in this article, they are: 1) Basic general examples on Regex 2) Regex text match for HTTP Datasource 3) Using Regex for dynamic groups 4)Using Regex to filter out results from Datasources 1) Basic general examples on Regex How to use ^' and ‘$’ Below teaches you how to use the symbols ^ and $. These symbols are to indicate start or end of the string. “^Hello" matches any string that starts with "Hello". “Percentage used$” matches a string that ends in with “Percentage used". “^def$" a string that starts and ends with "def" - effectively an exact match comparison. “Percentage Used" a string that has the text “Percentage Used" in it. You can see that if you don't use either of these two characters, you're saying that the pattern may occur anywhere inside the string -- you're not "hooking" it to any of the edges. How to use '*', '+', and ‘?' In addition, the symbols '*', '+', and '?', denote the number of times a character or a sequence of characters may occur. What they mean is: "zero or more", "one or more", and "zero or one." Here are some examples: “ab*" matches a string that has an a followed by zero or more b's ("ac", "abc", "abbc", etc.) “ab+" same, but there's at least one b ("abc", "abbc", etc., but not "ac") “ab?" there might be a single b or not ("ac", "abc" but not "abbc"). “a?b+$" a possible 'a' followed by one or more 'b's at the end of the string: Matches any string ending with "ab", "abb", "abbb" etc. or "b", "bb" etc. but not "aab", "aabb" etc. How to use Braces { } You can also use bounds, which appear inside braces and indicate ranges in the number of occurrences: “ab{2}" matches a string that has an a followed by exactly two b's ("abb") “ab{2,}" there are at least two b's ("abb", "abbbb", etc.) “ab{3,5}" from three to five b's ("abbb", "abbbb", or “abbbbb") --- Note that you must always specify the first number of a range (i.e., "{0,2}", not "{,2}"). Also, as you might have noticed, the symbols '*', '+', and '?' have the same effect as using the bounds "{0,}", "{1,}", and "{0,1}", respectively. Now, to quantify a sequence of characters put them inside parentheses: “a(bc)*" matches a string that has an a followed by zero or more copies of the sequence "bc" “a(bc){1,5}" one through five copies of "bc." How to use '|' OR operator There's also the '|' symbol, which works as an OR operator: “hi|hello" matches a string that has either "hi" or "hello" in it “(b|cd)ef" a string that has either "bef" or "cdef" “(a|b)*c" a string that has a sequence of alternating a's and b's ending in a c How to use Period (‘.') A period ('.') stands for any single character: “a.[0-9]" matches a string that has an a followed by one character and a digit “^.{3}$" a string with exactly 3 characters How to use Bracket Expressions "[ ]" Bracket expressions specify which characters are allowed in a single position of a string: “[ab]" matches a string that has either "a" or "b" (that's the same as "a|b") “[a-d]" a string that has lowercase letters 'a' through 'd' (that's equal to "a|b|c|d" and even "[abcd]") “^[a-zA-Z]" a string that starts with a letter “[0-9]%" a string that has a single digit before a percent sign ",[a-zA-Z0- 9]$”a string that ends in a comma followed by an alphanumeric character You can also list which characters you DON'T want -- just use a '^' as the first symbol in a bracket expression (i.e., "%[^a- zA-Z]%" matches a string with a character that is not a letter between two percent signs). In order to be taken literally, you must escape the characters "^.[$()|*+?{\" with a backslash ('\'), as they have special meaning. On top of that, you must escape the backslash character itself in PHP3 strings, so, for instance, the regular expression "(\$|A)[0-9]+" would have the function call: ereg("(\\$|A)[0-9]+", $str) (what string does that validate?) Just don't forget that bracket expressions are an exception to that rule--inside them, all special characters, including the backslash ('\'), lose their special powers (i.e., "[*\+?{}.]" matches exactly any of the characters inside the brackets). And, as the regex manual pages tell us: "To include a literal ']' in the list, make it the first character (following a possible '^'). To include a literal '-', make it the first or last character, or the second endpoint of a range." --------------------------------------------------------- 2) Regex text match for HTTP Datasource Below is an example of a regex text match case I attended before. In this case, thedatasource will look for the specific text in that webpage and will return a 1 of the text exist or return a 0 if there are no text. --------------------------------------------------------- 3) Using Regex for dynamic groups You can create a group that filters out a specific range of IP address based on the Regex given: /monthly_2017_04/Pic3.png.5cfeea779bc77c9b10050a0d15d9d810.png" rel=""> Based on this expression it filters out 7 devices. Using a regex calculator to test this expression. However do note in Lm it must be formatted as join(system.ips,",") =~ “10\\.15\\.20[01]\\." \\ we do not accept just a single \ --------------------------------------------------------- 4)Using Regex to filter out results from datasources You can use regexMatch to filter out different types of windows services so that you do not need to display all the unwanted services that are not required. ---------------------------------------------------------200Views2likes0CommentsWindows Drive Space Alerts
Windows Drive Space Alerts By default, LogicMonitor alerts on the percentage used on any drive. This in general is fine, but sometimes not. Let’s imagine you have a 2.2 terabytes drive. You might have your critical threshold set at 90%, which sounds fine, until you realise that you are going to get a critical alert when you still have 220 GB free. In my case that would be a cause for some celebration, not really an urgent need to get up at 3 A.M. and delete files so the world doesn’t end. Now Imagine your 2.2TB drive is divided up as: C: 10 GB (OS) D: 500 GB (Mission critical applications) E: 1 TB (Backups) F: 510 GB (Other Applications) A 90% alert will give you a critical at 1GB,50GB,100GB and 51GB respectively. Now the C: drive may be a cause for concern, but the others not so much. The two application drives you might only be concerned if they have less than 4GB free and the backup less than 10GB. So, we decide to alert on the following C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GB You could clone the datasource so you have four copies one for each drive but this is harder to maintain in the future and does not scale well. It would be better if you could somehow get the drive letter and assign a threshold based on that. Logicmonitor’s scripted complex datapoint using groovy to the rescue. The disks datasource queries the class Win32_Volume. We need to use the raw drive letter output from the WMI class so would write a groovy script like: Drive=output["DRIVELETTER"]; return(Drive); This returns C:,D:,E: and F: Not much use as Logicmonitor doesn’t deal with text, only metrics. Let’s beef up the script. drive = output['DRIVELETTER']; freeSpaceLowerLimitGigabyte = '0'; if (drive == 'C:') {freeSpaceLowerLimitGigabyte = '1';} if (drive == 'D:' || drive == 'F:') {freeSpaceLowerLimitGigabyte = '4';} if (drive == 'E:') {freeSpaceLowerLimitGigabyte = '10';} return freeSpaceLowerLimitGigabyte; This returns 1,4,10 and 4 for each drive, now we have a complex datapoint that returns the lowerlimit in GB for each drive dependant on the drive letter. Again, we can’t alert on this so we need another datapoint So we can use this to check if freespace is less than the freeSpaceLowerLimitGigabyte. To do that create a CapacityAlert datapoint using this expression if ( lt (FreeSpace, FreeSpaceLowerLimitGigabyte * 1024000000) , 1, 0) Which breaks down as if freespace is less than the assigned limit for that drive letter then return 1 (which you alert on.) Otherwise return 0. Alert threshold set at = 1 1 1, and we get critical alerts if: C: freespace is <1 GB D: freespace is <4 GB E: freespace is <10 GB F: freespace is <4 GBDavid_Lee8 years agoFormer Employee100Views0likes11CommentsNo WMI data is being collected from <host>
I get an error in LM on a few of our hosts: Quote No WMI data is being collected from <host>. This started at <time>. This means the LogicMonitor agent does not have permissions to collect data from <host>, or the traffic is being blocked. If the host is in a domain, ensure the LogicMonitor agent service is running as a domain account that has local administrator privileges on the host, or running as LocalSystem on a domain controller. Ensure there are no firewalls preventing the agent from accessing the host. These hosts don't really have anything particularly different on them - i have run through the WMI guide here : https://www.logicmonitor.com/support/monitoring/os-virtualization/troubleshooting-wmi/ After going to LM Tech support (who are very helpful) they said that the only solution they can give is to disable UAC. This will not work for us as this will introduce a security issue on those devices. Other machines in very similar configurations do not have this issue, so it seems odd that particular machines present this behavior even though others have UAC on as well. Can further investigation be done in this area to diagnose the fault specifically, and a different solution than UAC deactivation be provided. Thanks99Views0likes0CommentsCommon issues : High CPU usage on the Collector
This article provides information on High CPU usage on the Collector . (1) General Best Practices (a) First and foremost we advise our customers to be on latest General Release Collectors (unlessadvised not to) . Further information all the Collector information could be retrieved on the link below : https://www.logicmonitor.com/support/settings/collectors/collector-versions/ Also on the release notes of each newer Collector version we will indicate if we have fixed any known issues : https://www.logicmonitor.com/releasenotes/ (b) Please also view our Collector Capacity guide to get a full overview on how to optimise the Collector Performances : https://www.logicmonitor.com/support/settings/collectors/collector-capacity/ (c) When providing information on High CPU usage it would be useful if you can advise if the High CPU usage is all the time or a certain timeframe only (also if any environmental changes were done on physical machine that may have triggered this issue). Please do advise also if this occurred after adding newer devices on the collector or if this issue occurs after applying a certain version of the Collector. (2) Common Issues On this topic i will go through some of the common issues which have been fixed or worked upon by our Development Teams : (A) Check if the CPUis used by the Collector (JavaProcess) or SBproxy or other processes. (i) To monitor Collector Java Process : Use thedatasource Collector JVM status to check the Collector (Java process) CPU usage (as shown below). (ii) To monitor the SBProxy usage : We can use the datasource :WinProcessStats.xml (for Windows collector/ For Linux data source (this datasource is still being developed) . (B) If the high CPU usage is causedby the Collector Java processes, below are some of the common causes : (i)Collector java process using high CPU How confirm if this the similar issue : In the Collector Wrapper Logs you are able to view this error message : In our Collector wrapper.log, you can see a lot of logs like the below: DataQueueConsumers$DataQueueConsumer.run:338]Un-expected exception - Must be BUG, fix this, CONTEXT=, EXCEPTION=The third long is not valid version - 0 java.lang.IllegalArgumentException: The third long is not valid version - 0 at com.santaba.agent.reporter2.queue.QueueItem$Header.deserialize(QueueItem.java:66) at com.santaba.agent.reporter2.queue.impl.QueueItemSerializer.head(QueueItemSerializer.java:35) This issue has been in Collector version EA 23.200 (ii)CPU load spikes on Linux Collectors As shown in the image below the CPU usage of Collector Java process has aperiodicCPU spike (on an hourly basis) . This issue has been fixed on Collector version EA 23.026 (iii)Excessive CPU usagedespitenot having any devices running on it In the collector wrapper.log, you can see similar logs as below : [04-11 10:32:20.653 EDT] [MSG] [WARN] [pool-20-thread-1::sse.scheduler:sse.scheduler] [SSEChunkConnector.getStreamData:87] Failed to get SSEStreamData, CONTEXT=current=1491921140649(ms), timeout=10000, timeUnit=MILLISECONDS, EXCEPTION=null java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.logicmonitor.common.sse.connector.sseconnector.SSEChunkConnector.getStreamData(SSEChunkConnector.java:84) at com.logicmonitor.common.sse.processor.ProcessWrapper.doHandshaking(ProcessWrapper.java:326) at com.logicmonitor.common.sse.processor.ProcessorDb._addProcessWrapper(ProcessorDb.java:177) at com.logicmonitor.common.sse.processor.ProcessorDb.nextReadyProcessor(ProcessorDb.java:110) at com.logicmonitor.common.sse.scheduler.TaskScheduler$ScheduleTask.run(TaskScheduler.java:181) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) This issue has been fixed on EA 24.085 (iv)SSE process stdout and stderr stream not consumed in Windows Please note this issue occurs on only on Windows Collectors and the CPU usage of the Windows operating system has a stair-step shape as shown below. This has been fixed in Collector EA 23.076 (v)Collector goes down intermittently on daily basis In the Collector wrapper.logs, you can see similar log lines : [12-21 13:10:48.661 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4741] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354646203, stack= Thread-40 BLOCKED java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info(LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.333) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) [12-21 13:11:16.597 PST] [MSG] [INFO] [pool-60-thread-1::heartbeat:check:4742] [Heartbeater._printStackTrace:265] Dumping HeartBeatTask stack, CONTEXT=startedAt=1482354647068, stack= Thread-46 RUNNABLE java.io.PrintStream.println (PrintStream.java.805) com.santaba.common.logger.Logger2$1.print (Logger2.java.65) com.santaba.common.logger.Logger2._log (Logger2.java.380) com.santaba.common.logger.Logger2._mesg (Logger2.java.284) com.santaba.common.logger.LogMsg.info(LogMsg.java.15) com.santaba.agent.util.Heartbeater$HeartBeatTask._run (Heartbeater.java.320) com.santaba.agent.util.Heartbeater$HeartBeatTask.run (Heartbeater.java.311) java.util.concurrent.Executors$RunnableAdapter.call (Executors.java.511) java.util.concurrent.FutureTask.run (FutureTask.java.266) gobler terminated ERROR 5296 java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java.1142) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java.617) java.lang.Thread.run (Thread.java.745) This issue has now been fixed in Collector EA 22.228 (C) High CPU usage caused by SBProxy (i) CollectorCPU spikes until 99% The poor performance of WMIor PDH data collectionon some cases will cause too many retries will occurand this consumes a lot of CPU. In the collector sbproxy.log, you can search the log string as shown below and you can see the retry times is nearly 100 per request and subsequentlythis will consume a lot of CPU. ,retry: This is being investigated by our development team at this time and will be fixed in the near future . (3) Steps to take when facing high CPU usage for Collector (i) Ensure the collector has been added as a device and enabled for monitoring : https://www.logicmonitor.com/support/settings/collectors/monitoring-your-collector/ There are set of New Datasources for the Collector (LogicMonitor Collector Monitoring Suite- 24 DataSources) which as shown below and please ensure they have been updated in your portal and applied to your Collectorsand also ensure the Linux CPU orWindows CPU datasources have been applied to the Collector : (ii)Record a JFR (java flying record) in debug command window of the Collector : this can done through this method : // unlock commercial feature !jcmd unlockCommercialFeatures // start a jfr , in real troubleshooting case, should increase the duration a reasonable value. !jcmd duration=1m delay=5s filename=test.jfr name=testjfr jfrStart // stop a jfr !jcmd name=testjfr jfrStop // upload the jfr record !uploadlog test.jfr (iii) Upload the Collector Logs : From the Manage dialog you can send your logs to LogicMonitor support. Select the manage gear icon for the desired collector and then select 'Send logs to LogicMonitor': Credits: LogicMonitorCollector development team for providing valuable input in order to publish this article .99Views0likes0Commentscustom speed for interfaces
In some cases, it is important to be able to alter the speed of an interface (speed up or down or both) to accurately reflect the connected device speed (which itself may be opaque, like a cable modem that runs at 20/1 on a gigabit port). I asked this a while back from TS and was told "just change the interface", but when I explained how this is not always feasible (e.g., Cisco ASA), they recommended I open a feature request. So, here it is -- I would like to be able to set speed up and speed down on an interface, overriding what comes back from ifSpeed. If I can do this another way (via cloned DS) ,that could work, but I would like to make this as simple as possible since it comes up quite a lot for Internet-edge equipment. Thanks, Markmnagel9 years agoProfessor99Views0likes29CommentsAlert Can't connect https, but can curl from collector
Got an issue where getting a status "5", can't connect on https-443 to a web server on AWS EC2. But can connect to httpssite from public browser and using curl from the collector, without issue. Running diagnostics through LM shows a timeout. The "Poll Now" data looks fine except Normal Datapoints shows errors. These errors cannot be confirmed.86Views0likes6CommentsGetting Datapoint data via REST API
I've been trying to get data for individual datapoints within a single instance datasource via the REST API, in this case there are ~twenty datapoints and I only want data for one of them. Using get_data:https://www.logicmonitor.com/support/rest-api-developers-guide/data/get-data/ I can pick out my datasource, however I can't find any information on filtering by datapoint. When I get data, the datapoints are all listed in an array, then the data is presented as an array of arrays for the last hour...Icould get the position of the datapoint in the first array, then pick out the data using that position for each array listed in the data but that feels like a bad road to travel. query is essentially: "resourcePath = '/device/devices/73023/devicedatasources/1573061/instances/1928726/data'" Right now I'm adding the datapoints I want to a custom graph, I can then pull out the data for that custom graph e.g. "resourcePath = '/device/devices/73023/devicedatasources/1573061/instances/1928726/graphs/3133/data'" (with multiple lines I can pick specific data by checking the line label). Although this works fine, it still feel like an additional step. What I'd really like is: "resourcePath = '/device/devices/73023/devicedatasources/1573061/instances/1928726/ [datapoint/datapointID] /data'" but I can't see an ID for datapoints or an option for putting a 'name' in. Have I missed something? Does anyone already do this in a simple way?Solved66Views0likes11Comments497 days and counting........
You might have received an alert saying your linux based device has just rebooted, but you know that it has been up a long time. A switch might have just sent an alert for every interface flapping when they have all been up solidly. The important question to ask here is how long has the device been up? If its been up for 497 days,994 days,1491 days or any multiple of 497 then you are seeing the 497 day bug, that hits almost every linux based device that is up for a good length of time. Anything using a kernel less than 2.6 computes the system uptime based on the internaljiffies counter, which counts the time since boot in units of 10 milliseconds, or jiffies. This counter is a 32-bit counter, which has a maximum value of 2^32, or 4,294,967,296. When the counter reaches this value (after 497 days, 2 hours, 27 minutes, and 53 seconds, or approximately 16 months), it wraps back around to zero and continues to increment. This can result in alerts about reboots that didn’t happen and cause switches to report a flap on all interfaces. Systems that use 2.6 Kernel and properly supply a 64 bit counter will still alert incorrectly when the 64 bit counter wraps. A 32 bit counter can hold4,294,967,295( /4,294,967,295864000/8640000 = 497.1 days) A 64 bit counter can hold18,446,744,073,709,551,615 . (18,446,744,073,709,551,615/8640000 =2135039823346 days or 5849424173 years) Though I expect in 6,000 million years we will all have other things to worry over.David_Lee8 years agoFormer Employee66Views0likes5CommentsPSA: Collect from windows systems without admin rights
Don't know if anyone else noticed, but MS released a pretty slick script that enables WMI access remotely without admin rights. I have done a brief test with LM and it seems to be working well. https://blogs.technet.microsoft.com/askpfeplat/2018/04/30/delegate-wmi-access-to-domain-controllers/ That's the article. I created an AD group instead of a user to delegate, and I put the LM collector service in that group. Everything else I've followed as documented. I haven't tested anything else, but this alone is a huge step in the right direction.61Views3likes7Comments