Forum Discussion

wfaulk's avatar
wfaulk
Icon for Neophyte rankNeophyte
2 days ago

Collecting a very large number of datapoints

I have a need to collect data about CPU P-levels in VMware hosts.

The way that VMware is structured in LogicMonitor relies on a vCenter server, and then all of its hosts are created in Active Discovery. There does not seem to be a way to create a datasource that uses those AD-found hosts as target resources.

So I have a script that hits my vCenter, loops through each of a few dozen hosts, each of which has around 80 CPUs, each of which has around 16 P-levels. When you multiply that all up, that's about 30,000 instances. The script runs in the debug environment in about 20 seconds and completes without error, but the output is truncated and ends with "(data truncated)".

When I try to "Test Active Discovery", it just spins and spins, never completing. I've waited up to about 20 minutes, I think.

It seems likely that this is too much data for LogicMonitor to deal with all at once. However, I don't seem to have the option to be more precise in my target for the script. It would make more logical sense to collapse some of these instances down to multiple datapoints in fewer instances, but there isn't a set number of P-levels per CPU, and there isn't a set number of CPUs per host, so I don't see any way to do that.

There doesn't seem to be any facility to collect this data in batches.

What can I do?

  • I just discovered that I can't even prematurely aggregate the data because it's a counter; I need the previous value to make any sense of the new value.

  • Are referring to P-states on the CPU which controls its running frequency? Would that need to be a separate datapoint for each state? I would think it would just be one PState datapoint that would store 0, 1, 2, etc. For 30 servers with 80 cores that would 2400 instances. Or if you create a device per host, would only be 80 instances.

    Otherwise it does sounds like you'll have to break it up into multiple devices unless you can accept some sort of summary collection. Even if you got this working, you might run into the limit of the metrics-per-device (see https://www.logicmonitor.com/support/logicmonitor-limits-quotas-and-constraints) limit which I haven't seen myself but this sounds like it would hit it.

    You said you need to collect data so I assume just setting up the Datasource to only create instances during an alert event is not enough. Otherwise you can have the AD script only create the instance when it's important but that also means you only can do that at 15-minute intervals and not really useful for graphing data.

    If you do need to collapse into multiple datapoints, you can just create as many as the most you need. So if there is 16 at most, create the 16 datapoints but if the CPU doesn't have that many you can just skip the ones that don't apply. They will report NaN and not even show in a graph.