Neophyte

6 hours ago

Assistance Request to Resolve "No Data" Error on Devices Monitored via Batchscript in Collectors

Dear Engineers,
I am reaching out for your assistance to resolve an issue affecting multiple devices in our LogicMonitor portal (logicmonitor.com), where several datapoints are displaying a "No Data" message. Upon review, we noticed that tasks executed under “batchscript” are returning the value "False," indicating that despite various adjustments and recommendations, the Collectors are not successfully executing all tasks related to “batchscript,” which affects the retrieval of information for reports, graphical event analysis, alerts, etc.

We have examined the associated graphs, specifically the LogicMonitor_Collector_DataCollectingTasks-batchscript graphs for the five (5) assigned Collectors, and observed high rates of unexecuted tasks. Below are the actions taken so far:

Actions Taken:

- Adjusted properties for prioritization on the collectors:

Property	Suggested Value	Benefit
collector.threads.poolsize	100	Improves concurrency for general tasks.
collector.threadpool.size	50	A larger pool allows the collector to handle more tasks simultaneously.
collector.cache.max_objects	15000-20000	Increases temporary cache storage capacity.
collector.enable_batching	true	Optimizes data transmission in batches.
collector.device.batch.size	30	Processes more devices at once, improving performance.
collector.batchscript.retry	3	Enhances fault tolerance for script execution.
collector.batchscript.timeout	30	A longer timeout ensures complex scripts can execute without interruptions.
collector.batchscript.pollinginterval	300	Balances data collection frequency and resource usage.
collector.batchscript.maxconcurrenttasks	10	Allows parallel execution of scripts across multiple devices.
collector.snmp.retries	3-4 attempts	Improves reliability in networks with packet loss.
collector.snmp.timeout	5 seconds	Increases stability for SNMP queries in slower networks.
collector.snmp.parallelrequests	10	Improves monitoring speed by processing more requests in parallel.
collector.snmp.fetchthreads	30	Enhances concurrency for SNMP data collection.
interface.snmp.method	getconcurrent	Speeds up SNMP data collection.
paloalto.threadpoolsize	10	Adjusts thread count when monitoring Palo Alto firewall aspects.
cisco.ipsec.aggtun.timeout	10	Tweaks monitoring behavior for IPSec tunnels.
cisco.ipsec.aggtun.threadpoolsize	10	A larger pool allows the collector to handle more tasks in parallel.

A larger pool allows the collector to handle more tasks in parallel.

- Constant review and updates of**: DataSource, EventSource, ConfigSource, PropertySource, AppliesTo, and SNMP SysOID Maps for each device brand.
- Adjustment of batchscript task processing for all collectors**, updating `collector.batchscript.threadpool` and `collector.batchscript.timeout` values:

- (1) Collector-Large: 127 Devices
- `collector.batchscript.threadpool=200`, `collector.batchscript.timeout=360`

- (1) Collector-Medium: 15 Devices
- `collector.batchscript.threadpool=180`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 8 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 14 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 20 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

Although these changes have partially improved the situation, the error persists in executing batchscript tasks for several devices. This suggests that the issue might be related to the collector size and the hardware resources allocated.

Since batchscript is the primary method used to gather data for most of our devices, we would appreciate your guidance on the following:

Key Questions:

What is the optimal resource allocation for each of the collectors** to prevent task execution errors? We are currently monitoring over 200 devices, with an annual growth rate of 15%.
Is there a formula or method to determine how many batchscript tasks are successfully executed versus failed attempts?** This would help us assign appropriate collector sizes and hardware resources to ensure tasks are completed successfully, considering the 15% growth rate.
We are monitoring devices under the following setup:

- CollectorOn-Premises1Site: 300+ devices
- CollectorOn-Premises2Site: 75+ devices
- Collector-AWS: 50+ devices
- Collector-GCP: 50+ devices
- Collector-Azure: 50+ devices

*How can we determine the appropriate collector size based on the number of devices to be monitored?** This will help us allocate hardware resources to integrate more devices into each collector.*I look forward to your comments and technical recommendations for optimizing the collectors.

Community Feedback

Forum Discussion

Assistance Request to Resolve "No Data" Error on Devices Monitored via Batchscript in Collectors

Actions Taken:

- Adjusted properties for prioritization on the collectors:

- (1) Collector-Large: 127 Devices
- `collector.batchscript.threadpool=200`, `collector.batchscript.timeout=360`

- (1) Collector-Medium: 15 Devices
- `collector.batchscript.threadpool=180`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 8 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 14 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 20 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

Although these changes have partially improved the situation, the error persists in executing batchscript tasks for several devices. This suggests that the issue might be related to the collector size and the hardware resources allocated.

Since batchscript is the primary method used to gather data for most of our devices, we would appreciate your guidance on the following:

Key Questions:

*What is the optimal resource allocation for each of the collectors** to prevent task execution errors? We are currently monitoring over 200 devices, with an annual growth rate of 15%.*

Is there a formula or method to determine how many batchscript tasks are successfully executed versus failed attempts?** This would help us assign appropriate collector sizes and hardware resources to ensure tasks are completed successfully, considering the 15% growth rate.

We are monitoring devices under the following setup:

- CollectorOn-Premises1Site: 300+ devices
- CollectorOn-Premises2Site: 75+ devices
- Collector-AWS: 50+ devices
- Collector-GCP: 50+ devices
- Collector-Azure: 50+ devices

*How can we determine the appropriate collector size based on the number of devices to be monitored?** This will help us allocate hardware resources to integrate more devices into each collector.*I look forward to your comments and technical recommendations for optimizing the collectors.

Related Content

Customizing HostStatus datasource to alert on "No Data" ???

"Operation not supported" when adding a data source.

Complex Groovy DataPoint access to Script/Batchscript output

invalid Output format - root element of output JSON should be "data"

LogicMonitor's new AI assistant called LM Co-Pilot!

Recent Discussions

Assistance Request to Resolve "No Data" Error on Devices Monitored via Batchscript in Collectors

Did the job board go away?

Issue Adding Tags?

Events page not showing any events

New Community partners at LogicMonitor

Forum Discussion

Assistance Request to Resolve "No Data" Error on Devices Monitored via Batchscript in Collectors

Actions Taken:

- Adjusted properties for prioritization on the collectors:

- (1) **Collector-Large**: 127 Devices- `collector.batchscript.threadpool=200`, `collector.batchscript.timeout=360`

- (1) **Collector-Medium**: 15 Devices- `collector.batchscript.threadpool=180`, `collector.batchscript.timeout=360`

- (1) **Collector-Small**: 8 Devices- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`- (1) **Collector-Small**: 14 Devices- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) **Collector-Small**: 20 Devices- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

Although these changes have partially improved the situation, the error persists in executing batchscript tasks for several devices. This suggests that the issue might be related to the **collector size** and the **hardware resources** allocated.

Since **batchscript** is the primary method used to gather data for most of our devices, we would appreciate your guidance on the following:

Key Questions:

What is the optimal resource allocation for each of the collectors** to prevent task execution errors? We are currently monitoring over 200 devices, with an annual growth rate of 15%.

Is there a formula or method to determine how many batchscript tasks are successfully executed versus failed attempts?** This would help us assign appropriate collector sizes and hardware resources to ensure tasks are completed successfully, considering the 15% growth rate.

We are monitoring devices under the following setup:

- **CollectorOn-Premises1Site**: 300+ devices- **CollectorOn-Premises2Site**: 75+ devices- **Collector-AWS**: 50+ devices- **Collector-GCP**: 50+ devices- **Collector-Azure**: 50+ devices

How can we determine the appropriate collector size based on the number of devices to be monitored?** This will help us allocate hardware resources to integrate more devices into each collector.I look forward to your comments and technical recommendations for optimizing the collectors.

- (1) Collector-Large: 127 Devices
- `collector.batchscript.threadpool=200`, `collector.batchscript.timeout=360`

- (1) Collector-Medium: 15 Devices
- `collector.batchscript.threadpool=180`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 8 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 14 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

- (1) Collector-Small: 20 Devices
- `collector.batchscript.threadpool=170`, `collector.batchscript.timeout=360`

Although these changes have partially improved the situation, the error persists in executing batchscript tasks for several devices. This suggests that the issue might be related to the collector size and the hardware resources allocated.

Since batchscript is the primary method used to gather data for most of our devices, we would appreciate your guidance on the following:

*What is the optimal resource allocation for each of the collectors** to prevent task execution errors? We are currently monitoring over 200 devices, with an annual growth rate of 15%.*

- CollectorOn-Premises1Site: 300+ devices
- CollectorOn-Premises2Site: 75+ devices
- Collector-AWS: 50+ devices
- Collector-GCP: 50+ devices
- Collector-Azure: 50+ devices

*How can we determine the appropriate collector size based on the number of devices to be monitored?** This will help us allocate hardware resources to integrate more devices into each collector.*I look forward to your comments and technical recommendations for optimizing the collectors.