Troubleshooting

The following guide will describe methods used to diagnose and resolve the most common operational issues that may occur.

System Status Indicator:

When the system encounters abnormalities, the status indicator will change to YELLOW or RED, depending on the severity of the issue. At this point system function may be degraded, or interrupted. There are steps that can be taken to troubleshoot the issue, to restore the system to normal functioning status.

status bar

Hover over the colored indicator, to see the cause of the status change.

status bar
  • Database

  • Disk

  • Queue

If you do not have command line access to the system, you can report the above findings to Fluency Support for assistance with resolution.

Comand Line Access

To troubleshoot the issue further, Command Line access is required.

The "smonitor status" command will output various aspects of the current system status to the command prompt.

smonitor status '<'option'>'

Queue Error

If the system is reporting a queue error, run the following command for detailed status:

smonitor status queue

The output should be similar to the following:

Status Monitor (vers. 2.3.6)
Host : fluency-appliance
Status : NORMAL
Time : Thu-Nov-08-2018-12:06:08
Queue Length:
---------------
FusionService GREEN (0)
LoaderService GREEN (0)
SummaryService GREEN (0)
HistogramInput GREEN (0)
EventService GREEN (0)

The queue error could be triggered be one or more of the queue shown above. It will be indicated by the color text shown next to the queue name above.

Excessive queue length are usually caused by unanticipated data input bandwidth, or internal service crash/stoppage. In the latter case, stopping and restarting the relevant service is likely to resolve the issue.

If the FusionService or EventService queues are showing YELLOW or RED, the problem is likely coming from the correlation engine:

To restart the service:

systemctl stop fusion_service
systemctl start fusion_service

If the LoaderService queue is showing YELLOW or RED, the problem is likely from the database loader:

To restart the service:

systemctl stop loader_service
systemctl start loader_service

If the SummaryService queue is showing YELLOW or RED, the problem is likely from summary engine:

To restart the service:

/etc/init.d/lava_service stop
/etc/init.d/lava_service start

If the HistogramInput queue is showing YELLOW or RED, the problem is likely from histogram engine:

To restart the service:

/etc/init.d/histogram_service stop
/etc/init.d/histogram_service start

Once the service has been restarted, wait 15-20 minutes, and then check the the queue length/status with the following command:

smonitor status queue

If restarting the relavent service and waiting does not appear to have any affect or if the queue length is still increasing, it is likely the system is simply overloaded. Make a note of the queue length, along with the action(s) that were performed, and contact Fluency Support.

Disk Error

Fluency will run periodic cronjobs in the background to remove old/expired data past the retention period. Disk errors are usually the result of misconfiguration.

If the system is reporting a disk error, run the following commands to examine the disk usage in detail:

smonitor status disk
df -h
du -hs /opt/sdo/*

Make a note of the resulting outputs, and provide that information to Fluency Support. You will be instructed on as to how to proceed.

Database Error

If the system status is reporting a Database error, you will likely require further assistance from Fluency Support.

Run the following commands to examine the status of the database in detail:

es health -v
pgrep java

Make a note of the resulting output and provide that information to Fluency Support. You will be instructed on how to proceed.