SCOM 2012: SCUtils APC Monitoring

It is already a while ago when I found out that there is a free management pack from SCUtils which monitors APC UPS devices, the SCUtils APC Monitoring Management Pack.
When I wanted to test it, I realized that it was only available for SCOM 2012 R2. So I contacted the support and asked if they can also provide a SP1 version for me.
And they really did it and were very responsive – a big plus!

So I was able to implement it in my test environment and checked it out.
Here are my findings.

The management pack is well designed. The bundle consists of two MPs:

SCUtils.APC.UPS.mpb
SCUtils.APC.UPS.Dashboard.xml

It monitors APC UPS devices and APC EMUs (environmental monitoring unit). APC PDUs are not covered yet, but the support promised, that this will be added in the near future.
All discoveries run on a 4 hour schedule, the rules every 5 min and the monitors between 5 and 15 min. That is ok.

It creates all necessary views, including a Diagram View:

APC Folder

UPS Diagram View

With the UPS Dashboard you get a good overview of your APC environment.

UPS Dashboard

Monitors:

APC Monitors

All monitors are enabled by default, but there are also overrides, which disable some EMU monitors:

APC Overrides

Rules:

APC Rules

Only one rule is disabled by default.

The MP has successfully detected the low battery runtime (8 min) and you can see that the Description, Path, Source is always very descriptive.APC Alert

They also added some nice reports:

APC Reports

So from what I see, it has all you need to monitor APC UPS devices. SCUtils promised to create a documentation for that MP bundle soon, but there is not really a lot you need to do to implement the management pack. The only thing is that you add the APC devices through the Network Monitoring to your environment and import the MPs. That’s it.
Very easy. And it is free at the moment.
I will only wait for the PDU monitoring to be added, then it will have all I want.

Information: I have created the Monitor, Rule and Report-Overview with MP Studio

Update: The APC PDU monitoring packs has been released. Here is my review.

 

 

 

Advertisements

PowerShell: Temperature monitoring

If you want to monitor the temperature of your server rooms, then you have a lot of options. One is a temperature module, which is directly connected to your network and where you can access the temperature value through a XML file like: http://moduleIP/state.xml.

state.xml

We have used a solution from ControlByWeb, a PoE module with one sensor.

The idea is to have a System Center Orchestrator runbook, which checks the temperature of all sensors and creates a SCOM alert when the temperature is higher than the threshold of 30°C.

CheckTemp.xps.1

Then we also wanted to have a view directly in SCOM with the current values for all sensors. I used the PowerShell Web Widget for this.

TempSensorSCOM

The main part for all of this is a PowerShell script.

You can even use parts of the script and collect the data in SCOM.

Graph

But herefore you will need one rule for each sensor.

Functionality description:

The script reads a text file from a share with all IP addresses and names of the temperature modules.
Example:
192.168.10.110, Frankfurt
192.168.10.111, Paris

Then it connects to each module, loads the state.xml and reads the value of the first sensor.
With that data it creates an HTML table and writes that to a HTML file in a share on a web server.
The last step is that it can load the web page in the PowerShell Web Widget.

You can download the script on TechNet Gallery.

 

 

 

SCOM 2012: Detect Event Storm

System Center Operations Manager collects a lot of events but one System with a flapping service can cause SCOM to be flooded by events – an Event Storm. Operations Manager does not recognize this until the database is too full which causes performance issues or even greyed out management servers because they cannot proceed the data anymore.

It is important to avoid that Situation. There is one easy solution: a Monitor based on a PowerShell script which checks the number of events written to the database in a predefined schedule. If the number of events is higher than a given threshold an alert is created which shows the top 5 machines creating events. This makes it easy to find the cause of the problem. 

 
I have mentioned this situation in my presentation “Getting The Most From Operation Manager” at MMS 2015.

You can download the solution here. It also includes the rule to check greyed out agents.

A big thank to Thomas Peter from Vaserv EU who helped with this solution.

SCOM 2012: SquaredUp SQL Query Dashboards

After SCU Europe 2015 I finally made it to install the latest version of Squared Up (2.1.x) in my development environment. There are already a lot great posts around about SquaredUp. Tao Yang is leading on that at the moment.

I really must say, I am amazed about how the product improved within the last year. It is very easy now to import/export dashboards, customize existing ones and create new rich dashboards. And there is a lot more to come, I already received some information about the next version ;-).

This week I played around with the SQL Query PlugIn and created two dashboards for my environment, which I want to share here.

  1. OpsManager Settings:
    This dashboard shows the current database usage and the grooming/retention settings of the OpsManager database and the OpsManagerDW.
    OpsMgr settings
  2. Last Month:
    This dashboard shows the top 20 alerts and the number of alerts by severity, both from the last month.
    Last Month

I have taken this information in the past from SCOM reports, with SquaredUp you can see the data very fast and share that information with other teams easily.

You can download the dashboard templates here:
Last Month
OpsManager settings

Please be aware, that you need to change the database connection string in your environment.

Microsoft System Center Reporting Cookbook available soon

A new System Center book is on the horizon which covers the very important reporting topic. It will be published Friday 27th. You can find the link to the book and more information about it on the blog of Steve Buchanan, MVP and technical reviewer of the book.

Why is this book special?

Reporting is essential in the System Center world. What is for example Sccm without patch compliance reports? But where can you find good information about how to design System Center reports besides searching the web? This book gives you guidance with easy to follow recipes and a lot of useful information about setup, report design and other options besides SSRS like PowerPivot.

A big thank from me goes to Sam Erskine, one of the authors, who had the idea for the book. He managed the publication from the beginning to the end and it is really his baby. He made it possible that I was a technical reviewer of this book, that I saw how it grew and I am proud as a nurse which helped to bring a baby to live, that I had a small part in it.

So buy it, read it and share it ;-).

SCOM 2012: SC Orchestrator Additions Management Pack

There are already some management packs available, to monitor System Center Orchestrator 2012 with System Center Operations Manager 2012:

I am missing in those for example the monitoring of the Orchestrator database. After I wrote my last post about the Policy_Publish_Queue filling up in Orchestrator, I decided to create a mangement pack to monitor that and also added some tasks I thought that they could be useful.

You can find the management pack here

I would be glad about any comment or improvement idea.

 

SCOM 2012: Check greyed out agents

Greyed out agents are can be a nightmare for a System Center Operations Manager admin. An agent gets greyed out if the Health Service is not communicating correctly with the Management Servers. Normally an alert should be created with the name “Health Service Heartbeat Failure” which indicates this status. But sometimes I see the situation that the alert was created, but also auto-resolved by the system after a while (because of an agent recovery etc.). The problem then is if the agent still stays in an unhealthy state but no new alert gets created. I see that from time to time if the agent is stuck or has resource problems. This situation needs to be solved quickly because during that time no monitoring on the agent side takes place.

So how can this be resolved?

I implemented this solution: The management servers already know which agents are greyed out, so I have created a rule which runs on the “All Management Servers Resource Pool” every 5 min (you can select another interval if you like). It checks which agents are greyed out but are not in maintenance mode and then checks for each agent if there is an open “Health Service Heartbeat Failure” alert. It adds the server to a list which will be populated in one alert with the name “Sample – greyed out agents”, if no alert was found.

The main logic of the rule bases on a Powershell script. Here is the part, with the logic – I have skipped everything around it (log function, SCOM module, etc.).

$TotalCount=0
$list=””
$agentclass = Get-SCOMClass -Name “Microsoft.SystemCenter.Agent”
# Find greyed out agents which are not in maintenance mode
$agentobjects = Get-SCOMMonitoringObject -Class:$agentclass | Where-Object {($_.IsAvailable -eq $false) -and ($_.InMaintenanceMode -eq $False)}
if ($agentobjects -is [Object])
{
    $msg = “`r`nFound greyed out agents which are not in maintenance mode.”;
    Log -msg $msg -debug $debug -debugLog $debugLog;
    # Go through agent list
    foreach ($agent in $agentobjects) 
   {
       $msg =  “`r`n”+ $agent.displayname
       Log -msg $msg -debug $debug -debugLog $debugLog;
       #Go on if watcher state for the agent is unhealthy
       if((Get-SCOMClass -name “Microsoft.SystemCenter.HealthServiceWatcher”| get-scomclassinstance |  Where-Object {$_.Displayname   -eq $agent.DisplayName}).HealthState -ne ‘Success’)
       {
           # Find open Health Service Heartbeat Failure alert for the agent
           $alert=get-scomalert -name ‘Health Service Heartbeat Failure’ | where {($_.ResolutionState -ne 255) -and ($_.MonitoringObjectDisplayName -eq $agent.DisplayName)}
           # No alert for greyed out agent found
           if ($alert -isnot [Object])
           {
               $list+=”`r`n”+$agent.displayname
               $msg=”`r`nThe agent “+ $agent.displayname + ” has no open Health Service Heartbeat Failure alerts. Add to list.”
               Log -msg $msg -debug $debug -debugLog $debugLog;
               $Totalcount++
           }
       }
   } 
}

You can find the rule in a small management pack called Sample.BaseMonitoring, which you can download here.
It is designed for SCOM 2012 SP1. Please test it in your development environment before you add it to production!