Orchestrator 2012: Too much queued policy instances caused Orchestrator to slow down dramatically

Recently I had a situation with my System Center Orchestrator 2012 SP1 environment, where the Runbook Designer behaved strangely. I saw that when I started a runbook, it was not updating the log only the log history, when the runbook was finished. It also seemed to take longer than normal until the runbook was finished.
I started to check some things in my environment:

  • I checked the size of my database: with 2GB it was not too big
  • I checked the performance of my Management and Runbook servers. All looked normal.
  • I restarted the services. That did not help
  • I cleaned up some things in the DB => cleaned orphaned log entries from runbooks, deleted some old runbooks, which were not required anymore, purged the logs.
  • Then I checked the logging settings for all runbooks. With that I found one runbook, where the logging was enabled and it was currently running. But I could not stop it! It gave me an error like “Unable to un-deploy the runbook“. (sorry, I missed to create a screenshot of it 😉 ) I saw that the job history showed current entries and created always new ones. This runbook was invoked by another one, this invokation filled up the queue.

I searched around and found some SQL queries I could use to investigate more. So I logged on to the SQL server with the Orchestrator instance on it and ran the following query:

SELECT * FROM POLICY_PUBLISH_QUEUE

This gave me all instances of policies which were queued right now. And I had 350000 in there! That was the problem. I looked through the results and saw that most entries came from one policy/runbook. So I used this query to find more details about it:

SELECT POLICYINSTANCES.PolicyID ,POLICYINSTANCES.TimeStarted, POLICYINSTANCES.TimeEnded, POLICYINSTANCES.ProcessID, POLICYINSTANCES.SeqNumber, POLICIES.Name FROM POLICYINSTANCES INNER JOIN POLICIES ON POLICYINSTANCES.PolicyID = POLICIES.UniqueID WHERE POLICYINSTANCES.PolicyID = ‘PolicyID’

With that I could verify that it was the runbook, which was not stopping. So I used the next query to delete the entries from this policy out of the queue:

DELETE FROM [POLICY_PUBLISH_QUEUE] WHERE [PolicyID] =’PolicyID’

Now the queue only had 10 entries left in it :-).

I shrinked the database and checked the Orchestrator performance again and it was back to normal.

Wonderful!

Orchestrator 2012: SCOM activities are failing with error “Input string was not in a correct format”

I had recently a problem with my System Center Operations Manager 2012 (RTM) activities on my Orchestrator 2012 SP1 runbook servers.

All runbooks with SCOM activities failed. So I created a test runbook with only on SCOM activity (Get Monitor), enabled logging and checked what the error is.

The error text was: Failed to load the object properties. The exception was “Input string was not in a correct format.”.

SCOMActivityfailure

The web search did not help and I tried a lot: restarted server, redeployed SCOM integration pack, started SCOM console, which was working, tried with a new SCOM connection. Nothing helped.

So I opened a ticket with Microsoft support and they really helped very fast. Thanks!

The solution was this:

The problem was the Operations Manager Console cache, which was corrupted.

  1. To clean up this, recreate the SCOM connections with the same name.
  2. Start the SCOM console with the clear cache option: "C:\Program Files\System Center Operations Manager 2012\Console\Microsoft.EnterpriseManagement.Monitoring.Console.exe" /clearcache

 

Orchestrator 2012: Parallel reboot of server groups

As stated in my post “Orchestrator 2012: Patch a server with SCCM 2012” we had a request to reboot and patch groups of servers in parallel. The requirements were: Restart servers from different groups parallel, manual or scheduled start, do not go on with the rest of the servers in a group if one fails in this group.

How can we do that? First of all: use System Center Orchestrator 2012 – the automation tool from Microsoft.

Then I use SQL to provide the server names and store the status of the process.

I have a OrchestratorTemp database with two tables in there (plus the table described for the patching – see my blog):

ServerStatus:
ServerStatus

Service:
Service

The ServerStatus table has some entries already filled, when the runbook starts:
Servername, Grouping.

The Grouping had the values “OWA” and “General”. So servers of these two groups can be rebooted in parallel.

The start workflow looks like this:

startsite

It has to be started with the following parameter: Patch = Yes/No. This defines if patches should be applied or not.

If you need to schedule the reboots then you can add a schedule runbook in front of it which checks the date and initializes this runbook with the required start parameter.

It initiates the “Start groups” runbook and waits for completion. After the reboots it checks the patch status, checks the overall status and empties the tables (in the server status table it only deletes the fields which show the status).

Start Groups

This runbook enables the parallelity and can be extended with more groups.

startgroups
The “Get Server Groups” activity runs the following query: Select Distinct Grouping from dbo.ServerStatus.

The output will be used to start the “Control” runbook and fill the parameter “Grouping”. The parameter “Patch” is also provided to the sub runbook.

Control
control

This runbook helps to ensure that the rest of the servers in a group are skipped if one server fails.
It has a “Job concurrency” of the number of groups.

Details:

  • “Get failed Server in Group” : Select Servername from dbo.ServerStatus where Grouping =’%Grouping%’ AND Status = ‘Failed’
  • “Get next Server in Group” : Select Top 1 Servername from dbo.ServerStatus where Grouping =’%Grouping%’ AND Status is NULL.
  • When it found a server then it initiates the “Maintenance” runbook with the parameters: “Servername”,”Patch” and “Group”. It waits for completion.

Maintenance

This is the main reboot runbook. It could be split up to multiple sub runbooks, but I only took the patching part out of it. It can also be extended with pre or post activities to stop services or do other tasks around the reboot.
This runbook has a “Job concurrency” of the number of parallel groups.

maintenance

The easiest way to follow this workflow is to go straight from top to down. The enties in on the left and right are only for logging.

The main things this workflow is doing are: ping the computer, start SCOM maintenance mode, install patches, reboot, check netlogon service to see that the system is up, check patch status, check services and restart if necessary, check general service status, stop maintenance. Additionally it logs the status of the steps and sends out emails.

Here are the details of the non standard activities:

  • Get NetLogon Service Status: This is a “Run Command” activity running on the local on the runbook server. sc \\%Servername% query netlogon
  • Get Citrix Services: This is a “Run .Net” activity to get application specific services – here for Citrix. It runs a PowerShell script  and publishes the variable “output” :
    $output=@()
    $services = get-wmiobject win32_service -computername %Servername% | where {($_.displayname -like ‘*Citrix*’) -and ($_.Startmode -eq ‘Auto’)}
    foreach ($service in $services)
    {
    $output+=$service.displayname
    }
  • The Get-FQDN activity is described here.

Neil Peterson has written also a complex runbook about patching a Hyper-V cluster. He used some other methods to intialize the patching and presented the whole process on the MMS2013. You can get the details and watch the session here: http://blogs.technet.com/b/neilp/archive/2013/04/15/mms2013-session-now-on-channel-9-patching-a-hyper-v-cluster-with-orchestrator-configuration-manager-including-downloadable-runbook-exports.aspx

The Runbooks can be downloaded here.

Orchestrator 2012: Patch a server with SCCM 2012

You will perhaps have the question in your mind “Why initialize patching with Orchestrator?”.

We had the request to restart and patch servers on a reoccuring schedule in groups and with pre and post tasks to check. You can do that all in SCCM 2012 through tasks sequences, but Can you also control that SCCM should stop when one of the servers in the group fails and that you get a status at the end? Orchestrator can do that. It can run some general tasks for all servers or special tasks for single servers, so you can control more in there.

I will also create another blog post to describe the reboot runbooks. Here I want to focus on the patching part. This can also separately be initialized outside of the reboot process.

For our reboot szenario we only wanted to check which patches are available. Install them, reboot and after the reboot check which patches are installed successfully and if there are additional missing patches. We did not install those then. You could extend that as you need it.

We use System Center Orchestrator 2012 SP1. For my runbook I do not use the System Center Configuration Manager 2012 SP1 integration pack. I only use WMI queries to check which patches are available. But you still need SCCM 2012 to deploy the patches!

I use the following WMI classes:

CCM_SoftwareUpdate (http://msdn.microsoft.com/en-us/library/jj155451.aspx)
CCM_SoftwareUpdatesManager (http://msdn.microsoft.com/en-us/library/jj155384.aspx)
Win32_QuickFixEngineering (http://msdn.microsoft.com/en-us/library/windows/desktop/aa394391(v=vs.85).aspx)

We have one additional database in the same database instance as our Orchestrator database for logging. It is called OrchestratorTemp.

For this runbook we use a table called SoftwareUpdate to log the patch status.

softwareupdate

In the reboot runbooks we have another table which logs the general server status which also has columns Servername and RBInstance. With these both columns we later can link both tables and clean up the columns at the end of the process.

I use three runbooks to patch the server.

  1. SCCM Dev – Check updates
  2. SCCM Dev – Install updates
  3. SCCM Dev – Check previous updates

SCCM Dev – Check updates

sccm dev - check updates

It has the following initialize data parameters:

  • Servername
  • Patch (in the reboot runbook you can decide if you want to patch or not, Values: “True/False”)
  • Previous Found (needed for the second run after the reboot, should be “False” at the beginning)
  • RBInstance (reference to the main reboot runbook, can be any number if called outside)

I will focus on the interesting details of the main activies.

  • Get Updates/Check for additional updates (Run .Net Activity):
    Runs the following PowerShell script:
    getupdates
    and publishes the following data:
    getupdates-published
  • Write Updates/Write additional Update Status (Write To Database Activity):
    Writes into the OrchestratorTemp database:
    WriteUpdates
  • Install Update (Invoke Runbook): Initializes the “SCCM Dev – Install Update” runbook and waits for its completion. Loops until Finished=True. Given Parameters: Servername, RBinstance.
  • Check previous updates (Invoke Runbook): Initializes the “SCCM Dev – Check previous updates” runbook and waits for its completion. Given Parameters: Servername, RBinstance.

SCCM Dev – Install updates

sccm dev - install updates

The install updates will be initialized for each update which needs to be installed.

  • Get first missing update (Query Database Activity): Runs the following query:
    get first update
  • Install update (Run .Net Activity):
    Runs the following PowerShell script:
    install update
  • Check update (Run .Net Activity):
    Runs the following PowerShell script:
    check update
    and publishes the following data:
    check update - published
    Loops with a delay of 10 seconds and exits loop when these conditions occur:
    check update - loop
    (pattern: 8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23)
    => waits 2 minutes for the patch to install. Can be extended by increasing the number of attempts!
  • Cancel Update (Run .Net Activity):
    Runs the following PowerShell script:
    cancel update
  • The Write Update activities sets “ComplianceState” to 1 and the “EvaluationState” to the output status when the update was installed successfully. Otherwise it sets different “ComplianceStates” depending on the update status.

SCCM Dev – Check previous updates

sccm dev - check previous updates

This runbook should check if the update is listed in the installed updates after the reboot.

  • Get Compliance State (Query Database Activity): Runs the following query:
    get compliance state
  • Get ArticleID (Query Database Activity): Runs the following query:
    get articleID
  • Check install status (Run .Net Activity):
    Runs the following PowerShell script:
    Check install status
    and publishes the following data:
    Check install status - published
  • Write Update Compliance (Query Database Activity): Runs the following query:
    Write update compliance

Here is the link to the exported runbooks.

That’s it. Have fun!

Orchestrator 2012: Check SCCM maintenance window and set SCOM maintenance mode

Everyone who uses System Center Configuration Manager 2012 and System Center Operations Manager 2012 knows the problem of setting the server into maintenance mode when patching or software deployment needs to take place.

With System Center Orchestrator 2012 you get the integration packs for both systems and the option to create a workflow for this task. My intetion for this was to use the maintenance windows which are defined on the collections. During this timeframe software updates and deployments can be performed on the servers incl. reboots. So it would be good to set the servers into maintenance mode in SCOM. I only focussed on general maintenance mode windows not OSD ones and non recurring windows.

Here is the summary of the workflow I have created:
The workflow runs every 2 minutes. It reads a text file on the runbook server with all collection ids it should check, then checks if the collection has a maintenance window defined, that will start within the next 10-15 minutes. If yes, then it gets the collection members in SCCM, gets the FQDN for the server and starts the maintenance mode in SCOM. If successful it writes a log file otherwise it tries again to set the maintenance mode with the Netbios name.

Diagram:

set sccm maintenance window

Most of the parts are standard activities, so I only describe the “Get Maintenance Window” activity, which runs a PowerShell script on the Runbook server. This activity needs to run with a user that has SCCM permissions, otherwise it will provide no result. It only will have output data, if the maintenance window will occur within the next 10-15 minutes. So the link to the Get Collection Members activity should have the following include entry: Pure Output from Get Maintenance Window matches pattern .+

Here is the command line for the Get Maintenance Window activity:

cmd.exe /c | c:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe –c “function WMI-DateStringToDate($time) {  [System.Management.ManagementDateTimeconverter]::ToDateTime($time);};$collsettings = ([WMIClass] ‘\\SCCM Server FQDN\root\SMS\site_SCCMSiteCode:SMS_CollectionSettings’).CreateInstance();if($collsettings -is [Object]){$collsettings.CollectionID = ‘Link to Line Text of previous activity’;$collsettings.get();$windows=$collsettings.ServiceWindows;if ($windows -is [Object]){$now=Get-Date;Foreach ($window in $windows){$Time=WMI-DateStringToDate($window.StartTime);if (($window.IsEnabled -eq $True) -and ($window.ServiceWindowType -eq ‘1’) -and ($window.RecurrenceType -eq ‘1’)){if (($now.AddMinutes(15).compareto($Time) -eq ‘1’) -and ($now.AddMinutes(10).compareto($Time) -eq ‘-1’)){$Duration=$window.Duration+15;write-host ($Time.ToString(),$Duration) -separator ‘;’}}}}};”

Attention! The command line should not have line breaks! Otherwise it will not work within this activity.
For better readability I post the script here also with line breaks and comments:

param($SMSSiteCode, $SMSManagementServer, $COLLECTION_ID)
# convert WMI date to DateTime format
function WMI-DateStringToDate($time)
{ [System.Management.ManagementDateTimeconverter]::ToDateTime($time)}
# get collection settings (incl. Maintenance Windows)
$collsettings= ([WMIClass] \\$SMSManagementServer\root\SMS\site_$($SmsSiteCode):SMS_CollectionSettings).CreateInstance()
if($collsettings -is [Object])
{
$collsettings.CollectionID =$COLLECTION_ID
$collsettings.get()
$windows=$collsettings.ServiceWindows
if ($windows -is [Object])
{
$now=Get-Date
Foreach ($window in $windows)
{
$Time=WMI-DateStringToDate($window.StartTime)
# only check general maintenance and non recurring windows
if (($window.IsEnabled -eq$True) -and ($window.ServiceWindowType -eq‘1’) -and ($window.RecurrenceType -eq‘1’))
{
# check if starttime is within the next 10-15 min.
if (($now.AddMinutes(15).compareto($Time) -eq‘1’) -and ($now.AddMinutes(10).compareto($Time) -eq‘-1’))
{
# add 15 min to duration as buffer
$duration=$window.Duration+15;
write-host ($Time.ToString(),$Duration) -Separator ‘;’
}
}
}
}
}

Another thing to mention: Please add an exclude to the link between “Get Collection Member” and “Get FQDN” for your Management Servers: Member Name from Get Collection Member equals SCOMMGServerName.
Then they will not be set into maintenance mode if they are members of the checked collections.

Update

I found some problems with the daylight saving settings on the runbook server. We use UTC maintenance windows in SCCM. With daylight saving the local time of the runbook server gets adjusted but the maintenance window stays in standard UTC. The script compares the local time with the maintenance window. With the old version it sets the maintenance window at the wrong time when daylight saving is enabled.

Therefore I had to adjust the script. Here is the new version. The italic entries are new.

cmd.exe /c | c:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe –c “function WMI-DateStringToDate($time) {  [System.Management.ManagementDateTimeconverter]::ToDateTime($time);};$collsettings = ([WMIClass] ‘\\SCCM Server FQDN\root\SMS\site_SCCMSiteCode:SMS_CollectionSettings’).CreateInstance();if($collsettings -is [Object]){$collsettings.CollectionID = ‘Link to Line Text of previous activity’;$collsettings.get();$windows=$collsettings.ServiceWindows;if ($windows -is [Object]){$now=Get-Date;$universal=$now.ToUniversalTime().AddHours(([System.TimeZoneInfo]::Local).baseutcoffset.hours);$diff=($now.subtract($universal)).Hours;Foreach ($window in $windows){$Time=WMI-DateStringToDate($window.StartTime);if (($window.IsEnabled -eq $True) -and ($window.ServiceWindowType -eq ‘1’) -and ($window.RecurrenceType -eq ‘1’)){if (($now.AddMinutes(15).compareto($Time.AddHours($diff)) -eq ‘1’) -and ($now.AddMinutes(10).compareto($Time.AddHours($diff)) -eq ‘-1’)){$Duration=$window.Duration+15;write-host ($Time.ToString(),$Duration) -separator ‘;’}}}}}”

Here is the link to the runbook.

Orchestrator 2012: Reset SCOM 2012 monitor for closed alert

Everyone who works with System Center Operations Manager 2012 knows the problem of closed alerts where the monitor has not been reset first. The monitor will stay in the unhealthy state and no new alerts will be created anymore until the monitor gets reset.

You can create a scheduled task with a script on a management server or use Orchestrator for it. I found this blog which describes how to use the “Monitor alert” activity and then run a script afterwards. http://blog.scomfaq.ch/2012/05/05/reset-monitor-using-scom-2012-and-orchestrator-a-must-have-runbook/
I like the “Monitor alert” activity but I would like to reduce the number of scripts which connect to the management group.

So I have created another runbook.

resetmonitor

The first activity “Check every 5 min” triggers the runbook every 5 min. I think that is a good timeframe to check for closed alerts.

The next activity “Reset Monitor” runs on the Runbook server. It uses PowerShell and imports the SCOM 2012 module, so this must be installed on the Runbook Servers and the execution policy should be set to remotesigned.

Here are the details of the activity:

dotnet

$Alertname=@();
$State=@();
$Displayname=@();
# Import Operations Manager Module and create Connection
Import-Module OperationsManager;
New-SCOMManagementGroupConnection %ManagementServerName%;
$alerts=get-scomalert -Criteria “Severity!=0 AND IsMonitorAlert=1 AND ResolutionState=255″| where {$_.LastModified -ge ((get-date).AddMinutes(-5)).ToUniversalTime()}
if ($alerts -is [object])
{
foreach ($alert in $alerts)
{
$monitoringobject = Get-SCOMClassinstance -id $alert.MonitoringObjectId
# Reset Monitor
If (($monitoringobject.HealthState -eq ‘Error’) -or ($monitoringobject.HealthState -eq ‘Warning’))
{
$monitoringobject.ResetMonitoringState()
$State+=$monitoringobject.HealthState
$Displayname+=$monitoringobject.displayname
$Alertname+=$alert.Name
}
}
}

The script gets all closed alerts from monitors with severity ‘Warning’ or ‘Critical’ within the last 5 min and only resets the monitor if it is still in ‘Error’ or ‘Warning’ HealthState. You could use this script also for a scheduled task on a management server.

The published data is Alertname, State, Displayname, you could also publish other data, but that was what I needed for troubleshooting.

Orchestrator: Get FQDN activity

Sometimes you need to get the FQDN of a computer within a runbook for the following activity (example: SCOM – Start Maintenance Mode). Most activities provide only the Netbios name (example: Get Computer IP/Status).

I have a simple Run Program activity that utilizes PowerShell to get that information.

getfqdn

getfqdn-details

It runs on the computer for which you would like to get the FQDN.

Command:
cmd.exe /c | c:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe –c “[System.Net.Dns]::GetHostEntry(‘%Netbios computername from previous activity%‘).hostname”

In the following activity you only need to use the Pure Output from “Get FQDN”, which is now the FQDN of the computer.

You can also use the Run .Net  Script activity, that runs on the Runbook server. getfqdn 

Then you only need to select PowerShell as the script language and enter $FQDN=[System.Net.Dns]::GetHostEntry(‘%Netbios computername from previous activity%‘).hostname as script. Publish the variable FQDN in Published Data and you can use this variable in the next activity.