Fail Over Service
Not to be confused with the FOS feature of AMC v2, this is an example Config that uses SNMP to
set up fail-over handling that can be included in existing Configs.
When you unzip the enclosed package, you will need to open the Config(s) --
"FOS.xml" is the main one, but there are two test client Configs
as well -- and then edit the External Properties (ExtProps) to point
to the correct filepath for the "FOS.properties" file. Both the FOS
Host and Clients need these property values. You also have to
correct some other filepaths in the ExtProps (explained below).
Also, I am an SNMP newbie. It may be that I am misuing one or more parameters when I make the script call to send the SNMP trap. The solution still works, but if you see any unnecessary or incorrect coding here, PLEASE LET ME KNOW. [Thanks -Eddie].
The files include :
- 3 x GIFs used by the fosWebMonitor AL (simple web server)
- FOS.properties - External properties.
- FOS_Execute.bat - Batch file that is set up to run if AL dies note that you configure this in Ext Props.
- TestClient.xml - test client Config #1
- TestClient2.xml - test client Config #2
You use FOS by starting a TDI Server and pointing it at FOS.xml:
ibmdisrv -c examples/_merglets/fos_6.0/fos.xml %1 %2 %3 %4 %5 %6 %7 %8 %9
You can see that I have an
FOS_6.0 sub-dir, under
examples/_merglets
in my TDI solutions directory. You'll adjust this to fit your needs.
Then you drop the fosClientHeartbeat FC in the AL's you want
to monitor (as shown in the test client Configs). You do this by
creating a new
Include (XML type) that points to the FOS.xml
Config. Then either create a Library FC that inherits from the
included fosClientHeartbeat component, or you can add them
directly to your ALs.
You will also have to set up Cloudscape for networked mode
operation. This is done by editing the solution.properties file
in your TDI solution-directory (sol-dir), commenting out the embedded mode
lines and uncommenting the networked mode lines.
Check out the
Cloudscape/System Store cookbook for more info on this.
The ALs included in FOS.xml are:
- fosWebMonitor - web-based monitor (very simple) for FOS. Services port 80, so "http://localhost" to test. Was planning to go for XML/XSLT, but chickened out and did it simply by scripting the HTML.
- fosHostHeartbeatListener - Heartbeat listener process
- fosHostCheckForTimeouts - Checks Heartbeat Journal for deadbeats, and then calls fosHostHandleDeadClient as needed
- fosHostHandleDeadClient - AL to handle dead clients (prescribed actions as defined in ExtProps.
And here is a description of the ExtProps:
All props that start with FOS_Client are specific to the client.
They apply to all ALs in this Config, but can be tied to a
specific AL by appending "@" and the AL name (as shown
below - e.g. FOS_ClientAction@A_test )
Log heartbeat messages to client log
FOS_ClientLogHeartbeats:true
Send heartbeat at AL initialization. This has been disabled,
as it caused AL startup problems once in a while for me.
FOS_ClientHeartbeatAtInit:true
Send a heartbeat when AL terminates normally.
FOS_ClientCloseAtShutdown:true
Comma separated list of Client actions. See the Info tab of
the fosClientHeartbeat FC for more details. You have two
types of actions:
- mailto: sends messages defined in other ExtProp to this address
- execute: shells out and executes the FOS_ExecuteCommand batch-file/script with the value specified below as parameters
FOS_ClientAction:mailto:edbird@mac.com,execute:"eddie"
Action tied specifically to A_test AL. This overrides the previous action ExtProp.
FOS_ClientAction@A_test:mailto:edbird@mac.com,execute:"A_test60"
Message(s) to send by the
mailto: action
FOS_ClientMessage:Eddie is now out of action,Call out the guards!
FOS_ClientMessage@A_test:Special message for A_test60
Name of (all) clients. Since this is not tied to an AL, it serves as a user-
specified name for the TDI Server running this Config
FOS_ClientName:eddie
FOS_ClientName@A_test:My Test AL
Client pulse (how often the FC should send a heartbeat). A_test AL has its
own pulse.
FOS_ClientPulse:4
FOS_ClientPulse@A_test:2
Timeout before client declared dead by FOS Host
FOS_ClientTimeout@Another_test:20
FOS_ClientTimeout:15
Command to execute (needed quotes here for Windows) on deceased client.
FOS_ExecuteCommand:"C:/Documents and Settings/NO010196/My Documents/TDI/examples/_Merglets/FOS_6.0/FOS_Execute.bat"
Name of Heartbeat Journal db in the System Store (Cloudscape),
plus auth params.
FOS_HeartbeatJournalName:HeartbeatJournal
FOS_LoginId:APP
FOS_LoginPwd:APP
SNMP OID for heartbeats, port and Host URL (Clients and FOS Host need these)
FOS_HeartbeatOID:1.1.1.1.1.1.1.3.0
FOS_HeartbeatPort:3609
FOS_HostURL:localhost
Location for HTML files (fosWebMonitor) and GIF images
FOS_HTMLPath:C:/Documents and Settings/NO010196/My Documents/TDI/examples/_Merglets/FOS_6.0/
How oftent he fosWebMonitor refreshes the browser.
FOS_WebClientRefresh:3
That should about do it.
NOTE:
If you get this error the first time you start FOS, or whenever you've
deleted the Cloudscape sub-directory:
com.ibm.db2.jcc.a.SQLException: Table/View 'HEARTBEATJOURNAL' already exists in Schema 'APP'.
It's because all the ALs are trying to create the table at the same
time. Ways around this is to add
system.sleep(nSeconds) to the
AL Prolog - Before Init Hook of two of the ALs. Once you get this,
you need to stop and restart the Server running FOS.xml, since
one of the processes was unable to initialize.
You could also have three AutoStart ALs that simply had AL FCs
to launch the three worker ALs, and with an EternalIterator, so when
an AL stopped, it was started again on the next cycle. Be sure to use
the AL FC run mode "Run and wait for result".
Here is the zip file itself. The .doc file is out-of-date, although the message format information is correct. I hope to get some time to address this. Don't let it confuse you -- the stuff written here take precedence.
Note also that a simpler approach can be found in the
Real World DI site.
--
EddieHartman - 19 Mar 2006