| Automated Fault Detection/Recovery | The FAR Voice |
Constantly Monitors All Internal Functions ![]()
All complex computer systems experience problems of one type of another from time to time, both hardware and software. If the system's software does not acknowledge this fact and provide extensive and comprehensive trapping and recovery facilities, such faults can be irritatingly difficult to find (at best) and/or create major problems (at worst). FAR Voice systems have a rich set of automated fault detection, logging and automatic recovery functions.
| System ERR and LOG Files | top of page |
The FAR Voice software constantly monitors all internal functions. Any exception event that is deemed serious, is written to an appropriate log file (LOG extension). If an exception event occurs that is considered fatal, the event is logged to an error file (ERR extension). Thus simply looking for files with LOG or ERR extensions gives an instant scan of a system's health and function.
The Automated Alarm Function (see that section) can be set to automatically scan for the existence of specific ERR or LOG files, or for specific contents of such files (that is, specific strings within lines in such files), and upon their occurrence, trigger an alarm.
| The Master Watchdog | top of page |
Independent of the optional Automated Alarm Function, there is a system level 'Watchdog' thread that runs continuously during system operation to monitor each active telephone port.
The Watchdog is a program that runs at all times in a thread of the Master program. It watches each of the Phone programs that is processing a call. If an application fails to move from one action box to another in a reasonable time, Watchdog attempts to communicate with it. If the application appears to have become 'stuck,' Watchdog kills that process, resets the port hardware, and restarts the process. Thus application level errors cannot cause the system to 'freeze up.' If such an event does happen, a detailed log file is written, giving the date and time of the event, the phone port involved, the application involved, and the exact box in the application that last executed. The existence of this LOG file can cause the triggering of the Automated Alarm Function, if that option is installed.
| Automated Alarm Function | top of page |
The optional Automated Alarm Function monitors all system software and hardware for faults that are considered fatal or potentially fatal. The conditions that can trigger such an alarm are set by a configuration file that is tailored to the requirements of each system and can include all hardware and software functions as listed below.
- All designated control software.
- All voice ports (see the section on the Master Watchdog).
- The mirror disk system.
- The UPS system.
- All aspects of system function.
Examples of the types of events that can be automatically monitored are:
- If a specific ERR or LOG file, or a specific string in a specific file appears, an alarm is triggered.
- If a designated control program fails to reset a designated watchdog, an alarm is triggered.
- If an external contact closes or opens (intrusion alarm, temperature alarm, etc.), an alarm is triggered.
- On a fault tolerant chassis, if an internal event occurs (power supply failure, power supply out of tolerance, internal temperature alarm, fan rotation failure), an alarm is triggered.
- If a disk in the mirror disk sub-system has a major problem (unrecoverable hard error or disk crash), an alarm is triggered.
- If the UPS system experiences a major problem (for example, battery needs replacement, or battery exhausted), an alarm is triggered.
If an alarm condition exists the remote alarm software can respond in a number of ways.
- It can write a detailed log of the event.
- It can place a voice message in one or more system administrator voice mailboxes, whereupon the system calls those persons at designated numbers or pages them, or both, to deliver a message regarding the failure sensed.
- It can close a relay contact to activate an external alarm system.
- It can activate an external, independently powered, automatic dialer box to provide a fully documented alarm to the FAR Systems 24-7 Emergency Technical Services Hotline.
- In the unlikely event of an operating system crash, a hardware watchdog can automatically perform a hardware reset of the system, causing a reboot and automatic reload of the system.