Error Handling
- Definitions
- Exit signals are sent when processes crash
- Exit Signals propagate through Links
- Processes can trap exit signals
- Complex Exit signal Propagation
- Robust Systems can be made by Layering
- Primitives For Exit Signal Handling
- A Robust Server
- Allocator with Error Recovery
- Allocator Utilities
Definitions
- Link A bi-directional propagation path for exit signals.
- Exit Signal - Transmit process termination information.
- Error trapping - The ability of a process to process exit signals as if they were messages.
Exit Signals are Sent when Processes Crash
When a process crashes (e.g. failure of a BIF or a pattern match) Exit Signals are sent to all processes to which the failing process is currently linked.

Exit Signals propagate through Links
Suppose we have a number of processes which are linked together, as in the following diagram. Process A is linked to B, B is linked to C (The links are shown by the arrows).Now suppose process A fails - exit signals start to propogate through the links:

These exit signals eventuall reach all the processes which are linked together.
The rule for propagating errors is: If the process which receives an exit signal, caused by an error, is not trapping exits then the process dies and sends exit signals to all its linked processes.
Processes can trap exit signals
In the following diagram P1 is linked to P2 and P2 is linked to P3. An error occurs in P1 - the error propagates to P2. P2 traps the error and the error is not propagated to P3.

P2 has the following code:
receive
{'EXIT', P1, Why} ->
... exit signals ...
{P3, Msg} ->
... normal messages ...
end
Complex Exit signal Propagation
Suppose we have the following set of processes and links:

The process marked with a double ring is an error trapping process.

If an error occurs in any of A, B, or C then All of these process will die (through propagation of errors). Process D will be unaffected.
Exit Signal Propagation Semantics
- When a process terminates it sends an exit signal, either normal or non-normal, to the processes in its link set.
- A process which is not trapping exit signals (a normal process) dies if it receives a non-normal exit signal. When it dies it sends a non-normal exit signal to the processes in its link set.
- A process which is trapping exit signals converts all incoming exit signals to conventional messages which it can receive in a receive statement.
- Errors in BIFs or pattern matching errors send automatic exit signals to the link set of the process where the error occured.
Robust Systems can be made by Layering
By building a system in layers we can make a robust system. Level1 traps and corrects errors occuring in Level2. Level2 traps and corrects errors ocuring in the application level.In a well designed system we can arrange that application programers will not have to write any error handling code since all error handling is isolated to deper levels in the system.

Primitives For Exit Signal Handling
- link(Pid) - Set a bi-directional link between the current process and the process Pid
- process_flag(trap_exit, true) - Set the current process to convert exit signals to exit messages, these messages can then be received in a normal receive statement.
- exit(Reason) - Terminates the process and generates an exit signal where the process termination information is Reason.
The receive .. end construct attempts to remove messages from the mailbox of the current process. Exit signals which arrive at a process either cause the process to crash (if the process is not trapping exit signals) or are treated as normal messages and placed in the process mailbox (if the process is trapping exit signals). Exit signals are sent implicitly (as a result of evaluating a BIF with incorrect arguments) or explicitly (using exit(Pid, Reason), or exit(Reason) ).
If Reason is the atom normal - the receiving process ignores the signal (if it is not trapping exits). When a process terminates without an error it sends normal exit signals to all linked processes. Don't say you didn't ask!
A Robust Server
The following server assumes that a client process will send an alloc message to allocate a resource and then send a release message to deallocate the resource.This is unreliable - What happens if the client crashes before it sends the release message?
top(Free, Allocated) ->
receive
{Pid, alloc} ->
top_alloc(Free, Allocated, Pid);
{Pid ,{release, Resource}} ->
Allocated1 = delete({Resource,Pid}, Allocated),
top([Resource|Free], Allocated1)
end.
top_alloc([], Allocated, Pid) ->
Pid ! no,
top([], Allocated);
top_alloc([Resource|Free], Allocated, Pid) ->
Pid ! {yes, Resource},
top(Free, [{Resource,Pid}|Allocated]).
This is the top loop of an allocator with no
error recovery. Free is a list of unreserved
resources. Allocated is a list of pairs
{Resource, Pid} - showing which resource
has been allocated to which process.
Allocator with Error Recovery
The following is a reliable server. If a client craches after it has allocated a resource and before it has released the resource, then the server will automatically release the resource.The server is linked to the client during the time interval when the resource is allocted. If an exit message comes from the client during this time the resource is released.
top_recover_alloc([], Allocated, Pid) ->
Pid ! no,
top_recover([], Allocated);
top_recover_alloc([Resource|Free], Allocated, Pid) ->
%% No need to unlink.
Pid ! {yes, Resource},
link(Pid),
top_recover(Free, [{Resource,Pid}|Allocated]).
top_recover(Free, Allocated) ->
receive
{Pid , alloc} ->
top_recover_alloc(Free, Allocated, Pid);
{Pid, {release, Resource}} ->
unlink(Pid),
Allocated1 = delete({Resource, Pid}, Allocated),
top_recover([Resource|Free], Allocated1);
{'EXIT', Pid, Reason} ->
%% No need to unlink.
Resource = lookup(Pid, Allocated),
Allocated1 = delete({Resource, Pid}, Allocated),
top_recover([Resource|Free], Allocated1)
end.
Not done -- multiple allocation to same
process. i.e. before doing the unlink(Pid) we
should check to see that the process has not
allocated more than one device.
Allocator Utilities
delete(H, [H|T]) ->
T;
delete(X, [H|T]) ->
[H|delete(X, T)].
lookup(Pid, [{Resource,Pid}|_]) ->
Resource;
lookup(Pid, [_|Allocated]) ->
lookup(Pid, Allocated).
