The Type System
Static type systems are useful for documentation, static verification of code, and optimization. It is common practice in stack languages to document the stack effects of each instructionwhat type of values are removed from the stack (the consumption), and what type of values are placed on the stack (the production). The Cat type system allows type annotations (aka "type signatures" or "stack diagrams") that express the effect of an expression in terms of the consumption and production. If a type annotation is omitted, Cat is able to infer a type automatically using a variant of the Hindley-Milner algorithm (research.microsoft.com/Users/luca /Papers/BasicTypechecking.pdf). Example 5 shows some type signatures. From the Cat interpreter, the metacommand #t gives you the type of any function on the stack (metacommands are commands intended for the interpreter and are not part of the language; by convention, they are affixed with a "#" character).
Type variables are denoted with a leading apostrophe ('). Type variable names starting with lowercase letters represent individual types (scalar type variables). Type variable names starting with uppercase letters represent type vectors. A type vector is zero or more types on the stack. A type vector can be left unbound in the type signature of an expression, but ultimately has to be bound to a concrete sequence of types for an expression to be executable. ([apply] apply is not a valid program because we cannot resolve all the type variables.)
The arrow is used to differentiate function types with side effects ('A ~>'B) from those without ('A->'B). The rule for determining whether a function has a side effect is simply whether it uses a function with a side effect.
42 : ( -> int) + : (int int -> int) dup : ('a -> 'a 'a) pop : ('a -> ) apply : ('A ('A -> 'B) -> 'B) if : ('A bool ('A -> 'B) ('A -> 'B) -> 'B) [1 +] : ( -> (int -> int)) quote : ('a -> ( -> 'a))
In intermediate languages, it is useful to be able to verify certain properties of code statically. The most important properties are:
- Whether the inputs of the correct type are available when a function is called.
- Whether an expression can underflow the stack or not.
To guarantee proper verification of these properties, intermediate languages with stack-based semantics (such as the JVML) usually have a requirement that conditional execution cannot lead to different stack configurations (the types of values on the stack have to agree). The same principle applies to Cat, so the following is illegal:
is_monday ["I hate Mondays"] [42] if
In this case, depending on whether it is Monday, you would have either a string or integer on the top of the stack. This violates the type of the if function, which requires that both arguments ["I hate Mondays"] and [42] yield the same vector of types (denoted by 'B).
A novel feature of the Cat type system is that all functions are row polymorphic. Each function takes an implicit type-vector variable (called a "row variable") as an argument representing the rest of the stack. The row variable is returned implicitly along with the rest of the results. Another nice feature of the Cat type system is that you can perform partial type inference and compute the type of a polymorphic expression without context. This feature is not present in the Hindley-Milner type inference algorithms.