Wednesday, February 19, 2014

Puppet Internals - Puppet Type System Ruby API

Puppet Type System Ruby API

One question I got several times about the Puppet Type System is how it can be used from Ruby. So, in this post I am going to show just that - how the type system can be put to good use from within functions and in types and providers. This post is all about the Puppet Type System's Ruby API.

Background

The Puppet Type System was first introduced in Puppet 3.4. The API and functionality has been extended since then, and this blog post describes what is available in Puppet 3.5. The simple, straight forward examples should also work in Puppet 3.4, but many of the more advanced types where implemented in 3.5.

The Type System is implemented using the rgen gem - so if you want to play with code that uses the type system you should have that installed, and turn on --parser future to get all the things needed loaded and ready.

The Type System Implementation

At the core, the type system is built on a Type Model. This is what defines the classes that describe a Type. This model is anemic (using a term I introduced in an earlier post) and the model is accompanied by two implementations - the TypeFactory, which is used to construct types, and TypeCalculator which performs computations involving type (is this object an instance of this type, are these types compatible, etc.).

The Type Factory

While it is possible to interact directly with the Type Model, it far more practical to do so via the TypeFactory. One important thing to remember is that almost all of the types are parameterized, and thus quite unique - the Integer type can describe a range, the String type can express min and max length, etc. Thus, while there will be many instances describing a type flowing around in the system that describe the same kind of thing, the actual type instances are individual objects. In modeling terms, the parameters of a Type are contained in their parent type Just like a specific wheel is mounted on one specific car (or no car) at any given point in time is a specific Type instance associated with a parent type (or no type). This means that whenever we want to use a type we must have a fresh instance that is not already contained in some other type. (This may change in the future, but complicates the modeling).

With that bit of theory taken care of, I can move on to showing examples.

Creating an instance of Integer:

int_t = Puppet::Pops::Types::TypeFactory.integer()

(From this point forward, I am going to simplify the examples by assuming that FACTORY is a reference to Puppet::Pops::Types::TypeFactory.)

And, just to complete the example, to test if an object is an instance of that type we just created:

Puppet::Pops::Types::TypeCalculator.instance?(int_t, 3)        # true
Puppet::Pops::Types::TypeCalculator.instance?(int_t, 'hello')  # false

Factory methods

method description
integer creates an integer type range from -Infinity to +Infinity
range(from, to) creates an integer type range with given from/to, where :default denotes Infinity
float creates a float type range from -Infinity to + Infinity
float_range(from, to) creates a float type range with given from/to, where :default denotes Infinity
string creates a string type with size from 0 to Infinity
enum(*strings) creates an enumeration type from the given set of strings
pattern(patterns) creates an enumeration type based on regular expressions
regexp(pattern=nil) creates a regexp type, optionally parameterized with a pattern
boolean creates a boolean type
scalar creates an abstract scalar type
object creates an abstract object type
numeric creates an abstract numeric type
array_of(o) creates an array type parameterized by the given argument, its size is 0 to Infinity
array_of_data creates an array parameterized by the abstract type Data, its size is 0 to Infinity
hash_of(value, key_scalar) creates a hash parameterized by the given arguments for value and key, the default is a scalar key, its size is 0 to Infinity
hash_of_data creates a hash parameterized with scalar key and Data value.
collection creates a collection abstract type of size 0 to Infinity
data creates the abstract Data type
resource(type_name=nil, title=nil) creates a resource type, optionally parameterized with resource type name, and title
host_class(class_name=nil) creates a host class resource type, optionally parameterized with a class name
catalog_entry creates the abstract CatalogEntry type
optional(t) creates a type that represents the given type t or undef
variant(*types) creates a type that represents 'one of' the given types
struct(type_hash) creates a Struct type that fully qualifies a hash
tuple(*types) creates a Type that fully qualifies an array
ruby(o) creates a ruby type from the given object or class
ruby_type(class_name) creates a ruby type representing the given Ruby class name
undef creates a type representing undef values
type_type(t=nil) creates a meta type, optionally parameterized with the type this is the meta type for

In addition to these type creation methods, the constrain_size method allows changing the size constraint (for types that supports this; String, Array, Hash, and the occurrence of the last type in the Tuple type's sequence. The type_of(o) method is used by the other methods when converting the given argument(s) to type. The label(t) method produces a string representation of the type.

method description
constrain_size(t, from, to) constrains the size of t, where from and to are the same as for an integer range
type_of(o) is produces a type for o using the rules shown below
label(t) produces a string representation of the type.

The type_of(o) method allows flexible specification of type parameters in Ruby, as specified by the following table

o is_a then
Class if the Ruby class corresponds to one of the Puppet types, e.g. String, Integer, and then that type is returned, else a Puppet Ruby type.
PAbstractType used as is (i.e. Puppet::Pops::Types::PAbstractType)
String the string is the classname of a Ruby class - the corresponding type is produced
any other the type is inferred using the type calculator (see below)

Example:

FACTORY.tuple(String, Integer)
FACTORY.struct({'a' => String, 'b' => Integer})

# i.e. instead of having to do this
FACTORY.tuple(FACTORY.string, FACTORY.integer)

In summary - the type factory creates types using convenient transformation from Ruby types to Puppet types. (For details about each method, see the yardoc for the TypeFactory). For more information about what the different types represents, see the earlier posts in this series about the Puppet Type System

The Type Calculator

The other major part of the Puppet Type implementation is the class Puppet::Pops::Types::TypeCalculator (from this point on referred to as CALCULATOR in examples). The type calculator is the type inference system.

The type calculator has the same set of methods available both as instance and class methods. If a long series of operations are to be performed, it is faster to call the singleton method to get an instance, and then use what is returned for multiple operations.

The set of methods are:

method description
assignable?(t, t2) is t the same, or a more general type than t2
instance?(t, o) is o an instance of the type t
equals(t, t2) is the type t equal to the type t2
enumerable(t) if the type is Enumerable an suitable Enumerator is produced, else nil (currently only for integer range)
infer(o) infers the type of o and produces a generalized type (see below)
infer_set(o) infers the type of o and produces a value dependent type set (see below)
singleton returns the single instance of the TypeCalculator
string(t) produces a s string representation of the given type. This is the same as calling to_s on a type instance

In addition to these methods, there are several utility methods, mostly for use by the type calculator itself - those are not considered to be API.

The set of methods makes it possible to perform all of the operations exposed in the Puppet Programming Language.

The use of most of these methods should be easy to grasp. The assignable? method performs a type check based on two types, and instance? on a type and an instance.

The infer method infers a generalization - e.g. given [1, 3.14] infers Array[Numeric], whereas the infer_set method infers a set of value dependent types, e.g. given [1, 3,14] it will produce an Array[Variant[Integer[1,1], Float[3.14, 3.14]]] where each value in the array is encoded as a unique type. This is what allows more detailed type-questions to be answered.

The Type Parser

The third and final component in the Type System is the TypeParser. It produces a type given its string representation - e.g. if you execute the example below, you get back the same type:

a_type = FACTORY.array_of(String)
Puppet::Pops::Types::TypeParser.new.parse(a_type.to_s)

This allows you to store / pass type information in String form in a Resource parameter and convert it back to a type again in a Provider. Same thing for facts, settings, or when you get data from a source that cannot produce type instances.

Other Operations on Type

The types themselves support equality (==, eql?), and they can be used as hash keys - two types that are equal hash to the same hash-key. You can also copy a type (and all of its parameters) using the copy method (which is important as you need to consider the containment rule).

The Regexp type has a method called regexp that returns a Ruby Regexp from the puppet type's pattern.

The Struct type has a method to obtain the name/type hash as a Ruby Hash.

All types are considered to be immutable once they have been fully constructed, but this is not (currently) enforced.

Typical Usage

Say we want to check the types of arguments given to a puppet function. In this case we can perform all the type checking in one go even for complex types (just calling instance?).

accepted_t = FACTORY.tuple(String, Integer)
unless CALCULATOR.instance?(accepted_t, arguments)
  raise ParseError, "Argument type mismatch. Expected: " + 
     accepted_t.map(&:to_s).join(', ') +
     ". Got: " +
     CALCULATOR.infer_set(arguments).map(&to_s).join(', ')

That is, if we accept two arguments of String and Integer type respectively this will print out what we expected and what we got. (Here I did go through the gymnastics of turning the types back to string even if this could have been written out directly as just "String, Integer").

If the function supports multiple signatures, we can obtain the type of the given argument by calling infer_set, and then test assignability of that against the signatures - this is faster than performing multiple instance? calls, as each call will need to infer the type of the given arguments from scratch.

given_t = CALCULATOR.infer_set(arguments)
case
when CALCULATOR.assignable?(signature_1_t, given_t)
  # process signature 1
when CALCULATOR.assignable?(signature_2_t, given_t)
  # process signature 2
else
  # error, not a supported signature
end

Now the gymnastics from before makes more sense since we may want to print the various signatures and state that the given did not match any of them.

Future Work

There is a new Function API in the works that will make use of the Type System, and it will also contain providing good error messages when there is a type mismatch. Until then, manual checking can be done as shown above.

There may be changes to how the containment of parameters work - right now the same type instance may have to be repeated multiple times, and it would be beneficial to be able to declare them by name and then reference rather than contain them.

Thursday, February 13, 2014

Puppet Internals - Location Processing

Rationale for Detailed Position Information

In this post about Puppet Internals I am going to cover how source location information is handled by the future lexer and parser. This also shows a concrete case of using the adapter pattern introduced in Puppet Internals - the Adapter Pattern.

Rationale for Detailed Position Information

It is important to have detailed position information when errors occur as the user programming in the Puppet Programming Language would otherwise have to guess about where in the source text the problem is.

Positioned information is not only wanted for syntax errors and other errors immediately detected by the parser, this information is perhaps even more valuable when errors occur later and it is important to know where a particular value originated.

To date, this output has consisted of file and line only. While this is enough in most situations there are also many cases where line alone is not enough (say using an operator like '+' the wrong way, and there are five '+' on the same line), just knowing that there is something wrong with one of the '+' on line 3 is not that great. What we want is to also know the character position on the line.

Implementation Background

In the 3x Puppet Language implementation, the positioning information was limited to only contain the name of the file and the line number. In many cases this information is wrong (or rather, quite imprecise) as it relies on the lexer's current lexing position (i.e. what it has consumed from the source text) as opposed to the position of the individual tokens. The lexer may very well have advanced past the particular point where the problem is when the problem is reported by the parser.

The first implantation of the future parser (available in Puppet 3.2) contained detailed positioning (file, line and position on line). It was calculated at the same time as the lexer did its regular processing - i.e intermixed with production of the tokens that it feeds to the parser.

This proved to be complicated as the lexer needs to look ahead and thus either maintain a complex state where it is, the location of points it will potentially backtrack to, etc. This also proved to be a performance hog. Part of the performance problem was the need to compute the length of an entire expression to enable future output of the corresponding source text.

The second implementation was much more performant. The idea was that detailed position information is really only needed when there is an error and that only a fraction of the total amount of tokens have positions that end up being recorded in the catalog). Thus, the position information could be computed lazily, only recording the most basic information; the offset and length of the significant parts of the source text. It was then possible to get rid of the complex state and intermixed position calculations and instead more efficiently scan the entire input for line breaks and building an index.

This implementation was introduced in the future parser in Puppet 3.4. Unfortunately, the idea that positioning is really only needed when there is an error was wrong in this case since the 3.4 implementation has to transform all parsed syntax tree into the 3.x AST equivalent tree, and there all position information is still computed up front and stored inside each AST node (even if never used). (Bummer).

Now, in Puppet 3.5, the new evaluator is responsible for evaluating most expressions and it will only trigger the position computations when it is required; when there is an error or when the expressions must be evaluated by the 3x evaluator (catalog operations fall in this category).

While working on the performance of the lexer / parser it was also observed that it would be beneficial to serialize the constructed syntax tree as deserialization of a tree is much faster in general than again parsing the same source text. In order to be able to support this, it was obvious that the detailed information needed to be included in the model (and not only computed and kept in structures in memory). Thus, in Puppet 3.5's future parser's AST model you will find a Program element that holds the entire source text, the line offset index, and the parse tree result. (This has other benefits as well - even a broken program can be returned - even if it can not be evaluated, it makes it easier to report what is wrong).

Requirements on Positioning Information

  • Be fast to compute when parsing
  • Serializable / De-serializeble
  • Not be too expensive to compute when details are required
  • Once computed it should continue to be available
  • Provide file, line, and position on line
  • Provide corresponding source text given an expression
  • Handle source strings in UTF-8 (i.e. multibyte encoding)

The Implementation in Puppet 3.5

A mix of techniques were used to meet the requirements.

The central concept is that of a Locator; an object that knows about the source text string; where all the lines start, and given an offset into the string can answer which line it is on, and the position on that line. This means that only the offset and length of tokens needs to be recorded and stored in the syntax nodes.

We could store a reference to the locator in every node, but that requires one extra slot per node, and would need to be handled in de-serialization (i.e. setting thousands of references when loading a single model). The offset and length are in contrast regular Integers and are fast to serialize/de-serialize.

The parser always produces an instance of Program, and it contains both the source text and the required line index. With these two, it can reconstruct the Locator (that was originally created by the lexer / parser when parsing the source). The Program is only a data container, it does not do any offset computation - that is always handled by an instance of Locator.

Here is a diagram that shows the relationship between the Program and the Locator. It also shows how individual nodes (Positioned) and their corresponding computational / cache of positioning (a SourcePosAdapter) relate to Program and Locator. Read on to learn how those work.

Positioned

All nodes in the constructed syntax tree model inherit from Positioned (except Program which is always the entire source). Being Positioned means that there is an offset and a length (but nothing more).

If we want to know the line number and the position on the line we need to find the Locator since it knows how to compute the information. We could have implemented that in the Positioned object itself, but it would clutter its implementation and it would be difficult to change the strategy for computing. This is where the SourcePosAdapter comes in.

The SourcePosAdapter

Being an Adapter (see earlier post (1) for details) means that it is bi-directionally associated with a particular object without the object knowing about it. The responsibility of the managing the relationship is entirely on the adapter side.

An Positioned object is adapted to a SourcePosAdapter by:

adapter = SourcePosAdapter.adapt(the_object)

The same instance of the adapter is always returned (it is created the first time). It is possible to ask if an object is adapted (and get the adapter) by:

adapter = SourcePosAdapter.get(the_object)

Once a SourcePosAdapter is obtained, it can answer all the questions about position. When it is created it performs a minimum of computation. When asked for something that requires a Locator is searches for the closest object that has knowledge of it and then caches this information. When this takes place for the first time, the search always goes up to the Program (root) node. On subsequent searches a parent node with a SourcePosAdapter may be encountered and the search can stop there.

The resulting structure is what is depicted in the graph.

It is worth noting that all model objects that are contained knows about their container via the somewhat mysterious method eContainer (how that works in more detail and what the difference is between a containment, and a reference is the topic for another blog post).

Example

Say we have something simple like this:

 $a = 1 + 2 * 3

The lexer produces a sequence of tokens:

 VARIABLE, 'a', offset = 0,  length = 2
 EQUAL,    '=', offset = 3,  length = 1
 NUMBER,   '1', offset = 5,  length = 1
 PLUS,     '+', offset = 7,  length = 1
 NUMBER,   '2', offset = 9,  length = 1
 TIMES,    '*', offset = 11, length = 1
 NUMBER,   '3', offset = 13, length = 1

The Parser arranges them into a tree:

When the parser parsed the expressions, it did so by evaluation rules. Here is an excerpt from the grammar.

expression
  : ...
  | expression PLUS     expression   { result = val[0] + val[2]; loc result, val[1] }
  | expression MINUS    expression   { result = val[0] - val[2]; loc result, val[1] }
  | expression DIV      expression   { result = val[0] / val[2]; loc result, val[1] }
  | expression TIMES    expression   { result = val[0] * val[2]; loc result, val[1] }
  | ...

While there is much to talk about how the grammar / parser works, this post focuses on handling of location, so the interesting part here are the calls to the loc method. It is called with the resulting model node (e.g. an ArithmeticExpression) and then one or two other nodes which may be model nodes or tokens produced by the lexer. All of the arithmetic expressions are located by their operator (loc is called with val[1] which is a reference to the operator i.e. '+', or '*' in the example).

Once the tree is built, since all of the nodes are Positioned it is possible to adapt them with a SourcePosAdapter to get the detailed computed information.

Output

The output when there is position information is simply line:pos where pos starts from 1 (the first character on the line).

Output of source excerpt (i.e. a snippet of source and a caret-pointer to the position) is not yet implemented. Maybe someone wants to take a stab at making an implementation?

UTF-8

And, lets talk about UTF-8, or rather about lexing multibyte strings.

The new implementation (in lexer2) handles multibyte source. It does this by recording the byte offsets and using a locator that is specific to single or multibyte runtime. This proved to be much faster than using the new mulitbyte support introduced in Ruby 1.9.3. You can look at the implementation in Puppet::Pops::Parser::Locator, where there are two different implementations.

Summary

This post shows both a concrete example of using the adapter pattern, as well as some high level examples about how the lexer gets its work done in a performant way.

In the Next Post

Well, to be honest, I don't really know what the next post should be about? What would you like to learn more about next - I am considering an overview of modeling as it is a fundamental concept, but there are many other topics to choose from - the implementation of the lexer, how the parser works, how error messages are generated and formatted, and then we have the evaluator to cover... - seems like this will be a very long series.

Monday, February 10, 2014

The Puppet Type System Blog Posts

The Puppet Type System

In Puppet 3.5's future parser there is a new type system that makes it much easier to write validation logic for parameters (and much more). I have written a series of blog posts about the new type system - and this post is just an index to the series.

It works best if they are read in the order they were published:

I will update this index blog post when there are new posts in the series. Also, if there are changes to the implementation, I will try to keep the blog posts updated.

There are also posts about the Puppet Type System's Ruby API. These posts are for those that contribute to Puppet itself or write plugins.

Adding Struct and Tuple to the Puppet Type System

Adding Struct and Tuple to the Puppet Type System

While working on ideas for a new API for Puppet Functions and Orchestration I found myself wanting a couple of new types. Given a Puppet Labs Hack-Day, and some spare hours during the weekend I am now happy to present two additions to the type system:

  • Struct - a hash with specified keys and type per key
  • Tuple - an array with specified sequence of types

These additional types will be available in the Puppet 3.5 release when using the --parser future option.

Struct Type

The Struct type fully specifies the content of a Hash. The type is parameterized with a hash where the keys must be non empty strings, and the values must be types.

Here is an example, where the hash must contain the keys mode and path, and mode must have a value that is one of the strings "read", "write", or "update", and the key path must have a String value that is at least 1 character in length.

Struct[{mode=>Enum[read, write, update], path=>String[1]}]

A Struct type is compatible with a Hash type both ways, given that the constraints they express are met. A Struct is a Collection (just like Hash), but its size is controlled by the specified named entries.

Struct supports optional values - this means that a matching hash may either have undef bound to a key, or that the key is missing. A hash that has keys not specified in the Struct will not match.

An unparameterized Struct matches all structs and all hashes.

Tuple Type

The Tuple type fully specifies the content of an Array. It is to Array what Struct is to Hash, with entries identified by their position instead of by name. There is also some flexibility allowed with a variable number of optional and trailing entries.

Tuple[T1, T2]                   # A tuple of exactly T1 and T2
Tuple[T1, T2, 1]                # A tuple with a variable number of T2 (>= 0)
Tuple[T1, T2, 1, 3]             # A tuple with a variable number of T2 (0-3 inclusive)
Tuple[T1, 5, 5]                 # A tuple with exactly 5 T1
Tuple[T1, 5, 10]                # A tuple 5 to 10 T1
Tuple[T1, T1, T2, 1, 3]         # A tuple of one T1, two T1, or two T1 followed by T3

All entries in the Tuple (except the optional size constraint min/max count) must be a type and denotes that there must be an occurrence of this type at this position. The tuple can be modified such that the min and max occurrences of the given types in the type sequence can be specified. The specification is made with one or two integer values or the keyword default. The min/max works the same way as for an Integer range. This way, if optional entries are wanted in the tuple the min is set to a value lower than the number of given types, and if the last type should repeat the max is given as a value higher than the number of given types. As an example, a size constraint entered as Tuple[T, 0, 1] means T occurs 0 or 1 time. If the max is unspecified, it defaults to infinity (which may also be spelled out with the keyword default).

 ["a", 1]     =~ Tuple[String, Integer]      # true
 ["a", 1,2,3] =~ Tuple[String, Integer, 1]   # true
 ["a", 1,2,3] =~ Tuple[String, Integer, 0]   # true
 ["a", 1,2,3] =~ Tuple[String, Integer, 0,2] # false
 ["a", 1,2,3] =~ Tuple[String, Integer, 4]   # true
 ["a", 1,2,3] =~ Tuple[String, Integer, 5]   # false

The Tuple type is a subtype of Collection. Its size is specified by the given sequence and the size constraint (which defaults to exactly the given sequence).

Summary

In this post you have seen the two new very useful types Struct, which fully qualifies a Hash, and Tuple which fully qualifies an Array.

You find the index page with all the posts in the series here

Friday, February 7, 2014

Puppet Internals - The Adapter Pattern

Puppet Internals - The Adapter Pattern

In this series of blog posts about the future parser/evaluator the turn has come to adapters, a technique used to implement a strategy pattern that attaches an instance of an Adapter to an object and where the original object's logic is kept unaware of this attachment. Part of the rationale for this is described in the previous post called "Separation of Concerns". While the adapter pattern also is a functional composition pattern its main use is to allow composition of state.

Using an Adapter

Let's start with an example creating and using a simple adapter.

The base implementation is found in the class Puppet::Pops::Adaptable::Adapter. Here is a simple example where a NickNameAdapter is defined and used to associate a nick name with any object.

# Defining the adapter
class NickNameAdapter < Puppet::Pops::Adaptable::Adapter
  attr_accessor :nick_name
end

# Using the adapter
d = Duck.new("Daphne")
NickNameAdapter.adapt(d).nick_name = "Daffy"
NickNameAdapter.get(d).nick_name
# => "Daffy"

Behind the scenes, the adapter will add one instance variable to the adapted object named after the adapter class. (Or put differently, each subclass of Adapter manages one instance of the adapter-class per adapted object). The naming ensures there are no clashes among the various kinds of adapters.

While it is possible to achieve almost the same kind of separation using Ruby techniques, the adapter pattern has one important difference when it comes to accidental coupling. The typical Ruby solution would either be to open the original class and include a module, or if sparse allocation is wanted to open/create the eigen class (a.k.a instance-class), and then add instance variables and / or logic there. While this Ruby construct keeps the source code free from mixing the concerns, the end result is an object that has the aggregated behavior. This creates two problems:

  • The names of attributes and methods may clash with some other concern - someone may later want to use a StarWarsNickName module that associates a nick name based on a Star Wars character. If this adapter also uses nick_name - they will clash.
  • Logic that is unaware of the extension may unintentionally invoke such operations - especially so if the extension is of a generic kind (e.g. using names like status, id, or, name). Such logic is both tedious and difficult to hunt down later when refactoring takes place.

Benefits of Using an Adapter

  • The storage for an adapter is allocated when needed. This is ideal when only some objects need to carry extra data. (Real example - associating on-demand cached information that is derived from the object's intrinsic attributes that is expensive to calculate).
  • The data and behavior captured in an adapter does not leak into the object itself (while it is possible to find the instance variable, the methods that operate on it are not found in the adapted object/class).
  • Does not require keeping a separate table of association between adapted instance and extra data. This is always problematic as it may cause memory leaks if objects are not managed in this table (they must be removed when they go out of scope).
    • An efficient table solution also requires that there is a unique key that allows a hash table to be used, or that the adapted object itself can work as a hash key. This may not always be the case.
  • Makes refactoring easier since logic that makes use of the extended functionality must do so via use of the specific adapter.
  • Should the logic ever be ported to another language then it is much easier to implement/use an adapter pattern then to try to add support for Ruby's open-and-extend concepts.

The Adapter API

The Adapter API is quite simple, and consists of:

  • adapt - creates adapter if needed, returns the adapter
  • adapt_new - creates a new adapter (removes any previous adapter of the same type)
  • get - returns the adapter or nil if not adapted
  • clear - completely removes the association of this adapter type

The following methods are intended to be defined / overridden by the implementor of an adapter if it has special needs.

  • create_adapter - if something more special than a standard new/initialize is needed
  • instance_var_name - if something special (like thread or context unique names) are needed
  • associate_adapter - associates an adapter with an object

adapt, adapt_new

The class method adapt produces an instance of the adapter class associated with the given object. If the object is not already adapted, a new adapter is created, otherwise the already associated adapter is returned.

MyAdapter.adapt(o)  

The adapt method optionally takes a block with one or two parameters. If one parameter is used, this is given the value of the adapter, and if a second is used, it is given the adapted object.

NickNameAdapter.adapt(o) { |a| a.nick_name = "Buddy!" }
NickNameAdapter.adapt(o) { |a, o| a.nick_name = "You're the best #{o.class.name} I ever met."}

The adapt_new works the same way, but it always produces a new instance of the adapter and drops the association with any previously associated adapter of the same type.

get

Simply returns the associated adapter if the given object has been adapted, and nil otherwise. This is a convenient way to both test if an object is adapted, and get the adapter at the same time.

MyAdapter.get(o)  # nil, or an adapter of type MyAdapter

clear

Simply removes the adapter from the given object. (Technically done by removing the instance variable). This is a no-op if the given object was not adapted.

MyAdapter.clear(o)

create_adapter

This method can be overridden if something more elaborate is needed in a subclass of Adapter when an instance of the adapter is created. The method is given the adapted object if the adapter wants to configure itself from the object it is associated with. The default create_adapter simply calls new without arguments.

associate_adapter

This method associates an adapter with the given object. It may be overridden if there are special needs - this method must bind the adapter to the expected instance variable, but a special implementation may also do other things.

instance_var_name

As shown earlier, the name of the created instance variable is based on the name of the class. This method may be overridden if there is the need to use the same kind of adapter to manage multiple states of the same kind, say to include the current thread. (An alternative solution is for the adapter itself to manage such a thread to data mapping).

Adapters and Models

I have yet to write a blog post about the ecore modeling technology used in Puppet (RGen), but here is a small tidbit that relates to adapters.

Since the runtime support for objects that are modeled makes it possible to navigate a complete structure (objects know if they are contained, in which object they are contained, and the role that it plays - e.g. "I am the left front wheel of the Car RQT-324"). This also goes the other way - it is possible to generically find "all contained objects", which for a Car could mean, the four wheels, the engine, the seats etc. and the logic we are writing can find these without knowing explicitly about the exact concepts (e.g. "wheels").

This makes it very easy to search for an adapter. As an example, we may have a TestReport adapter that contains information about the condition of a part of a Car. Further, if a part also can contain parts recursively and we are operating on one part somewhere in this hierarchy, we may want to know if there is a report associated with either the given part or a "parent part".

There is a small utility method for this in Puppet::Pops::Utils called find_adapter. It can be used like this:

part = ... # a reference to a car part
adapter = Puppet::Pops::Utils.find_adapter(part, TestReportAdapter)
if adapter
   # a report was associated with the part or one of its parent parts
   # ...
end

This only works with modeled classes (such as the AST model used in the future parser) since Ruby classes in general does not know anything about containment.

In a future blog post I will cover all the various ways a model can be navigated by making use of the ecore meta-model that is associated with modeled objects. (As an example, say how to write a method that finds any/all contained object that has an associated adapter).

When not to use an Adapter

There are scenarios where an Adapter is not the best choice. They are slightly slower than using directly implemented support for a concern - so in performance critical parts purity in design may need to be sacrificed.

Naturally, if something is really an intrinsic part of some other object, then it should probably be implemented directly, or via explicit composition - say when the system we are writing is all about producing test reports for car parts.

Summary

The adapter pattern is a mechanism that provides separation of both data and functional concerns. It is ideal for extending functionality with additional aspects where it is undesirable to mix the concern into targeted classes. It is also good for dealing with sparsely extended populations (where only some / few out of all instances needs to be extended).

In the Puppet runtime for the "future parser/evaluator" adapters are currently only used for caching of computed source locations, but is also intended to be used for things like association of validation results (i.e. errors), loader (which loader loaded the code), and perhaps documentation. There are many additional constructs in the Puppet runtime that would benefit from the use of the adapter pattern and you may be seeing them in additional places as more code is refactored to reduce coupling and increase cohesion in the Puppet code base.

In the Next Post

In the next post, I will cover how the SourcePosAdapter is used to cache detailed source position information associated with individual expressions parsed from puppet source code.

Tuesday, February 4, 2014

The Type Hierarchy and Scalars

The Type Hierarchy

In the previous post about the Puppet 3.5 experimental feature Puppet Types I covered the rationale behind having a type system, and exemplified by using a handful of types such as Integer, and Array to achieve simple tasks.

This time, I am going to present an overview of all the types in the type system and present the most fundamental type - the Scalar in more detail.

The Type Hierarchy

 Any
   |- Scalar
   |  |- Numeric
   |  |  |- Integer[from, to]
   |  |  |  |- (Integer with range inside another Integer)
   |  |  |
   |  |  |- Float[from, to]
   |  |  |  |- (Float with range inside another Float)
   |  |
   |  |- String
   |  |  |- Enum[*strings]
   |  |  |- Pattern[*patterns]
   |  |
   |  |- Boolean
   |  |- Regexp
   |
   |- Collection[size_from, size_to]
   |  |- Array[element_type, size_from, size_to]
   |  |  |- Tuple[*T, size_from, size_to]
   |  |
   |  |- Hash[key_type, value_type, size_from, size_to]
   |     |- Struct[ key_type_value_type_hash ]
   |
   |- Data
   |  |- Scalar
   |  |- Array[Data]
   |  |- Hash[Scalar, Data]
   |  |- Undef
   |
   |- CatalogEntry
   |  |- Resource[resource_type_name, title]
   |  |- Class[class_name]
   |
   |- Variant[*types]
   |- Optional[type]
   |- Default
   |- Undef
   |- NotUndef[type]
   |
   |- Type[type]
   |
   |- Runtime['ruby', class_name]

Meet the Scalars

The types under Scalar should not come as a surprise; they represent something of single value such as String, Integer, Float, Boolean and Regexp (regular expression).

 'hello' =~ String  # true
 '123'   =~ String  # true
 '123'   =~ Numeric # false
 123     =~ Numeric # true
 1       =~ Float   # false
 1.0     =~ Float   # true
 1       =~ Integer # true
 1.0     =~ Integer # false
 /.*/    =~ Regexp  # true
 '.*'    =~ Regexp  # false
 true    =~ Boolean # true
 false   =~ Boolean # true
 'true'  =~ Boolean # false

The Scalar type on its own is quite useful if you need to check that you have received a single value as a parameter (i.e. not an array or hash or something more complex).

Integer and Float Ranges

As you can see in the type hierarchy, the Integer and Float types can be parameterized with an optional range from - to. This range is inclusive of the from/to values and it may be given in the reverse order where from > to (it is still the same range). If to and from are equal the range is a single value. It is also possible to use a literal default which in the first (from) position means -Infinity, and +Infinity in the second (to) position. If only one value is given it is taken to mean from the given value to +Infinity.

 1     =~ Integer[0,10]       # true
 -1    =~ Integer[0,10]       # false
 100   =~ Integer[0]          # true
 100   =~ Integer[0, default] # true

Float ranges work the same way, range values may be given as integers or float values.

 1.0     =~ Float[0,10]       # true
 -1.0    =~ Float[0,10]       # false
 100.0   =~ Float[0]          # true
 100.0   =~ Float[0, default] # true

Iterating over Integer Range

An Integer range that is capped (i.e. not open ended at -Infinity or +Infinity) can be used to iterate with any of the iterative functions. It behaves as if it were an array with the values in order from from to to.

 Integer[1,10].each |$val| { notice $val }
 # notice 1
 # notice 2
 # ...
 # notice 10

For more details, see the respective iterative function how it works with respect to getting the index as well as the value, and what it returns.

(And no, it is not supported to iterate over a Float range).

As you will see later, an Integer range is also very useful when it comes to describing type constraints on a collection (array and hash) as it makes it easy to ensure that a given array is not empty.

String

A String unsurprisingly represents a text string. It can be parameterized with a length range that is specified the same way as the range parameters for an Integer range.

 'abc'   =~ String      # true
 'abc'   =~ String[1]   # true, it is >= 1 in length
 'abc'   =~ String[1,4] # true, it is between 1 and 4 in length
 ''      =~ String[1,4] # false
 'abcde' =~ String[1,4] # false, longer than 4

Strings Matching Patterns / Enums

In the Puppet Type System, the types Pattern and Enum are subtypes of String. They are used to describe strings with certain characteristics; those that match. In the case of Enum, matching is done against one of its string values, and for Pattern, one of its regular expressions.

Let's start with Enum which is useful in situations where there are certain "keywords" (i.e. allowed values) - say used as keys in a Hash.

 $colors = [blue, red, green, yellow, white, black]

 'blue'      =~ Enum[$colors]   # true
 'red'       =~ Enum[$colors]   # true
 'pink'      =~ Enum[$colors]   # false
 'deep-blue' =~ Enum[$colors]   # false

As you can see from the example above, the matching is strict, only strings that is equal in value to one of the values listed in the Enum match.

The Pattern type in contrast matches against a regular expression. When constructing a Pattern type, you can give it strings or regular expressions (mix as you like).

 $colors = [blue, red, green, /yellow/, /^white$/, black]

 'blue'       =~ Pattern[$colors]   # true
 'red'        =~ Pattern[$colors]   # true
 'pink'       =~ Pattern[$colors]   # false
 'deep-blue'  =~ Enum[$colors]      # true 
 'ocean-blue' =~ Enum[$colors]      # true
 'blueish'    =~ Enum[$colors]      # true
 'whiteish'   =~ Enum[$colors]      # false (the regexp was anchored with ^$)
 'sky-color'  =~ Enum[$colors]      # false (it is a type system, what did you expect)

Regexp Type

The Regexp type represents a Regular Expression. It matches all regular expressions when not parameterized. It can also be parameterized with a regular expression in string or literal regular expression form. When parameterized with a specific regular expression it works as if it was a regular expression.

 /.*/       =~ Regexp           # true, it is a regular expression
 /.*/       =~ Regexp[/.*/]     # runtime error, left is not a string
 'blueish'  =~ Regexp['blue']   # true

This is mostly valuable since it allows constructing a regular expression using string interpolation - i.e. you can do something like this:

 $prefix = 'http://myorg.com'
 $var =~ Regexp["${prefix}/index.html"]

Strictness

The type system is strict; you cannot trick it into accepting strings containing digits as being numeric, nor is an integer mistaken for a string. Empty strings are just that, a string with length zero. Oh, and undef most certainly is undefined and not matching any of the other types (unless you want to - but that is the topic of a later post).

In the Next Post

In the next post I plan to cover the rest of the general types (the Collection types Array and Hash, Data, Optional, and Variant. And Oh, I do have to talk about Undef then. Maybe it becomes two posts...

Puppet Internals - Polymorphic Dispatch

The Basic Problem

In this series of blog posts about the future parser/evaluator the turn has come to polymorphic dispatch, a technique used to implement a strategy pattern where the logic is kept separate from the data/content it is operating on. The rationale for this is described in the previous post on "Separation of Concerns".

The Basic Problem

The basic problem when implementing a Strategy pattern is that we essentially want to write strategy methods as if they were part of the classes it operates on. In Ruby this is easily done (if we can accept static late binding) since we can:

  • include a module in a class
  • reopened a class and add/modify methods

We can however not easily support several variants of the same strategy, and we have to make sure that what we apply does not introduce methods with the same name since they would override each other's contributions to the class otherwise.

Essentially, dynamically adding/modifying a class' behavior is Monkey Patching, which should be reserved as a last resort to temporarily fix problems in 3d party logic.

Polymorphic Dispatch

In the Puppet 3.x future parser/evaluator, polymorphic dispatch is implemented using a Visitor Pattern and I am going to jump straight into examples showing how this implementation is used.

A Simple Label Provider

Let's say we want to implement a LabelProvider strategy that produces a string that describes what something is and that is suitable for inclusion in an error message. The typical alternative would be to just output the name of the class - while clear to implementors, users will however have a hard time understanding references to implementation classes.

There are several options available when using a visitor and we are going to start with the simplest form of visitor and add an implementation that just outputs "Object (label provider is incomplete)" for all kinds of objects.

class LabelProvider

  def initialize
    @label_visitor = Puppet::Pops::Visitor.new(self, "label")
  end

  def label(o)
    @label_visitor.visit(o)
  end

  protected

  def label_Object(o)
    "Object (label provider is incomplete)"
  end
end

In this example, we created a Visitor that is configured to dispatch calls to self, and to make calls to methods that start with "label" followed by an underscore, and the last part of the class name of the object it is told to operate on. If no such method is implemented, a search is made up the class hierarchy of the given object until a method is found. If no method is found an error is raised.

We then use the label provider to produce output:

provider = LabelProvider.new
puts provider.label(3)

We can now add methods as needed to create labels for all kinds of objects.

def label_ArithmeticExpression(o)
  "#{o.operator} Expression"   # i.e. '+ Expression', '- Expression' etc.
end

def label_Regexp(o)
  "Regular Expression"
end

I typically use the letter 'o' for the argument to mean 'the visited object' (which is guaranteed to be an instance of the class) as it requires less effort to identify the visited object in the logic if it has the same name in all of the polymorphic methods.

A more Efficient Label Provider

While the simple label provider in the previous example does the job perfectly well it is also the slowest. Each time we need a label provider, the visitor needs to be recreated and we are not fully benefiting from its ability to cache the resolution from visited object to called method as this cache is lost when the label provider goes out of scope. We can change that by making the visitor a class instance variable. When doing this we also need to modify how the call is made. Here is the improved implementation:

class LabelProvider

  def initialize
    @@label_visitor ||= Puppet::Pops::Visitor.new(self, "label")
  end

  def label(o)
    @@label_visitor.visit_this(self, o)
  end

# the rest is the same...
end

We can make one further optimization; since the visitor can be used with any number of arguments, the generic form of calling visit (or visit_this) accepts a variable number of arguments, and requires some internal shuffling of the arguments. We can instead call an optimized version (~30% faster) - in this case visit_this_0, (it is '0' since we are not giving any additional arguments (except the visited object)).

  def label(o)
    @string_visitor.visit_this_0(self, o)
  end

There are optimized versions for 1, 2 or 3 arguments, called visit_this_1, etc.

The optimized (caching) version can be used when we know that the class hierarchy or the set of available visitable methods is not going to change. If that is the case, we would need the use the non caching variant. (As a side note, if we are dealing with a design where classes are redefined to be subclasses of something other than their current superclass we are really in trouble).

Min and Max number of arguments

By default, the visitor is created to allow a minimum of 0 and a maximum of infinity number of arguments. If something else is wanted, this can be specified when creating the visitor:

# only the visited
@@label_visitor ||= Puppet::Pops::Visitor.new(self, "label", 0, 0)

# the visited + one more
@@eval_visitor ||= Puppet::Pops::Visitor.new(self, "eval", 1, 1)

# the visited + at least one more
@@variable_visitor ||= Puppet::Pops::Visitor.new(self, "variable", 1, nil)

By using these, the visitor will issue errors if there are too few or too many arguments when calling visit. (If the optimized visit_this_0, visit_this_1, etc. are used, you will instead get a runtime error from Ruby if the argument count does not match up).

The Visitor's First Argument - the default receiver

The first argument to the Visitor is the default receiver. It only plays a role when using the simple visit method, and can be specified as nil when constructing the Visitor if the receiver is given in each call to visit_this.

Visiting With Additional Arguments

You probably already guessed that if you want to pass additional arguments, they are simple given in the call to visit. Here is an example from the evaluator:

def evaluate(target, scope)
  @@eval_visitor.visit_this_1(self, target, scope)
end

Multiple Visitors and Other Patterns

It is perfectly ok to have multiple visitors in the same strategy. As an example, the evaluator has visitors for "eval", "lvalue", "assign", and "string".

Something complex can be handled with delegation to other strategies as this allows composition of the wanted behavior. Again as an example, the evaluator delegates to CompareOperator, RelationshipOperator, and AccessOperator which are strategies for these operations. The first two were factored out into separate strategies simply because of readability of the code (the CompareOperator adds three visitors "equals", "compare", and "include"), and the RelationshipOperator needs to perform operations that are specific to it. Including those in the main evaluator would result in it dealing with mixed concerns.

As an example, here is how the ComparisonOperator is used in the evaluator:

  case o.operator
  when :'=='
    @@compare_operator.equals(left,right)
  when :'!='
    ! @@compare_operator.equals(left,right)
  when :'<'
    @@compare_operator.compare(left,right) < 0
  when :'<='
    @@compare_operator.compare(left,right) <= 0
  when :'>'
    @@compare_operator.compare(left,right) > 0
  when :'>='
    @@compare_operator.compare(left,right) >= 0
  else
    fail(Issues::UNSUPPORTED_OPERATOR, o, {:operator => o.operator})
  end

(The call to fail will be covered in a future post that covers error handling).

Lastly, the AccessOperator (which represents expressions in the Puppet Programming Language on the form expression [ expression, expression ...]) needs to maintain state in order to be able to provide good error messages in case of a failure that relates to one of the evaluated expressions passed as an argument. Here is an example of how it is used in the evaluator:

# Evaluates x[key, key, ...]
#
def eval_AccessExpression(o, scope)
  left = evaluate(o.left_expr, scope)
  keys = o.keys.nil? ? [] : o.keys.collect {|key| evaluate(key, scope) }
  Puppet::Pops::Evaluator::AccessOperator.new(o).access(left, scope, *keys)
end

This may be changed since it may be faster to not create the strategy and instead pass an extra argument in each call - more benchmarking will tell.

Calling the "super" Version

Sometimes it is required to call a super version of a polymorph method. Here is an example:

def doit_Base(o)
  ...
end

def doit_Special(o)
  doit_Base(o) ...
end

It is the responsibility of the implementor to ensure that the right "super" method is called (i.e. that Special is indeed a specialization of Base). It is also the responsibility of the implementor to check when method are added if there are any direct calls to "super" versions that needs to be rewired (say if the class hierarchy is changed).

Last Part of Class Name, Feature or Flaw?

A decision was made to make the Visitor only use the last part of the class name. This means it is incapable of differentiating between two classes in different name spaces if they share the same name. This is both a feature (a strategy can be compatible with two different implementations) and a flaw (when used with 3d party classes that were not designed with unique class names).

In practice, this has not been a limitation in the implementation of the parser/evaluator. If later required, the Visitor could be made aware of additional segments in the name. This would probably need to be a specialized Visitor as it would be slightly less performant.

Strategy Composition

Since our strategies are separate instances we can easily pass them around. We can compose the behavior we want and pass the strategies as arguments instead of having to rely on "the one and only possible implementation" (which is what we get if we reference a class or module by name). We may want to have say a debugging version of the label provider where we output additional information. Exactly how we compose strategy objects and wire them into the logic where we want them to be used is for a later post where I am going to be talking about "inversion of control", and "injection". We could do something simple like this:

 class SomeClass
   def initialize(label_provider = DefaultLabelProvider.new)
     @label_provider = label_provider
   end

   def some_work(x)
     # ooops
     raise SomeError, "There is something wrong with the #{@label_provider.label(x)}."
   end
 end

Summary

In this post, the concept of polymorphic dispatch was introduced; basically moving methods from a set of classes to a common strategy where a visitor is used to dispatch the calls as if the methods were in their original place inside the classes - or to be more accurate, to the same effect as if they were in their original places.

This organization ensures that strategies do not clash, and we get a design with low coupling, and high cohesion; two desirable measures of architectural quality.

In the Next Post

In the next post, I will explain the Adapter pattern which is another technique to separate concerns.