Fleet  0.0.9
Inference in the LOT
Fleet - Fast inference in the Language of Thought


Fleet is a C++ library for programming language of thought models. In these models, we specify a grammar of primitive operations which can be composed to form complex hypotheses. These hypotheses are best thought of as programs in a (mental) programming language, and the job of learners is to observe data (typically inputs and outputs of programs) and infer the most likely program to have generated the outputs from the inputs. This is accomplished in Fleet by using a fully-Bayesian setup, with a prior over programs typically defined thought a Probabilistic Context-Free Grammar (PCFG) and a likelihood model that typically says that the output of a program is observed with some noise.

Fleet is most similar to LOTlib (https://github.com/piantado/LOTlib3) but is considerably faster. LOTlib converts grammar productions into python expressions which are then evaled in python; this process is flexible and powerful, but slow. Fleet avoids this by implementing a lightweight stack-based virtual machine in which programs can be directly evaluated. This is especially advantageous when evaluating stochastic hypotheses (e.g. those using flip() or sample()) in which multiple execution paths must be evaluated. Fleet stores these multiple execution traces of a single program in a priority queue (sorted by probability) and allows you to rapidly explore the space of execution traces.

Fleet is structured to automatically create this virtual machine and a grammar for programs from just the type specification on primitives: a function signature yields a PCFG because the return type can be thought of as a nonterminal in a PCFG and its arguments can be thought of as children (right hand side of a PCFG rule). A PCFG expanded in this way will yield a correctly-typed expression, a program, where each function has arguments of the correct types. Note that in Fleet, these types in the PCFG must be C++ types (intrinsics, classes, or structs), and two types that are the same are always considered to have the same nonterminal type.

Fleet requires you to define a Grammar which includes all of the nonterminal types that are used anywhere in the grammar. Fleet makes heavy use of template metaprogramming, which prevents you from ever having to explicitly write the grammar itself, but it does implicitly construct a grammar using these operations.

In addition, Fleet has a number of built-in operations, which do special things to the virtual machine. These include Builtins::Flip, which stores multiple execution traces; Builtins::If which uses short-circuit evaluation; Builtins::Recurse, which handles recursives hypotheses; and Builtins::X which provides the argument to the expression.


Fleet is based on header-files, and requires no additional dependencies. Command line arguments are processed in CL11.hpp, which is included in src/dependencies/.

The easiest way to begin using Fleet is to modify one of the examples. For simple rational-rules style inference, try Models/RationalRules; for an example using stochastic operations, try Models/FormalLanguageTheory-Simple.

Fleet is developed using GNU g++ 10 and requires C++-20.


Fleet contains several implemented examples of existing program-induction models, including identical or close variants of:

as well as several other examples, including sorting and first order logical theories.


Fleet is currently written using a signle-file, header-only style. This allows for rapid refactoring and prototyping since declarations and implementations are not in separate files. This style may change in the future because it also leads to slower compilation times.


Fleet provides a number of simple inference routines to use. Examples of each can be found in Models/Sorting

Generally, we find that ParallelTempering is the fastest and most effective way to search these spaces of programs. These can be compared in Models/Sorting, which has examples of each.


To illustrate how Fleet works, let's walk through a simple example: the key parts of FormalLanguageTheory-Simple. This is a program induction model which takes a collection of strings and tries to find a stochastic program which can explain the observed strings. For example, it might see data like {"ab", "abab", "ababab"} and then infer that the language is (ab)^n.

Let's first walk through the grammar definition. Here, we first import Grammar.h and also Singleton.h. Singleton is a design pattern that permits only one copy of an object to be constructed, which is used here to ensure that we only have one grammar (and aren't, e.g. passing copies of it around). Typical to Fleet's code, we define our own grammar as MyGrammar. The first two template arguments are the input and output types of the function we are learning (here, strings to strings) and the next two are (variadic) template arguments for all of the nonterminal types used by the grammar (here, strings and bools). The input type determines the type of any arguments to productions in the grammar; the output type determines the type of the root (e.g. what evaluating a program generated from this grammar will return).

#include <string>
using S = std::string; // just for convenience
#include "Grammar.h"
#include "Singleton.h"
class MyGrammar : public Grammar<S,S, S,bool>,
public Singleton<MyGrammar> {
// compute the tail of a string: tail(wxyz) -> xyz
add("tail(%s)", +[](S s) -> S { return (s.empty() ? S("") : s.substr(1,S::npos)); });
// compute the head of a string: tail(wxyz) -> w
add("head(%s)", +[](S s) -> S { return (s.empty() ? S("") : S(1,s.at(0))); });
// concatenate strings pair(wx,yz) -> wxyz
add("pair(%s,%s)", +[](S a, S b) -> S {
if(a.length() + b.length() > MAX_LENGTH) throw VMSRuntimeError(); // caught inside VirtualMachineState::run
else return a+b;
// the empty string
add("\u00D8", +[]() -> S { return S(""); });
// check if two strings are equal
add("(%s==%s)", +[](S x, S y) -> bool { return x==y; });
// logical operations
add("and(%s,%s)", Builtins::And<MyGrammar>);
add("or(%s,%s)", Builtins::Or<MyGrammar>);
add("not(%s)", Builtins::Not<MyGrammar>);
add("if(%s,%s,%s)", Builtins::If<MyGrammar,S>);
// access the argument x to the program (defined to a string type above)
add("x", Builtins::X<MyGrammar>);
// random coin flip (this does something fancy in evaluation)
add("flip()", Builtins::Flip<MyGrammar>, 10.0);
// recursion
add("recurse(%s)", Builtins::Recurse<MyGrammar>);
// add all the characters in our alphabet (alphabet is global variable defined on the command line)
for(const char c : alphabet) {
add_terminal( Q(S(1,c)), S(1,c), 5.0/alphabet.length());

Then, in the constructor MyGrammar(), we call the member function Grammar::add which takes two arguments: a string for how to show the function (called a "format"), and a lambda expression for the function itself. The string formats are just for show – they have nothing to do with how the program is evaluated and expressions in these strings are never interpreted as code. Each string format uses "%s" to denote the string of its children. So, when Fleet is asked to display a program, it recursively substitutes into these format strings. Fleet checks for you that you have as many "%s"es as you have arguments to each function and will complain if it's wrong. (NOTE: if you need to display children in a different ordered sequence, Fleet also supports this via "%1", "%2", etc.).

The second argument to Grammar::add is a lambda expression which computes and returns the value of each function (e.g. for "tail(%s)" it compute the tail of a string – everything except the first character. Note that this function, as is generally the case in Fleet, attempts to be relatively fault-tolerant rather than throwing exceptions because that's faster. In this case, we just define the tail of an empty string to be the empty string, rather than an error.

The functions for computing the head of a string is defined similarly. The "pair(%s,%s)" function is meant to concatenate strings but it actually needs to do a little checking because otherwise it's very easy to write programs which create exponentially long strings. So this function checks if its two argument lengths are above a global variable MAX_LENGTH and if so it throws a VMSRuntimeError (VirtualMachineState error) which is caught and handled internally in Fleet. (Note that in the actual Model code, there is a more complex version of pair which is a bit faster, as it modifies strings stored in the VirtualMachineState stack rather than passing them around, as here).

Then, we have a function which takes no arguments and only returns the empty string. Here, we have given it a Unicode name which will appear/render in most terminals as epsilon (a convention for the empty string in formal language theory).

The "(%s==%s)" function computes whether two strings are equal and returns a bool. Note that generally it is helpful to enclose expressions like this in parentheses (i.e. "(...)") because without this, the printed programs can be ambiguous.

Note that in each of these lambda expressions, the input types (e.g. "S a, S b" in "pair") and return type (S) really matter. Fleet uses some fancy template magic to extract these types from the functions themselves and build the required grammar. That allows it to start with MyGrammar's return type and sample a composition of functions which yield this type.

MyGrammar also adds built-in logical operations. Builtins are functions that we do not define ourselves using lambdas. We certainly could, but these are buitlin because they are a bit fancier than what you might think. Specifically, in most computer science applications we want the versions of these functions which involve short circuit evaluation. For example, and(x,y) won't evaluate y if x is false, which for us ends up with a measurably faster implementation. This is maybe most important though for "if(%s,%s,%s)", where we only want to evaluate one branch, depending on which way the boolean comes out. The implementation of correct short-circuit evaluation depends on some of the internals of Fleet and so it's best to use the built-in versions of these functions.

We also have a special fucntion Builtins::X which provides the argument to the function/program we are learning. We defaulty give this the name "x".

The "flip()" primitive is somewhat fancy (and Buillt-in) because it is a stochastic operation, which acts like a coin flip, returning true half of the time and false half of the time (Fleet also a built-in biased coin, which takes a coin weight as a parameter). This is fancy because it means that when the program evaluates, there is no longer one single output, but rather a distribution of possible return values. Fleet will hand these back to you as a DiscreteDistribution of the output type of the function. Here, we have included 10.0 after the Builtins::Flip. This is an optional number which can be included for any primitive and it determines the probability in the PCFG of each each expansion (they are summed and renormalized for each nonterminal type). In this case, we need to upweight the probability of Flip in order to make a proper PCFG (e.g. one where no probabliity is given to infinite trees) because otherwise the and,or,not operatiosn above will lead to infinite trees. The default weight for all operations (when its not specified) is 1.0, meaning that 10.0 makes it ten times as likely that a bool will expand to a flip than the others. In general, you will want to upweight the terminal rules (those with no children) in order to keep grammars proper.

The next operation is Builtins::Recurse, which allows a defined function to call itself recursively. This is a special Builtin because it requires us to evaluate the entire program in which it occurs again, and use the output of that program. Note that in general as we search over programs, we will find things that recurse infinitely – Fleet's VirtualMachineState keeps track of how many recursive calls a single program has made (via VirtualMachineState::recursion_depth) and when it exceeds VirtualMachineControl::MAX_RECURSE or when the total runtime exceeds VirtualMachineControl::MAX_RUN_PROGRAM, Fleet will halt evaluation of the program.

Finally, this grammar adds one terminal rule for each character in the global string "alphabet." Here, these characters are converted to strings containing a single character because char is not a nonterminal type in the grammar we defined. The entire collection of characters is given an unnonormalized grammar probabiltiy of 5.0.

Next, we define a class, MyHypothesis, which defines the program hypotheses we are going to be learning. The simplest form for this is to use something of type LOTHypothesis, which defines built-in functions for computing the prior, copying, proposing, etc.

#include "LOTHypothesis.h"
// Declare a hypothesis class
class MyHypothesis : public LOTHypothesis<MyHypothesis,S,S,MyGrammar,&grammar> {
using Super::Super; // inherit the constructors
double compute_single_likelihood(const datum_t& x) override {
// MyHypothesis stores a program, and "call" will call it with a given input
// The output is a DiscreteDistribution on all of the outputs of that program
const auto out = call(x.input, "");
// we need this to compute the likelihood
const auto log_A = log(alphabet.size());
// Likelihood comes from all of the ways that we can delete from the end and the append to make the observed output.
// So this requires us to sum over all of the output values, for a single data point.
double lp = -infinity;
for(auto& o : out.values()) { // add up the probability from all of the strings
lp = logplusexp(lp, o.second + p_delete_append<strgamma,strgamma>(o.first, x.output, log_A));
return lp;
void print(std::string prefix="") override {
// Override the default "print" function by prepending on a view of the distribution
// that was output (Fleet normally would just show the program)
prefix = prefix+"#\n#" + this->call("", "<err>").string() + "\n";

Here, MyHypothesis uses the curiously recurring template pattern to pass itself, so that it knows how to make copies of itself. The other template arguments are the input and output types of its function (string to string) and the Grammar type it uses (MyGrammar). The address of the grammar itself &grammar is also part of the type. Then, we inherit LOTHypothesis' constructors (via Super::Super's insanity in C++).

Primarily, what we have to define here is the compute_single_likelihood function, which defines the likelihood of a single observed data string. To compute this, we first call the function (LOTHypothesis::call) on the observed input. This returns a DiscreteDistribution<std::string> of outputs. Note that the outputs which return errors are mapped to the second argument to call, in this case the empty string. Then, we loop over outputs and add up the likelihood of the observed output x.ouput given the program output o.first weighted by the program's probability of that output o.second. This part uses p_delete_append, which is a string edit likelihood that assigns nonzero probability of any string being corrupted to any other. It computes this by imagining that the output string was corrupted by noise that deleted from the end of the string, and then appended, where both operations take place with geometric probabilities. These probabilities are passed as template arguments since that lets us compute some key pieces of this at compile time, yielding much faster code (this is a hot part of any string-based inference model).

We also define a print function which allows us to say how we print out the program hypothesis. Defaultly, any LOTHypothesis will just print (using the format strings like "pair(%s,%s)" above) but here we would like to also show the distribution of outputs when we print, so this prints a line with a LOTHypothesis::call (which is a DiscreteDistribution) and then a .string() call on the returned value.

The last piece of code is the actual running: Here, we define a Fleet object, which basically manages the input/output. We tell it that we want to be able to specify the alphabet and a string for the data (comma separated strings) on the command line. We create a grammar and note that this must happend after Fleet.initialize, since creating the grammar object needs to know the alphabet. Then, we create a TopN object to store the top hypotheses we have found. This is basically a sorted collection of the best hypotheses found so far. We set a few control aspects of our VirtualMachine (e.g. how many steps we should run each program for, how many outputs we will consider for each program). We convert our datastr (specified on the command line) to type MyHypothesis::data_t, and do a check on whether the data contains anything not in the alphabet.

#include "Top.h"
#include "Fleet.h"
#include "Builtins.h"
// define this so we can use it
std::string alphabet = "abcd";
std::string datastr = "a:aabb,c:ccbb"; // input-output pairs in our observed data
int main(int argc, char** argv){
// define a Fleet object
Fleet fleet("A simple, one-factor formal language learner");
fleet.add_option("-a,--alphabet", alphabet, "Alphabet we will use"); // add my own args
fleet.add_option("-d,--data", datastr, "Comma separated list of input data strings");
fleet.initialize(argc, argv);
// top stores the top hypotheses we have found. We set the number of top to store
// via the command line --top=100. Internally, that sets FleetArgs::top, which is used
// as the constructor for TopN. So it only works after you call Fleet.initialize above.
// mydata stores the data for the inference model
// Note that Fleet has these nice string_to functions which will parse strings in a standard
// format into many of its standard data types. Here, MyHypothesis::data_t is a comma-separated
// list of MyHypothesis::datum_t's, and each of those is an input:output pair using the
// alphabet
auto mydata = string_to<MyHypothesis::data_t>(datastr);
// let's just check that everything in the data is in the alphabet
for(const auto& di: mydata) {
for(char c: di.output) { // only checking the outputs are all in the alphabet
// tell Fleet's top to print each best hypothesis it finds
// (this can be set with the command line --print-top-best=1)
top.print_best = true;
// Create a hypothesis to start our MCMC chain on
auto h0 = MyHypothesis::sample();
// Create an MCMC chain
MCMCChain c(h0, &mydata, top);
// Actually run -- note this uses C++'s new coroutines (like python generators)
// the | print(FleetArgs::print) here will print out every FleetArgs::print steps
// (which is set via --print=1000). Defaultly FleetArgs::print=0, which does not print.
for(auto& h : c.run(Control() | print(FleetArgs::print) ) {
top << h; // add h to top
// we can do other stuff with h in here if we want to.
// at the end, print the hypotheses we found

And finally, we run the actual model. We make an initial hypothesis using MyHypothesis::sample, which creates a random hypothesis from the prior, we make an MCMCChain, and we run it. This uses C++'s fancy coroutines to let us process the samples as a loop. The c.run function takes a Control object, which basically specifies the number of steps to run or amount of time to run etc. When Control() is called with no arguments, it gets its arguments from the command line. This means that automatically, we can give a commandline arugment like "--time=30s" and this fleet program will run for 30 seconds (times can be in days (d), hours (h), minutes (m), seconds (s) or miliseconds (q).

Note that there are several inference schemes, and most (including the examples in Models) use ParallelTempering, which seems to work best.

As the chain runs, it will place hypotheses into "top" (which defaultly reads the command line argument (e.g. "--top=25") to figure out how many of the best hypotheses to store. At the end of this code, we call top.print(), which will call the .print() function on everything in top and show them sorted by posterior probability score (remember above we overwrote MyHypothesis::print in order to have it run the program).

At the end of this program, as the Fleet object is destroyed, it will print out some statistics about the total number of MCMC steps, total runtime, etc. (This can be turned off by setting FleetArgs::print_header=false).