View on GitHub

Ü Programming Language

Lambdas are here!

Recently i added lambdas into Ü. This article describes motivation for that and how lambdas in Ü work.

Motivation

There are functions in standard library (and not only in it) with a parameter of some function-like type (function pointer, functional object). Examples of such functions - ust::sort - with comparator function, thread class constructor - with thread entry function. Also possible functions like search with given predicate, iterators with filter/mapping functions, etc.

It was already possible to use such functions - by passing a function pointer or an instance of class with overloaded () operator. But sometimes such approach is too verbose. It’s unlikely that such function/functional class are used anywhere also, so, creating them only for a single usage seems to be overkill. Another problem of such approach is that functions and classes are defined in global space, not where they are used, that decreases (sometimes) readability and code locality.

Lambdas as solution

The solution of the problem described above is obvious - add some sort of anonymous functions, defined directly at usage site. Such functions in Ü are named lambdas.

A simplest lambda looks like this:

fn SortInts( ust::array_view_mut</i32/> ints )
{
    ust::sort(
        ints,
        lambda( i32 x, i32 y ) : bool { return x > y; } );
}

A lambda is defined with usage of lambda keyword and following parameters and return value description.

The result of a lambda expression is an object of some class generated by the compiler, which has overloaded () operator, created from specified description and body. Because of overloaded () such object may be called like a function.

Captures

It is a good language improvement to have local anonymous functions. But it is not enough. Sometimes a functional object needs to have some internal state and not to be a (almost) pure function with result depending only on provided at call site arguments. In order to support such necessity lambdas in Ü may capture local variables and function arguments from the surrounding context.

A lambda with [=] captures all variables that are used inside it by making a copy.

var i32 x= 42;
auto f=
    lambda[=]( i32 y ) : i32
    {
        // Capture variable "x".
        return x * y;
    };

A lambda with [&] captures all variables that are used inside it by creating a reference.

var i32 x= 42;
auto f=
    lambda[&]( i32 y ) : i32
    {
        // Capture a reference to variable "x".
        return x * y;
    };

It’s even possible to capture mutable references to surrounding variables and change them in a lambda.

var i32 mut x= 0;
auto f=
    lambda[&]( i32 y )
    {
       // Capture mutable reference to variable "x" and mutate it in the lambda.
       x= y;
    };

Captured variables/references become lambda fields (reference fields for captured references). During lambda object construction such fields are properly initialized. For captured values copy constructor is called, or an error is produced is it’s not possible.

Explicit captures

Sometimes it’s good for better readability to specify captured variables explicitly. It’s possible via capture list in []. Variables needed to be captured are listed inside it, prefixed with & if they are needed to be captured by reference.

var i32 x= 42;
// Specify captured variable "x" explicitly.
auto f=
    lambda[x]( i32 y ) : i32
    {
        return x * y;
    };

One of the advantages of the usage of a capture list is the possibility to capture in the same lambda both values and references.

var i32 mut x= 0;
var i32 y= 33;
// Specify two captured variables, capture the first one by reference, the second one by copy.
auto f=
    lambda[&x, y]()
    {
        // Can mutate captured by reference variable.
        x= y;
    };

Capture expressions

Yet another way of variable capturing is to specify an initializer expression for a name in a capture list. If such expression exists, its result will be captured instead of a variable with the same name from the surrounding context.

auto f=
    lambda[x= 42]( i32 y ) : i32
    {
        return x * y;
    };

The main advantage of such expression is the possibility to move value into the lambda object. it is useful if copying is too expensive or if an object is non-copyable at all.

Lambda this

this in lambdas is inaccessible by a programmer. It is so in order to avoid confusion with possible this from the surrounding context.

There is possibility to specify lambda this mutability.

auto x= 0;
auto f=
    lambda[=] mut () : i32
    {
        ++x; // Mutate captured variable copy (not the source variable).
        // On each call this lambda will produce different result.
        return x;
    };

byval lambdas are also possible. It’s even possible to move captured variables in byval mut lambdas, which is (for now) not allowed in byval mut this methods of non-lambda structs and classes.

auto f=
    lambda[x= 0] byval mut () : i32
    {
       // Move captured by value variable.
       // For "i32" it is unpractical to do so, but it has sense for non-trivially-copyable types or non-copyable types.
       return move(x);
    };

Internal compiler workflow for lambdas

The compiler performs for lambdas preprocessing step - in order to find captured external variables. This is basically a compilation of a lambda function (with some tweaks), but later such function is removed.

After the preprocessing the lambda class is created and actual lambda function building is performed.

Because of preprocessing it’s possible to calculate reference notation for lambdas automatically, there is no need to specify it explicitly. And it’s even not possible to do so, because a programmer may need to know exact reference notation for the result lambda class, which is practically hard, since result class layout and reference notation isn’t so trivially predicted (but still is deterministic).

The approach with preprocessing has one significant downside - it doesn’t allow to implement template lambdas. So, if it is necessary, regular functional classes should be used instead, where operator () may be template.

Lambda class

For each lambda the compiler generates its own class. Even if two lambdas are practically identical, they have different types. And the result lambda class have no name accessible by a programmer, but this is not a huge problem, because auto and templates may be used for working with lambdas.

Lambdas defined inside type templates and function templates are different for different template arguments. It is necessary, since they depend on these arguments. Also different are lambdas on each iteration of for operator for tuples, since iteration variable type (and sometimes constexpr value) may be different on each iteration.

Conclusion

Lambdas do not add something fundamental into the language, but they allow to improve code readability, sometimes significantly.

A lot of other programming languages have lambdas or something similar. For programmers who have experience with such languages it will be easy to use lambdas in Ü. Perhaps this may make transition to Ü smoother.

Existence of lambdas simplifies further development and usage of second-order functions, like map, filter, fold and others, which are especially useful for classes like iterators.

Possible further work

Lambdas for now can’t be coroutines - generators and async functions. And i do not know, whatever there should be such lambdas. It seems for me to be better to have some similar to lambdas concept for coroutines, but not just lambdas which are coroutines. I mean something like async blocks in Rust, which result is directly a coroutine object and not a functional object which returns a coroutine object. Two levels of indirection are likely unpractical and perhaps unachievable because of Ü limitations for structs with references inside.

It would be nice to have possibility to cast lambdas without captures into function pointers. But right now it’s not possible to do this just by performing binary cast, because of different internal LLVM function signatures.