View on GitHub

oberien's Blog

Tech blabber about cryptography, rust, and other personal interests

Thoughts on return, break and continue

2022-01-04

Commonly, the keywords return, break and continue are used to influence the control flow of a program. Designing my own language rebo, I’m at a point where I want to implement a form of control-flow-changing operations. This post is primarily a braindump of ideas I had thinking about the implementation. Let’s embark on journey through those ideas in the form of thought experiments.

Only break

The keywords return, break and continue are only valid in some contexts.

return can only be used within functions, where it returns from the function it’s used in, either with a value or with void / unit.

fn my_function() {
    // return implicit unit
    return;
    // return expression (evaluating to unit in this case)
    return ();
}

The break keyword can only be used within loops (for, while, loop), where it aborts the loop and continues executing the code afterwards. (There are also languages where break is used within switch statements to abort execution of those and continue with the code after the switch, but we’ll ignore those here.)

// only "before 0" will be printed
for i in 0..10 {
    println!("before {}", i);
    break;
    println!("after {}", i);
}

Similarly, continue can be used within loops (for, while, loop) to abort the current execution of the loop and continue with the next iteration.

// only "before 0", "before 1", ..., "before 9" will be printed
for i in 0..10 {
    println!("before {}", i);
    continue;
    println!("after {}", i);
}

Some languages employ labels to allow breaking out of an outer loop:

'loop1: while true {
    while true {
        break 'loop1;
    }
}

With labels, we can get creative and start to explore new possibilities. We can “label” a function and use break to “break out of” that function, i.e., return from the function without having a return keyword:

'fun1: fn my_function() {
    break 'fun1;
}

However, that wouldn’t allow returning a value. You’d need to allow breaking with a value. Rust already has this feature, called loop-break-value:

let foo = loop {
    break 1337;
};

The same feature could allow you to “return” a value from a function via break:

'fun1: fn my_function() -> int {
    break 'fun1 1337;
}

You can go further and also allow “early-returning” from any block (which is being considered for rust):

let foo = 'block: {
    if true {
        break 'block 1337;
    }
    42
};

The loop expression can be seen as the loop keyword being followed by a block, which is repeated indefinitely. Allowing breaking out of blocks, you don’t need the continue keyword:

'loop: loop 'block: {
    // continue
    break 'block;
    // break
    break 'loop;
}

As a generalization, every block can have an optional label. That way function-labels could also just be the normal block label of the function body block:

fn foo() -> int 'label: {
    break 'label 1337;
}

The same is true for loops, which also don’t need special labels:

'loop: {
    loop 'block: {
        // continue
        break 'block;
        // break
        break 'loop;
    }
}

Using the keyword break for all of break, continue and return may feel weird. Instead, a different keyword like quit or exit could be used (or something else as those two are also somewhat taken).

Only return

Instead of using break-with-value, we could use the standard return for everything. This works especially well with closures. Loops don’t need to be keywords but could become recursive functions (given TCO). Those looping functions take a function representing the loop-body as argument. Loop-body-functions return a value indicating whether the loop should break or continue.

enum LoopResult {
    Break,
    Continue,
}

The simplest of these cases is loop:

// signature of the `loop` function
fn loop(body: impl Fn() -> LoopResult) { ... }

let mut i = 0;
loop(|| {
    if i > 3 {
        return LoopResult::Break;
    }
    println!("loop: {}", i);
    i += 1;
    LoopResult::Continue
});

while would look like this:

// signature of the `while` function
fn while(condition: impl Fn() -> bool, body: impl Fn() -> LoopResult) { ... }

let mut i = 0;
while(|| i <= 3, || {
    println!("while: {}", i);
    i += 1;
});

Finally, for could be used like this:

// signature of the `for` function
fn for<T>(iterator: impl IntoIterator<Item = T>, body: impl Fn(T) -> LoopResult) { ... }

for(0..=3, |i| { println!("for: {}", i); LoopResult::Continue });

This approach is somewhat similar to pure functional languages where loops are often represented as reduce operators on iterators / lists.

“Extreme” Continuation Passing

Another solution could be a variation of “only return”, also taking ideas from the continuation passing style. Similarly to above, looping operations are represented as functions. Functions never return. Instead of writing code after a function / loop, that code is wrapped in a function and passed as argument to the previous function as continuation. Every function has an implicit additional argument, which is the continuation. Loops themselves don’t exist. They are replaced with calling the current function again at the end of its body.

fn my_loop(i: i32, continuation) {
    if i > 3 {
        // will never return
        continuation();
    }
    println!("{}", i);
    my_loop(i + 1, continuation);
    // unreachable
}

…Never mind, I think I just invented a variation of functional languages with TCO…
NB: This is somewhat akin to (in)direct threading in interpreters.

Labels as Values / Scoped Labels

So far we’ve only looked at labels available in the current context. You can’t break out of a label which you haven’t seen yet during execution. For example you can’t “break out of” a label that is defined after the current loop:

'loop1: loop {
    // error
    break 'loop2;
}

'loop2: loop {}

Having such functionality would allow goto with extra steps:

// we want to jump to the code below
// create loop just for breaking out of it
loop { break 'label }

// define empty labeled loop to use for the break above
'label loop {}
// code we want to jump to

The main problem with goto and labels in C in my opinion is that labels are effectively global. This opens up to lots of gotchas and possibilities for unclean code.

But nothing stops us from making labels scoped. We can define labels to be globally available within the block / scope they are defined in (and all sub-blocks / sub-scopes). Labels could be their own built-in type, be passed around as special values and used as function arguments. return, break, continue and goto can take labels just as before.

fn return_to_label('to_return_to: Label<()>) {
    return 'to_return_to;
}

let 'label = 'outer_label;
return_to_label('label)

'outer_label;
// code

We also need to allow returning values to labels. For that, labels evaluate to a value. If the value is not unit, a default must be specified which is used if the label-expression is reached without being jumped to.

fn return_to_label('to_return_to: Label<i32>) {
    return 'to_return_to 1337;
}

return_to_label('outer_label);

let result = 'outer_label else 42;
assert_eq!(result, 1337);

Getting different values from labels based on how they were reached is similar to the φ (phi) function in llvm.

The idea of label values is possible in C already via the setjmp and longjmp functions.

In the end I just went with the standard return, break and continue.


Discussion on reddit.