Some programming language ideas

Published on 21-02-2025

This post is me writing down some ideas for programming languages or programming language features. Mainly to get them out of my head into the world without having to write an actual compiler for them 😅 I start by providing some context and then present my ideas. Hopefully it’s interesting.

Explanation of existing concepts

Here are a few short explanations and references to existing concepts that I refer to later on. The explanations are meant to give you some intuition about them if you are not at all familiar with them.

A refinement type is a type with a a predicate attached to it that holds for every value of this type.

Rust is not a language that has user-definable refinement types, but you can find some examples of refinements in the standard library, like NonZero and NonNull. The documentation for NonZero mentions that NonZero can result in memory optimization. For example, Option<NonZero<u32>> is the same size as u32 because the compiler is able to use 0 to mean None. This is great! However, I think it is a shame that it is not possible to describe such optimizations ourselves. It is not uncommon to have integer variables where not every value is valid. The excess values could be used for something else, but you cannot express this.

Strict compile time safety

Wuffs is a memory-safe programming language that does have refinement types. Wuffs is a small language and meant to write decoders and encoders in. It is safe with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences. All these checks are done at compile time. For integer variables the compiler understands what values it can have and thus if addition might result in an overflow or not.

Flow typing and union types

Flow-sensitive typing (or flow typing) is something I came into contact with at work using TypeScript. A type system with flow typing can refine the type of a variable through the flow of a program.

Below is a documented Typescript snippet to showcase flow typing and union types. You can run it in the TypeScript playground here.

type NumberOrString = number | string;

function whatIsIt(x: NumberOrString) {
  // At this point x could be a number or a string, since we have not
  // inspected x yet.

  if (typeof x === 'string') {
    // In this branch Typescript understands that x can only be a string.

    console.log('x is a string');
    return;
  }

  // If we reach this point then x was not a string, so x can only 
  // be a number. Typescript is able to strip part of the type since
  // we performed a runtime check to allow this; testing if the type
  // of x is a string or not.

  if (typeof x === 'number') {
    console.log('x is a number');
    return;
  }

  // If we reach this point then x was neither a string nor a number,
  // which should never happen if the program is well typed.

  // Below is an assignment to a variable with type never.
  // never is a special type.
  // It means that no option exists of what this type could be.

  // If we did not exhaustively check every possible variant
  // that x could be then the assignment below would be a type
  // checking error. You can try this yourself by making
  // NumberOrString = number | string | boolean
  const _exhaustivenessCheck: never = x;

  // In a properly typed program this should never print.
  console.log("What have you done?!");
}

const y: NumberOrString = 'hello';
whatIsIt(y);

// This is TypeScript for "Frick you. Just do the type cast."
whatIsIt(false as unknown as NumberOrString);

Some ideas!

Here are some ideas for features in programming languages.

There is a PyPy blog post explaining their then new dictionary design (it’s a post from 2015). Their design is not important here, just know that it uses an array of integers (sparse_array) as indices into another array (compact_array). The value of the indices can only be as large as the size of the dictionary (num_items). So, if the dictionary has fewer than 256 entries, then using 8-bit indices is sufficient. When the dictionary grows beyond this size you would need at least 16-bit integers, etc.

struct dict {
    long num_items;
    variable_int *sparse_array;
    dict_entry *compact_array;
}

This means that the actual type (a 8-bit, 16-bit or 32-bit integer) of variable_int depends on the value of num_items. You cannot express this relationship between variables in C or C++, but you could with union types and refinement.

type small_dict = struct {
    u32(0..(2**8)) num_items;
    u8 *sparse_array;
    dict_entry *compact_array;
};

type medium_dict = struct {
    u32((2**8)..(2**16)) num_items;
    u16 *sparse_array;
    dict_entry *compact_array;
};

type big_dict = struct {
    u32((2**16)..(2**32)) num_items;
    u32 *sparse_array;
    dict_entry *compact_array;
};

type dict = small_dict | medium_dict | big_dict;

In the example the dict comes in three variaties with either 8-bit, 16-bit or 32-bit indices. You can differentiate between the three by inspecting the value of num_items. The syntax of u32(lo..hi) is a meant as a refinement on u32, where its value must lie in the range lo..hi. The double star ** means “to the power of”, so 2**8 is 256.

You still have to check at runtime what kind of dictionary you are dealing with, but you would have to do this anyway. This does not incur extra cost, but the compiler can check if you are not accidentally assuming you are dealing with a small_dict without having done the necessary checks.

Other examples that also fall into the category of different interpretation of memory depending on some bit of data are NaN-boxing and Small String Optimization (SSO).

Also, this construction can be used to implement Rust-like enums yourself! Rust enums are implemented as tagged unions (as far as I know). Tagged unions are also used in TypeScript and with flow typing you can check if all variaties are handled. No explicit compiler support has to be written for enums and exhaustiveness checking.

This is an idea I had for Rust. Or at least for a language that does some form of lifetime analysis. The example below is for Rust and shows creating a Vec with a capacity of 8. A few elements are pushed, a reference to an element is taken and then an attempt is made to push again.

let mut a: Vec<i32> = Vec::with_capacity(8);

a.push(1);
a.push(2);
a.push(3);

let b: &i32 = &a[0];

a.push(4); // <- Highly illegal! You could go to jail for this (not really)

In Rust it is illegal to hold a reference to a Vec element and then push an element. You are holding an immutable reference and then try to mutably borrow it. The reason you need to mutably borrow the Vec for a push is that the underlying data where the elements are stored might move. If a Vec has no space left for a new element it has to move the elements to a place with more room for new elements.

However, in this example the push will not result in such a relocation so the reference will still be valid afterwards. Even if you check that the Vec has spare room you are not allowed to push while holding a reference. Annoying!

If you could overload a function based on a refinement, then you could define a version of push that does not require a mutable borrow, and the usage would look no different! That is a big if however 😄

fn push(&self, x: T) where self.len < self.cap {
    // Yay, we can push x and all references are safe!
    // ...
}

fn push(&mut self, x: T) {
    // ...
}

Even if overloading would not be possible you could still define another version of push that only can be called if you know that there is enough room.

fn push_safe(&self, x: T) where self.len < self.cap {
  // ...
}

Alignment requirements can be expressed through refinement predicates on addresses. In the pseudocode address is an implicitly available variable that represents the address of a value of the type.

type Int64 = [8]u8 where (address % 8 == 0); // Must be 8 byte aligned

Maybe on some embedded systems you have ranges of memory that are reserved for certain purposes. For example, the Game Boy has 8 KiB of VRAM mapped to memory range 0x8000-0xA000. It is possible to define pointers that are only allowed to point to VRAM.

type VRAMByte = u8 where (address >= 0x8000 && address < 0xA000);
type VRAMPointer = *VRAMByte;

If the program is well typed, you could never accidentally read or write outside of VRAM when using a VRAMPointer.

Checked bending of the rules

Assume you have a language like Wuffs that can guarantee compile time safety for certain issues like integer arithmetic overflow. This safety does come at a cost. In order to guarantee that an addition will not overflow you need to check that the integer is below a certain value, for example. Or to be sure that indexing an array will not be out of bounds you need to check the index value. ~~For this last case Rust enables out of bounds access checks for Vecs in debug builds, but disables them for release builds.~~ (This is not true 🙈 I got corrected by kibwen)

What you could do is introduce a special function called assume that can be given a predicate and the compiler will pretend that you did the necessary checks to make the predicate true.

fn index(&self, i: uint) {
  assume(i < self.len);

  // We can now index data
  return self.data[i];
}

When you add an assumption that turns out to not actually be true you will have introduced a bug. But you can more easily find the bug because all your assumptions have been written down in code! For debug builds you can turn the calls to assume into assert and find the problem.

assume is essentially a debug_assert except that it also ties in to the flow typing of the language.

Comments on Hacker News

Thanks for the comments!

Some programming language ideas

Explanation of existing concepts

Refinement types

Strict compile time safety

Flow typing and union types

Some ideas!

Type refinement + union types on C-like structs

Refinement based function overloading

Refinement on memory addresses

Checked bending of the rules