← Playground README
raw

NURL — Neural Unified Representation Language (or Non-hUman Readable Language)

A programming language designed exclusively for use by language models. Not meant to be human-readable — maximum information density, deterministic compilation, LLVM-based codegen.


Why NURL?

Existing programming languages were designed for humans: - Keywords (function, return, class) consume tokens without adding information - Syntactic noise (parentheses, semicolons, indentation) exists for human benefit - Grammar exceptions require memorization, not logic

LLMs generate and consume code token by token. NURL optimizes this process:

Metric Python C NURL
Tokens for "add two ints" ~15 ~12 ~4
Grammar productions ~100 ~200 ~50
Runtime performance slow fast fast (LLVM)
Target platforms one many any LLVM target

Design Principles

1. Token efficiency above all

Every syntactic construct is designed to minimize token count without information loss. A single character can carry full semantic meaning.

2. Regular grammar

LLMs predict the next token from context. NURL's grammar has no exceptions — the same construct always works the same way. The grammar fits on a single page.

3. Local semantics

A token's meaning is derivable from at most 8 tokens of context. No long-range dependencies that could break during generation.

4. Deterministic compiler

The same source always produces identical output. No UB, no platform differences, no behavioral variation. LLMs can trust code behaves as written.

5. Full platform support

One compilation pipeline → all target platforms without porting.


Architecture

NURL source (.nu)
        │
        ▼
   Tokenizer
   (deterministic, context-free)
        │
        ▼
     Parser
   (LL(1), ≤3-token lookahead)
        │
        ▼
   LLVM IR (.ll)
        │
        ▼
      clang
        │
   ┌────┴────────────┐
   ▼                 ▼
native            wasm32-wasi
(Linux/Win/macOS) (via WASI SDK)

The compiler (nurlc.nu) is written in NURL itself. The bootstrap runs it twice over its own source and requires byte-identical LLVM IR on both rounds before the build is accepted.


Editor support

Syntax highlighting for VS Code / Windsurf is available in tooling/vscode-nurl/.

Install from VSIX: 1. Ctrl+Shift+P → "Extensions: Install from VSIX..." 2. Select tooling/vscode-nurl/nurl-0.1.0.vsix

The browser-based playground (see below) ships a Monaco port of the same tokenizer — no install required.


HTTP API & browser playground

A FastAPI container under api/ exposes the compiler over HTTP and hosts a Monaco-based playground that builds and runs NURL programs as WebAssembly (wasm32-wasi) directly in the browser via @bjorn3/browser_wasi_shim.

Endpoints

Build & run the container

From the repository root (the build context must be the repo root so the Dockerfile can access build.sh, compiler/, stdlib/, examples/, spec/, README.md):

docker build -f api/Dockerfile -t nurl-api:dev .
docker run --rm -p 8000:8000 nurl-api:dev
# → http://localhost:8000/         (playground)
# → http://localhost:8000/docs     (Swagger UI)

Pipeline inside the container

  1. nurlc <file.nu> → LLVM IR on stdout.
  2. The API rewrites the IR to match the wasm32-wasi ABI (renames @main@__main_argc_argv, injects the target triple, inserts i32/i64 shims for malloc/puts to match libc signatures).
  3. clang --target=wasm32-wasi -O2 <ir>.ll /opt/nurl/stdlib/runtime.wasm.o -o out.wasm using the WASI SDK (24.0) bundled into the image.

The wasm-compiled NURL runtime (stdlib/runtime.wasm.o) is baked into the image at build time. See api/README.md for local-dev instructions without Docker.


Syntax — overview

NURL uses prefix notation. The structure is always:

OP ARG1 ARG2 ... ARGN

Types (single letter)

i  — integer (64-bit, signed)
u  — integer (64-bit, unsigned)
f  — float   (64-bit)
b  — boolean
s  — string  (UTF-8, immutable)
v  — void
*T — pointer to T

Operators

:  — binding (variable / struct / enum / const decl)
=  — assignment
@  — function definition / aggregate constructor
→  — return type / arrow
.  — member access / indexing
( ) — function call
?  — ternary conditional  /  ?T — option type
?? — pattern match (exhaustive)
~  — loop / for-each / mutability prefix / bitwise complement
&  — and (logical i1, bitwise i64) / FFI decl prefix
|  — or (logical i1, bitwise i64)  / enum-decl separator / slice-literal separator
!  — logical NOT / Result type prefix (! T E)
\  — try-propagate / closure (lambda)
^  — explicit return
#  — type cast
Z  — sizeof
%  — trait / impl decl
$  — import decl
`  — string literal

Example: add two integers

@ add i a i b → i { ^ + a b }

Example: conditional

? > x 0
  `positive`
  `non-positive`

Example: loop

: i n 0
~ < n 10 {
  = n + n 1
}

Example: struct and member function

: Point { i x  i y }

@ dist Point p → f {
  ^ + * . p x . p x
      * . p y . p y
}

Example: function call

( add 3 4 )
( dist myPoint )

Example: mutability (default immutable)

: i x 10            // immutable — reassignment is a compile error
: ~ i counter 0     // mutable — ~ prefix
= counter + counter 1

Example: enum + pattern match

: | Json { JNull  JBool b  JNum i  JStr s }

@ describe Json v → s {
  ^ ?? v {
      JNull  → `null`
      JBool x → ? x `true` `false`
      JNum n → ( nurl_str_int n )
      JStr s → s
    }
}

Example: slice literal + for-each

: [i nums [ i | 1 2 3 4 5 ]
: i total 0
~ n nums { = total + total n }

Example: closure (lambda)

: (@ i i) square \ i x → i { * x x }
( apply square 7 )          // 49

Example: Result type + try-propagate

@ parse s src → ! i ParseErr { ... }

@ sum_two s a s b → ! i ParseErr {
  : i x \ ( parse a )       // `\` unwraps Ok, propagates Err
  : i y \ ( parse b )
  ^ @ ! i ParseErr { + x y }
}

Example: trait with default method

% Shape [T] {
  @ area T obj → i                 // required
  @ describe T obj → i {           // default body
    ( nurl_print ( nurl_str_int ( area obj ) ) )
    ^ 0
  }
}

% Shape Rect { @ area Rect r → i { ^ * . r w . r h } }

Token efficiency in practice

Comparison: sum the numbers 1–100.

Python (~46 tokens):

def sum_to_hundred():
    total = 0
    for i in range(1, 101):
        total += i
    return total

NURL (~13 tokens):

@ sumto i n → i {
  : i acc 0
  : i k   1
  ~ <= k n { = acc + acc k  = k + k 1 }
  ^ acc
}

Memory model

The current compiler is deliberately minimal:


Type system


Target platforms

The compiler emits LLVM IR and delegates native codegen to clang, so any target clang supports is reachable in principle. Only the first two are exercised by the build scripts today.

Platform Backend Status
Linux x86_64 LLVM primary dev target — build.sh + tests
Windows x86_64 LLVM fully supported — build.bat runs the same bootstrap + snapshot test suite as build.sh
macOS ARM64 LLVM should work via clang; untested
WebAssembly wasm32-wasi supported via the api/ container (WASI SDK 24.0); browser execution via browser_wasi_shim
Android / iOS LLVM cross planned
Embedded (no_std) LLVM planned
JVM JVM bytecode future
.NET CLR CIL future

Project structure

nurl/
├── spec/                      — formal language specification
│   ├── grammar.ebnf           ✓ current (v1.1)
│   ├── grammar_v0.1.ebnf …    — historical snapshots (v0.1 → v1.0)
│   ├── types.md
│   ├── ir.md
│   └── bootstrapping.md
├── compiler/
│   ├── nurlc.nu               ✓ self-hosting compiler, written in NURL
│   ├── nurlc.py               — Python bootstrap compiler
│   ├── src/                   — Python compiler internals
│   │   ├── lexer.py
│   │   ├── parser.py
│   │   ├── typechecker.py
│   │   ├── ir_gen.py
│   │   └── llvm_gen.py
│   └── tests/                 — 80+ `.nu` test programs + snapshot runner
│       ├── run_tests.sh       — Linux/macOS test runner
│       ├── run_tests.bat      — Windows test runner
│       ├── correct.txt        — golden baseline (status + output per test)
│       └── *.nu               — positive and negative tests
├── stdlib/
│   ├── runtime.c              ✓ C runtime (I/O, string helpers, FFI surface)
│   ├── runtime.o              — native host build
│   └── runtime.wasm.o         — wasm32-wasi build (produced inside the API image)
├── examples/                  — curated `.nu` programs surfaced by the playground
│   ├── showcase.nu  calculator.nu  fizzbuzz.nu  collatz.nu  wordcount.nu
│   └── enigma.nu  slice_test.nu  test_05_closures_and_capture.nu …
├── api/                       — FastAPI container (compiler-as-a-service + playground)
│   ├── Dockerfile             — multi-stage build; installs WASI SDK; bootstraps nurlc
│   ├── app/main.py            — endpoints, IR-rewrite shims, docs rendering
│   ├── static/index.html      — Monaco-based playground, runs wasm in-browser
│   └── requirements.txt
├── tooling/
│   └── vscode-nurl/           — VS Code / Windsurf syntax-highlighting extension
├── build/                     — all bootstrap artefacts land here
│   ├── nurlc_py(.ll)          — stage 0: Python-compiled `nurlc.nu`
│   ├── nurlc_self(.ll)        — stage 1: self-compiled
│   ├── nurlc_self2(.ll)       — stage 2: fixed-point check
│   └── nurlc                  — final self-hosting binary
├── build.sh / build.bat       — full bootstrap + test-suite driver
├── clean.sh / clean.bat       — remove build artefacts
├── nurl.sh  / nurl.bat        — convenience wrapper to compile a `.nu` file
└── nurlc                      — symlink to build/nurlc (Linux/macOS)

Roadmap

Current language version: grammar v1.1 (spec/grammar.ebnf). Historical grammar snapshots are kept under spec/grammar_v0.1.ebnfspec/grammar_v1.0.ebnf.

Phase 8 — Grammar v0.7: type safety and error handling

Phase 9 — Grammar v0.8: closures and mutability

Phase 10 — Grammar v0.9 → v1.1: trait defaults, literal matches, fat pointers, auto-drop, modules


Building

Prerequisites

Tool Purpose
Python 3.8+ Python reference compiler (compiler/nurlc.py)
clang / LLVM 14+ Compile LLVM IR (.ll) to native binary

Windows

Install LLVM from llvm.org/releases (choose the Windows installer for the latest stable release). The installer adds clang.exe and related tools to PATH.

You can use Command Prompt, PowerShell, or Git Bash for the commands below.

Linux (Debian / Ubuntu)

sudo apt install python3 clang

Linux (Fedora / RHEL)

sudo dnf install python3 clang

macOS

brew install llvm
# Add LLVM to PATH for this shell (add to ~/.zshrc or ~/.bash_profile to persist):
export PATH="$(brew --prefix llvm)/bin:$PATH"

Step 1 — Build the C runtime (once)

# Linux / macOS
clang -c stdlib/runtime.c -o stdlib/runtime.o

# Windows (CMD / PowerShell)
clang -c stdlib\runtime.c -o stdlib\runtime.o

stdlib/runtime.o is already checked in; rebuild it only if you modify runtime.c.


Step 2 — Bootstrap the self-hosting compiler

Use the automated build scripts to bootstrap the compiler and verify stability:

# Linux / macOS
./build.sh

# Windows (CMD / PowerShell)
build.bat

The build script performs a complete bootstrap process: 1. Compiles nurlc.nu with the Python reference compiler → build/nurlc_py 2. Compiles nurlc.nu with the stage-0 binary → build/nurlc_self (stage 1) 3. Compiles nurlc.nu with stage 1 → build/nurlc_self2 (stage 2) 4. Verifies stages 1 and 2 produce byte-identical LLVM IR (bootstrap fixed point) 5. Copies stage 2 to build/nurlc and symlinks it at the repo root 6. Runs the snapshot test suite (compiler/tests/run_tests.sh on Linux/macOS, compiler/tests/run_tests.bat on Windows) and diffs against correct.txt

All build artefacts are stored under build/. The run prints BUILD SUCCESS & TESTS PASSED on success, or the full log / diff on failure.

Clean build artifacts:

# Linux / macOS
./clean.sh

# Windows (CMD / PowerShell)  
clean.bat

Manual build (if needed):

# Create build directory
mkdir -p build  # Linux/macOS
mkdir build     # Windows

# Generate LLVM IR using Python compiler  
python compiler/nurlc.py --llvm compiler/nurlc.nu > build/nurlc.ll

# Link into native binary
clang build/nurlc.ll stdlib/runtime.o -o build/nurlc      # Linux/macOS
clang build\nurlc.ll stdlib\runtime.o -o build\nurlc.exe  # Windows

Compile any .nu file

Recommended (automated):

# Linux / macOS
./nurl.sh myprogram.nu              # Creates myprogram binary
./nurl.sh myprogram.nu myoutput     # Creates myoutput binary  

# Windows
nurl.bat myprogram.nu               # Creates myprogram.exe
nurl.bat myprogram.nu myoutput      # Creates myoutput.exe  

Manual (two-step):

# Linux / macOS
./nurlc myprogram.nu > myprogram.ll          # or build/nurlc
clang myprogram.ll stdlib/runtime.o -o myprogram
./myprogram

# Windows
nurlc.exe myprogram.nu > myprogram.ll        # or build\nurlc.exe
clang myprogram.ll stdlib\runtime.o -o myprogram.exe
myprogram.exe

Python reference compiler vs self-hosting compiler

The Python reference compiler (compiler/nurlc.py) exists solely to bootstrap the self-hosting compiler. It implements the subset of grammar v1.1 that nurlc.nu itself uses — structs, functions, the :/=/@/^/?/~/(/./# core, basic traits and impls — and omits most of the features added in Groups D–F:

Anything beyond the bootstrap subset must be compiled with the self-hosted build/nurlc binary. The Python compiler is not a user-facing tool.


Known Limitations

The following are known limitations of the current compiler (nurlc.nu, grammar v1.1). They reflect deliberate scope decisions rather than bugs, and are tracked for future work.

Type system

Limitation Workaround
Single-letter type keywords (i u f b s v) cannot be used as variable names with type inference Use an explicit type annotation: : i n expr
No sized types (i8, u32, f64 …) — lexer emits i + 8 as two tokens Use base types (i, f) and cast with #
zext / trunc casts not implemented — i1 cannot be widened to i64 directly Use nurl_print_bool for boolean output; avoid mixing i1 and i64
! T E payload is stored as i64; complex T/E types (structs with payloads > 8 bytes) may not round-trip correctly through # cast Use base types and simple enums (tag-only) as T and E

Functions and calls

Limitation Workaround
Variadic functions (e.g. printf) cannot be declared via ffi_decl — LLVM IR varargs syntax (...) is not generated Use nurl_print_* builtins; declare specific non-variadic wrappers in C
No tail-call optimisation — deep recursion may stack-overflow Use explicit loops (~)
Closures capture by value (snapshot at construction); mutating an enclosing local after the closure is built does not affect the captured copy Keep mutation explicit; pass the current value as a parameter or return the new state

Enums

Limitation Workaround
Enum variants with a named-struct payload require the struct to be declared before the enum in the same file — forward references are not supported Order declarations: structs first, enums after
Pattern matching binds at most 2 payload variables per arm — variants with 3+ payloads cannot fully destructure in a single arm Access additional payload fields via separate . extraction after matching

Imports

Limitation Workaround
import_decl is a static inline-include (like #include) — the imported file is compiled into the same LLVM module Avoid importing files that define main; avoid circular imports
Import alias ($ `path` alias) is parsed but ignored — all imported names land in the global namespace Prefix imported names manually (e.g. math_sin, math_cos)
No duplicate-include guard — importing the same file twice emits duplicate definitions Import each file at most once

Grammar

Limitation Workaround
Negative integer literals cannot be written directly — -1 tokenises as MINUS INT(1) Use ~ 0 (bitwise complement) for -1; compute negatives as - 0 n
No automatic memory management — heap-allocated values (slice literals, strcat results, etc.) are not freed Call free via FFI when needed; keep values on the stack where possible
Import is inline-include only: no namespaces, no duplicate guard, alias parsed but ignored Import each file at most once; prefix names manually

LLM integration

NURL is designed so that:

  1. Generation is reliable — grammar regularity reduces hallucinations
  2. Errors are local — a bug in one expression does not propagate
  3. Context window is sufficient — a complete program fits in an LLM's context
  4. Diffing is easy — changes are small and localized
  5. Round-trips work — code → explanation → code preserves semantics

Name

NURL = Neural Unified Representation Language

Also: NURL = Non-hUman Readable Language

File extension: .nu