Skip to content

Code Size

How much code does it take to express the same idea?

We count three things: lines of code (LOC), tokens (whitespace-separated words), and compression ratio (gzip size / raw size — lower means more repetitive/boilerplate-y code).

Results

Language Avg Lines Avg Tokens Compression Ratio
Clojure8360.7
Erlang12480.6
Objc14590.5
Ruby14.341.60.7
Python14.643.60.6
Javascript15.754.30.6
Typescript15.958.30.6
Csharp16540.6
Elixir17.450.10.6
Kotlin19.7570.6
Haskell21.9107.40.5
Swift22.376.70.5
Rust22.667.30.5
Go26.677.10.6
Java26.681.40.5
Milo26.9101.30.5
Cpp27.988.40.5
C39.71460.4
Zig39.9158.70.4

What drives the differences?

Ruby/Python win because:

  • No boilerplate (no main function, no imports for builtins, no type annotations required)
  • Rich standard library (Counter, tally, ThreadPoolExecutor — one-liners for complex operations)
  • Minimal syntax (whitespace blocks, implicit returns)

C/Zig lose because:

  • No built-in collections (no HashMap, no dynamic array without manual allocation)
  • Manual memory management
  • No string operations (character-by-character parsing)
  • Manual threading/concurrency infrastructure

The interesting middle: Kotlin (18.5) matches JavaScript despite being statically typed. Extension functions and expression-bodied syntax eliminate the ceremony you'd expect from a JVM language.

The gap widens with verbosity

On simple algorithmic problems, languages cluster within 2× of each other. On real-world problems with I/O, JSON, HTTP, and concurrency, the gap stretches to 3-5×. Real programs exercise the standard library, error model, and concurrency primitives — that's where languages diverge most.

How we count

LOC: Non-blank lines. Includes imports and main function boilerplate.

Tokens: Whitespace-separated words. let x: i32 = 5; = 5 tokens.

Compression Ratio: gzip(source) / len(source). Lower means more repetitive code (boilerplate). Higher means more information-dense code.