Code Size
How much code does it take to express the same idea?
We count three things: lines of code (LOC), tokens (whitespace-separated words), and compression ratio (gzip size / raw size — lower means more repetitive/boilerplate-y code).
Results
| Language ↕ | Avg Lines ↑ | Avg Tokens ↕ | Compression Ratio ↕ |
|---|---|---|---|
| Clojure | ★8▼ | ★36▼ | ★0.7▼ |
| Erlang | ★12▼ | ★48▼ | ★0.6▼ |
| Objc | ★14▼ | ★59▼ | ★0.5▼ |
| Ruby | ★14.3▼ | ★41.6▼ | ★0.7▼ |
| Python | ★14.6▼ | ★43.6▼ | ★0.6▼ |
| Javascript | ★15.7▼ | ★54.3▼ | ★0.6▼ |
| Typescript | ★15.9▼ | ★58.3▼ | ★0.6▼ |
| Csharp | ★16▼ | ★54▼ | ★0.6▼ |
| Elixir | ★17.4▼ | ★50.1▼ | ★0.6▼ |
| Kotlin | ★19.7▼ | ★57▼ | ★0.6▼ |
| Haskell | ★21.9▼ | ★107.4▼ | ★0.5▼ |
| Swift | ★22.3▼ | ★76.7▼ | ★0.5▼ |
| Rust | ★22.6▼ | ★67.3▼ | ★0.5▼ |
| Go | ★26.6▼ | ★77.1▼ | ★0.6▼ |
| Java | ★26.6▼ | ★81.4▼ | ★0.5▼ |
| Milo | ★26.9▼ | ★101.3▼ | ★0.5▼ |
| Cpp | ★27.9▼ | ★88.4▼ | ★0.5▼ |
| C | ★39.7▼ | ★146▼ | ★0.4▼ |
| Zig | ★39.9▼ | ★158.7▼ | ★0.4▼ |
What drives the differences?
Ruby/Python win because:
- No boilerplate (no main function, no imports for builtins, no type annotations required)
- Rich standard library (
Counter,tally,ThreadPoolExecutor— one-liners for complex operations) - Minimal syntax (whitespace blocks, implicit returns)
C/Zig lose because:
- No built-in collections (no HashMap, no dynamic array without manual allocation)
- Manual memory management
- No string operations (character-by-character parsing)
- Manual threading/concurrency infrastructure
The interesting middle: Kotlin (18.5) matches JavaScript despite being statically typed. Extension functions and expression-bodied syntax eliminate the ceremony you'd expect from a JVM language.
The gap widens with verbosity
On simple algorithmic problems, languages cluster within 2× of each other. On real-world problems with I/O, JSON, HTTP, and concurrency, the gap stretches to 3-5×. Real programs exercise the standard library, error model, and concurrency primitives — that's where languages diverge most.
How we count
LOC: Non-blank lines. Includes imports and main function boilerplate.
Tokens: Whitespace-separated words. let x: i32 = 5; = 5 tokens.
Compression Ratio: gzip(source) / len(source). Lower means more repetitive code (boilerplate). Higher means more information-dense code.