Show HN: (bits) of a Libc, Optimized for Wasm

I make a no-CGO Go SQLite driver, by compiling the amalgamation to Wasm, then loading the result with wazero (a CGO-free Wasm runtime).To compile SQLite, I use wasi-sdk, which uses wasi-libc, which is based on musl. It's been said that musl is slow(er than glibc), which is true, to a point.musl uses SWAR on a size_t to implement various functions in string.h. This is fine, except size_t is just 32-bit on Wasm.I found that implementing a few of those functions with Wasm SIMD128 can make them go around 4x faster.Other functions don't even use SWAR; redoing those can make them 16x faster.Smooth sort also has trouble pulling its own weight; a Shell sort seems both simpler and faster, while similarly avoiding recursion, allocations and the addressable stack.I found that using SIMD intrinsics (rather than SWAR) makes it easier to avoid UB, but the code would definitely benefit from more eyeballs.See this for some benchmarks on both x86-64 and Aarch64: https://github.com/ncruces/go-sqlite3/actions/runs/145169318... Comments URL: https://news.ycombinator.com/item?id=43730458 Points: 10 # Comments: 3

Avr 18, 2025 - 19:51
 0
Show HN: (bits) of a Libc, Optimized for Wasm

I make a no-CGO Go SQLite driver, by compiling the amalgamation to Wasm, then loading the result with wazero (a CGO-free Wasm runtime).

To compile SQLite, I use wasi-sdk, which uses wasi-libc, which is based on musl. It's been said that musl is slow(er than glibc), which is true, to a point.

musl uses SWAR on a size_t to implement various functions in string.h. This is fine, except size_t is just 32-bit on Wasm.

I found that implementing a few of those functions with Wasm SIMD128 can make them go around 4x faster.

Other functions don't even use SWAR; redoing those can make them 16x faster.

Smooth sort also has trouble pulling its own weight; a Shell sort seems both simpler and faster, while similarly avoiding recursion, allocations and the addressable stack.

I found that using SIMD intrinsics (rather than SWAR) makes it easier to avoid UB, but the code would definitely benefit from more eyeballs.

See this for some benchmarks on both x86-64 and Aarch64: https://github.com/ncruces/go-sqlite3/actions/runs/145169318...


Comments URL: https://news.ycombinator.com/item?id=43730458

Points: 10

# Comments: 3