Compiling an alpha idea: the stages between notebook and prod

Skelf-Research · November 12, 2025 ·

compilerworkflowproduction

A trading signal does not graduate from notebook to production in one step. It graduates in stages, and the stages are the same on every desk we have seen, in the same order, with the same questions asked at each one. The thing that varies is how much custom infrastructure each desk has to build to move between them.

sigc collapses that infrastructure into four compiler stages and one binary. This post walks through what each stage actually does, what it catches, and what it hands off to the next. It is meant to be a literal description of what runs when you type sigc run momentum.sig, then sigc daemon, then point a cron job at it.

Stage 0: the notebook

This is not a sigc stage; it is the input. A researcher has an idea — say, twelve-month minus one-month momentum, z-scored cross-sectionally, neutralised by sector, long the top quintile and short the bottom quintile, rebalanced monthly. They sketch it in Jupyter against the closest sample of prices they have. They get a Sharpe number. The number is plausible.

The notebook is where the idea lives. It is the wrong place for the implementation to live, because notebooks have no compiler, no cache, no version of the operator that anyone else will reach for, and no clean way to swap the data source out. The whole point of stages 1 through 4 is to extract the implementation from the notebook without losing the idea.

In sigc, the artefact that comes out of the notebook is a small .sig file. For the signal above, it looks roughly like this:

data:
  px:  load price from "s3://example/prices.parquet" adjust=split_div
  sec: load sector from "s3://example/sector.parquet" dtype=category

params:
  lookback = 252
  skip     = 21

signal momentum:
  total_ret = ret(px, lookback)
  skip_ret  = ret(px, skip)
  mom       = total_ret - skip_ret
  xs        = zscore(as_xs(mom))
  emit neutralize(xs, by=sec)

portfolio longshort:
  weights = rank(momentum).long_short(top=0.2, bottom=0.2, cap=0.02)
  costs   = tc.bps(5) + slippage.model("square-root", coef=0.1)
  backtest rebal=21d benchmark=SPY from 2015-01-01 to 2024-12-31

This is what the researcher hands off. It is the same artefact through the next four stages.

Stage 1: parse

sig_compiler reads the file and emits an AST. This is the boring stage. It exists mostly so that the next three stages have a structured thing to chew on.

What it catches: syntax errors, mis-named blocks, malformed expressions, duplicate signal names. The errors are small, mechanical, fast to fix. They are also the kind of error that, in a notebook, would have been a stray : in the wrong place; here they raise before any data is touched.

Stage 2: type-check

This is where the real value of having a compiler shows up. sig_compiler traverses the AST and assigns types and shapes to every named expression. px is a Series<price> indexed by (date, symbol). sec is a Series<category> over the same index. ret(px, 252) returns a Series<float> over the same index, shifted by 252 trading days. as_xs(...) declares that the next operator should treat the input as cross-sectional. zscore, neutralize, and rank.long_short each carry typing rules about what input shape they accept and what output shape they produce.

A signal that references a column that does not exist fails here. A neutralize(xs, by=sec) where sec and xs have different indices fails here. A portfolio that emits weights against a benchmark with a different calendar fails here. These are the errors that, in a notebook, surface as a NaN row at 3am.

A failing type-check is fast to fix because the message names the operator, the expected shape, and the observed shape. A passing type-check is a precondition for everything downstream: the IR can be hashed, the cache can be keyed, the runtime can plan.

Stage 3: execute

The IR goes to sig_runtime. The runtime walks the operator graph in topological order, executes each node on Polars/Arrow columns, and writes intermediate results back into the cache. The execution model is lazy: a result is computed once, hashed, and reused for the rest of the run and for any future run with the same IR and inputs.

Parallelism happens along two axes. Time-series operators that are independent across symbols (like ret(px, n) per name) are dispatched across the Rayon thread pool. Cross-sectional operators (like zscore over a date) execute as a single vectorised pass per date but use SIMD kernels (AVX2 or AVX-512 where the CPU supports it) for the elementwise step.

The cache layer (sig_cache) is content-addressed with blake3 over sled. A cache key is the hash of the IR node plus the hashes of its inputs. A cache hit is a few microseconds. A cache miss runs the operator and writes the result.

The backtest harness consumes the emitted weights, applies the rebalance cadence, applies the transaction-cost model, and computes the standard summary statistics: Total Return, Sharpe Ratio, Max Drawdown, Turnover. Those are what sigc run prints to the terminal.

The first run of a new strategy might take 45ms on five years of daily data for 500 securities. The second run, with the same .sig file and the same inputs, hits the cache and returns in microseconds. A run with one changed parameter recomputes only the operators downstream of that parameter, because everything else hashes the same.

Stage 4: serve

The same binary that ran the backtest can run as a daemon. sigc daemon starts a long-running process that owns the cache and listens on nng REQ/REP at tcp://127.0.0.1:7240. Clients send sigc request compile foo.sig, sigc request run foo.sig, or sigc request status and get answers without re-opening the cache.

The daemon is configured by a sigc.yaml. The file declares the production posture: circuit breakers (max drawdown 15%, max position 10%, kill switch), per-minute order rate limits, Slack webhooks, Prometheus /metrics, structured JSON audit logging, and a schedule block with cron-style jobs that name a strategy and a cadence. Slack alerts fire on anomalies that match a configured rule. Kubernetes can keep the daemon healthy and restart it on the same cache directory because the cache is persistent on disk.

The signal that ran in the notebook now runs in production, against the same binary, with the same operator implementations, against the same cache hashing scheme. There is no second codebase to translate to. There is no Python-to-Java port. The thing that was tested is the thing that runs.

What you give up

Compilation is opinionated. You are writing in a DSL, not in Python. Operators are added by writing Rust, not by importing a pandas function. Data sources have to declare a type, not get inferred at runtime. The price of compile-time safety is that you cannot just paste any line of Python in.

That is the trade, and we think it is correct, because the thing the notebook is good at is sketching an idea, and the thing it is bad at is operating a strategy with money on the line. Two different artefacts solve two different problems. The .sig file is the second one.

Why staging matters

If you skip the compile stage, you push the type-check into the runtime, and the runtime catches the error at minute 47 of a long backtest instead of at second zero. If you skip the cache, you recompute the same z-score every time anyone iterates, and your team’s iteration loop slows down by ten times. If you skip the daemon, you re-open the cache every run, and on a busy desk that becomes the bottleneck. If you skip the production posture and bolt risk on later, every alert you write is a one-off.

Each stage exists because skipping it costs more than including it. sigc lays them out so they are the default path, not a heroic engineering project.

If you are about to move a strategy from a notebook to a real PnL, write the .sig file first. Let the compiler tell you what was missing.