Skip to content
sigc GitHub

← Back to writing

Reading sigc syntax: a 5-minute tour

Skelf-Research · ·
dsllanguagetutorial

If you have spent any time writing factor research in pandas, the sigc syntax should feel familiar within about five minutes. The DSL is small on purpose: four blocks, a handful of operators, no control flow worth mentioning. The point is that a strategy reads like a description of itself.

This post walks through the language by reading three real .sig files from the repository: a momentum strategy, a mean-reversion strategy, and a combination factor. By the end of it you should be able to read any of the strategies shipped in the examples/ directory and most of the ones in strategies/.

The four blocks

Every .sig file has up to four kinds of block. They appear in this order:

data:
  <name>: load <kind> from "<source>" [options...]

params:
  <name> = <value>

signal <name>:
  <name> = <expression>
  ...
  emit <expression>

portfolio <name>:
  weights = <weight constructor>
  costs   = <cost model>           # optional
  backtest [rebal=<cadence>] [benchmark=<symbol>] from <date> to <date>

data is what to load. params is what to tune. signal is the typed computation. portfolio is what to do with the signal. You can have several signals; the portfolio block references them by name.

Example one: momentum

Here is the full file:

data:
  px:  load price from "s3://example/prices.parquet" adjust=split_div
  sec: load sector from "s3://example/sector.parquet" dtype=category

params:
  lookback = 126
  hold     = 21

signal momentum:
  ret = log(px / lag(px, lookback))
  xs  = as_xs(ret)
  xs  = zscore(xs)
  xs  = neutralize(xs, by=sec)
  emit winsor(xs, p=0.01)

portfolio longshort:
  weights = rank(momentum).long_short(top=0.2, bottom=0.2, cap=0.02)
  costs   = tc.bps(5) + slippage.model("square-root", coef=0.1)
  backtest rebal=hold benchmark=SPY from 2015-01-01 to 2024-12-31

Read top to bottom. The data block loads two parquet files from S3. px is split-and-dividend-adjusted prices; sec is sector codes, declared as a categorical dtype so the compiler can validate neutralize(by=sec) downstream. The two columns share an index, which the compiler will require for any operator that combines them.

The params block declares two constants: a 126-day momentum lookback and a 21-day hold period. Parameters are surfaced by the compiler as a knob — you can override them from the CLI (--param lookback=252) or from a Rust embedding with Strategy::with_param("lookback", 252).

The signal momentum block computes a series step by step. lag(px, lookback) returns prices from 126 days ago. px / lag(px, lookback) is the price ratio. log(...) is the log return. as_xs(...) is a tag that tells the compiler the next operator should treat its input as a cross-section per date (rather than as a per-name time series). zscore then standardises each date’s cross-section. neutralize(xs, by=sec) subtracts the sector mean so the resulting signal has no sector bias. winsor(xs, p=0.01) clips the top and bottom percentile to limit the influence of outliers. emit is the explicit “this is the output of this signal” statement.

The portfolio longshort block consumes the signal. rank(momentum).long_short(top=0.2, bottom=0.2, cap=0.02) ranks names cross-sectionally on the momentum signal, takes the top and bottom 20% as long and short books, and caps any single position at 2%. The cost model adds 5 basis points per trade plus a square-root market-impact slippage with coefficient 0.1. backtest rebal=hold benchmark=SPY from 2015-01-01 to 2024-12-31 says rebalance every 21 days, benchmark against SPY, and run the backtest over a decade of data.

If you have done factor research in pandas, you wrote this loop ten times. Here it is twelve lines.

Example two: mean reversion

data:
  px: load price from "s3://example/prices.parquet" adjust=split_div

params:
  fast = 5
  slow = 60

signal meanrev:
  r_fast = ret(px, fast)
  r_slow = ret(px, slow)
  z      = -zscore(as_xs(r_fast - r_slow))
  emit clip(z, -3, 3)

portfolio vol_target:
  weights = scale_vol(meanrev, target_ann_vol=0.1, lookback=252)
  backtest rebal=5d benchmark=SPY from 2018-01-01 to 2024-12-31

Notice three things.

First, ret(px, n) is the same operator as before, used twice with two windows. The difference between a five-day return and a sixty-day return is the residual mean-reversion signal. The unary minus on zscore(...) flips the sign so that recent under-performers are long candidates.

Second, clip(z, -3, 3) is a hard limit on the per-name signal magnitude, useful when you do not want to let one extreme name dominate the book.

Third, the portfolio uses scale_vol, not long_short. scale_vol(meanrev, target_ann_vol=0.10, lookback=252) produces weights such that, given a 252-day realised volatility estimate, the portfolio’s annualised volatility tracks 10%. This is a different family of portfolio constructor from the long-short rank construction, and the compiler treats them as different but composable building blocks.

Example three: combining factors

data:
  earnings_yield: load feature from "s3://example/earnings_yield.parquet"
  roa:            load feature from "s3://example/roa.parquet"
  market_cap:     load feature from "s3://example/market_cap.parquet"

signal value:
  emit zscore(as_xs(earnings_yield))

signal quality:
  emit zscore(as_xs(roa))

signal size:
  emit -zscore(as_xs(log(market_cap)))

signal combo:
  emit 0.5 * value + 0.3 * quality + 0.2 * size

portfolio balanced:
  weights = rank(combo).long_short(top=0.25, bottom=0.25, cap=0.015)
  backtest rebal=21d benchmark=SPY from 2016-01-01 to 2024-12-31

Three things are worth pointing at.

First, you can declare several signals in one file and combine them with explicit arithmetic. combo is a linear combination of value, quality, and size. Weights live in source. There is no hidden optimisation step you have to inspect; the relationship between factors and portfolio is literally what the file says.

Second, size flips the sign of log market cap because the small-cap factor is “long small, short large”. The minus sign is in the source. The compiler does not guess.

Third, the data block declares three “feature” sources without prices at all. The portfolio still produces weights, ranks them long-short, and is backtested against SPY — the backtest harness pulls price data via the runtime’s data loader against the configured warehouse.

Operators you will use most

The runtime ships 120+ operators. The ones you will reach for first:

  • ret(series, n) and log(...) for returns.
  • lag(series, n) to shift a series back in time.
  • rolling_mean(series, n), rolling_std(series, n), ema(series, n) for trailing stats.
  • zscore(series) and as_xs(series) for cross-sectional standardisation.
  • rank(series) to convert continuous scores into ranks.
  • neutralize(series, by=group) to residualise against a categorical or factor.
  • winsor(series, p=...) and clip(series, lo, hi) for outlier control.
  • rsi, macd, atr, vwap for technical signals.

The portfolio side ships fewer constructors because there are fewer well-formed ways to turn a signal into weights:

  • rank(signal).long_short(top, bottom, cap) for percentile long-short books.
  • scale_vol(signal, target_ann_vol, lookback) for volatility-targeted portfolios.

The cost models are composable:

  • tc.bps(n) for a flat n-basis-point per-trade cost.
  • slippage.model("square-root", coef=c) for a square-root market-impact model.

Where the language goes next

This is the surface. The full language reference and operator catalogue live at docs.skelfresearch.com/sigc/operators/. The strategy library in the GitHub repo has 23 ready-to-use examples covering momentum, mean reversion, multi-factor, statistical arbitrage, technical, and volatility families. If a factor you use is missing, the operator can be added in sig_runtime as a normal Rust function, with a shape contract, and the compiler will pick it up.

Five minutes is enough to read sigc. The cost of writing the next strategy in it is roughly twelve lines.