Optimizing Insider Trading Backtests: A Practical Guide, Sigma Journal

InsidersTradesSigma

Why this split exists

The shortest explanation is that we want our /performance page to describe what happened after the rules were fixed, not what happened while we were still fiddling with them.

That distinction sounds obvious. It is not how much of finance research is actually done.

A typical failure mode runs like this. A researcher starts with a sensible hypothesis about insider purchases, clusters, executive seniority, transaction size, or filing timeliness. They test many variants, adjust thresholds, remove awkward periods, add filters for liquidity, then discover that one specification looks excellent. The chart goes up. The write-up becomes confident. The strategy, when traded live, develops the personality of wet cardboard.

The problem is not that optimisation is forbidden. The problem is that optimisation must be paid for with honest evaluation. If you use historical data to choose the model, that same history can no longer serve as a clean test of whether the model generalises.

Our answer is a walk-forward protocol with an explicit temporal split:

Training period: 2022 through 2024
Test period: 2025 through 2026

Everything that can be learned, tuned, selected, ranked, clipped, winsorised, filtered or argued about belongs on the left side of that line. Everything reported as out-of-sample performance belongs on the right.

Why 2022 to 2024 for training

Three years is not a magical number. It is a compromise.

A training window must be long enough to contain different market regimes, enough corporate events, enough variation in insider activity, and enough examples for cross-sectional ranking rules to stabilise. It must also be recent enough that the market structure, filing behaviour, liquidity profile and disclosure plumbing still resemble the present.

For insider transactions, recency matters. Reporting systems change. Issuer populations change. Small-cap liquidity changes. The prevalence of buybacks, secondary offerings, SPAC remnants, founder-led governance structures and post-pandemic capital-market behaviour all affect how raw filings should be interpreted.

A 2022 to 2024 window captures a useful mix: a rising-rate environment, uneven equity leadership, and a market that had already absorbed the most acute distortions of the pandemic period. That does not make it ideal. It makes it defensible.

Why 2025 to 2026 for testing

A test period should be long enough to include disappointment.

That is not a joke, or not only a joke. A six-week paper test can flatter almost anything, especially in event-driven signals where return paths are lumpy and concentrated. A two-year test window is still modest, but it is long enough to expose whether a signal relies on one quarter of favourable tape or one narrow pocket of the market.

By fixing 2025 to 2026 as the out-of-sample window, we can ask a simple question: after the rules were set using 2022 to 2024 data, did the signal continue to behave in the direction and rough magnitude one would expect?

If yes, confidence should rise, cautiously. If no, that failure is still valuable, because it tells us the edge may have been regime-specific, overfit, implementation-sensitive, or simply imaginary, which is a crowded category in quantitative finance.

What walk-forward validation means in practice

Walk-forward validation is often described with more ceremony than content. The practical version is straightforward: use past data to set the rules, then apply those rules to later data exactly as they stand.

That sentence hides a lot of operational detail. In insider-transaction research, detail is where most of the cheating happens, usually by accident.

The unit of prediction must respect filing time

Insider signals are not observed when the trade happens. They are observed when the market can reasonably know about the filing.

This matters because many regulatory regimes allow a delay between transaction date and disclosure date. In the United States, Section 16 insiders generally report on Form 4 within two business days after the transaction, subject to the SEC framework for beneficial ownership reporting. In the European Union, under the Market Abuse Regulation, persons discharging managerial responsibilities and closely associated persons must notify transactions promptly and no later than three business days, with the issuer and competent authority then handling publication requirements under the local implementation of MAR Article 19. Those deadlines are not trivia. They define when a strategy can legally and operationally react.

A proper walk-forward test timestamps each event at the earliest moment the filing is available to the market in a machine-usable form, not at the transaction date if that date would not yet have been observable.

The universe must be defined ex ante

Another easy way to flatter a backtest is to use today’s surviving securities as if they had always been the investable universe.

That introduces survivorship bias. Delisted names disappear from the sample, often for unpleasant reasons. Yet unpleasant reasons are precisely what a realistic test must include. If a small-cap issuer received bullish insider buying in 2025 and then vanished in 2026 after a financing spiral, the test does not get to pretend the ticker never existed.

The same applies to venue eligibility, minimum liquidity, free-float screens and market-cap cutoffs. These rules should be determined using information available at the time, then applied consistently through the test window.

Corporate actions are not optional housekeeping

Stock splits, reverse splits, ticker changes, mergers, spin-offs and rights issues can all distort both transaction values and subsequent returns if the data pipeline is sloppy.

Insider datasets are particularly exposed because filings often include share counts, prices and ownership percentages that need reconciliation against adjusted market data. A reverse split can make a transaction look absurdly large or absurdly cheap if one side of the pipeline is adjusted and the other is not. The result is not alpha. It is arithmetic negligence.

Parameter selection belongs in training, not in the epilogue

Suppose you test holding periods of 5, 10, 20 and 60 trading days, and 20 works best in 2025. You do not then get to announce that the strategy always used 20 days. You learned that from the test set. The same goes for signal buckets, z-score caps, role weights, sector neutralisation, and whether to exclude option-related transactions.

This is the central discipline of walk-forward work. The test set is not there to help write the model. It is there to judge the model after the writing is done.

Why insider-transaction strategies are unusually easy to overfit

Insider filings look clean on paper. A person with privileged information buys or sells shares, files a form, and the market reacts. Real data are less literary.

Not every insider trade means the same thing

A chief executive buying in the open market with cash is not the same as an automatic tax sale by a non-executive director. A founder increasing control through a related-party structure is not the same as a broad-based option exercise. A disposal linked to divorce, estate planning or margin-call mechanics is not a pure expression of information.

Researchers therefore create taxonomies. They classify transaction types, insider roles, ownership links and filing notes. That is necessary, but every classification choice creates a degree of freedom. Enough degrees of freedom, and a strategy can be tuned until it explains the past with suspicious eloquence.

The cure is not to avoid nuance. The cure is to decide the nuance on the training sample, document it, and then accept the consequences in the test sample.

Sparse but noisy events encourage story-telling

Compared with daily price data, insider events are sparse. That sparsity tempts researchers to inspect individual cases and infer patterns from memorable anecdotes. A cluster of purchases before a takeover. A CFO buying after a profit warning. A chairman selling before a capital raise. These stories are interesting, sometimes genuinely informative, and often statistically treacherous.

Small samples invite selective memory. Walk-forward validation imposes a useful boredom. It asks whether the rule works repeatedly, not whether one case made everyone in the room nod.

Filing delays create hidden look-ahead bias

This deserves repetition because it is one of the most common errors in event studies. If returns are measured from the transaction date rather than the filing-availability date, the strategy may be credited with gains that were already in the price before the market could have known about the filing.

That is not a subtle distinction. It can entirely reverse a result.

Market	Regulator	Rule	Deadline	Notes
US	SEC	Section 16 Form 4	T+2 business days	Applies to most reportable insider transactions by officers, directors and 10% holders.
FR	AMF	MAR Art 19	T+3 business days	PDMRs and closely associated persons must notify transactions; publication follows issuer and local process.
EU	ESMA and national competent authorities	MAR Art 19	T+3 business days	Framework is harmonised at EU level, implementation and dissemination mechanics vary by market.

Illustrative filing-deadline comparison for event timestamping. Operational availability can differ from legal deadline.

Regime changes can make a good idea look immortal

A signal that works in one market regime can fail cleanly in another. Insider buying may be more informative when external financing is expensive, when small caps are neglected, or when management teams have unusually strong visibility into earnings inflections. It may be less informative when macro factors dominate everything or when disclosure is rapidly arbitraged by event-driven funds.

A training window that spans several conditions helps, but no finite sample can immunise a strategy against regime dependence. The best we can do is avoid pretending that one period’s success proves universal truth.

Our publication protocol for `/performance`

The point of publishing a protocol is to remove improvisation after the fact. If the rules are public, there is less room to tidy the furniture after the guests have arrived.

What we freeze before evaluating 2025 to 2026

For any insider-based signal family that appears on /performance, the following categories should be fixed using only 2022 to 2024 information:

Event definition
Which filings count, which transaction codes are included or excluded, how amendments are handled, and how duplicate or corrected records are reconciled.
Timestamp policy
Whether an event enters the strategy at filing date, filing time, next open, next close, or another explicitly documented implementation point.
Role taxonomy
How insiders are grouped, for example CEO, CFO, chair, director, founder, beneficial owner, related party, and whether those groups receive differential weights.
Signal transformation
Raw notional value, value relative to market cap, ownership change, cluster counts, recency decay, and any winsorisation or normalisation.
Portfolio construction
Long-only or market-neutral, equal weight or risk weight, sector constraints, position caps, liquidity filters, and overlap rules for repeated filings.
Cost model
Slippage, commissions, borrow assumptions if shorting is involved, and any delay between signal observation and execution.
Evaluation metrics
Return, volatility, hit rate, drawdown, turnover, capacity proxies, and benchmark choice.

A protocol that leaves these items vague is not a protocol. It is a mood board.

What we do not do

We do not use the 2025 to 2026 period to choose the best holding period, decide whether to include one awkward transaction code, or discover that a signal is “really” strongest in micro-caps after all. We also do not exclude ugly episodes unless the exclusion rule was itself pre-specified in training and justified on operational grounds.

That sounds strict because it is strict. It is also the minimum needed for out-of-sample claims to mean anything.

Why we still monitor live behaviour during the test window

Freezing the model does not mean ignoring reality. During 2025 to 2026 we still monitor data quality, late filings, symbol mapping errors, and implementation drift. If a regulator changes dissemination format, or an exchange changes market-data identifiers, the pipeline may require maintenance.

But maintenance is not model revision. Fixing a broken parser is allowed. Reweighting CFO purchases because they had a nice quarter is not.

Evidence from the literature, and its limits

Insider trading has a long academic literature, and much of it supports the broad proposition that some insider purchases contain information about future returns. The problem is that broad propositions are not tradable by themselves.

What the literature generally finds

Classic and subsequent studies have often found that insider purchases, especially open-market buys, are more informative than sales, and that firm characteristics such as size, valuation and information opacity can affect the strength of the signal. Researchers have also examined the importance of insider rank, trade clustering and transaction size.

These findings are useful because they narrow the search space. They do not eliminate the need for clean protocol design. Academic significance can evaporate after costs, delays, implementation constraints and publication bias are accounted for.

Why published anomalies need harsher testing

An anomaly reported in a paper has already survived one selection process, namely the process by which it became interesting enough to publish. That creates a subtle optimism bias. The strategy concept may be real, but the reported effect size is often inflated by specification search, sample luck, and the tendency of journals to prefer positive results over null findings.

A walk-forward framework is one way to apply harsher discipline. If a signal family still behaves sensibly after a pre-declared split, confidence should rise. If not, the literature may still be directionally right while the implementable edge is too weak, too crowded or too unstable.

Why we cite regulators as well as papers

For insider-event strategies, legal and operational details are not side notes. They are the difference between a valid event study and a fantasy novel with timestamps.

That is why our source stack leans on regulators and standard setters, not just academic papers. SEC filing rules, ESMA guidance, and national competent authority materials matter because they define what the market can know, and when.

What a good reader should ask of any backtest

Readers should be mildly suspicious by default. Not cynical, just professionally difficult.

Ask when the signal became observable

If the article cannot tell you whether the event clock starts at transaction date, filing date, filing timestamp or next tradable session, the result is not ready for adult supervision.

Ask what was tuned, and when

A serious piece of research should state which parameters were chosen in training and whether any choices were revisited after seeing test results. If the answer is vague, assume the model had more editorial assistance from hindsight than advertised.

Ask how ugly data were handled

Insider datasets contain amendments, duplicates, stale identifiers, corporate actions and occasional filing oddities that resist neat classification. A robust protocol explains how these were treated and whether the treatment was deterministic.

Ask whether costs and liquidity were considered

An event-driven signal in small or illiquid names can look marvellous before costs and impossible after them. Turnover, spread, market impact and capacity are not administrative details. They are the business model.

Ask whether the failure cases are shown

A strategy report that only presents aggregate success is marketing. Research should show drawdowns, weak subperiods, and the kinds of names or regimes where the signal struggled. If the method only works in flattering weather, that is useful to know before the storm.

What this means for our readers now

For Sigma Journal’s /performance work, the 2022 to 2024 training and 2025 to 2026 test split is less about page architecture than intellectual hygiene. It is our attempt to make every published result answer a fair question.

Did the signal, specified using the earlier period, continue to function in the later one?

That question is narrower than the grand claims often made for insider signals. It is also more useful. A narrow honest answer beats a sweeping contaminated one.

There is a final point worth making. Clean out-of-sample testing does not make a strategy good. It makes the evidence interpretable. If the result is weak, we learn something real. If the result is strong, we have earned the right to be only moderately pleased with ourselves, which is the correct amount in this business.

The next concrete step is straightforward: publish each signal family’s frozen specification alongside its 2025 to 2026 realised test behaviour, then let readers inspect where the protocol held firm and where market structure still raises open questions, especially around filing availability, costs and cross-market comparability.

Effective Strategies for Insider Trading Backtests

Why this split exists

Why 2022 to 2024 for training

Why 2025 to 2026 for testing

What walk-forward validation means in practice

The unit of prediction must respect filing time

The universe must be defined ex ante

Corporate actions are not optional housekeeping

Parameter selection belongs in training, not in the epilogue

Why insider-transaction strategies are unusually easy to overfit

Not every insider trade means the same thing

Sparse but noisy events encourage story-telling

Filing delays create hidden look-ahead bias

Regime changes can make a good idea look immortal

Our publication protocol for `/performance`

What we freeze before evaluating 2025 to 2026

What we do not do

Why we still monitor live behaviour during the test window

Evidence from the literature, and its limits

What the literature generally finds

Why published anomalies need harsher testing

Why we cite regulators as well as papers

What a good reader should ask of any backtest

Ask when the signal became observable

Ask what was tuned, and when

Ask how ugly data were handled

Ask whether costs and liquidity were considered

Ask whether the failure cases are shown

What this means for our readers now

Sources & further reading

Sigma backtest cohort

Related reading

Effective Strategies for Insider Trading Backtests

Why this split exists

Why 2022 to 2024 for training

Why 2025 to 2026 for testing

What walk-forward validation means in practice

The unit of prediction must respect filing time

The universe must be defined ex ante

Corporate actions are not optional housekeeping

Parameter selection belongs in training, not in the epilogue

Why insider-transaction strategies are unusually easy to overfit

Not every insider trade means the same thing

Sparse but noisy events encourage story-telling

Filing delays create hidden look-ahead bias

Regime changes can make a good idea look immortal

Our publication protocol for /performance

What we freeze before evaluating 2025 to 2026

What we do not do

Why we still monitor live behaviour during the test window

Evidence from the literature, and its limits

What the literature generally finds

Why published anomalies need harsher testing

Why we cite regulators as well as papers

What a good reader should ask of any backtest

Ask when the signal became observable

Ask what was tuned, and when

Ask how ugly data were handled

Ask whether costs and liquidity were considered

Ask whether the failure cases are shown

What this means for our readers now

Sources & further reading

Sigma backtest cohort

Our publication protocol for `/performance`