A large insider-filings dataset invites parameterisation. That is not a vice. The vice is selective memory about what was tried.
Step 1. Define the economic hypotheses before the fine grid
We start with higher-level hypotheses, for example:
- clustered open-market buys by senior executives may signal undervaluation,
- sales may be less informative because of diversification and tax motives,
- larger trades relative to prior holdings may carry more information,
- shorter post-filing windows may capture faster information diffusion,
- and microcaps may show stronger raw effects but worse implementability.
These are economic ideas. The parameter grid comes after, not before.
Step 2. Enumerate the full model family
Once the hypotheses are set, we generate the strategy family systematically. This includes all combinations of:
- event filters,
- role classifications,
- size thresholds,
- portfolio construction rules,
- entry timing,
- exit timing,
- rebalancing cadence,
- and cost assumptions.
The point is reproducibility. If a specification was available to be chosen, it belongs in the count.
Step 3. Partition the data honestly
At minimum, we separate:
- a discovery sample,
- a validation sample,
- and ideally a final holdout or forward period.
For time-series and event-driven data, this partition should respect chronology. Randomly shuffling observations can produce tidy cross-validation metrics and untidy real-world disappointment.
Step 4. Rank candidates using pre-declared criteria
We choose ranking criteria before looking at the leaderboard. Typical examples are:
- net Sharpe,
- t-statistic of alpha,
- drawdown-adjusted return,
- turnover-adjusted information ratio.
The criterion matters because changing it after seeing the results creates another layer of hidden search. One can data-mine the objective function as efficiently as the parameters.
Step 5. Apply search-aware inference
At this stage, the best candidate is not yet a finding. It is a nominee. We then apply search-aware evaluation, using one or more of the methods described above.
Step 6. Publish the range, not just the champion
A credible article shows:
- where the chosen specification sits in the distribution,
- whether nearby specifications behave similarly,
- how performance changes across subperiods,
- and what happens after costs and delays.
If the winner is isolated and fragile, readers should see that plainly.
What readers should demand from any article built on large-scale search
The burden is not only on researchers. Readers, allocators, and editors should ask better questions.
Show me the denominator
If someone presents a strategy with a handsome in-sample Sharpe, ask: out of how many trials was this selected? If the answer is vague, the result is too.
Show me the holdout
If there is no untouched sample, there is no clean evidence. There is only iterative persuasion.
Show me the neighbourhood
Does the strategy work across nearby parameter values, or only at one oddly specific setting? Robustness is not glamorous, but it is where genuine signal tends to live.
Show me costs and operational constraints
Insider-based strategies can involve publication lags, liquidity issues, and concentration in small names. Gross returns are often the easy part. Net returns are where fiction goes to be audited.
Show me the failed siblings
A lone successful specification tells you less than the family portrait. If hundreds of close relatives failed, the survivor may be lucky rather than superior.
Why we are explicit about 583,000 trials
There is a temptation to think that disclosing a huge search count somehow weakens the article. It does the opposite. It tells the reader that the research process has not been airbrushed.
A broad search can be entirely legitimate when the underlying phenomenon is complex and the design space is large. Insider-transaction signals are exactly that sort of problem. But legitimacy requires accounting. If 583,000 combinations were explored, then 583,000 combinations belong in the interpretive frame.
The alternative is familiar. A polished chart appears. The article speaks confidently about "the strategy". The search process remains backstage. The reported significance assumes a world in which the chosen specification was ordained from the start. It was not. It won a tournament.
That does not make the strategy false. It makes unadjusted certainty false.
A note on humility, which is cheaper than drawdowns
There is a cultural problem here as much as a statistical one. Quant research often rewards the reveal, not the restraint. The incentive is to present the best result with just enough caveat to satisfy the legal department and not enough to disturb the sales team.
We prefer a duller standard. If a signal survives broad search, out-of-sample testing, and search-aware adjustment, it is more interesting, not less. If it does not, the research was still useful. It mapped where the edge is not.
That is not failure. It is inventory control for bad ideas.
The concrete next step is straightforward: for every future insider-signal article built on broad parameter search, publish the trial count, the selection protocol, and at least one search-aware statistic alongside the headline backtest. The open question is the harder one, and the more valuable one: what is the effective number of independent tests in a highly correlated insider-strategy grid, and how stable is that estimate across market regimes?