What a robust test would look like
Let us be concrete. If Sigma Journal were publishing this from a fully populated query, the minimum viable empirical design would separate descriptive and adjusted results.
Descriptive layer
Start with open-market purchases and sales separately. Report counts by gender, market, and role. Then show T+90 raw and benchmark-adjusted returns, equal-weighted and value-weighted, plus hit rates.
This layer answers the simple newsroom question: what happened after these trades? It is not enough for inference, but it grounds the discussion.
Adjusted layer
Then estimate a model along these lines:
[
Return_{i,T+90}^{adj} = \alpha + \beta_1 Female_i + \beta_2 Buy_i + \beta_3 Female_i \times Buy_i + Controls + FE + \varepsilon_i
]
Where controls include role, log market cap, sector, transaction size, market, year, and perhaps a routine-trade proxy. Fixed effects can include market and time. Standard errors should be clustered by issuer, and possibly by issuer-date if multiple insiders trade around the same event.
The coefficient of interest is not simply Female. It is often the interaction on purchases, because the informational content of buys is where one expects a cleaner signal. If women "outperform" only in pooled trades and not in buys, the result deserves suspicion.
Event clustering and duplicate information
Insider trades cluster around the same corporate information set. If three executives buy within days of each other, those are not independent observations in the way a textbook t-test would prefer. Nor are repeated trades by the same person in the same issuer over a short window.
A robust implementation should either aggregate clustered trades into a single event or account for clustering statistically. Otherwise, significance can be overstated. This is one of the oldest ways to manufacture a publishable p-value in market microstructure. The method is cheap. The confidence is expensive.
Routine versus opportunistic
Jenter's routine-opportunistic distinction is especially relevant here. If men trade more routinely and women trade more selectively, then the average female trade may look better even if there is no intrinsic gender effect after conditioning on trade type.
That is not a problem. It is, arguably, the result. But one should name it correctly. The edge would then be trade selection behaviour associated with gender composition, not some essential property of women as insiders. Markets are complicated enough without importing astrology.
Why cross-market results can differ even if the underlying behaviour does not
Suppose the eventual analysis finds women insiders outperform men in one market, not another. That would be entirely plausible without requiring any dramatic story.
Disclosure timing changes investability
Under MAR, PDMRs and persons closely associated must notify transactions within a specified period once the threshold condition is met, and issuers then disclose. In the US, Form 4 timing is different and often easier to parse. A strategy measured from transaction date may look stronger in one regime simply because public visibility arrives sooner relative to the trade.
If your T+90 starts from disclosure date, this issue is mitigated. If it starts from transaction date, the cross-market comparison becomes partly a reporting-lag comparison.
Corporate governance composition differs
Board structures and executive pipelines differ across markets. So does the representation of women in executive versus non-executive roles. If one market has a higher share of women in independent directorships and another has more women in operational executive roles, the informational content of their trades may not be directly comparable.
Sector mix matters more than people admit
A market heavy in financials and industrials is not the same as one heavy in healthcare or technology. Insider trades in R&D-heavy sectors can behave differently from those in regulated or asset-heavy sectors. If the gender composition varies by sector, and sector returns vary over the sample, one can mistake a sector effect for a gender effect with surprising ease.