Do Women Insiders Outperform Men? A Data-Driven Analysis, Sigma Journal

InsidersTradesSigma

The absence of article-specific output is not fatal. It is, in fact, a useful discipline. It forces the article to separate what is known from what is merely plausible. That is rarer in finance publishing than it should be.

Why T+90 is the right horizon to argue about

A same-day or T+5 event study captures immediate information content, but it also captures filing mechanics, market microstructure, and the fact that some disclosures arrive when the market is already halfway to the answer. T+90 is different. It asks whether insiders are not only informed, but directionally right over a quarter.

That horizon has advantages. It smooths some noise. It aligns with the idea that insiders may trade ahead of business developments that take time to show up in earnings, guidance, contract wins, or strategic announcements. It is also long enough to observe whether a purchase was merely brave or actually prescient.

It has disadvantages too. Ninety trading days is enough time for many unrelated things to happen. Macro shocks, sector rotations, index rebalancing, M&A rumours, and earnings cycles all crowd into the window. The longer the horizon, the less one can attribute returns cleanly to the insider's private information. T+90 is a good test of practical investability, but a less pure test of information revelation.

Buys and sells should not be thrown into one bucket

This point is old, but still neglected. Insider purchases are generally more informative than sales. A sale can reflect diversification, tax obligations, estate planning, divorce, school fees, or the sort of domestic budget item that never makes it into a factor model. A purchase is harder to explain away. It requires fresh capital and usually a stronger view.

Any gender comparison that pools buys and sells risks measuring composition rather than skill. If women insiders are underrepresented in large founder-led liquidity programmes, or if men dominate option-exercise-related sales, a pooled result can become nonsense with decimal places.

T+90 should be benchmark-aware

Raw returns are a start. Excess returns are the point. A woman insider buying a small-cap industrial before a broad small-cap rally is not necessarily a better signal generator than a man insider buying a software name in a flat tape. At minimum, one wants market-adjusted returns. Better still, sector- and size-adjusted abnormal returns, with issuer clustering handled properly.

If the eventual internal query uses simple raw T+90 returns, the article should say so plainly. If it uses benchmark-adjusted returns, even better. If it uses a proper calendar-time portfolio or matched-firm approach, better still. The difference is not cosmetic. It changes whether the result is a signal or just a reflection of where each group tends to sit in the market.

The literature gives hints, not a verdict

The cleanest way to approach the gender question is to avoid pretending it starts with insiders. It starts with broader evidence on investment behaviour.

Barber and Odean's work on individual investors remains the classic reference point. Men, on average, traded more aggressively than women, and that excess trading hurt performance after costs. The mechanism proposed was overconfidence. That does not prove women insiders outperform men insiders. It does suggest a channel through which trade selectivity could differ.

In the corporate-insider literature, the most robust stylised fact is not about gender but about the informativeness of purchases, especially by senior executives and especially in clusters. Seyhun's long line of research established that insider trades contain information, though the magnitude and persistence vary by period, market, and implementation. Jenter's work also sharpened the distinction between routine and opportunistic insider trading. Once trades are classified by pattern, the opportunistic subset carries more signal.

That matters for gender analysis because any observed difference may simply reflect trade type. If women insiders are more likely to trade opportunistically and less likely to trade routinely, then "women outperform men" is partly a statement about trade classification. Useful, yes. Fundamental, not quite.

The role mix problem

A cross-market dataset will contain directors, CEOs, CFOs, supervisory board members, non-executive directors, beneficial owners, and assorted persons closely associated. The gender distribution across those categories is not even remotely uniform. Neither is the information content of their trades.

CFO purchases may differ from non-executive director purchases. Founder-chair sales differ from independent director sales. If women are disproportionately represented in one role bucket and men in another, a naive comparison is contaminated from the start.

A proper test would stratify or control for at least:

role or title,
issuer size,
sector,
transaction type,
transaction value,
whether the trade is first after a quiet period or part of a sequence,
market and reporting regime,
and ideally whether the trade appears routine.

Without that, one can still report descriptive differences. One should not call them causal. Finance has enough causal language already.

Cross-market means cross-regime

The phrase "cross-market" sounds grand and scientific until one remembers that insider disclosure rules differ materially. Europe operates under the Market Abuse Regulation framework, with PDMR reporting obligations under Article 19. The United States uses Section 16 reporting, with Forms 3, 4, and 5. Deadlines, thresholds, field formats, and the practical ease of extracting clean data all vary.

That matters because the investable signal depends on when the market learns about the trade. It also matters because certain transaction types are more clearly labelled in some jurisdictions than others. If one market records a transaction as an acquisition with rich metadata and another wraps several variants into a broad category, comparability suffers.

Market	Regulator	Rule	Deadline	Notes
FR	AMF	MAR Art 19	T+3	PDMRs and persons closely associated must notify transactions once the annual threshold is exceeded; public dissemination mechanics can vary by issuer.
EU	ESMA / national regulators	MAR Art 19	T+3	Harmonised framework in principle, uneven implementation details in practice.
US	SEC	Exchange Act Section 16, Form 4	T+2	Form 4 data are relatively structured, but transaction coding still requires care around derivatives, gifts, and planned trades.

Disclosure regimes shape signal timing before anyone starts talking about gender.

What a robust test would look like

Let us be concrete. If Sigma Journal were publishing this from a fully populated query, the minimum viable empirical design would separate descriptive and adjusted results.

Descriptive layer

Start with open-market purchases and sales separately. Report counts by gender, market, and role. Then show T+90 raw and benchmark-adjusted returns, equal-weighted and value-weighted, plus hit rates.

This layer answers the simple newsroom question: what happened after these trades? It is not enough for inference, but it grounds the discussion.

Adjusted layer

Then estimate a model along these lines:

[ Return_{i,T+90}^{adj} = \alpha + \beta_1 Female_i + \beta_2 Buy_i + \beta_3 Female_i \times Buy_i + Controls + FE + \varepsilon_i ]

Where controls include role, log market cap, sector, transaction size, market, year, and perhaps a routine-trade proxy. Fixed effects can include market and time. Standard errors should be clustered by issuer, and possibly by issuer-date if multiple insiders trade around the same event.

The coefficient of interest is not simply Female. It is often the interaction on purchases, because the informational content of buys is where one expects a cleaner signal. If women "outperform" only in pooled trades and not in buys, the result deserves suspicion.

Event clustering and duplicate information

Insider trades cluster around the same corporate information set. If three executives buy within days of each other, those are not independent observations in the way a textbook t-test would prefer. Nor are repeated trades by the same person in the same issuer over a short window.

A robust implementation should either aggregate clustered trades into a single event or account for clustering statistically. Otherwise, significance can be overstated. This is one of the oldest ways to manufacture a publishable p-value in market microstructure. The method is cheap. The confidence is expensive.

Routine versus opportunistic

Jenter's routine-opportunistic distinction is especially relevant here. If men trade more routinely and women trade more selectively, then the average female trade may look better even if there is no intrinsic gender effect after conditioning on trade type.

That is not a problem. It is, arguably, the result. But one should name it correctly. The edge would then be trade selection behaviour associated with gender composition, not some essential property of women as insiders. Markets are complicated enough without importing astrology.

Why cross-market results can differ even if the underlying behaviour does not

Suppose the eventual analysis finds women insiders outperform men in one market, not another. That would be entirely plausible without requiring any dramatic story.

Disclosure timing changes investability

Under MAR, PDMRs and persons closely associated must notify transactions within a specified period once the threshold condition is met, and issuers then disclose. In the US, Form 4 timing is different and often easier to parse. A strategy measured from transaction date may look stronger in one regime simply because public visibility arrives sooner relative to the trade.

If your T+90 starts from disclosure date, this issue is mitigated. If it starts from transaction date, the cross-market comparison becomes partly a reporting-lag comparison.

Corporate governance composition differs

Board structures and executive pipelines differ across markets. So does the representation of women in executive versus non-executive roles. If one market has a higher share of women in independent directorships and another has more women in operational executive roles, the informational content of their trades may not be directly comparable.

Sector mix matters more than people admit

A market heavy in financials and industrials is not the same as one heavy in healthcare or technology. Insider trades in R&D-heavy sectors can behave differently from those in regulated or asset-heavy sectors. If the gender composition varies by sector, and sector returns vary over the sample, one can mistake a sector effect for a gender effect with surprising ease.

What we can say now, and what we should wait to say

Without the live query output, the article cannot responsibly claim that women insiders do or do not outperform men in your dataset. It can, however, say what would make the claim credible.

A credible positive result would need these features

First, the spread should appear in purchases, not just in pooled transactions. Second, it should survive benchmark adjustment. Third, it should remain directionally similar after controls for role, size, sector, and market. Fourth, it should not be driven by a tiny number of outliers. Fifth, confidence intervals should be narrow enough to distinguish signal from anecdote.

If all of that holds, then the result is interesting. Not because it flatters a narrative, but because it suggests differences in trade selectivity or information use worth studying further.

A null result would also be interesting

Null results are not failures. They are often the first honest thing to happen in a factor zoo. If women and men insiders show similar T+90 performance after controls, that would imply that any raw differences are mostly compositional. That is useful for practitioners. It tells them not to overfit a demographic filter where role and trade type already do the work.

A mixed result may be the most realistic outcome

The most plausible outcome, frankly, is heterogeneity. Women insiders may outperform in purchases but not sales. In some markets but not others. In small caps but not large caps. Among non-routine trades but not routine ones. The real world is fond of conditional statements. Quant readers should be too.

How we would publish the final version once the query arrives

The eventual article should include a compact results table and, ideally, a chart with T+90 benchmark-adjusted returns by gender for buys and sells separately, with confidence intervals. It should also show sample counts prominently. If there is a backtestable spread, the article can discuss implementation frictions, turnover, and whether the edge survives after realistic delays and liquidity filters.

A useful extension would be a decomposition:

composition effect: how much of the raw spread is explained by role, market, sector, and trade type,
selection effect: whether women appear to trade less often but more selectively,
timing effect: whether the spread is concentrated around earnings cycles or specific volatility regimes.

That would turn a broad cultural question into an investable research note. Which is, after all, the point.

A note on tone and temptation

This is one of those topics where the market commentary can become either sanctimonious or silly. Better to stay empirical. If women insiders outperform, report it with intervals and caveats. If they do not, report that too. The market does not owe us a morality play, only a distribution.

The payoff is straightforward. The next concrete step is to run the gender-stratified T+90 query with buys and sells separated, benchmark-adjust returns by market and sector, and publish the sample counts before the headline spread. The open question is not whether a narrative exists. It always does. The open question is whether the narrative survives issuer clustering, role controls, and the brutal arithmetic of a smaller female sample.

Analyzing Gender Performance Among Corporate Insiders

The question is better than most answers