The data problem is nastier than it looks
Matching a buy to a later sell sounds easy until you look at real filings. Names vary. Roles change. Transaction codes differ across markets. Holdings may be reported before and after the trade, or not at all. Some insiders transact through related entities. Others file corrections. The result is a puzzle with too many pieces and no picture on the box.
Disclosure regimes tell you events, not always inventory provenance
Under the EU Market Abuse Regulation, persons discharging managerial responsibilities, and persons closely associated with them, must notify transactions in the issuer’s shares or related instruments once a threshold is crossed, with notification due promptly and no later than three business days after the transaction. In France, the AMF is the relevant regulator, but the rule is European in architecture. Good for timeliness, less good for lot accounting.
In the United States, Section 16 filings provide rich transaction-level disclosures for directors, officers, and beneficial owners above the threshold. Form 4 can be very useful, particularly because transaction codes distinguish open-market trades, option exercises, gifts, and other events. But even there, reconstructing exact holding periods can be messy when insiders have multiple acquisition sources and multiple accounts.
Same person is not always the same string
A serious matching pipeline needs entity resolution. “Jean Dupont”, “M. Jean Dupont”, and “Dupont Jean” should not become three insiders. Nor should a spouse-controlled vehicle be silently merged into the executive’s personal account unless the legal reporting framework supports that linkage.
At minimum, a matching key should include:
- normalised insider identity,
- issuer identifier,
- security identifier where available,
- transaction date,
- transaction type,
- quantity,
- and post-transaction holdings if disclosed.
Even then, ambiguity remains. The analyst’s job is to reduce it, not pretend it has vanished.
FIFO is a convenience, not a truth serum
Suppose an insider buys 10,000 shares in March and 5,000 more in June, then sells 8,000 in September. Which lot did they sell? Without explicit lot identification, you need an allocation rule. The common choices are FIFO, LIFO, and proportional matching.
FIFO is often the practical default because it is simple and stable. It also tends to align with a conservative notion of “how long did this acquired exposure remain outstanding?” But it is still an assumption. If the insider actually sold the June lot first, your estimated holding period is wrong.
That does not make the exercise useless. It means the article, and the backtest, should say so in plain English rather than hiding behind decimal places.
A defensible matching framework
If you want a publishable distribution of insider holding periods, the methodology should be stricter than the average market-data blog and less mystical than a consultant’s appendix. The goal is not perfect truth. It is a transparent approximation that survives contact with sceptical readers.
Step 1, define the transaction universe narrowly
Start with the cleanest signal set:
- open-market purchases by insiders,
- ordinary shares only, unless derivatives are central to the market studied,
- exclude grants, vesting, option exercises, gifts, inheritances, pledges, and transfers,
- exclude transactions by closely associated persons unless you can identify them consistently and want them by design.
This will reduce sample size. Good. Better a smaller sample than a larger one full of compensation plumbing.
Step 2, build insider-level inventory by issuer
For each insider and issuer, create a chronological ledger of eligible acquisitions and eligible disposals. Where filings report post-transaction holdings, use them as a consistency check. If a disposal would imply negative eligible inventory under your rules, that is a warning sign that the sell likely came from another source.
At this stage, you need a decision rule:
- either allow mixed-source inventory and accept more noise,
- or require that matched sells do not exceed the cumulative eligible open-market buys still outstanding.
The second option is stricter and usually better for a conviction study.
Step 3, match sells to prior buys
Apply a lot-matching rule, usually FIFO, within each insider-issuer-security ledger. This produces estimated round trips and holding periods in calendar days or trading days.
Useful outputs include:
- median holding period,
- interquartile range,
- share of buys still unsold after 30, 90, 180, and 365 days,
- hazard rate of first sell over time,
- and role-level splits, such as CEO, CFO, chair, and non-executive director.
A survival-analysis frame is often cleaner than a simple histogram because many buys remain unsold at the end of the sample. Those are right-censored observations, not missing data.
Step 4, separate full exits from partial trims
An insider who buys 50,000 shares and later sells 5,000 has not “exited” in the ordinary sense. Partial disposal may still matter, but it should not be conflated with complete reversal.
A robust study therefore reports at least two measures:
- time to first matched sell,
- time to full disposal of the purchased quantity, where observable.
The first captures when insiders begin reducing exposure. The second captures how long the commitment lasted in aggregate. They are not the same thing.
Step 5, classify likely routine sales
Not every sale following a buy should be read as a change of mind. Some are likely routine. Clues include:
- sales clustered around tax dates,
- disposals immediately following vesting events,
- repeated small sales under a plan,
- and patterns inconsistent with open-market accumulation.
If the filings or market permit, flag these separately. The distribution of “all matched sells” is interesting. The distribution of “discretionary open-market reversals” is more interesting.