Manager DNA · Behavioural Clustering

How Institutional Managers
Actually Behave

Not all institutional investors are equal. A passive index giant buying 10,000 stocks and an activist hedge fund taking a concentrated 8% stake in one company both file a 13F -- but the signal value of their disclosures is fundamentally different. This page uses unsupervised machine learning to identify those differences at scale.

What the algorithm does
Step 1Extract 14 behavioural features per manager-quarter from 13F data
Step 2UMAP compresses 14 dimensions into a 2D map preserving local structure
Step 3HDBSCAN finds density-connected clusters without a fixed number of groups
Step 4Cosine similarity assigns each cluster a semantic archetype label
Managers Profiled
8,934
Unique institutional filers with 4+ active quarters
Stable Archetypes
4
Density-connected groups found by HDBSCAN
Silhouette Score
0.412
Cluster separation quality: >0.5 = strong structure
Best min_cluster_size
100
Tuned via sweep over 5 values

Manager Behavioral Embedding

UMAP 2D projection of 14 behavioural features across 8,934 institutional managers. Each dot is one manager. Proximity indicates behavioural similarity.

2,000 of 2,000 managers shown  ·  UMAP(n_components=2, n_neighbors=15)
How to read this chart
X-axis (UMAP 1)
Left = long-horizon holders · Right = high-turnover traders
Y-axis (UMAP 2)
Top = concentrated books · Bottom = diversified 100+ positions
Proximity = similarity
Closer dots = more behaviorally similar across all 14 features
Cluster colours
Color = HDBSCAN archetype · assigned by cosine similarity
Grey dots (Noise)
No dense neighborhood · typically 5-15% of universe

Behavioural Archetypes

Four distinct investor phenotypes emerge from unsupervised clustering. Click any card to understand who these managers are, how they trade, and what their 13F filings actually signal.

Feature Space: 14 Dimensions

Each of these behavioural metrics is computed per manager across all available 13F quarters before being passed to UMAP and HDBSCAN. No label information is used -- the clusters emerge purely from the data.

avg_hhi
avg_put_ratio
log_avg_aum
avg_turnover
avg_conviction_delta
new_position_rate
exit_rate
avg_holding_duration_qtrs
top5_concentration
options_notional_ratio
shared_vote_ratio
amendment_rate
quarters_active
aum_volatility

Hover any feature chip for a plain-English description. Features are z-score normalised before being passed to UMAP.