- CookBook
- Posts
- I built a 545-asset scanner with Claude
I built a 545-asset scanner with Claude
Stoic walks through the entire build, including what went wrong
Today’s special edition features Stoic talking AI + trading.
Table of contents:
The dashboard
z-scores for dummies
How to actually build with Claude
Four ways to read the market
All models are wrong, some are useful
The 4-hour break
What is actionable, and what to explore next?
Limitations
![]() Stoic | I built a 545-asset scanner with Claude |
Everyone is posting articles on the subject of what you can do with an LLM and how you can improve your entire life by 1000x with lines and lines of prompts for engagement.
Few walk through something they built, the entire process of building it live, the thinking behind it, the limitations, what was learned etc.
We’re going to do all of that and more here.
The problem with 545 charts…
BTC sold off 5% on the FOMC decision and you have no idea which altcoins are most affected, which are holding up, or where the actual statistical extremes were.
You’re guessing because you have to switch through charts on TradingView manually or go by raw percentages but what do the percentages even mean if you don’t know what the baseline is?
The issue is there is only so much capacity to manually scan and decipher information quickly, it’s not scalable.
There are ~ 545 perpetuals on Binance . You can check maybe 30 before the setup is gone.
A conversation on VWAPs and value development sparked this effort:

1. The Dashboard

VWAP vs. Funding Rate Z-Score Scatter Plot
We’re going to work our way backwards.
For my purpose, the finished product is a Streamlit dashboard which runs locally and fetches live candle data for every USDT perpetual on Binance (545 tickers).
Here are some questions I want to answer:
Where are the outliers?
Which assets are holding up while everything else is weak?
Which ones have been extended far enough that they’re statistical anomalies?
To take it a step further:
How long has the asset remained extended?
What is the characteristic of the extension (i.e. crowded positioning with squeeze risk, slow grind, thin market)?

I also added alerts for set threshold. If something went from −1.2σ to −2.4σ in one cycle (5 or 15 mins), that’s a different situation than something that’s been grinding to −2.4σ over six cycles.
Quick note: σ (sigma) is the symbol for standard deviation, which measures the typical move size (volatility)
Clicking any scatter dot plots a z-score sparkline over time, distribution relative to its own past, and a hit rate table showing what happened the last hundred times it was this extended.
It computes rolling VWAP z-scores for each one, and plots them all simultaneously on a single interactive scatter chart with a 5 minute refresh.

I wanted a single view of assets that are statistically over-extended relative to their own history at any given moment. Not relative to BTC or to the sector, but relative to their own history.

This isn’t for signals. The dashboard displays statistical anomalies and quantifies how unusual they are.
What is done with this information is based on how the information is processed and used, which requires regime awareness, risk management, and discretion.

TLDR: The scanner’s purpose is to tell me where things are, how unusual that is, and how long they’ve been there.
2. z-scores for dummies
Before moving forward, there’s one statistical concept that we should touch on. The dashboard relies heavily on z-score and standard deviations (extension from mean).

When you look at a chart and see an asset sitting 3% below its VWAP, that number means nothing without context.
Is 3% a lot for that asset?
BTC can move 3% in an afternoon without anyone blinking. For a low-volatility alt that usually drifts 0.5% from its VWAP, 3% can be extreme.
The raw percentage doesn’t tell you anything useful on its own, and it definitely doesn’t let you compare across assets.
A z-score solves this by asking a different question.
Not “how far is price from VWAP” but “how far is price from VWAP relative to how far it normally gets.“
The math is simple:
(current deviation − average deviation) ÷ typical swing
The result tells you how many standard deviations you are from the norm. Zero means exactly average. +2 means higher than 97.5% of that asset’s own history. −2.5 means lower than 99% of it.
Normalizing like this, BTC and a micro-cap alt are on the same relative scale. In other words, a −2σ move means the same thing for both: unusually extended, historically speaking.
We use two main parameters:
1. VWAP (volume weighted average price):

2. Funding Rates
This is the foundation of this simple dashboard I created applied across 545 assets.
3. How to actually build with Claude

So much garbage around this subject for engagement.
I am showing you the entire process behind something I actually built. Including the trial & error, plus the thought process as the dashboard evolved.
This is more valuable than just handing you a dashboard I made for my personal preferences and criteria. Teach a man to fish as they say.
I built the dashboard using Claude over the course of a week spending 30-45 mins a day.
Here are my qualifications:
Mechanical Engineering background.
I am a problem solver at heart and enjoy solving complex puzzles. I have some mathematics and statistics under my belt (far from a quant).
Not a professional developer.
I am a relentless tinkerer and am always curious. Understand some Python and have built some tools for myself using Claude over the past 6 months. In other words, I have enough coding knowledge to read and review what gets written, catch logic errors and have some sense of when something looks wrong.
My process and rules
1. Build in slices
Define the context of what you’re building and provide the LLM with adequate boundaries and information. Otherwise you’ll have to go through endless loops of perceived “improvement”.
Every feature started as a spec i.e. what does this do, what does it need to know and what does it create.
Each slice was shippable before moving on to the next one.
Chronologically:
> The data fetch (binance api) and chart came first
> Filters
> Inspect panel
> Duration tracking and clustering
This was done so that I understood what was happening at each phase well enough to know where the gaps were and what needed to be fixed before I moved on.
2. Ask for diffs, not complete rewrites.
Once the code gets past a few hundred lines, I do not want the entire file to be rewritten because it can lead to the creation of new bugs while fixing the original one.
3. Verify the output, don’t blindly trust it
The statistical math like the z-score formula, rolling VWAP calculation, extension score, k-means initialization was correct the first time.
But the CSS light/dark mode, Streamlit session state ordering and semantic cluster labelling needed multiple rounds of review and correction.
You have to read everything.
4. Ask Claude for critiques to see what potential gaps exist in the approach
When I directed the LLM to point out the statistical problems and overall gaps in the approach, it identified them accurately, including pointing out the in-sample bias in the σ estimation, the window correlation problem, the fat tail issue and issues with the duration gap.
Worth noting, these weren’t made apparent during some of the build. You have to ask it to look for holes.
Integrate this into your workflow, where after each step you direct the LLM to point out any potential gaps in what you are building.
TLDR: Lay out context. Build in slices. Ask the LLM to provide critique. Don’t assume everything is correct.
Here is where you come in.
The LLM can not decide what to build, what’s important, when does adding a feature make the dashboard/tool better instead of adding more noise.
Establishing the domain and answering these questions is your job. Be the domain expert.
The quality of what you get is a function of how precisely you can describe what you want, which means you need to appropriately define the problem and understand it deeply, if you don’t want to get garbage as the end result.
Here is a summary of the build sequence, slice by slice for the dashboard I put together:
Slice 1: Binance API fetch + rolling VWAP z-score math + basic scatter chart
Slice 2: Filters (volume, extension threshold, quadrant)
Slice 3: Watchlist creation & alerts
Slice 4: Inspect panel — z-score sparkline, distribution histogram, hit rate backtest
Slice 5: Velocity tracking (Δz per refresh), duration counter, four axis modes
Slice 6: OI fetch, funding rate fetch, clustering, statistical audit.
This is where asking “what’s wrong with this?” produced the most value.
The LLM identified in-sample bias in the σ estimation, the window correlation problem between the two VWAP axes, and the missing duration dimension. All of these became the next things to build.
4. Four ways to read the market
The dashboard has four ways to look at the same market data, each answering a different question.
i. 2D/5D rolling VWAP scatter plot

March 26 snapshot
This is the default.
Each blip is a ticker (BTC, ETH, etc).
X = 2-day z-score, Y = 5-day z-score.
It’s a quick scan of regime. One glance shows you whether the market is broadly above or below its own (very) recent average on two timeframes.
Here is the issue with this. The two timeframes share 40% of the same data and are heavily correlated (~0.72 correlation) so you’re not getting an independent view i.e. it’s a measure of the same thing with slightly different rulers.
This is why this view is best for the regime question
Case study:
March 18 FOMC dump. Fed holds rates, upgrades inflation to 2.7%. BTC drops 5%, $708M in single-day ETF outflows.
The mean extension reported at the time was −1.6σ market-wide with 22 assets at ±2.5σ or beyond.
The chart shows it instantly — 78% of the plot was sitting in the bottom-left quadrant.
ii. 2D/prior-week — making the axes a bit more uncorrelated

March 26 snapshot
In this view, we’re now comparing 2 days of prior value to last week’s developed value.
Correlation drops to ~0.18
Now a dot in the bottom-left means it’s extended on two independent timeframes.
iii. Weekly anchored - 2D/weekly anchor

March 26 snapshot
The weekly open is the reference point in this view.
The Y-axis in this mode is how far you are from where the week opened, cumulative volume-weighted.
The most useful question this mode answers: “is this a genuine weekly loser, or just a short-term pullback within a strong week?”
Best used Thursday–Friday when there’s enough weekly data accumulated.
iv. VWAP/funding (2D VWAP/Funding Rate) – the orthogonal view

March 26 snapshot
The most interesting mode.
X = 2D VWAP z-score.
Y = funding rate z-score.
These two axes are structurally independent (~0.08 correlation). One measures where price is vs “positioning consensus”. The other measures how crowded the positioning is vs its own history.
When combined together, they have the potential to answer the question:
“Where is price extended AND positioning overcrowded simultaneously?”
That combination is where the violent moves come from in both directions.
The two quadrants to quickly scan here:
the top-right (price extended above + funding elevated = potential crowded longs, liquidation risk)
bottom-left (price extended below + funding negative = potential crowded shorts, squeeze risk)
5. All models are wrong, some are useful
Everything under here is for the purpose of showing how to parse data and what my thinking was to refine this dashboard using the data that I collected over a period of 5 days (not conclusive findings).
Duration & Clustering
This is where things got a bit more interesting as I became more involved in figuring out the optimal way to cluster the data.
Take this all with a grain of salt because everything spoken about from here onwards entails 4-5 days of data but I wanted to share what I learned and the process of walking through building a tool and then how to dissect it to continue to improve it.
First, why did I end up clustering the data?
A z-score tells you where something is, but not why it’s there or what it’s likely to do next.
Two assets can sit at the same −2.3σ and have completely different developing circumstances behind that number.
One got there in a single candle with no volume behind it and is already bouncing back. The other has been grinding there for six cycles with large open interest and no correlation to BTC.
Same z-score but completely different circumstances.
For the dashboard, clustering adds the second dimension. Instead of asking how extended something is, it asks what the full profile looks like across every dimension simultaneously. Magnitude, duration, velocity, rarity, volume, BTC correlation, volatility regime.
The cluster is the answer to what those seven numbers look like together, automatically, across all 545 assets, every five minutes.
It turns a list of z-scores into a set of distinct market situations you can actually filter and potentially act on.
The clustering evolution
Not going to dive into this part too much but it evolved through sharing and tinkering with some things.
I started with k-means through which I reached four clusters with semantic labels:
Noise spike, Slow grind, Crowded position, Thin market.
Then I migrated to DBSCAN which removed the fixed-K constraint and marked genuine outliers rather than forcing everything into a group, but required ε tuning that the k-dist plot couldn’t solve cleanly.
HDBSCAN removed ε entirely, found the most stable cluster structure automatically, and added soft membership probabilities. With 42 hours of data it found eight clusters rather than four.
6. The 4-hour break
Yes, 5 days of data is not enough in this case.
The point is to go through the exercise of how to develop a tool and iterate on it. Also, this paints a picture of how certain conclusions can be reached due to a limited or skewed dataset.
Quick overview on how the scanner works:
it runs every 5 minutes and logs every asset that has extended from its volume-weighted average price.
Each log entry is a snapshot that includes position, how long it’s been extended, how fast it got there, and how correlated it is with Bitcoin.
When the time window closes, we check whether the asset has moved back. If it has, we call it a “resolved outcome”.
Let’s say the hypothesis going into this exercise is that assets extended from their average should revert within a certain period of time.
Here is what we learned:

Win rates came in at 13% at 15 minutes, 15% at 30 minutes, 21% at 1 hour, 31% at 4 hours, 36% at 8 hours, and 33% at 12 hours.

The most interesting finding wasn’t the win rates in this particular case of gathering 5 days of data.
It was what happened to the predictive features at 4h.

This shows how the structure changed from the 4-hour mark onward, which makes intuitive sense.
Below 4 hours the tree splits on extension size and how long the asset has been sitting there.
Above 4 hours those factors lose their grip and macro structure takes over.
The 4h break.
Short-horizon extensions are noise playing out locally.
Longer ones are regime related, the broader market condition matters more than the individual asset’s position, which confirms how significant context is.
7. What is actionable, and what to explore next?
The z-score is useful for scanning and ranking but not for directional entries on its own.
The initial goal of the scanner is to work as a filter to identify potential candidates or point out outliers, not as a signal to act on directly.
What I will explore next
Context.
Testing extensions filtered by regime (ranging vs trending) could separate the cases where reversion actually occurs from the continuation moves that dominate this dataset.
The 4h structural break is an interesting thread to pull on.
What specifically changes at that timescale, perhaps looking into more time discretization between 1h - 4h time periods and whether BTC correlation filtering at 8h+ improves the subset win rate.
8. Limitations
Six days of data during a single market regime.
March 24–29 was a ranging, post-selloff period, not a trending market or a significant stress event.
The findings clearly reflect a specific regime rather than some kind of universal truth.
Win rates, feature importance, and the 4h structural break would all look different in a sustained uptrend or during a liquidation cascade.
More data across distinct regimes is needed before drawing hard conclusions.
There’s also a measurement gap, which limits the accuracy of the extension of the asset because the scanner polls every 5 minutes, so the entry timing is approximate.
An asset could have first breached a threshold 3 minutes before the snapshot or 4 minutes after. The entry extension recorded is close but not exact.
Conclusion
“This article is useless, just tell AI to do it all.”
The LLM is a tool, and the tool is only as good as the person using it.
“There are no permanent solutions in a dynamic system.” - Naval
The Dashboard Improvement Loop

I distilled how I think about creating tools and the improvement loop in the image referenced above.
Following the scientific method is the best way to approach this process.
You start with a thesis, collect data, analyze the data, and modify the tool.
However, over the course of this loop, you must have a good understanding of whether the changes you’re making are going to overfit or underfit the data you have collected.
To get a handle on this aspect, it will take you some iterations to understand where the “cliff” is and what the bounds are of the problem you are trying to solve.

I really enjoy tinkering and building dashboards/tools for myself. Will continue to work on this in my spare time using this same process.
The best way to get better at doing anything is to just jump into it.
Hope this was helpful.
Would you like to see more...? |
Hit 'reply' to this email and let us know what you liked, disliked, or if you have any questions.
P.S. Magus, Doc and Charlie cook up more sauce like this daily in The Paragon.


