Spall is a fast, simple, portable profiler, made to make tracing large programs fast and easy.

About spall


Spall is a fast, simple, portable profiler, made to make tracing large programs fast and easy.

Read more
Filters

Recent Activity

&spall Baby steps on sampling on OSX today.

Got RIP samples plumbed through the trace-viz pipeline so I can see them.

No callstack unwinding or symbol resolution yet, but this has so many moving parts that are functioning. Hooking arbitrary child processes, grabbing registers/stack on an interval, checking to see when the child terminates, etc.

View original message on Discord

&spall I've got all sorts of cool stuff from the past few weeks!

Got DWARF symbol loading working to grab static function names on OSX and Linux, I got the symbol loader multi-threaded, and I've finally had a chance to play with Odin's auto-tracer for self-inspection, which is super exciting.

View original message on Discord

Got &spall off of SDL on Linux today!

Hard work by Will to swap from SDL_TTF to stb_truetype, and then a bunch of work on my end to shuffle over to talking to X11 directly.

Now I can finally start working on Linux sampling without SDL getting in the way.
Is it really handmade without working on platform layers at some point?

View original message on Discord

&spall I ran the same test I did last year for HMS again, to see if Perfetto had caught up, and/or if I'd gotten notably slower.

Nope. Spall is even faster now than it was then, even with all the extra features. (8s -> 6s) That's a good place to be.

Line info is shipped, and I'm puttering away on some UI overhaul stuff to get sampling ready to go. I'm hoping to have basic sampling on one of my target platforms (probably OSX) usable by HMS.

View original message on Discord

&spall Got line info for PDB auto-tracing working!
I need a round of cleanup, and then I should be able to ship line info for linux and osx (my DWARF stuff is running well), and windows auto-tracing.

It's a big step towards getting good info out of sampling, too.

View original message on Discord

&spall Little update to Spall today, got double-click to zoom working after a little debugging with @negate32!

It's a nice little feature that I've been meaning to do for ages, makes it much easier to see children under a parent function

View original message on Discord

&spall It was a rough few days (and I'm sure I'll see a looming bug-report or two in the next few days), but I've finally got some of my big paid-version changes from Spall-native, backported to the web version!

Spall-web, now with histograms, a smaller LOD tree, a better selection interface, and a redesigned timescale.

View original message on Discord

&spall My new batch of memory came in today, so now I can do serious load-testing.
I still have plenty more footprint to crunch down, but I'm now able to load 2.3 billion functions from Python, and dig through them at reasonable framerates

View original message on Discord

&spall It's a little silly, but I've been poking at load-times and memory usage again, and I figured it'd be interesting to see how that reflected on my older traces.

This is the JSON trace that took about ~8s to load at HMS on the web, now loading in ~3.2, from boot to usable on the native version.

(The binary version of this trace takes ~350ms)

View original message on Discord

&spall So, I decided to go spelunking in gdb today. Learned many things.
Auto-tracing gdb is a pain in the ass, because gdb's thread pool starts before main. Very C++, such wow.
(this trace video doesn't have the ~10+ other threads GDB spun up before main started, more research needed to inject spall earlier)

When GDB does symbol resolution, it runs resolvers for a pile of languages, one after another, for each symbol.

View original message on Discord

&spall it's silly, but I hit a new record tonight that excited me. I've used literally all of my memory and gone swapping (~32 GB), but I've got 500M functions from a complete auto-trace of python running build-tests, rendering at 60-80 fps. (actively crunching stats dips the framerate a little)

I need to do more work to chop memory usage down, but it's getting there.

View original message on Discord

&spall Needs more visual polish and a few UX issues ironed out, but I'm making headway on a process/thread filter for stat-crunching.

View original message on Discord

&spall Doing a little more tinkering today. This was an important one on my plate.
Stat-pane resizing!

This is a lead-in for better thread-selection stuff, in the new bottom tab-bar

View original message on Discord

&spall Messed with auto-tracing LuaJIT this evening, getting back into Spall after a bit of a hiatus.

local samples = 1000000
local T = {}
local CachedTable = {"abc", "def", "ghk"}

local function test1(times)
    T[times] = CachedTable
end

do
    for warm = 1, samples do
        test1(warm)
    end
end

Interesting to note, for simple bench tests, the more loop iterations you have, the longer the JIT seems to take to kick in.

for 100,000 iterations it takes >1ms to move to JITted code
for 1,000,000, it's 8ms
and for 10,000,000 it's ~60ms

The actual asm-gen part of LuaJIT is incredibly zippy. ~50-100us here
(the spiky part)

View original message on Discord

&spall It doesn't look like much, but this was a lot of learning-pain :P
Got the file open dialog working on OSX, which involved fussing with objc_msgsend and learning how to translate apple-docs to C-ish code in my head.

Big thanks to @Perlind for all the help getting this working!

View original message on Discord

&spall microevents using the new pdb resolver for Windows confirmed working today, thanks to @NeGate.
Cuik self-tracing, compiling sqlite3, with a ~166MB trace (down from ~280MB, I think). \o/

Negate has small function names, so things didn't shrink that much. :P

View original message on Discord

&spall So, uh, new milestone!
I'm tracing python running self-tests, getting 400 million functions, rendered at 60fps.
(that's 40 million functions per second average, captured)

It eats all of my memory, but it's huge progress.

cpython is somewhere around ~500kloc of C, according to tokei (take that with a major grain of salt, I'm sure)

View original message on Discord

&spall Got microevents working and resolving symbols on OSX. Still needs C++ name demangling, but the essence is there.

The Odin compiler on this machine spends 500ms+ (aggregating across threads) while compiling spall, calling memcpy. wild

View original message on Discord

&spall Threaded-writing for auto-tracing is in!

This is a seamless (no big disk-write gaps) trace of Odin compiling spall. All 120 million function calls at 60 fps on a meh intel mac laptop.

Now for the hard part, offline symbol resolution :P

View original message on Discord

&spall Writing a workstealing threadpool to even out a multi-producer,multi-consumer job DAG. Realized today I wrote a thing to make my life easier. Immediately spotted the mutex contention issue after plopping it in.

I should have written this thing ages ago. Thank you @bvisness
It's finally at a point where it's useful to me. <3

View original message on Discord

&spall Hashed out a lovely new feature for spall-native this evening with @philliptrudeau!
Histograms for functions! Lots still to do to make them shippable, but they're useful for us, even without the polish.

very handy for self-profiling the profiler's event emitter, and for figuring out WTF "average" for a function with huge tail latencies looks like.
There are also some big library changes on the way soon to get profile traces even faster and make library-building with spall much easier, hopefully those should be ready and in master in the next day or two.

View original message on Discord

&spall native can now launch with a trace-file passed as a command line argument.

I've been a little busy lately with demo-prep, but this one's for @NeGate's booth. Had to happen. :P

View original message on Discord

Ok, text isn't quite right yet, I don't have a loading screen, and selection doesn't quite work, but it's almost usable now.

It's hard to tell which one I'm using at this point, horribly broken multiselect animations aside. :P

View original message on Discord

still needs text rendering and a lot of platform normalization, but spall-native is coming along pretty quickly.

View original message on Discord

It's a small thing, but it helps a ton for big files.
Just pushed a change that builds self-times during parsing for binary files. Should cut a few seconds off spall load times, and also clean up a few weird edge-cases for begin events with no end.

View original message on Discord

Some good suggestions from @Phil H later, and a bunch of site work done, and spall is now live!

You can now vertically scroll with your scroll wheel by hovering over the mini-tree on the side, and there's a global scale to help you figure at where you're at in your trace.

https://gravitymoth.com/spall?refresh_unfurl=plz

View original message on Discord

it's a small change, but it makes a big difference.
Thanks to @bvisness for the suggestion!

View original message on Discord

So, more little things today..
Spall can now show thread/process names from chrome's tracing format, scroll stats, pan while you've got things selected, and read/display the juicy bits of chrome's sampling profiler data.

If you're sick of waiting for the chrome performance tab's profiler to zoom at 0 fps, trying to see what's taking so long in your JS code, you can now load them into spall.

Thanks to @bvisness for doing some serious groundwork figuring out the format.

My sampling profile import is a little beta because it's an undocumented format, so results may definitely vary.

View original message on Discord

Did some work this evening making JSON parsing a bit faster.
Doing around 500 MB in ~6 seconds now. (around 2x faster than it was)

I think we're ready to demo.

View original message on Discord

Starting to dig a little more of the ol' netsim code out while doing some visual polish today. Can't just be a boring old profiler, it's gotta feel good.

View original message on Discord

Ok, probably the last big feature before proper ship is in. We now have the ability to print self-time per function!

Also, because you probably want it, there's a lovely new button at the top left to crunch stats for your whole file.

Hopefully everything left now is cleanup, polish, and optimization.

View original message on Discord

a little under 1 GB of binary trace data taken from a 30 minute happenlance burn-test, loaded in 6 seconds.
You can now do stats without tanking the framerate too, which is nice.

Getting close to a proper launch. Needs another bug pass, but I'm hoping to get it up in beta in the next week or two

View original message on Discord

More usability features!
Added the top bar, so you can tell where you are on the x-axis while zoomed in, and you get a quick view of thread activity so you can spot program slow-points.

Not 100% sold on my current colors for the periphery views yet though.

View original message on Discord

So, I don't recommend this at all because the iPad WASM jit doesn't like to free memory when you refresh, but it does work for ~500 MB json files, mostly.

View original message on Discord

Working on rendering speed today. Still some lurking z-index issues, but we can now load and smoothly zoom/pan through 6 million events (300 MB of spall-binary, or ~700 MB of JSON) at 165 fps.

This is cuik processing a massive generated fibonacci program.

View original message on Discord

So, I felt like doing a little upgrade to my speed test. This is 540 MB, emitted from chrome's self-profiler. (the last one was 40 MB)
There's definitely some UI/UX polish left to do on my end (it's hard to squish so many profilers on the screen, so things are a little scrunched :P), I need to properly name PIDs and TIDs like perfetto/speedscope can, and I'm working with our resident JSON wizard, @demetrispanos to speed things up even more, but the numbers speak for themselves.

(chrome://tracing failed to load the file entirely)

View original message on Discord

One more big batch of changes, and it's finally feature-complete enough to feel real.
Needs polish for days and a big cleanup / optimization pass, but multiselect and stats are in.
Time for a nice long Zzz.

This time, we're featuring a lovely trace from Happenlance, killing it with incredible frametimes. Hopefully we'll get similar frametimes too after some tweaking.

View original message on Discord

After a long all-nighter with Philip and Jeroen, LOD is in!
This is a 530 MB JSON dump from chrome's chrome://tracing self-record feature that chrome://tracing's renderer fails to open, and takes a solid, laggy year to load and zoom around in perfetto.

View original message on Discord

After a bunch of optimization and TLC, we're at ~530 MB json trace files in ~8 seconds, plus with some collab with @philliptrudeau, I've also added support for a binary ingest format that loads around 10x faster than that.
Still needs some LOD love, but it's coming soon, I swear. :P @bvisness hopefully, I'll be at a point where your 1 GB trace files are totally viable, soon. More UI/UX work to go, but load times are now in the ballpark of tolerable, especially if you don't need JSON specifically.

View original message on Discord

With some huge help from our resident superhero, @philliptrudeau, we've got support for smooth scrolling, panning, and pan-to-zoom now!
Up next on my list is handling begin and end events, so I can process more config files, and improving my 1GB+ trace frametimes, but the core is now solid.

View original message on Discord

Got my Odin/WASM flamegraph tracer build/running well. Works on tablets, and (at least with my current ~900 KB of test data) boots faster than perfetto or chrome://tracing. Almost good enough to replace chome://tracing for small files, just needs slightly better zoom + time window selection.

https://github.com/colrdavidson/tracey

View original message on Discord