01 · Roasts
The Ghost of GitHub Present
33 commits in a year, all jammed into a 10-week window. Your heatmap looks like a EKG flatline with one brief moment of consciousness before the patient gave up again.
Meta AI → 0 Stars
Former intern at Meta AI, Netflix, AND Amazon — and your repos have a combined 0 stars, 0 forks. The big-tech logos in your bio are working harder than your commit graph.
The Alignment Problem
wardstone is supposed to evaluate LLM alignment and it's 120 lines of code with a hardcoded example about weapons and MDMA. The attack surface of your own README is more dangerous than your toolkit.
Monolingual ML Tourist
96% Python, all three repos are AI/ML research sketches in the same domain. You've discovered one hammer and everything looks like a transformer.
Portfolio or Graveyard?
animus has no README, no tests, no CI — just vibes and typed Python. At least label it a graveyard so visitors know to bring flowers.
Built using
Zoral
Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.
zoral.ai
02 · Category breakdown
- Impact25% weight30F
- Consistency20% weight20F
- Quality20% weight57D
- Depth15% weight55D
- Breadth10% weight30F
- Community10% weight40D
03 · Stats
365-day commit heatmap
21 active days
Language distribution
- Python96%
- Go2%
- C++1%
- Other1%
04 · Numbers
Owned repos
non-fork
6
Commits
last 12 months
33
Followers
15
Joined GitHub
Jun 2022
05 · Top repos
pxe3 /
dominion
Personal RL infra exploration with distributed training framework (workers/inference/learner architecture), multiple algos (PPO, DPPO, diffusion), and test coverage. ~200KB, ~58 commits over 2 months, typed Python, structured layout. Early-stage but functional.
pxe3 /
animus
LLM-based social agent simulator combining Tree-of-Thought and Reflexion techniques for agentic reasoning. Early-stage research project with typed Python core logic, memory systems, and actor/evaluator/reflection models, but minimal documentation and no tests/CI.
pxe3 /
wardstone
Early-stage Python toolkit for LLM alignment evaluation with one implemented attack method (character bijection), basic test coverage, but minimal documentation and architectural scope (10 KB, ~120 LOC).
06 · Timeline
- Jun 1, 2022Joined GitHub
- Oct 19, 2024Created wardstone — An open-source toolkit for language model alignment evaluations.
- Nov 16, 2024Created animus — agentic sim: reflexion + tot
- Jan 12, 2026Created dominion — an exploration into rl infra
- Mar 10, 2026Most recent push to dominion
07 · Compare
08 · Rubric
How this score was produced
Overall = Σ (category × weight) + gentle top-end curve
Tier thresholds
▸ How the pipeline works
- 01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
- 02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
- 03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
- 04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
- 05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.
~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.
▸ Data sources & caveats
- Heatmap & commit totals: GitHub GraphQL
contributionsCollection— covers the last 365 days, includes private repos when the user has opted in (default). - Language %: byte totals across the top 30 owned non-fork repos.
- Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
- Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.