01 · Roasts
CI? Only in One Zip Code
Six repos scored, and exactly ONE has CI — tempo. benchmark-mondays has 10+ test suites and still no pipeline. You know how to write tests; you just refuse to run them automatically.
344 KB Engine, 0 Stars, 0 README
benchmark-mondays is your most architecturally impressive repo — Box-Muller price simulation, full poker hand evaluator, multi-arena agent runner — and you didn't even write a README. It's a masterpiece in a locked room.
The 'tempo' Anomaly
20 stars on a Claude Code rate-limiter but zero stars on an ML training platform with 13 methods and GPU support. Your marketing strategy appears to be 'release and pray'.
Python Monolinguist with a TypeScript Side Hustle
79% Python, 14% TypeScript — you're not multi-lingual, you're bilingual at best. GDScript at 1% suggests a game project that never got past 'hello world' in Godot.
Night Owl Who Codes in Bursts
60% night-owl commits, entire zero-weeks scattered across the heatmap. You're not a daily coder — you're a binge coder who disappears for a week then drops 4-level commit days on a Saturday.
Built using
Zoral
Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.
zoral.ai
02 · Category breakdown
- Impact25% weight48D
- Consistency20% weight65C
- Quality20% weight58D
- Depth15% weight60C
- Breadth10% weight65C
- Community10% weight30F
03 · Stats
365-day commit heatmap
148 active days
Language distribution
- Python79%
- TypeScript14%
- GDScript1%
- JavaScript1%
- CSS1%
- HTML1%
- Other3%
04 · Numbers
Owned repos
non-fork
31
Commits
last 12 months
699
Followers
8
Joined GitHub
Oct 2019
05 · Top repos
tomalmog /
crucible
Comprehensive ML training platform (13 methods, remote GPU, interp tools) with typed Python, structured architecture, and extensive docs. Active development but still nascent (0 stars, ~55 commits in 2 months).
tomalmog /
tempo
Python CLI automation tool for Claude Code with rate-limit handling and session persistence. Typed, well-documented, structured codebase with CI/CD but no tests. 30 commits across ~3 months shows steady development.
tomalmog /
personal
Personal portfolio website in TypeScript with Next.js, React, and integrated Groq AI chatbot. Well-structured, typed frontend with functional demo projects and experience showcase, but minimal tests/CI and limited architectural scope for a portfolio project.
tomalmog /
benchmark-mondays
Monorepo for multi-arena AI agent competition engine (stock trading + poker). TypeScript codebase, multi-package with test coverage, but zero public adoption, no README, and created within 3 weeks.
tomalmog /
slide
Personal Expo/React Native prediction market app with TypeScript, structured multi-file layout, and live crypto integration. No tests, CI, or license; boilerplate-heavy README. ~40 days old, 11 recent commits.
tomalmog /
musical
TypeScript Next.js "Heardle" game cloning Last.fm data into a music-guessing game with YouTube playback. Demonstrates structured React hooks and API integration, but is brand-new (2 commits, 85 KB), lacks tests/CI, and is a tutorial/one-off learning project.
06 · Timeline
- Oct 29, 2019Joined GitHub
- Nov 13, 2025Created personal — Personal Website
- Dec 12, 2025Created tempo — Automated Claude Code runner with rate limit handling. Run long Claude Code tasks overnight. Tempo automatically detects rate limits, waits for reset, and continues your task with
- Jan 10, 2026Created slide
- Feb 10, 2026Created crucible — LLM training suite
- Mar 13, 2026Created musical
- Mar 27, 2026Created benchmark-mondays
- Apr 16, 2026Most recent push to personal
07 · Compare
08 · Rubric
How this score was produced
Overall = Σ (category × weight) + gentle top-end curve
Tier thresholds
▸ How the pipeline works
- 01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
- 02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
- 03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
- 04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
- 05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.
~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.
▸ Data sources & caveats
- Heatmap & commit totals: GitHub GraphQL
contributionsCollection— covers the last 365 days, includes private repos when the user has opted in (default). - Language %: byte totals across the top 30 owned non-fork repos.
- Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
- Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.