pxe3 · 38.9/100 — Rate My GitHub

01 · Roasts

The Ghost of GitHub Present

33 commits in a year, all jammed into a 10-week window. Your heatmap looks like a EKG flatline with one brief moment of consciousness before the patient gave up again.

Meta AI → 0 Stars

Former intern at Meta AI, Netflix, AND Amazon — and your repos have a combined 0 stars, 0 forks. The big-tech logos in your bio are working harder than your commit graph.

The Alignment Problem

wardstone is supposed to evaluate LLM alignment and it's 120 lines of code with a hardcoded example about weapons and MDMA. The attack surface of your own README is more dangerous than your toolkit.

Monolingual ML Tourist

96% Python, all three repos are AI/ML research sketches in the same domain. You've discovered one hammer and everything looks like a transformer.

Portfolio or Graveyard?

animus has no README, no tests, no CI — just vibes and typed Python. At least label it a graveyard so visitors know to bring flowers.

Built using

Zoral

Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.

zoral.ai

02 · Category breakdown

Impact
25% weight
30F
Consistency
20% weight
20F
Quality
20% weight
57D
Depth
15% weight
55D
Breadth
10% weight
30F
Community
10% weight
40D

03 · Stats

365-day commit heatmap

21 active days

Less

Language distribution

4 langs

Python96%
Go2%
C++1%
Other1%

04 · Numbers

Owned repos

non-fork

Commits

last 12 months

Followers

Joined GitHub

Jun 2022

05 · Top repos

pxe3 /

dominion

47/100

Personal RL infra exploration with distributed training framework (workers/inference/learner architecture), multiple algos (PPO, DPPO, diffusion), and test coverage. ~200KB, ~58 commits over 2 months, typed Python, structured layout. Early-stage but functional.

I25Q60D55

READMETests

Python★ 04mo ago

pxe3 /

animus

33/100

LLM-based social agent simulator combining Tree-of-Thought and Reflexion techniques for agentic reasoning. Early-stage research project with typed Python core logic, memory systems, and actor/evaluator/reflection models, but minimal documentation and no tests/CI.

I25Q45D30

Python★ 06mo ago

pxe3 /

wardstone

25/100

Early-stage Python toolkit for LLM alignment evaluation with one implemented attack method (character bijection), basic test coverage, but minimal documentation and architectural scope (10 KB, ~120 LOC).

I15Q35D25

READMETests

Python★ 01y ago

06 · Timeline

Jun 1, 2022
Joined GitHub
Oct 19, 2024
Created wardstone — An open-source toolkit for language model alignment evaluations.
Nov 16, 2024
Created animus — agentic sim: reflexion + tot
Jan 12, 2026
Created dominion — an exploration into rl infra
Mar 10, 2026
Most recent push to dominion

07 · Compare

Compare pxe3 against

github.com/

pxe3 · 6dmedian coder

08 · Rubric

How this score was produced

Overall = Σ (category × weight) + gentle top-end curve

CategoryWeightScoreContrib.

Raw total38.1

Top-end curve+0.8

Final overall38.9

Tier thresholds

S90–100Mass-producing humansA80–89Ship machineB70–79Solid engineerC60–69Getting thereD40–59README enthusiastF0–39GitHub tourist

▸ How the pipeline works

01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.

~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.

▸ Data sources & caveats

Heatmap & commit totals: GitHub GraphQL contributionsCollection — covers the last 365 days, includes private repos when the user has opted in (default).
Language %: byte totals across the top 30 owned non-fork repos.
Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.