01 · Roasts
Notebook Hoarder
85% of your codebase is Jupyter Notebooks. That's not a portfolio, that's a homework pile with a GitHub account attached. At least rename the cells.
One-and-Done Committer
TokenizerStats: one commit, lifetime under 60 seconds. datacamp: created and abandoned in 29 minutes. You're speedrunning the 'push and disappear' achievement.
Star-Starved Researcher
499 total stars across 50 repos sounds decent until you realize it averages to ~10 per repo and your FYP — your biggest project — has exactly zero. The work is there; the audience isn't.
Half-Life Problem
47% of your repos haven't been touched in over 2 years. Your GitHub is part active lab, part archaeological dig site.
Solo Artist, No Label
soloPct = 100%. Every single repo, just you, alone, in the dark. 35 PRs a year to other projects but nobody PRs back — you're contributing to the world but not inviting the world in.
Built using
Zoral
Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.
zoral.ai
02 · Category breakdown
- Impact25% weight36F
- Consistency20% weight60C
- Quality20% weight42D
- Depth15% weight55D
- Breadth10% weight65C
- Community10% weight50D
03 · Stats
365-day commit heatmap
167 active days
Language distribution
- Jupyter Notebook85%
- C++4%
- Python4%
- HTML3%
- Julia1%
- Go1%
- Other2%
04 · Numbers
Owned repos
non-fork
30
Commits
last 12 months
253
Followers
109
Joined GitHub
Jun 2017
05 · Top repos
ncduy0303 /
molecule-tokenization
FYP research project benchmarking multiple molecule tokenization methods (SMIRK, BPE, APE, fragSMILES, t-SMILES) for MLM pretraining via HuggingFace Trainer, with downstream classification finetuning on MoleculeNet datasets. Typed, documented, and architecturally sound for scope (537 KB, ~11k LOC), but experimental in
ncduy0303 /
ncduy0303.github.io
Personal portfolio and blog site built with Hugo and PaperMod theme. Includes resume CV, experience, achievements, and about pages. CI/CD via GitHub Actions. Minimal impact but well-documented and structured.
ncduy0303 /
TokenizerStats
One-shot research code for molecular tokenizer analysis, supporting the paper "Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models." Julia + Python hybrid project with ~508 KB codebase, minimal tests, no CI, and single-day lifetime.
ncduy0303 /
ncduy0303
Personal profile repository with minimal substance — a README-only project serving as a GitHub landing page with links and statistics, no code artifacts or meaningful project content.
ncduy0303 /
datacamp
Single Jupyter notebook Datacamp coursework project with no README, tests, CI, license, or documentation. Created and last pushed same day (2026-01-26). Minimal codebase (~1.2 MB) implementing a basic PyTorch neural network for cybersecurity threat detection.
06 · Timeline
- Jun 26, 2017Joined GitHub
- Nov 3, 2020Created ncduy0303
- Mar 31, 2023Created ncduy0303.github.io — My personal Github Page
- Jan 26, 2026Created datacamp — A repository to store my work of different Datacamp projects
- Feb 6, 2026Created molecule-tokenization — FYP Project: Advanced Tokenization Methods for Molecular Foundation Models
- Feb 10, 2026Created TokenizerStats — Taken from https://pubs.acs.org/doi/10.1021/acs.jcim.5c01856
- Apr 21, 2026Most recent push to ncduy0303
07 · Compare
08 · Rubric
How this score was produced
Overall = Σ (category × weight) + gentle top-end curve
Tier thresholds
▸ How the pipeline works
- 01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
- 02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
- 03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
- 04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
- 05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.
~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.
▸ Data sources & caveats
- Heatmap & commit totals: GitHub GraphQL
contributionsCollection— covers the last 365 days, includes private repos when the user has opted in (default). - Language %: byte totals across the top 30 owned non-fork repos.
- Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
- Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.