ncduy0303 · 51.6/100 — Rate My GitHub

01 · Roasts

Notebook Hoarder

85% of your codebase is Jupyter Notebooks. That's not a portfolio, that's a homework pile with a GitHub account attached. At least rename the cells.

One-and-Done Committer

TokenizerStats: one commit, lifetime under 60 seconds. datacamp: created and abandoned in 29 minutes. You're speedrunning the 'push and disappear' achievement.

Star-Starved Researcher

499 total stars across 50 repos sounds decent until you realize it averages to ~10 per repo and your FYP — your biggest project — has exactly zero. The work is there; the audience isn't.

Half-Life Problem

47% of your repos haven't been touched in over 2 years. Your GitHub is part active lab, part archaeological dig site.

Solo Artist, No Label

soloPct = 100%. Every single repo, just you, alone, in the dark. 35 PRs a year to other projects but nobody PRs back — you're contributing to the world but not inviting the world in.

Built using

Zoral

Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.

zoral.ai

02 · Category breakdown

Impact
25% weight
36F
Consistency
20% weight
60C
Quality
20% weight
42D
Depth
15% weight
55D
Breadth
10% weight
65C
Community
10% weight
50D

03 · Stats

365-day commit heatmap

167 active days

Less

Language distribution

7 langs

Jupyter Notebook85%
C++4%
Python4%
HTML3%
Julia1%
Go1%
Other2%

04 · Numbers

Owned repos

non-fork

Commits

last 12 months

253

Followers

109

Joined GitHub

Jun 2017

05 · Top repos

ncduy0303 /

molecule-tokenization

40/100

FYP research project benchmarking multiple molecule tokenization methods (SMIRK, BPE, APE, fragSMILES, t-SMILES) for MLM pretraining via HuggingFace Trainer, with downstream classification finetuning on MoleculeNet datasets. Typed, documented, and architecturally sound for scope (537 KB, ~11k LOC), but experimental in

I25Q50D45

README

Python★ 03mo ago

ncduy0303 /

ncduy0303.github.io

38/100

Personal portfolio and blog site built with Hugo and PaperMod theme. Includes resume CV, experience, achievements, and about pages. CI/CD via GitHub Actions. Minimal impact but well-documented and structured.

I15Q50D45

READMECI

HTML★ 05mo ago

ncduy0303 /

TokenizerStats

30/100

One-shot research code for molecular tokenizer analysis, supporting the paper "Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models." Julia + Python hybrid project with ~508 KB codebase, minimal tests, no CI, and single-day lifetime.

I15Q50D20

READMETests

Julia★ 05mo ago

ncduy0303 /

ncduy0303

12/100

Personal profile repository with minimal substance — a README-only project serving as a GitHub landing page with links and statistics, no code artifacts or meaningful project content.

I5Q10D20

READMECI

Unknown★ 03mo ago

ncduy0303 /

datacamp

12/100

Single Jupyter notebook Datacamp coursework project with no README, tests, CI, license, or documentation. Created and last pushed same day (2026-01-26). Minimal codebase (~1.2 MB) implementing a basic PyTorch neural network for cybersecurity threat detection.

I5Q25D5

Jupyter Notebook★ 05mo ago

06 · Timeline

Jun 26, 2017
Joined GitHub
Nov 3, 2020
Created ncduy0303
Mar 31, 2023
Created ncduy0303.github.io — My personal Github Page
Jan 26, 2026
Created datacamp — A repository to store my work of different Datacamp projects
Feb 6, 2026
Created molecule-tokenization — FYP Project: Advanced Tokenization Methods for Molecular Foundation Models
Feb 10, 2026
Created TokenizerStats — Taken from https://pubs.acs.org/doi/10.1021/acs.jcim.5c01856
Apr 21, 2026
Most recent push to ncduy0303

07 · Compare

Compare ncduy0303 against

github.com/

ncduy0303 · 6dmedian coder

08 · Rubric

How this score was produced

Overall = Σ (category × weight) + gentle top-end curve

CategoryWeightScoreContrib.

Raw total49.1

Top-end curve+2.5

Final overall51.6

Tier thresholds

S90–100Mass-producing humansA80–89Ship machineB70–79Solid engineerC60–69Getting thereD40–59README enthusiastF0–39GitHub tourist

▸ How the pipeline works

01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.

~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.

▸ Data sources & caveats

Heatmap & commit totals: GitHub GraphQL contributionsCollection — covers the last 365 days, includes private repos when the user has opted in (default).
Language %: byte totals across the top 30 owned non-fork repos.
Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.