01 · Roasts
One Domain, One Language, One Destiny
Python + MIMIC-IV + FastAPI on every single repo. You've found your niche so hard that your GitHub looks like a clinical trial with five identical arms. Throw in a JavaScript side-project before you become a single-point-of-failure.
940 Commits, Zero Test Suites
You pushed 940 commits this year and not one repo has HAS_TESTS=yes in a standard location. You're deploying BioClinicalBERT to Google Cloud Run on vibes and GitHub Actions prayers.
7 Stars, 0 Forks, Live on Three Cloud Platforms
Cloud Run, Render, deployed APIs, Swagger UIs — and the entire internet has collectively starred your work 7 times. You're shipping to prod for an audience of ghosts.
Type Hints for Pydantic, None for Your Own Code
You use Pydantic's Field constraints and FastAPI typed endpoints, yet every repo scores TYPED=no. The framework types your inputs; you type nothing yourself. Physician, annotate thyself.
Joined July 2025, Already Outpacing Most
Account is under a year old and you've already hit 940 commits, 4 deployed systems, and a GCP production endpoint. The trajectory is real — the visibility isn't. Yet.
Built using
Zoral
Shadows one worker for a week, then takes over their job with zero extra setup. Behaves exactly like the original.
zoral.ai
02 · Category breakdown
- Impact25% weight48D
- Consistency20% weight65C
- Quality20% weight62C
- Depth15% weight70B
- Breadth10% weight40D
- Community10% weight25F
03 · Stats
365-day commit heatmap
258 active days
Language distribution
- Python64%
- Jupyter Notebook36%
- Dockerfile0%
04 · Numbers
Owned repos
non-fork
5
Commits
last 12 months
940
Followers
5
Joined GitHub
Jul 2025
05 · Top repos
SimonYip22 /
Rule-Based-Clinical-Symptom-Checker
Rule-based clinical symptom classifier with deterministic weighted scoring, dual CLI/API architecture, comprehensive domain documentation, automated tests and CI/CD. Typed language absent, but mature codebase demonstrates substantial structured work and domain modeling.
SimonYip22 /
Clinical-Entity-Extraction-Validation-System
Clinical NLP extraction-validation system built on BioClinicalBERT + rule-based regex for ICU notes. Typed Python, structured src/, comprehensive docs, CI/CD, but minimal adoption (2 stars, recent research project).
SimonYip22 /
NEWS2-Early-Warning-Monitoring-System
Python clinical monitoring system implementing NEWS2 scoring with FastAPI backend, CSV persistence, and dual CLI/API interfaces. Well-documented architecture with automated CI, typed inputs, but lacks production test coverage and formal license despite functional scope.
SimonYip22 /
Time-Series-ICU-Patient-Deterioration-Predictor
Specialized medical ML system using LightGBM and TCN for ICU patient risk prediction. Demonstrates substantial effort with structured pipelines, mixed documentation quality, and limited external adoption (2 stars, no external PRs or product reference).
SimonYip22 /
SimonYip22
Personal portfolio README showcasing clinical ML work with links to external projects; no actual code in this repo. Profile aggregator rather than functional software.
06 · Timeline
- Jul 30, 2025Joined GitHub
- Aug 19, 2025Created Rule-Based-Clinical-Symptom-Checker — Deterministic clinical inference engine mapping patient-reported symptoms to ranked differential diagnoses, with matched symptoms and structured management guidance. Includes input
- Aug 22, 2025Created NEWS2-Early-Warning-Monitoring-System — Clinical vitals monitoring system implementing real-time NEWS2 calculation from input patient vitals, tiered risk alert generation, temporal tracking with trend visualisations, and
- Sep 3, 2025Created SimonYip22
- Sep 11, 2025Created Time-Series-ICU-Patient-Deterioration-Predictor — Early ICU deterioration detection system combining LightGBM and Temporal CCN (TCN) for multi-dimensional clinical risk modeling
- Nov 28, 2025Created Clinical-Entity-Extraction-Validation-System — Precision-first clinical NLP extraction-validation system converting ICU progress notes to structured entity outputs using rule-based regex extraction and BioClinicalBERT encoder v
- May 15, 2026Most recent push to SimonYip22
07 · Compare
08 · Rubric
How this score was produced
Overall = Σ (category × weight) + gentle top-end curve
Tier thresholds
▸ How the pipeline works
- 01Scrape.Pull every non-fork repo pushed in the last 90 days, plus your contribution calendar, followers, and language byte counts — straight from GitHub's REST & GraphQL APIs.
- 02Triage.A small model reads every repo's file tree + README and picks the 20 files per repo that actually reveal how you code.
- 03Grade each repo. All repos run in parallel through a fast scoring model that reads the picked files and rates each one independently on Impact, Quality, and Depth — with evidence citations.
- 04Aggregate. A larger reasoning model combines the per-repo scores with server-computed stats (heatmap, commit cadence, language entropy, follower count) to produce the 6-dimension profile score + roasts.
- 05Correct.Deterministic server-side checks enforce anchor-scale floors (e.g. a profile with 2,000+ public commits can't score 30 Consistency) and recompute the final verdict.
~90 seconds per profile, ~$0.25 in compute. Total of ~240 files read across your top-12 repos. One rating per GitHub account per day.
▸ Data sources & caveats
- Heatmap & commit totals: GitHub GraphQL
contributionsCollection— covers the last 365 days, includes private repos when the user has opted in (default). - Language %: byte totals across the top 30 owned non-fork repos.
- Curve: a small upward nudge centered on raw score ≈ 70, capping at 100. Prevents specialists from being unfairly penalised for narrow breadth.
- Anchor corrections: when server-measured signals (e.g. privateWorkLikely, multiRepoVolume, follower count) mandate a minimum category score, the aggregation step enforces it. These are signal-conditional, not identity-based floors.