0. TLDR: SCARCE-CXR
Self-supervised Cross-domain Adaptation for Rare Clinical Entities in Chest X-Rays.
1. Motivation for this Guide
What Do You Do When You Only Have 23 Labeled X-Rays?
2. Pretraining Setup for Robustness and Invariance
Optimizing X-Ray Training Per Dollar: RAM Caches, ResNets, and Augmentations
3. How SSL Works: MoCo + BarlowTwins (And How to Fix Collapse)
Contrastive Learning is Lazy. Why MoCo Collapsed and How VICReg Fixed It.
4. How SSL Works: DINO + SparK (And Dealing with Plateauing)
Stop Blindly Applying ImageNet SSL to Medical Data. What DINO and SparK Taught Me.
5. Probing and Finetuning
340GB of Manual Downloads, Disease Selection, Domain Gaps, and Probing vs Finetuning with Grad-CAM Considerations.
6. Analysis + Future Directions
MoCo v2 vs. BarlowTwins vs. SparK. Results on Medical SSL (With GradCAMs).
0. TLDR: Ryan-GPT
Building distributed training primitives from scratch on a single consumer GPU.
1. Why Single-GPU Training Doesn't Scale
Understanding the constraints before building distributed systems.
2. Data Parallelism as the First Scaling Primitive
Explicit gradient synchronization and the cost of communication.
3. Bucketing and Sharding as the Second and Third Scaling Primitives
Reducing communication overhead and memory usage.
4. Tensor Parallelism as the Fourth Scaling Primitive
Splitting individual layers across GPUs.
5. Training Validation on a Single GPU
Distributed code without distributed hardware?
6. Test Results and Future Directions
Crunching the numbers and what comes next.