Sharp Performance on Apple Silicon: M4 Pro vs M4

Note: Staying true to the paper I got SHARP running in ~200ms on an H100. Total process — download / inference / compression / upload is about 3 seconds. Read more or reach out to spatial@fncore.com if you're interested. The MacBook Pro and Mac Mini timings below are 2-4 seconds higher than they should be for an approach identical to Apple's paper. I'll update them as soon as I re-run the benchmarks.

I benchmarked Sharp on my MacBook Pro and a Mac Mini to compare performance against the NVIDIA GPUs from Apple's paper.

The Results

Testing 9 images on each machine:

Machine Chip GPU Cores Inference Total Time
MacBook Pro M4 Pro 16-20 5.56s 7.04s
Mac Mini M4 10 9.28s 10.79s

The Mac Mini runs 1.7x slower than the MacBook Pro — which lines up almost exactly with the GPU core ratio (10 vs 16-20 cores).

CoreML vs MPS

I also tested CoreML as an alternative to PyTorch's MPS backend on the MacBook Pro:

Backend Inference Speedup
MPS (PyTorch) 5.56s baseline
CoreML 4.04s 1.4x faster

CoreML delivers a 1.4x speedup over the standard PyTorch MPS backend on the same hardware.

Takeaway

GPU core count scales linearly with inference speed on Apple Silicon. The M4 Pro's extra cores translate directly into faster 3D processing, and CoreML can squeeze out another 40% on top of that.