Resurrecting a flash card with f3 and dd
A 128 GB SD card I'd been using for homelab image staging started throwing the classic half-dead-flash cocktail into dmesg:
mmcblk0: error -110 requesting status
blk_update_request: I/O error, dev mmcblk0, sector 7340032
Buffer I/O error on dev mmcblk0p1, logical block ...
The card was ghosting the host mid-write. Could be worn cells, could be counterfeit flash, could be a corrupted FAT from an ungraceful eject. Symptoms alone are ambiguous. That's exactly what f3 is for.
Step 1 — accuse the card of frauaud - f3
f3 stands for Fight Flash Fraud. It writes deterministic 1 GB files onto the mounted device, then reads them back and compares. Anything that lies about its capacity, silently corrupts data, or wraps around is caught.
Mount the card, cd into it, and run:
f3write /mnt/sdcard/
f3read /mnt/sdcard/
Output looked roughly like this:
Data OK: 119.84 GB (251339776 sectors)
Data LOST: 0.00 Byte (0 sectors)
Corrupted: 0.00 Byte (0 sectors)
Slightly changed: 0.00 Byte (0 sectors)
Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 42.11 MB/s
Verdict: not counterfeit. Real capacity matched advertised capacity. The dmesg errors were the controller tripping on specific bad physical blocks.
If the report had shown gigabytes of Data LOST, the card would be fake and the only "fix" would be to cap the usable region using f3probe and partition only the honest part of the device. Different story for a different day.
Step 2 - force the controler to remap
Modern flash has an FTL (flash translation layer) controller that can retire bad blocks and substitute spares from a reserved pool — but only if you actually write to every address and give it the chance to notice. A mostly-read workload lets marginal blocks fester until they time out during a critical write. The cure is to overwrite the entire device and let the controller do its job.
Unmount first. Then work on the raw block device, not a partition:
sudo umount /dev/mmcblk0p1
sudo dd if=/dev/zero of=/dev/mmcblk0 bs=4M status=progress conv=fsync
Two things that matter here, and that beginners consistently get wrong. First, of=/dev/mmcblk0, not mmcblk0p1 — you want to wipe the device including its partition table, not just one filesystem. Second, bs=4M matches typical flash erase-block granularity far better than the default 512 B, which would take until the heat death of the universe.
For a modern card that honors TRIM, blkdiscard /dev/mmcblk0 is cleaner and faster than dd if=/dev/zero, because it tells the controller "all of this is free, do as you will" instead of dragging literal zeros across every cell. Try blkdiscard first and fall back to dd if the controller silently ignores discard (many cheap SD cards do).
During the dd run I kept dmesg -w open in another pane. Two more -110 timeouts surfaced, then silence. The controller had hit the bad regions, remapped them to spare area, and moved on.
Step 3 - repartition and verify
Fresh GPT, fresh ext4, remount, rerun f3:
sudo parted /dev/mmcblk0 mklabel gpt
sudo parted /dev/mmcblk0 mkpart primary ext4 1MiB 100%
sudo mkfs.ext4 /dev/mmcblk0p1
sudo mount /dev/mmcblk0p1 /mnt/sdcard
f3write /mnt/sdcard/ && f3read /mnt/sdcard/
Second pass: zero errors, full capacity, no dmesg noise. The card has been back in rotation for two months carrying Proxmox ISOs and PXE images without complaint.
Why this actually works (and when it doesn't)
f3 diagnoses by brute force: if the device can't round-trip a byte, it's lying or dying, and no amount of smart-sounding SMART output will save it. dd (or blkdiscard) repairs by coercion: the FTL is a reactive creature; it won't remap blocks it doesn't know are bad, and it doesn't know until you write to them.
This recipe works when the card is genuine (f3 confirms real capacity), when the spare-block pool isn't already exhausted, and when the controller itself is still alive. It does not work when the controller is dead — dd will fail or hang on every sector, at which point replace the card — or when the card is counterfeit, because no amount of zeroing will manifest capacity that was never there. It also doesn't apply to full-fat SSDs reporting SMART failures; those have already spent their internal remap budget and are telling you, in standardized vocabulary, that they're finished.