Everything I know about SSDs (2019)

Everything I know about SSDs (2019)

(kcall.co.uk)

207

by fagnerbrack

eatonphil

You might also be interested in this AMA we held on r/databasedevelopment with two NVMe developers from Samsung.

https://www.reddit.com/r/databasedevelopment/comments/1afpez...

thadt

Thanks for hosting (and posting it here). I was reading the "What Modern NVMe Storage Can Do..." paper just yesterday, and this was a great followup.

Eisenstein

Something useful to know that wasn't mentioned: SSDs will corrupt data if sitting for extended periods without being powered on and thus should never be used for cold storage.

"There are considerations which should be made if you are planning on shutting down an SSD based system for an extended period. The JEDEC spec for Enterprise SSD drives requires that the drives retain data for a minimum of 3 months at 40C. This means that after 3 months of a system being powered off in an environment that is at 40C or less, there is a potential of data loss and/or drive failures. This power off time limitation is due to the physical characteristics of Flash SSD media's gradual loss of electrical charge over an extended power down period. There is a potential for data loss and/or flash cell characteristic shift, leading to drive failure."

* https://www.ibm.com/support/pages/potential-ssd-data-loss-af...

Lammy

Relevant: Nintendo 3DS and Switch game carts go bad unless played https://news.ycombinator.com/item?id=39367506

userbinator

Note that retention increases exponentially with decreasing temperature, so literally "cold storage" might actually be OK for SSDs.

The retention also goes down with erase cycles, and is usually specified after the rated number of cycles have been reached; I expect those same SSDs to hold their data much longer than 3 months if they're still nearly new.

I still have USB drives over a decade old, but their contents are still intact. Then again, those haven't been written to much, and are SLC and older 2-bit MLC.

Eisenstein

Those are drives that weren't plugged in for a decade? How do you know the contents are intact?

eimrine

Probably by plugging it in and eyeballing all the contents. I have a similar experience with 1GB ancient USB drive.

Eisenstein

Did you literally never plug the drive in for over a decade or so, not once? Then you checked the contents for corruption and it was clean?

HankB99

I ran into this for the first time a couple weeks ago. I tried to boot a system from an SSD that had sat for about 9 months at normal room temperature w/out power. It stumbled badly, repeatedly. I booted a live USB, ran non-destructive `badblocks` scan on it and reinstalled the OS. It's been working fine since.

I thought SSDs would last longer than 9 months w/out losing data when not powered.

BeFlatXIII

> Something useful to know that wasn't mentioned: SSDs will corrupt data if sitting for extended periods without being powered on and thus should never be used for cold storage.

I've got burnt by this the hard way. Even in the early 2000s, HDDs from the mid-90s were often still bootable after years of sitting in still contemplation. No such longevity for SSDs.

robotnikman

Makes me wonder if those external SSD drives made by Samsung and WD/Sandisk take that into account and use different flash memory with better longevity without power.

sgerenser

Unlikely in the case of Sandisk, which is known for a recent bout of extremely unreliable external SSDs: https://arstechnica.com/gadgets/2023/08/sandisk-extreme-ssds...

wmf

If anything, I would expect consumer SSDs to use the cheapest/lowest grade of flash available.

wtallis

Small USB flash drives get the worst NAND, followed by memory cards, then consumer SSDs, then enterprise SSDs. "Portable SSDs" that are physically much larger than USB thumb drives usually contain standard consumer SSDs in mSATA or M.2 form factor, plus a bridge chip.

ksec

I assume this is the same for all NAND? i.e Flash / SD Card and USB Drives ?

jmbwell

I always thought 'flash' was a holdover from early reprogrammable ROM technology, non-volatile memory that you could erase by literally flashing a literal flashbulb over a little window on the chip. I would've sworn in a court of law that I recall this being called "flashable" memory, that erasing it was called "flashing" it, and reprogramming it in general was called "reflashing" in a sort of synecdoche. And I'd have assumed that this was the fundamental origin of what became _electronically_ erasable ROM (EEPROM), which led to all the various NVRAM technologies we have now, with "flashing" sticking as the term for reprogramming it, even after you could do it electronically.

It looks like the story these days is that someone at Toshiba thought up the name out of the blue. I'm skeptical!

dboreham

> holdover from early reprogrammable ROM technology, non-volatile memory that you could erase by literally flashing a literal flashbulb

No. I can see how that might appear plausible but they're unrelated. Flash was a marketing term invented to differentiate a new EEPROM technology that allowed much higher density, and featured sector erase, from previous EEPROM tech. This was done because engineers saw EEPROM as esoteric expensive tech that had no place in low cost products. The Flash vendors wanted to position the new chips as replacing UVPROM, which was relatively cheap by comparison. So they came up with a name that was a) not EEPROM and b) conveyed that the devices were quickly reprogrammed vs UVPROMs (which had to be baked in an eraser then took some minutes to program in a special piece of equipment).

Flash's big advantage was that it could be programmed on-board, allowing soldered down PROMs, surface mount packages.

Source: I was a hardware design engineer when Flash was introduced and heard the marketing pitch first hand.

Using Flash chips for rewritable bulk persistent storage came much later. The first generation devices didn't have the necessary density.

sgerenser

I never heard of using a literal flash to reprogram EPROMs, but this wikipedia entry[1] makes your story for the origin of the term “Flashing” seem likely

EPROMs are easily recognizable by the transparent fused quartz (or on later models resin) window on the top of the package, through which the silicon chip is visible, and which permits exposure to ultraviolet light during erasing.

[1] https://en.m.wikipedia.org/wiki/EPROM

jmbwell

It does look like all the references I can find point to engineers at Toshiba in 1980 coining the name, although Google ngrams shows some references to "flash EPROM" prior to 1980, so I can't help but wonder if the idea existed at least in some form prior to 1980

monocasa

Looking into it, those references look miscategorized. It's some Zambian national report that's talking about 0.18um processes (aka, 180nm) in the same paragraph, which wouldn't have come out until the very late 90s.

hex4def6

Heard from a greybeard that they had a demo night in university (this was probably early 80s), and one of the demos was was some sort of path-finding rover robot. Of course, it ended up being one of those projects that had show stopping bugs up until a few hours before showtime.

During demo night, it was a big hit, until a stray camera flash got (un)lucky and wiped the microcontroller's EEPROM...

Feels a little apocryphal (I'd assume most flashes have / had UV filters?)

avidiax

This still happens today. Some chip packaging doesn't provide enough protection from UV.

https://forums.raspberrypi.com/viewtopic.php?f=28&t=99042

asciimov

You would have put some tape over the window to keep that from happening.

Also erasing takes like 15+ mins, it wasn’t quick.

radicalbyte

These things were used in a number of computers from the early 90s, I have a few lying around for my Amiga.

ThrowawayR2

EPROMs were erased by many minutes of exposure to UV light, not a flashbulb, in a device called a UV eraser. I've never heard anyone refer to any operation on an EPROM as "flashing".

jmbwell

Yep, "UV eraser" confirmed by an elmer I know. A unit with a drawer and a timer knob, he says you'd typically set for 30 minutes.

He says he does recall people calling it "flashing," but not until much later, by which time it would have been actual "flash" memory.

As for my own memory, I'm going to file this under Mandela Effect, cross-referenced under Things People Probably Told Me That I've Believed Since Before The Internet Was Available To Fact-Check!

salawat

Note that the Internet isn't always right, and sometimes the dead trees are the only reliable source.

The Mandela Effect is just a name we've come up with for identifying biological memory faults.

epcoa

Still own and use a little 9V plastic “portable” UV EPROM eraser that I’ve had for 30 years. Never heard the term “flashing” in this use even when I used them commercially. It really makes no sense what it has to do with flashing, it’s a steady gas discharge lamp for 10 to 20 minutes.

The process of programming EPROM was colloquially called “burning” whereas for Flash the programming is called “flashing”, despite the characteristic that the name originally refers to is fast erase speed.

canucker2016

I always thought that "flash" memory was named for EEPROM's erase speed compared to EPROM erase speed (less than a second versus 20+ minutes).

EPROM has erase as the first word in the acronym, so everyone just said "erasing" the PROM when they put the chip in the UV eraser.

Wikipedia lists finer differentiation between flash memory and EEPROMs.

see https://en.wikipedia.org/wiki/EEPROM#Related_types

howard941

Programming the windowed EPROMS was called burning. Don't ever recall hearing anything about flashing them for programming or erasure.

monocasa

Yeah, exactly. 'Burning' itself being a holdover from PROMs, where you'd literally burn certain fuses in the array to select bits.

HankB99

/pedantic

ROM - Read Only Memory - programmed at the time of manufacture.

EPROM - Erasable Programmable Read Only Memory (AKA UVEPROM because they could be erased using UV light over a period of time, and they had a quartz window to admit the UV.)

EEPROM - Electrically Erasable Programmable Read Only Memory. IIRC it required a special device to erase.

It's been a while since I worked with this stuff but I don't ever recall hearing it called flash.

numpad0

Actually no, it’s Toshiba trademark for then-new type of EEPROMs that could be erased quickly in blocks rather than having to be set and unset bit by bit.

Intel had italic stylized “FLASH” logo proudly printed on chips. I’ve come across it once.

0xcde4c3db

Old-school windowed EPROMs need to be erased under a UV-C tube for something like 20 minutes. By comparison a flash memory block erase operation is practically instantaneous.

d_sem

Anecdotally this was also what was told to me by (now retired) electronic engineers who worked in automotive embedded systems in the 80's/90's. The term flash was related to the process of UV exposure to erase memory. This was also humorously explained to me as the beginning of the end for system performance. You didn't need to prove out your system when you could just update your hardware after the fact. In a world of over-the-air flashing we have come a long way from fixed design elements.

idle_cycles

Two wonderful papers that are relevant: 1) https://pages.cs.wisc.edu/~jhe/eurosys17-he.pdf 2) https://www.usenix.org/system/files/hotstorage19-paper-wu-ka...

MPSimmons

11 years ago, I did a 3 hour 'Introduction to Solid State Storage' at LOPSA-East 2013 that also covered how spinning disks worked, if anyone is interested.

https://www.youtube.com/watch?v=G3wf1HMr6b0

The SSD content starts at around 1 hour in: https://youtu.be/G3wf1HMr6b0?si=5kdNeLGafrrU6Gmy&t=3573

dang

Discussed at the time:

Everything I Know About SSDs - https://news.ycombinator.com/item?id=22054600 - Jan 2020 (185 comments)

markhahn

little hard to understand why it's worth explaining the details if you're going to gloss over the issue of endurance and erase cycle limits.

if you do very little writing, you have nothing to worry about with SSD endurance. just read-disturb.

do you do very little writing?

creatonez

There is a very good 5-part explanation from Branch Education:

https://www.youtube.com/playlist?list=PL6rx9p3tbsMuk0jnC-dBd...

password4321

I've heard SSDs are more likely to "fail fast".

Can anyone recommended utilities that monitor and warn before SSD failures?

caseyf

For NVMe, if you get the SMART data with smartmontools/smartctl, you can inspect Percentage Used.

"Percentage Used: Contains a vendor specific estimate of the percentage of life used for the Endurance Group based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the Endurance Group has been consumed, but may not indicate an NVM failure. The value is allowed to exceed 100."

for SATA/SAS SSDs, there is "Media_Wearout_Indicator" which hasn't been a particularly reliable indicator in my experience.

Dalewyn

>if you get the SMART data with smartmontools/smartctl, you can inspect Percentage Used.

CrystalDiskInfo[1] can be used for this purpose over on Windows. Some vendor-provided utilities like Samsung Magician will also provide this data with appropriate drives.

[1]: https://crystalmark.info/en/software/crystaldiskinfo/

justsomehnguy

> CrystalDiskInfo[1] can be used for this purpose over on Windows

Only if you need a fancy GUI or guide some non-tech person to read the values to you over the phone/IM.

Otherwise just use win32 port of smartmontools.

magnetic

My SSDs show SMART attributes, which can be used as a rough indicator of health, but really the only strategy I've found to work well for my peace of mind is to use redundancy.

Concretely, I use ZFS with a zpool with 2 SSDs in a mirror configuration. When one dies, even if it's sudden, I can just swap it out for another one and that's it.

My vulnerability window starts when the first SSD fails and closes when the mirror is rebuilt. If something bad happens to the other SSD during that time, I'm toast and I have to start restoring from backup.

Comment was deleted :(

Vecr

Did you stagger the power-on times? Otherwise you could get tightly correlated failures.

magnetic

They are about 25 hours apart, which isn't very large I'll admit.

Thankfully, the serial numbers aren't too close to each other, so I'm hoping they aren't part of the same batch.

themoonisachees

In my experience with enterprise SSDs (which yeah aren't the same but that's what I have to offer), SSDs with sequential serial numbers and identical on-times, in the same RAID array, can have wildly different actual endurance. Some storage servers I used to admin had SSDs lasting longer than 2 neighbor replacements from the same original box, and this happened at least twice.

I stopped being worried about on-times after that for SSDs. HDDs are still quite correlated (on the order of months) but if you're building the server you have to put the disks in it at some point.

somat

One surprising feature of "enterprise" class drives is that they often have a faster fail than their consumer class counterparts. the idea being an enterprise class drive will be found in a drive array, and the array will be much happier with a hard fast final failure than a drive that tries to limp along. While the consumer with their single drive is a lot happier when their failing drive gives them every chance to get the data off.

themoonisachees

Often times you'll find that consumer drives that are limping around do fail smart checks and would absolutely flash a red light and send out a monitoring alert were they in an enterprise enclosure. While there is probably truth to what you're saying, I think the enterprise is also just way more proactive at testing drives and calling them dead the instant the smart checks fail, where consumers don't typically use crystaldiskinfo.

Comment was deleted :(

Crafted by Rajat

Source Code

hckrnws

Everything I know about SSDs (2019)