12

I am looking for the "right" way to dump the contents of audio CDs to hard disk without losing any information like CD identifiers, cue lists, etc...

I am not searching for a all-in-one solution from CD to compressed audio, like ABCDE for example, because I can't be certain at this time about all the possible future audio formats and data structures that I will ever need in the future. It is also not necessary that online CD information sources, like CDDB or Musicbrainz are queried at dump time. The idea is more to get a full, perfect-quality, lossless (obviously) dump of the CDs, in a set of files that I can post-process as many times as I need, with different parameters of various existing or future software, for batch-converting part or all of the library into a particular format. I mainly want to avoid having to play the physical disk-jockey with well over one thousand CDs more than once.

What would be the optimal set of programs and options to get a binary dump of the whole audio data, as well as cue times, CD-Text data, CD identifiers, etc... well, anything that is on the disk ?

I have programming skills and writing the necessary scripts to batch-process the contents of the dump is not an issue, as long as we are speaking about linear audio (.wav) and text files.

I am also wondering if it would be better to get whole-CD audio as a single track or individual tracks. I have many live recordings, for which it is probably more useful to have single-track, because it is usually the way I listen to them. Any advice on that would also be appreciated.

So far, I have experimented with cdda2wav and cdrdao, and I found the following set of commands probably give me a lot of the data I need :

cdda2wav -D /dev/cdr0 -B
cdda2wav -D /dev/cdr0 -t all -cuefile
cdda2wav -D /dev/cdr0 -J
cd-info -C /dev/cdr0
cdrdao read-cd toc_file

Running all these commands result in a lot of redundant information being dumped, and of course in reading the whole CD more than once. I wasn't able to clearly determine the data provided by one of these commands to be a strict subset of another one, hence my question.

I use Linux slackware 15.0 on a desktop with 4 SATA CD drives. In addition to the above, do you think using more than a single CD drive, to dump up to 4 CDs in parallel (saving time) would result in a higher risk of errors (on scratched media, for example) ?

5 Answers5

11

To extract as much information as possible from a CD with audio tracks, on a current CD drive, you should use cdrdao with any subchannel information supported by your drive:

cdrdao read-cd --read-raw --read-subchan rw_raw tocfile

You may need to specify a different driver with the --driver option, depending on the drive you have; see the cdrdao README file for details.

This will include CD-TEXT data if your drive supports it. Note that if you want to write a CD with CD-TEXT data, you may need to explicitly enable driver option 0x10 if you’re using the generic-mmc driver. cdrdao has a database of known drives but it might not include the drive you’re using.

If the CD you’re reading isn’t in great condition, or the drive itself isn’t great, you may want to avoid rw_raw to at least have a chance of detecting errors.

In general you should read CDs in disk-at-once mode, not individual tracks; DAO will preserve the original tracks, with whatever gaps were present (if any), along with any extra information at the beginning or end of the CD.

Stephen Kitt
  • 434,908
  • Does the option --read-subchan make a difference on whether the CD-TEXT data will be included into the bitstream or not, or is that completely unrelated ? In such case, where does the CD-TEXT data lie within the resulting data stream (which is basically just a long audio track) ? The documentation of cdrdao only mentions error correction for read-subchan arg. – Patrick Aszody Jun 19 '22 at 11:47
  • They’re unrelated. If your CD drive supports reading CD-TEXT data, it ends up stored in the .toc file. – Stephen Kitt Jun 19 '22 at 13:00
  • Maybe with --driver generic-mmc-raw? This is supposed to do CD-TEXT automatically? – mirabilos Feb 27 '23 at 20:13
9

Wow, takes me back.

So, cdrdao has been around for quite some time, and I do think it's the tool you want to use; specifically¹

album="Nine Inch Nails – Broken"
cdrdao read-cd --read-raw --read-subchan rw_raw --device /dev/cdrom --datafile "${album}.bin" "${album}.toc"

Now, what to do with these two files, aside from burning them again? I honestly don't know. For CD-ROMs, you could use the relatively new raw2iso program to get an ISO image out of your cdrdao raw image. But it's useless for audio disks! Could you extend raw2iso such that it can deal with audio style content? Maybe!

As of now, however, the tool I'm aware of that makes the "most precise" Audio-CD copies (cdrdao) has no image format that allows other uses than re-writing to a disk :(

So, either you do that and work with the copy (which might be attractive for "rescue" purposes), or you will have to read the original twice: Once with cdrdao read-cd --read-raw --read-subchan rw_raw and once with e.g. cdda2wav, or honestly, abcde with a FLAC compression – that's lossless, but heavily entropy-coded ("compressed"), so that your redundant data at least doesn't take as much space – it's also much more useful to have actual audio files for your audio player programs, and, honestly, mass storage is so cheap: 150€ gets you two 4TB drives (you're in luck, CD backups and surveillance camera recordings: practically the same thing from a hard drive's view).

Throw a file system with checksums and mirroring built in (ZFS?) on them, and you get some long-term storage if you replace one disk every 4 years or so. 4 TB are roughly 4000 fully-fledged maximum-length audio CD backups; if you have more than 4000 CDs, you might have a problem (worth solving with a tape drive).

Or honestly, if you want highly-reliable backups, companies do sell that as a service; it tends to get cheaper the more you're willing to wait to retrieve an image; for example, AWS will let you wait (up to) 12 hrs to get an image in their cheapest archival storage class, but it will cost you a ridiculous $0.0018 per month to store 1 GB, and 9 ct to then download it. That means that your 250 CDs (that's a wild guess) collection of live media will cost $5.40 per year to archive redundantly in high-reliability data center... and only cost you cents to retrieve an image, should you realize your FLAC audio isn't "complete" enough.


¹ Why did I choose Nine Inch Nails – Broken? Because it's an annoying CD with 99 tracks, and a leadout! It also triggers bugs in some CD drives at 21 min 21 seconds sharp, if I remember correctly. It's been a while, as said.

  • super nice! With that you could use nearly as complete as cdrdao read-cd --read-raw --read-subchan raw_rw directly as image files, nice :) – Marcus Müller Jun 17 '22 at 20:28
  • @A.B thanks, but none of them is the same as the cdrdao raw_rw format (that's OK, it's really a very specific format that you'd never want to use aside for very raw disk archival reasons) – Marcus Müller Jun 17 '22 at 20:41
  • I honestly don't understand! Would that enable you to use that image? – Marcus Müller Jun 17 '22 at 22:55
  • so that means your kernel module can directly use these images? (sorry, I'm being thick here) – Marcus Müller Jun 17 '22 at 22:56
  • yeah but that means it can not currently read these images – Marcus Müller Jun 17 '22 at 23:00
  • but it's a viable way for OP – they say they can code enough, and adding an image parser does not sound absurdly complex. – Marcus Müller Jun 17 '22 at 23:01
  • I think you mean abcde, A Better CD Encoder. https://abcde.einval.com/wiki/. You left out the e. (And yeah, this takes me way back, too. My current and previous Linux desktops don't even have it installed, so well over a decade since I've used it I guess.) – Peter Cordes Jun 18 '22 at 05:43
  • @PeterCordes oops yeah. My current desktop has one, which I quite literally had to dig up, in order to rip some late-1980s CD, which I otherwise would have had a hard time enjoying.... – Marcus Müller Jun 18 '22 at 07:40
  • MPlayer can (or at least could, back in the day, if they didn't remove it) play CD audio tracks directly from bin/cue files. – R.. GitHub STOP HELPING ICE Jun 18 '22 at 18:11
  • Thanks for a wealth of information in this answer. Actually there is a quite easy way to use the raw data from the disk. I found some information at https://forums.freebsd.org/threads/how-to-dump-an-audio-cd-to-iso.18809/ . Using information on this page I am able to do some post-processing, splitting the image into individual track files. To be able to add metadata tags, I need to do an additional pass with cdda2wav -J, though. – Patrick Aszody Jun 19 '22 at 11:55
1

I'm sure there are hundreds of questions on so network, but too lazy to search for them now. I would suggest you to search.

This one in particular is on unix. Unfortunately I had to resort to windows accurate rip programs because of lack of reasonable ones for linux.

In the past I used cdparanoia with high paranoia setting. But it is very slow and still not as good as accurate rip. But for good quality CDs it is fine.

I think your best bet, if you are not ready to use a spare windows or a trial one in a VM is to get RAW CD data as describd in Stephen's answer with the Accurate RIP tools to feed data to.

Honestly I never bothered to do this as I just wanted to dump my CDs with reasonable quality and forget about this outdated technology.

P.S. AccurateRIP also depends on the hardware device you use and it's offset measure. So it is a complicated job if you want bit by bit accuracy. My personal take was that I didn't really need that, given I'm not going to hear minor issues with cdparanoia or a reasonable accurate rip capable device..

akostadinov
  • 1,048
0

I have done this using 2 different tools on linux.

The first ripping program I use, for CDs in good condition, is called K3b. You can rip a CD losslessly to flac which will preserve titles, or wav, which will not. It is essentially a graphical interface for cdrdao. It has been around for a long time and it hasn't been updated but it is convenient. It will not produce a cue file, or at least I haven't figured out how to make it do that, because I've never needed it. There are a couple of tools that will do that: shnsplit and cuegen.

For CDs in poor condition, I use cd-paranoia. Yes, it can be slow, as akostadinov says. But it will work on CDs that can't be read by cdrdao. A friend of mine lost a CD in his car and brought it to me. It had been underneath the floor mat for several months. It was unreadable. I put cd-paranoia to work on it on a spare laptop, and after a week it had recovered the 8 tracks of music and they sounded very good.

I have ripped and burned a couple of thousand CDs using these two tools and I recommend them as the most simple.

Wastrel
  • 151
-1

Surely, the old dd command would do the job?

It would certainly dump all information!

Jeremy Boden
  • 1,320