Copyright: ---------- Copyright 2009 by Zygo Blaxell Disclaimer: ----------- There is no warranty, not even an implied one. Note to potential future employers: ----------------------------------- This code is more than 10 years old, and downright fugly in places. If asked directly, I will disavow any knowledge of having written any of this code. Especially the shell scripts. ;-) Dependencies (Debian packages): ------------------------------- awk bplay gnuplot mp3info mplayer perl sox tclsh vorbis-tools Contents: --------- There are currently two versions of the code in this repository: src - the HEAD revision of the code I found in my CVS repository scripts - the version of the code I found in actual use on my laptop Generally where there are differences between the two, the version in 'scripts' is more likely to work. Currently some of the C code doesn't compile. GCC was more forgiving in the 90's than it is now. Usage: ------ WARNING: I haven't actually tried these yet, these are from memory: How to compare mp3 (and ogg and maybe even wma) files in N easy steps: 1. Use mp3sumtab to generate a table of signatures for all your files. Something like this: cd $HOME/mp3-collection && find -type f -print0 | xargs -0 mp3sumtab > summary 2. Once you have the signature table, process it with mp3sumcmp to produce a sorted distance list: mp3sumcmp $threshold $percentile < summary | sort -n > sorted I no longer remember what values make sense for $threshold or $percentile. I'd guess 4000 and 90, respectively. $threshold is the maximum size delta that will make it to the output file, probably in the range 0..65535. This is used to filter the output so that obviously different pairs of files are ignored. $percentile is the percentile of the difference values that is used when comparing two files. 50 is the median, 0 is the lowest difference value and (100-epsilon) is the highest. So if you said '90' for percentile, the comparison function would return a value for each pair of mp3 files such that 90% of the difference values between the files fall below the number returned. mp3sumcmp never bothers to compare pairs of files that are not within 10% of the same size. This is currently hardcoded. 3. Listen to similar files, and choose which to keep and which to discard. The sorted distance list has three fields per line (in Tcl list syntax): dv name1 name2 where 'name1' and 'name2' are the names of two files, and 'dv' is what mp3sumcmp thinks is the distance between them. So now you just read pairs of files in sorted order, listen to them both, and decide which one you want to keep. To help you with this, there is a script named 'mp3sieve': mp3sieve file1 file2 [file3...fileN] mp3sieve plays each pair of files using a script named 'mp3compare' or 'mp3compare-seekable', and prompts for a response. 'mp3compare' will attempt (often successfully) to downmix both input files to mono, then play one on each speaker in stereo. It's a really good idea to use headphones while you're doing this, partly because it ensures you get accurate channel mapping, but mostly because anyone who happens to be within earshot will want to kill you if you don't. The responses you can give to mp3sieve are: l - trash the file played on the left channel. r - trash the file played on the right channel. L - trash the left file and rename the right file to the left R - trash the right file and rename the left file to the right lr - trash both files (I used this surprisingly often on my mp3 collection!) rl - alias for lr q - quit a - play files again n - restart outer loop at next file s - switch to mp3compare-seekable instead of mp3compare. Currently there's no way to switch back. MAKE SURE YOU HAVE CORRECT STEREO CHANNEL MAPPINGS before you start deleting files! Copy a couple of obviously different files to 'left.mp3' and 'right.mp3', then run 'mp3sieve left.mp3 right.mp3' and make sure it does what you expect. 'mp3compare' plays two files at the same time. Playback starts immediately, but the playback is not seekable. This is the default used by 'mp3sieve'. 'mp3compare-seekable' decodes both files to a temporary file, then plays the temporary file. Playback starts only once both files are decoded, but the playback is seekable (i.e. mplayer left and right arrow keys work). 4. Enjoy the miscellaneous utilities: mp3diffplot $left $right - plot the difference between two mp3 files. mp3compare $left $right - play two mp3 files at once, the first in the left channel, the second in the right channel. mp3erapmoc $left $right - play two mp3 files at once, the average of the two in the left channel, the difference in the right channel. mp3autosieve $regexp - run mp3sieve on all files in the current directory (or any subdirectory thereof) whose names match $regexp.