Download, slice and dice podcasts on Linux

I’m trying to replace my Windows applications with Linux applications. On Windows, I use I use Juice to download podcasts as MP3s. Recently I decided to switch over to Linux for receiving podcasts. After looking around at various podcast catchers (especially ones that ran on the command-line, so that I could automate them with a cron job), I ran across Podracer. I decided to combine Podracer with a script to split long MP3s into shorter MP3s so that I could play them more easily in my car. Here’s what I did on my Ubuntu Linux machine:

Step 1: Install and configure podracer

I used these commands:
sudo apt-get install podracer
mkdir ~/.podracer
vim ~/.podracer/subscriptions
and add the url of a podcast, e.g. http://feeds.webmasterradio.fm/tdsc for The Daily SearchCast.

cp /etc/podracer.conf ~/.podracer/podracer.conf
Edit ~/.podracer/podracer.conf so that you can pick the download directory you want. I changed
#poddir=$HOME/podcasts/$(date +%Y-%m-%d)
to
poddir=$HOME/rawpodcasts
because I want all my podcasts in one directory where I can do a batch process over them afterwards. Go ahead and run “mkdir ~/rawpodcasts” to create the directory that podcasts will be stored in.

sudo vim /usr/bin/podracer
(it’s okay, Podracer is a shell script). Find the line that says
m3u=$(date +%Y-%m-%d)-podcasts.m3u
and comment it out so that podracer won’t automatically create an .m3u playlist as it downloads podcasts.

Run podracer in “catchup” mode to avoid downloading all the old podcasts from your subscriptions with “podracer -c”. podracer will create a file ~/.podracer/podcast.log to keep a record of all the podcasts that have been downloaded (the “-c” catchup mode creates this text file without actually downloading the MP3s). If you want to re-download a file (e.g. while you’re testing your configuration), you can edit the file ~/.podracer/podcast.log and just delete the line for any MP3 you want to re-download.

Step 2: Install and configure mp3splt (optional)

At a terminal window, type “sudo apt-get install mp3splt”. In Step 1, we configured Podracer to download podcasts as MP3s into a “rawpodcasts” directory. In this step, we’re going to take those long MP3s and split them into individual segments into a new “finishedpodcasts” directory. Make the “finishedpodcasts” directory with the command “mkdir ~/finishedpodcasts”.

Make a file /home/username/download-mp3s-and-process.sh that looks like this.

#!/bin/bash

# Run podracer to download any new podcasts
/usr/bin/podracer

# Now split the podcasts into segments
for i in /home/username/rawpodcasts/*.mp3
do
nicename=`basename $i .mp3`
# Send both stderr and stdout to /dev/null so that this is a quiet cron job
mp3splt -eqd /home/username/finishedpodcasts -o $nicename-@n $i &> /dev/null
done

This script will run podracer to download any new podcasts. Then we list all the MP3 files in the rawpodcasts directory and run mp3splt on each podcast. If you had a file test.mp3, you would be running the command

“mp3splt -eqd /home/matt/finishedpodcasts -o test-@n test.mp3 &> /dev/null”

for example. What do the options to mp3splt mean?

-e means “split on sync errors.” If someone created an mp3 by concatenating multiple mp3s (e.g. with a program such as mp3wrap), that could cause sync errors. mp3splt looks at those sync errors to split the concatenated mp3 back into multiple mp3 files.

-q stands for “quiet.” Don’t ask user to respond to any questions. Normally “-e” says something like

Mp3Splt 2.1 (2004/Sep/28) by Matteo Trotta
THIS SOFTWARE COMES WITH ABSOLUTELY NO WARRANTY! USE AT YOUR OWN RISK!
MPEG 1 Layer 3 – 44100 Hz – Joint Stereo – 256 Kb/s – Total time: 35m.04s
Processing file to detect possible split points, please wait…
Total tracks found: 6
Is this a reasonable number of tracks for this file? (y/n)

Quiet mode suppresses this interactive question on the last two lines above.

-d is the directory to place the split mp3s.

-o lets you specific an output file. “@n” stands for the track number after splitting. So if test.mp3 were made out of two mp3 files, the output of the command above would be two files (in the finishedpodcasts directory) named test.mp3-001.mp3 and test.mp3-002.mp3 . It doesn’t hurt to run mp3splt on existing mp3s because it will just overwrite any old files that had been created.

Step 3: Periodically download and process podcasts

To download podcast files periodically and process them, make a crontab entry for podracer or your script. This will make the cron daemon run your script every few hours to download new mp3s.

I typed “crontab -e” and made the file look like this:

# At 3:03 am, 8:03 am, 10:03 am, 12:03 pm, and 4:03 pm, run this script
3 3,8,10,12,16 * * * /home/username/download-mp3s-and-process.sh

Whenever you’re ready to put the podcasts on some type of media (SD Card, iPod, iPhone, whatever), just copy over anything from the finishedpodcasts directory (if you used mp3splt in step 2) or the rawpodcasts directory if you skipped step 2. Then delete anything left over in either directory.

23 Responses to Download, slice and dice podcasts on Linux (Leave a comment)

  1. For those who got lost on the first line like me. Cron is a time based scheduling software, from the greek word chronos meaning time.

  2. Dave (original)

    Chronological πŸ™‚

  3. Sometimes, it’s exactly this kind of stuff that makes me love Linux… the ability to get something to work because you have all the tools and information at your disposal. It’s a fun challenge.

    But at the same time, when you just NEED something to work quickly, it can be a pain in the butt. Thankfully, you’ve saved me that effort! πŸ™‚

  4. Matt, Do you have two system on your machine?

  5. I am sorry, i mean two OS

  6. Xianhong, nope — I have one Windows computer and one Linux computer.

  7. Wot, no Mac? πŸ˜‰

  8. Matthew Anderson

    Matt isn’t a Mac Person πŸ™‚

    He wrote in July last year:

    “I decided to hold off on getting one to see how well the iPhone worked for my wife (she’s a Mac person and I’m… not).”

  9. And Harith wins the award for person spending most time reading Matt’s blog!

  10. Matthew Anderson,

    Thanks for the kind words. I read most but not all what Matt write.

    I guess, I “save” in mind some details of what I read under specific key words. When I need to recall those details, I use GOOG to search them under those specific keywords. One of the few advantages of being a SEO, you may say πŸ™‚

    Matt discovered that already in May 2007 and wrote : “Harith, you have an amazing memory for details.” πŸ™‚

  11. Nice tips, Chronological.. just like this blog.

  12. Matt have you ever thought of doing a post on why your not a Mac person? I would be interesting in reading that review. I have thought about switching to a Mac at times, but still have not done it. No real reason other than I am used to pc’s

  13. Bruce, I leave open the possibility that I may want to switch to a Mac at some point. The romantic in me hopes that Linux can still do well on the desktop though. πŸ™‚

  14. Matt,

    Rumor has it that Emmy has left the house last Friday to the plex and started tinkering at the data centers. When do you expect her back home? πŸ™‚

  15. Forget The Daily Search Cast… you need SEO 101. πŸ˜‰

    Your instructions make it seem quick and painless. Nice. Just need some way to automatically delete those you’ve listened to like iTunes does and I’ll consider switching.

  16. Awesome tutorial. I’ve left my ubuntu box growing dust until just the other week when I upgraded to 7.10 from my old Dapper version. I hope to get my audio working in screencast mode so I can contribute a few tutorials focused at the entry level user/webmaster…eg adding gFTP, skype, some basic GIMP tutorials. Thanks again for this tutorial, it was very clear. I’ll give it a whirl tomorrow. :thumbsup

    – Sean

  17. “Rumor has it that Emmy has left the house last Friday to the plex and started tinkering at the data centers.”

    Nope nope. For one thing, she’s a stay-at-home cat, Harith. Emmy also has the ability to shut off a computer merely by walking across a keyboard, so we have to keep her far away from Google data centers.

    Added: Harith, you can see where Emmy might want to hide in a backpack to get to a data center though. I have to be careful that I don’t pick up a stowaway hiding in my backpack. πŸ™‚

    Emmy in a backpack

  18. Matt,

    Ok. So she stayed home away from the DCs πŸ™‚

    Emmy girl: You look great!

  19. Have a look at SongBird – it’s cross-platform and based on the Mozilla engine.

    Watch their little video, as it’s actually a really good introduction as to what it can do.

  20. I think stowaway cat takes the cake! I could work on tech issues all day with companionship like that. Mine does the same, likes to type out gibberish as she walks on the keyboard… perhaps IM buddies I don’t know about.

  21. Thanks for the post. One of Linux’s strong point is its fantastic cron. U can use cron to do a lot of things in this case u’re looking for and downloading new podcasts.


  22. # Run podracer to download any new podcasts
    /usr/bin/podracer

    is your -c option missing here?
    keep up the good work, rgds, W

  23. Hi there Matt and other readers. Just wanted to thank you for such a clear, concise, instructional post, that does EXACTLY what I needed! Up until now I have been manually downloading my podcasts using Firefox Live Feeds cut and pasted into a terminal ssh session, running aria2c all during my ISP “peak” time. No more! Now first thing in the morning, I can open my ~/Downloads/Podcasts directory I my already running session of mocp and there they all there from the previous night! And all downloaded in my ISPs offpeak hours! I know there is nothing revolutionary in these instructions, but it just worked for me! So thank you very much and keep up the terrific work!
    Bill

css.php