Here's one challenge.
http://www.actsofsilence.com/netlabels/ hosts a big list of Creative Commons Netlabels. That means that these sites upload music that's free to download and share, but it's difficult to do so because it's spread across around 300 different publishers that use different hosting methods for the music (bandcamp, archive.org, direct download, etc). I think what this would take is:
1. For each netlabel, find a way to automate the processes of:
1.a. Downloading all the music
1.b. Tagging the music files appropriately.
1.c. Checking for new publications/uploaded music that you don't have locally.
2. Write a script that, when run, creates a directory named after the relevant netlabel and performs those three actions there.
3. Write a script that, when run, takes a bunch of said netlabel-specific scripts and runs them all in a certain directory, building an encompassing library.
4. As long as the internet exists: expand the program adding CC netlabels to it.
I'm going to loop through step 1 myself, other iterators should feel free to join me, pick a netlabel from the list and look under its hood. Hacky happing!