I’m currently in the process of migrating my old Wordpress blog to Pelican. In Wordpress it is easy to export all your posts as an XML file, but this does not export the referenced images. Also there is no export-all functionality, or FTP access or something like that. Downloading all individual files manually would be too much work and I’m lazy. I prefer letting my computer doing this stupid repetitive tasks. This is way computer were invented by the way.

Luckily the exported XML file contains all the links so we can use wget to download all files for us.

Three easy steps combined in a shell one-liner can do all the work for us:

  • filter xml tags containing <wp:attachment_url> using grep
  • extract URLs using sed
  • download via wget

Example:

grep attachment_url ~/Downloads/<yourweblog>.wordpress.<date>.xml | sed 's/.*\(https:.*\)<\/.*/\1/' | wget -m

The -m options tells wget to mirror the site which keeps the folder structure. This way the converted blog entries (see pelican-import) can be updated easily to use the new images of the pelican simply by substituting the domain part of the image URLs.


Comments

comments powered by Disqus