Export Blogger archive to Google Calendar or iCal

Here is a way to export a Blogger archive file to your Google Calendar, or any iCal-based calendar program.


Background: Blogger .xml archive -> iCal .ics file


Basically, I was looking for a way to recreate the sort of "Timehop" feature where what you were doing on social media "On this Day..." would resurface on that day years later. You know, like with Facebook's "On this Day" feature or Google Photo's "Rediscover this Day". But I couldn't find any such thing for a Blogger blog, and you might have years of old embarrassing posts that you'd love to relive. Why leave them just sitting unread? Relive them each day by having those old posts appear (either as links or full entries) in your calendar.

So I haphazardly found a way to import those posts into Calendar, so that I can set them as recurring events. Now I'm no programmer or app author. Basically, I just manually edited my Blogger archive file with a bunch of Find/Replace commands, until it met the standards of an iCal file for importing to Google Calendar. In other words, I manually transformed the file from one type to another. It's a totally do-it-yourself way.

Now like I said, I'm no expert and I am 100% sure there are better ways to do this, and someone much smarter than me could probably write a self-contained program or macro to do this automatically. But I've never found a program like that to convert Blogger archive .xml files to .iCal files. So I had to make do with my own limited knowledge.

But I'm hoping that following these copy/paste instructions can save you the hassle of figuring out the code manually, and can help semi-automate this process so that you can convert over the file quickly and hopefully painlessly. Just copy/paste these strings into the Find/Replace box and in about 5~10 minutes you should be ready to go.

Tool: Notepad++


The only tool you will need is a text editor program called Notepad++.
I chose this because it's free, open-source, and works great with regular expressions. I used version 6.8.8.

Notepad++ Find/Replace dialog box
Find/Replace dialog box in Notepad++

This is the Find/Replace dialog box in Notepad++. We are going to basically use this and only this. A lot. So get comfortable with it.

And before we start double-check that "Regular expressions" is selected and ".matches newline" is checked.



The iCal format


So a basic iCal file follows this format. I need to make my blog archive file look like this. This will be the template style that we're aiming for.

BEGIN:VCALENDAR
PRODID:<Test>
VERSION:2.0
BEGIN:VEVENT
DTSTART;VALUE=DATE:20160130
RRULE:FREQ=YEARLY
DESCRIPTION:here is some entry content. Looking good my man.
SUMMARY:here is the event title aka blog post title
END:VEVENT
END:VCALENDAR

As far as I could tell, Google will only successfully import the file if you have these items as a minimum. OK, let's start.

Open the Blogger archive .xml file in Notepad++.

Part 1 - Preliminary file clean-up


Step 1.1 - Beautify (Optional)


You don't have to do this, but I recommend it. The big block of code on your screen is ugly, confusing, and unorganized, and it's difficult to see where items start and stop. So I suggest installing the "XML Tools" plugin via Notepad++'s Plugin Manager.

Once it's installed, find it in the plugins menu and choose the menu option "Pretty Print - XML only". Now the code looks organized and clear.

Step 1.2 - Clear the junk out


OK, now let's start the editing. Remember, for everything we do, ensure in the Find/Replace box that "Regular expressions" and the ".matches new line" boxes are checked.

[We just want your entry data, but the Blogger archive file includes lots of information about your settings, template, etc. It seems to store this data as unused blog "entries." Since we don't need that data and only want the post content, let's remove it all.]

Use the "Find..." command to find this in the file:
BLOG_USE_LIGHTBOX

You'll probably get back three results, but all are in the same blog entry (i.e. between a pair of <entry> and </entry> tags). Which ever entry includes this BLOG_USE_LIGHTBOX code will be the final of the 'useless' unused blog entires, meaning your real, first, actual blog post starts after this entry.

So with your eyes, look down a few lines from the last BLOG_USE_LIGHTBOX and find the first <entry> tag just below. Overall, for my test archive file, this was around line 3460.There are lots of <entry> tags so make sure you've got the right one.

Now delete EVERYTHING above that <entry> tag.

You'll be left with just actual blog posts.

Step 1.3 - Delete useless blog post info 


Now we start using Find/Replace to remove the bits that are useless to us. So use the Find/Replace function to Find each of these items (yes, one at a time because I'm not a programmer) and Replace them with nothing (leave the Replace box blank).

These functions will find these tags and all the content between them, and remove it. Just paste each of these one at a time into the "Find" box, make sure the "Replace" box is empty, and hit the "Replace All" button. Repeat for each:


  • <id.*?</id>
  • <author.*?</author>
  • <updated.*?/>
  • <media.*?/>
  • <category.*?/>
  • </title>
  • <link rel='edit'.*?/>
  • <link rel='self'.*?/>
  • <link rel='replies'.*?/>
  • <thr.*?/thr:total>
  • <thr:in-reply-to.*?/>
  • <gd:extendedProperty.*?/>


Note: depending on your blog some of these items might not be found anyway. No problem.

Part 2 - Start replacing the tags


Step 2.1 - Location data


You should decide if you want your blog posts' geotag/location data kept and used as the location for the calendar events.

Step 2.1 A - Preserve it!


If you want this preserved, do this:

Find:
<georss:featurename>
and Replace it with:
LOCATION:

and

Find:
</georss:featurename>.*?</georss:box>
and Replace it with nothing (i.e. leave the box blank)


Step 2.1 B - Remove it


If you do not want locations included, or if you never geotaggeg your blog posts, then you can remove all location info with this:

Find:
<georss:featurename>.*?</georss:box>
and Replace it with nothing

Step 2.2 - Replace Blogger tags with iCal-friendly tags


Now it's time to do some replacement. Just Find/Replace these sets:

Find:
<published>
and Replace it with:
DTSTART;VALUE=DATE:

This will set the blog post in the calendar to the date of the blog post. I opted to ensure the set blog post is used (whether you published it then or had manually back/forward dated it) instead of the date the entry was last updated.

Find:
<entry>
and Replace with:
BEGIN:VEVENT

and

Find:
</entry>
and Replace with:
END:VEVENT

These will make each blog post its own event.

Find:
</feed>
and Replace with:
END:VCALENDAR

This will mark the end of the blog archive as the end of your imported calendar data.

Step 2.3 - Blog post title as event title


Find:
<title type='text'>
and Replace with:
SUMMARY:

That will set the blog post's title as the event title. So what you see as the entry on your calendar will be this. You don't have to do this, of course. We're starting to get into the "what you feel like" part.

Optionally, you could add some sort of prefix here if you wanted. For example, instead of Replace with just SUMMARY you could use SUMMARY: Blog Post- so your calendar entry can be more visually distinguished from other normal calendar events.

Part 3 - Calendar entry content


Step 3.1 - Choose the entry content: Link or post?


Now we need to decide what's going to go in the event description.

  • Do you want the entire post's content in there, so that you can read the whole post in your calendar? 
  • Or do you want just a link to your original blog post?


Step 3.1 A - Blog content as Description


If you want the entirety of each post's content copied to each corresponding calendar entry, do this:

Find:
<content type='html'>
and Replace with:
DESCRIPTION:

Then Find the following items, Replacing each with nothing (i.e. leave the box blank):


  • </content>
  • <link rel='alternate' type='text/html' href='
  • ' title='.*?'/>


This will leave a link to the original post at the end of the entry. If your post was a draft in Blogger, it won't have a link because it was never published.

Step 3.1 B - A link back to original post as Description


If you just want a link back to the original post in your calendar event:

Find:
<link rel='alternate' type='text/html' href='
and Replace with:
DESCRIPTION:

Then Find the following items, Replacing each with nothing (i.e. leave the box blank):


  • ' title='.*?/>
  • <content type=.*?</content>

Part 4 - Clean Up


Now we need to clean up the number formats to make the Blogger timestamps fit well with a Calendar app.

One problem is that your blog archive file has whatever time zone setting your blog had. So the publish times are going to be off. But I don't really care about accurate hours, just accurate dates. So I'm going to make my life more simple and just remove the timestamps.

Optional: You could edit this to keep the timestamps and, for example, just change the time zone marker to "Z" so it thinks the posting time was in GMT. That's easiest. Then you'd have to remove the colons separating the hour:minute:seconds. And go back and remove ";VALUE=DATE"

But I just want the dates (this will make the blog post an "all day" event on your calendar).

So let's remove the time-stamps and clean up the date-stamps. But before that, decide:

Step 4.1 - Repeat or Not?


Decide if you just want your blog posts exported to the calendar, on just the dates when they were posted, or if you want them to repeat annually. I like that whole "time hop" on "On this Day" feeling, so I perfer to have them repeat annually.

Step 4.1 A - No repeat


For no repeat, and just a proper archive, then run this Find/Replace task:

Find:
T(\d+):.*?</published>
and Replace with nothing (i.e. leave the box blank)

Step 4.1 B - Repeat annually 


But if, like me, you like the whole "time hop" reminder, and would like to see each post repeat on its same day each year, run this task instead. I highly recommend this, as it's a great way to revisit your content.

Find:
T(\d+):.*?</published>
and Replace with:
\nRRULE:FREQ=YEARLY

and

Find:
(\d+)-(\d+)-(\d+)
and Replace it with:
$1$2$3

This will remove the hyphens from the date format Blogger uses. We need just a pure series of numbers. Hat tip to http://stackoverflow.com/a/25627871


Step 4.2 - Header


Congrats, we're almost done. Now just manually go add this to the very top of the page:

BEGIN:VCALENDARPRODID:<Test>VERSION:2.0

Finally, it's time to clean out any messy tab spaces that are left over. It's important that each item be at the start of a new line. There can be extra blank lines between, but everything needs to be far-left as possible. So let's remove any errant tab spaces:

Find:
\t+
and Replace with nothing (i.e. leave the box blank)

Step 4.3 - Save


Now just save the file as plain text ("Normal Text File" in the drop-down menu in the Save dialog box).

Before saving, rename the extension from .txt to .ics

Step 4.4 - Import file


You can now import the file into your Google Calendar or whatever calendar app.

Final Thoughts


Play around with this and find the best method that works for you. For example you might want to better format the post content if you chose to show the whole original post inside the calendar. Links in your original blog post will stay in the calendar event description (if you chose to keep the full content) but images of course won't display (links to the images will be there though).

Just have fun and I hope you found this helpful. I'm no programmer, but just spent a few hours playing around with this. It's a good way to resurface old memories, and gives another back-up option besides just your hosted blog, or a dead archive file sitting on your hard drive.

Good luck and enjoy revisiting all those old blog memories.

Comments