Using Regular Expressions and Search Regex to Remove the .html from Old Blogger Permalinks

So, you’ve migrated from Blogger to Wordpress. How should you handle your internal links? The Search Regex plugin is a fast and easy way to not only change your internal urls if they contain .blogspot.com but also helps you get rid of the .html at the end of the url (if you want).

Before using this tutorial run a database backup. Neither Search Regex nor I accept liability for use of the plugin/tutorial.

Set Up Matching Permalinks

Let’s assume that you’re going to use the same year/month permalink structure as Blogger did (if not see this other tutorial AFTER you finish the current one). Go to Permalinks and select the year/month option.

Run the Maintain Blogger Permalinks Plugin

Blogger makes structures the post-slug (post title) part of permalinks differently than Wordpress. It’s shorter & they drop some words and characters. Fortunately the maintain Blogger permalinks plugin works for that. Be use to run a database backup first.

The maintain Blogger permalinks page also has some important javascript to put on your old Blogger domain if you were using .blogspot.com and some code to put in your .htaccess file on your new site (above or below the section designated as Wordpress’s) if you were using your own domain.

If this is getting too complicated, this is also a service I offer, so contact me if you need someone else to do it for you.

Strip Out .blogspot.com

Skip this if you used your own domain the whole time, but if there are any internal links that point to .blogspot.com, strip those out first.

Simply search for yourdomain.blogspot.com and replace it with yournewdomain.com. You don’t have to click Regex. First, click the Replace button and scroll down to see if it looks right. If it does, then scroll up and click Replace and Save or else it won’t be permanent.

Now Strip .html From the End of the URL

In “Search Pattern,” paste:

|domain.com/(\d*)/(\d*)/(.{1,50}).html|

(be sure to put your own domain where domain.com is!)

In “Replace Pattern,” paste:

domain.com/$1/$2/$3/

Click “Regex,” but you can ignore the other little boxes that then show up.

Click here to see what it should look like.

First, click the Replace button and scroll down to see if it looks right. If it does, then scroll up and click Replace and Save or else it won’t be permanent.

For those wondering what the {1,50} is for, that’s for posts with multiple urls ending in .html, to keep the code from selecting everything between the first and the last. Blogger permalinks have less than 40 characters, but I put the limit at 50, just to play it safe. Here’s what it looks like without limits.

Again, if this is more complicated than you’re comfortable with, this is also a service I offer, so contact me if you need someone else to do it for you.

{ 2 trackbacks }

List of Sites from my recent RegEX Re(lated) Search « Web Development Journal
July 4, 2011 at 2:11 pm
Parsing HTML, regex expression’s in Automated Editor Rules | Automated Editor Wordpress Plugin
August 13, 2011 at 9:47 am

{ 3 comments }

1 ngadimin March 30, 2011 at 5:01 pm

hi,
i was trying to remove all <div< tag from my post. I try:

@(.*?)@ pattern, but always return nothing found. can you help me with the working pattern?

thank you

2 ngadimin March 30, 2011 at 5:04 pm

i’m sorry, the pattern got striped out, here it is:
@<div .*?>(.*?)</div>@

3 john gruhler July 21, 2011 at 8:21 pm

I made a few permalink changes and I am now trying to clear the mess I made up…How can I use mod rewrite to go from the following old permalinks to the new permalink ? Google is causing me fits witht he 404 and unreachable errors…can this even be done?..jg

1. /%pagename%.html to: /%post_id%/%pagename%
2. /%pagename%/.html to: /%post_id%/%pagename%
3. /%pagename./html% to: /%post_id%/%pagename%
Please help me, I am frantic…Johng

Comments on this entry are closed.