Wordpress to Contentful migration
Contentful is one of the most prominent headless CMSs at the moment. As such it provides you all the content management capabilities of traditional tools like WordPress while decoupling the delivery of this content via an API that can be used to retrieve content.
Headless CMS are quickly turning from the new kid on the block intro everyone's favourite. And I totally understand why: it makes thinking about building a frontend so much easier, for once you actually know what is happening and we have great separation of concerns. For me as a webdev this is great stuff!
But it is also work, after all these old and dusty WordPress sites need to be migrated to those new an shiny, Contentful powered React based single page applications ✨ I have been building for clients. Luckily Contentful provides a range of tools to make this easier for. While the contentful-export and contentful-import can help you migrate Content Models and Entries from space to space the Content Management API helps you to manage your content. Thus today, let us take a look at how you can utilize it to migrate a blog from Wordpress to Contentful.
You should understand this post as a guide and as basic examples. Code presented got my job done but will surely need adoption for your usecase.
Planning
Our end goal is to create blogposts, categories and assets in Contentful. For this we already have a content model set up. Many specifics of what follows will change according to the Content Model you are setting up.
Before we dive into the specifics of APIs lets make a plan on how to tackle this. The steps we are going to go through are:
- Get posts from Contentful
- Do some first processing on the posts
- Get categories from Wordpress
- Create a list of needed assets from Wordpress
- Create and publish assets in Contentful
- Create and publish categories in Contentful
- Create, link and publish blogposts.
1. Getting posts from Wordpress
To achieve this we will use the JSON based REST API Wordpress provides by default.
Using this we will first crawl all pages for posts, making requests to wp-json/wp/v2/posts?page=[pageNumber]
until we get a 400 back.
const exportBlogposts = (apiUrl, log) =>
new Promise((resolve) => {
const exportPageOfPosts = (apiUrl, page = 1, allPosts = []) => {
log(`Getting posts for page ${page}`)
const url = `${apiUrl}?page=${page}`
https
.get(url, (res) => {
// When we get a 404 back we went one page over those with posts.
// So we are done now.
if (res.statusCode === 400) {
return resolve(allPosts)
}
let result = ''
res.on('data', (d) => {
result += d.toString()
})
res.on('end', async () => {
blogPosts = JSON.parse(result)
return exportPageOfPosts(
apiUrl,
page + 1,
allPosts.concat(blogPosts)
)
})
})
.on('error', (e) => {
throw Error('Error while exporting blogposts', e)
})
}
exportPageOfPosts(apiUrl)
})
I am handing a log
function for logging here because I tied all pieces together using listr and wanted nicer integration for logs.
Once this function resolves we will have an array containing all blogposts from our Wordpress blog and are ready to do some first processing on them.
2. Prepare posts for usage
I generally like to prepare data for myself so that in later stages and during debugging I have a better overview. Wordpress gives us a lot of information that we don't really need so lets get rid of that.
At the same time there is some information we would like that this API call has not given us. Mainly images that are present within the post. We could make more API requests to get this information. For this solution however we decided to extract image source urls and alt texts from the posts body using regular expressions.
const transformPosts = (posts) =>
posts.map((post) => {
delete post._links
delete post.guid
delete post.excerpt
delete post.author
delete post.comment_status
delete post.ping_status
delete post.template
delete post.format
delete post.meta
delete post.status
delete post.type
post.publishDate = post.date_gmt + '+00:00'
delete post.date_gmt
delete post.date
delete post.modified
delete post.modified_gmt
delete post.tags
delete post.sticky
post.body = `<div>${post.content.rendered}</div>`
delete post.content
post.title = post.title.rendered
post.slug = post.slug
post.category = post.categories[0]
delete post.categories
return extractBodyImages(post)
})
const extractBodyImages = (post) => {
const regex = /<img.*?src="(.*?)"[\s\S]*?alt="(.*?)"/g
post.bodyImages = []
while ((foundImage = regex.exec(post.body))) {
const alt = foundImage[2] ? foundImage[2].replace(/_/g, ' ') : ''
post.bodyImages.push({
link: foundImage[1],
description: alt,
title: alt,
postId: post.id,
})
}
return post
}
In the above code we move information around and delete other as a simple clean up and to have nicer representation later on. Wrapping the body in a <div>
tag is a curiosity of our system down the line where we use marked to generate the output on my final website where marked has trouble if there is no ingle top level element.
3. Get categories from Wordpress
After all we will be doing some more requests to the REST API. To make the following code easier I have a helper function called getJSON
that given a URL will resolve with an object representing the JSON present at that URL.
const generateAssetsList = (posts, baseUrl, simpleLog = console.log) =>
new Promise(async (resolve) => {
const apiURL = `${baseUrl.replace(/\/$/, '')}/wp-json/wp/v2/categories`
// First reduce posts to an array of category numbers.
simpleLog('Reducing posts to category numbers')
const categories = await Promise.all(
posts
.reduce((all, post) => {
if (!post.category) return all
if (all.indexOf(post.category) > -1) return all
return all.concat([post.category])
}, [])
.map(async (categoryNumber) => {
simpleLog(`Getting information about categories`)
const categoryData = await getJSON(`${apiURL}/${categoryNumber}`)
return {
categoryNumber,
name: categoryData.name,
slug: categoryData.slug,
description: categoryData.description,
}
})
)
resolve(categories)
})
As shown above we first reduce all our blog posts to a list of categories that we need. This way we will not get information about the same category multiple times.
Once we have unique categories we fetch all of them from the API and resolve with the gained information.
4. List all needed assets
We have two types of images that we might care about:
- featured images used as teasers and header images for blogposts
- images within the body of posts
As a result of this step we are looking for an array that looks a bit like:
[{
link: 'link to wordpress iage.jpg',
description: 'describe the image',
title: 'and title it',
postId: 'because linking back is nice'
}, ...]
To create Assets in Contentful we need to pass in a link to get the Asset from as well as a title and description for the Asset. We already created these types of objects for all images in the body of our posts when we first processed them. Which leaves us the first case of features images and to create one nicely flattened array.
const generateAssetsList = (posts, baseUrl, simpleLog = console.log) =>
new Promise(async (resolve) => {
const apiURL = `${baseUrl.replace(/\/$/, '')}/wp-json/wp/v2/media`
simpleLog('Reducing posts to asset numbers')
let infosFetched = 0
// First add the featured_media images and get ther URLs.
const featuredAssets = await Promise.all(
posts
.reduce((all, post) => {
if (!post.featured_media) return all
return all.concat([
{
mediaNumber: post.featured_media,
postId: post.id,
},
])
}, [])
.map(async ({ mediaNumber, postId }, i, array) => {
const featuredMedia = await getJSON(`${apiURL}/${mediaNumber}`)
infosFetched += 1
simpleLog(`Getting info about assets ${infosFetched}/${array.length}`)
return {
mediaNumber,
link: featuredMedia.guid.rendered,
title: featuredMedia.title.rendered || `asset${i}`,
description: featuredMedia.alt_text || '',
postId,
}
// After all this we also add images from the body of posts.
})
)
const assets = featuredAssets.concat(
posts.reduce((all, post) => {
const images = post.bodyImages ? post.bodyImages : []
return all.concat(images)
}, [])
)
resolve(assets)
})
Here you can see how I did this one a bit later on and renamed log
to simpleLog
with a default to console.log
. Keep in mind that this is not meant to be perfect but working and inspiring code 😉
First we get the actual assets from Wordpress using their media numbers. After we get that we can also create objects representing featured assets and finally put them together with the body assets all the while doing some nice logging on the side.
Interlude: Contentful API
Good job 👍
We got halfway done. We have all the information we need to start creating assets and entries in Contentful. This is the perfect time to go and read up on the Management API. Or at least skim over Entry creation and Asset creation.
One thing you will notice for sure (since you went to those links and skimmed the code sample, didn't you?) is how Contentful always includes locales in their requests. In fact fields first have locale attributes which then hold the actual content.
Apart from that the API, especially with the SDK, is straight forward to use. Just keep in mind that we need to publish Assets and Entries after creating them.
5. Create and publish assets in Contentful
Ohh boy would it be fun to read all of this function right here. But let me save you from that and present you with some pseudo code while you can view the function in this gist.
const createAndPublishAssets = () => {
space = await getTheRightSpace()
const createAndPublishSingleAsset = () => {
let asset
try {
asset = await createAssetInContentful()
}
try {
asset = await asset.publish()
}
createAndPublishSingleAsset( nextAsset )
}
createAndPublishSingleAsset( firstAsset )
}
That should give you a feeling for the general workflow.
Once you let this part run, go and grab a hot chocolate because for our ~400 assets this took over ten minutes to finish.
6. Creating and publishing categories in Contentful
After mastering the Assets this will be a walk in the park. This code follows the same pattern as explained in the pseudo code for assets but is short enough to present you the entire thing right here:
const createAndPublishCategories = async (
categories,
spaceId,
managementToken,
simpleLog = console.log
) => {
const client = contentful.createClient({
accessToken: managementToken,
logHandler: (level, data) => simpleLog(`${level} | ${data}`),
})
const space = await client.getSpace(spaceId)
const createdCategories = await Promise.all(
categories.map(
(category) =>
new Promise(async (resolve) => {
let cmsCategory
try {
cmsCategory = await space.createEntry('blogCategory', {
fields: {
categoryName: {
'en-US': category.name,
},
},
})
} catch (e) {
throw Error(e)
}
try {
await cmsCategory.publish()
} catch (e) {
throw Error(e)
}
// Save mapping information to contentful.
cmsCategory.wpCategory = category
resolve(cmsCategory)
})
)
)
return createdCategories
}
Really not much magic involved. The important thing is to keep a reference between your Entries and Assets in Contentful on the one hand and the original posts and images from Wordpress on the other. I decided to achieve this by adding wpCategory
as an attribute of the Asset created in Contentful.
On to the final step!
7. Creating, linking and publishing blogposts
This is where it all comes together. Now that we have our images created as Assets in Contentful and our categories created as entries with the appropriate type our blogposts will tie all of this together.
Now keep in mind: we need to replace the links to our images with the one we got from the Assets we created in Contentful and we need to link the right category. Lets take a quick peek at linking entries through the Content Management SDK.
category: {
'en-US': {
sys: {
type: 'Link',
linkType: 'Entry',
id: categoryId
}
}
}
And with that piece of magic 🧙 in place, I present to you the creation of blogposts in this gist, because ain't nobody going to read a post with that much code in it.
Make the pieces work together
Now we have all the pieces to make our migration work.
I decided to create one script per step and have them work in two ways:
- From the commandline creating files in between
- Programatically together with listr as a taskrunner
This way I could easily run single steps during development and have a single command with pretty output as the final result.
Photo credits: Dog by Emerson Peters on Unsplash