Ein Baum und seine Abenteuer



Wordpress to Contentful migration

Contentful is one of the most prominent headless CMSs at the moment. As such it provides you all the content management capabilities of traditional tools like WordPress while decoupling the delivery of this content via an API that can be used to retrieve content.

Headless CMS are quickly turning from the new kid on the block intro everyone's favourite. And I totally understand why: it makes thinking about building a frontend so much easier, for once you actually know what is happening and we have great separation of concerns. For me as a webdev this is great stuff!

But it is also work, after all these old and dusty WordPress sites need to be migrated to those new an shiny, Contentful powered React based single page applications ✨ I have been building for clients. Luckily Contentful provides a range of tools to make this easier for. While the contentful-export and contentful-import can help you migrate Content Models and Entries from space to space the Content Management API helps you to manage your content. Thus today, let us take a look at how you can utilize it to migrate a blog from Wordpress to Contentful.

Lets go on a journey.
Lets go on a journey.

You should understand this post as a guide and as basic examples. Code presented got my job done but will surely need adoption for your usecase.

Planning

Our end goal is to create blogposts, categories and assets in Contentful. For this we already have a content model set up. Many specifics of what follows will change according to the Content Model you are setting up.

Before we dive into the specifics of APIs lets make a plan on how to tackle this. The steps we are going to go through are:

  1. Get posts from Contentful
  2. Do some first processing on the posts
  3. Get categories from Wordpress
  4. Create a list of needed assets from Wordpress
  5. Create and publish assets in Contentful
  6. Create and publish categories in Contentful
  7. Create, link and publish blogposts.

1. Getting posts from Wordpress

To achieve this we will use the JSON based REST API Wordpress provides by default.

Using this we will first crawl all pages for posts, making requests to wp-json/wp/v2/posts?page=[pageNumber] until we get a 400 back.

const exportBlogposts = (apiUrl, log) =>
  new Promise((resolve) => {
    const exportPageOfPosts = (apiUrl, page = 1, allPosts = []) => {
      log(`Getting posts for page ${page}`)
      const url = `${apiUrl}?page=${page}`
      https
        .get(url, (res) => {
          // When we get a 404 back we went one page over those with posts.
          // So we are done now.
          if (res.statusCode === 400) {
            return resolve(allPosts)
          }
          let result = ''

          res.on('data', (d) => {
            result += d.toString()
          })

          res.on('end', async () => {
            blogPosts = JSON.parse(result)
            return exportPageOfPosts(
              apiUrl,
              page + 1,
              allPosts.concat(blogPosts)
            )
          })
        })
        .on('error', (e) => {
          throw Error('Error while exporting blogposts', e)
        })
    }
    exportPageOfPosts(apiUrl)
  })

I am handing a log function for logging here because I tied all pieces together using listr and wanted nicer integration for logs.

Once this function resolves we will have an array containing all blogposts from our Wordpress blog and are ready to do some first processing on them.

2. Prepare posts for usage

I generally like to prepare data for myself so that in later stages and during debugging I have a better overview. Wordpress gives us a lot of information that we don't really need so lets get rid of that.

At the same time there is some information we would like that this API call has not given us. Mainly images that are present within the post. We could make more API requests to get this information. For this solution however we decided to extract image source urls and alt texts from the posts body using regular expressions.

const transformPosts = (posts) =>
  posts.map((post) => {
    delete post._links
    delete post.guid
    delete post.excerpt
    delete post.author
    delete post.comment_status
    delete post.ping_status
    delete post.template
    delete post.format
    delete post.meta
    delete post.status
    delete post.type
    post.publishDate = post.date_gmt + '+00:00'
    delete post.date_gmt
    delete post.date
    delete post.modified
    delete post.modified_gmt
    delete post.tags
    delete post.sticky
    post.body = `<div>${post.content.rendered}</div>`
    delete post.content
    post.title = post.title.rendered
    post.slug = post.slug
    post.category = post.categories[0]
    delete post.categories
    return extractBodyImages(post)
  })

const extractBodyImages = (post) => {
  const regex = /<img.*?src="(.*?)"[\s\S]*?alt="(.*?)"/g
  post.bodyImages = []
  while ((foundImage = regex.exec(post.body))) {
    const alt = foundImage[2] ? foundImage[2].replace(/_/g, ' ') : ''
    post.bodyImages.push({
      link: foundImage[1],
      description: alt,
      title: alt,
      postId: post.id,
    })
  }
  return post
}

In the above code we move information around and delete other as a simple clean up and to have nicer representation later on. Wrapping the body in a <div> tag is a curiosity of our system down the line where we use marked to generate the output on my final website where marked has trouble if there is no ingle top level element.

3. Get categories from Wordpress

After all we will be doing some more requests to the REST API. To make the following code easier I have a helper function called getJSON that given a URL will resolve with an object representing the JSON present at that URL.

const generateAssetsList = (posts, baseUrl, simpleLog = console.log) =>
  new Promise(async (resolve) => {
    const apiURL = `${baseUrl.replace(/\/$/, '')}/wp-json/wp/v2/categories`
    // First reduce posts to an array of category numbers.
    simpleLog('Reducing posts to category numbers')
    const categories = await Promise.all(
      posts
        .reduce((all, post) => {
          if (!post.category) return all
          if (all.indexOf(post.category) > -1) return all
          return all.concat([post.category])
        }, [])
        .map(async (categoryNumber) => {
          simpleLog(`Getting information about categories`)
          const categoryData = await getJSON(`${apiURL}/${categoryNumber}`)
          return {
            categoryNumber,
            name: categoryData.name,
            slug: categoryData.slug,
            description: categoryData.description,
          }
        })
    )
    resolve(categories)
  })

As shown above we first reduce all our blog posts to a list of categories that we need. This way we will not get information about the same category multiple times.

Once we have unique categories we fetch all of them from the API and resolve with the gained information.

4. List all needed assets

We have two types of images that we might care about:

  1. featured images used as teasers and header images for blogposts
  2. images within the body of posts

As a result of this step we are looking for an array that looks a bit like:

[{
  link: 'link to wordpress iage.jpg',
  description: 'describe the image',
  title: 'and title it',
  postId: 'because linking back is nice'
}, ...]

To create Assets in Contentful we need to pass in a link to get the Asset from as well as a title and description for the Asset. We already created these types of objects for all images in the body of our posts when we first processed them. Which leaves us the first case of features images and to create one nicely flattened array.

const generateAssetsList = (posts, baseUrl, simpleLog = console.log) =>
  new Promise(async (resolve) => {
    const apiURL = `${baseUrl.replace(/\/$/, '')}/wp-json/wp/v2/media`
    simpleLog('Reducing posts to asset numbers')
    let infosFetched = 0

    // First add the featured_media images and get ther URLs.
    const featuredAssets = await Promise.all(
      posts
        .reduce((all, post) => {
          if (!post.featured_media) return all
          return all.concat([
            {
              mediaNumber: post.featured_media,
              postId: post.id,
            },
          ])
        }, [])
        .map(async ({ mediaNumber, postId }, i, array) => {
          const featuredMedia = await getJSON(`${apiURL}/${mediaNumber}`)
          infosFetched += 1
          simpleLog(`Getting info about assets ${infosFetched}/${array.length}`)
          return {
            mediaNumber,
            link: featuredMedia.guid.rendered,
            title: featuredMedia.title.rendered || `asset${i}`,
            description: featuredMedia.alt_text || '',
            postId,
          }
          // After all this we also add images from the body of posts.
        })
    )
    const assets = featuredAssets.concat(
      posts.reduce((all, post) => {
        const images = post.bodyImages ? post.bodyImages : []
        return all.concat(images)
      }, [])
    )

    resolve(assets)
  })

Here you can see how I did this one a bit later on and renamed log to simpleLog with a default to console.log. Keep in mind that this is not meant to be perfect but working and inspiring code 😉

First we get the actual assets from Wordpress using their media numbers. After we get that we can also create objects representing featured assets and finally put them together with the body assets all the while doing some nice logging on the side.

Interlude: Contentful API

Good job 👍

We got halfway done. We have all the information we need to start creating assets and entries in Contentful. This is the perfect time to go and read up on the Management API. Or at least skim over Entry creation and Asset creation.

One thing you will notice for sure (since you went to those links and skimmed the code sample, didn't you?) is how Contentful always includes locales in their requests. In fact fields first have locale attributes which then hold the actual content.

Apart from that the API, especially with the SDK, is straight forward to use. Just keep in mind that we need to publish Assets and Entries after creating them.

5. Create and publish assets in Contentful

Ohh boy would it be fun to read all of this function right here. But let me save you from that and present you with some pseudo code while you can view the function in this gist.

const createAndPublishAssets = () => {
  space = await getTheRightSpace()

  const createAndPublishSingleAsset = () => {
    let asset
    try {
      asset = await createAssetInContentful()
    }
    try {
      asset = await asset.publish()
    }
    createAndPublishSingleAsset( nextAsset )
  }

  createAndPublishSingleAsset( firstAsset )
}

That should give you a feeling for the general workflow.

Once you let this part run, go and grab a hot chocolate because for our ~400 assets this took over ten minutes to finish.

6. Creating and publishing categories in Contentful

After mastering the Assets this will be a walk in the park. This code follows the same pattern as explained in the pseudo code for assets but is short enough to present you the entire thing right here:

const createAndPublishCategories = async (
  categories,
  spaceId,
  managementToken,
  simpleLog = console.log
) => {
  const client = contentful.createClient({
    accessToken: managementToken,
    logHandler: (level, data) => simpleLog(`${level} | ${data}`),
  })
  const space = await client.getSpace(spaceId)
  const createdCategories = await Promise.all(
    categories.map(
      (category) =>
        new Promise(async (resolve) => {
          let cmsCategory
          try {
            cmsCategory = await space.createEntry('blogCategory', {
              fields: {
                categoryName: {
                  'en-US': category.name,
                },
              },
            })
          } catch (e) {
            throw Error(e)
          }
          try {
            await cmsCategory.publish()
          } catch (e) {
            throw Error(e)
          }

          // Save mapping information to contentful.
          cmsCategory.wpCategory = category
          resolve(cmsCategory)
        })
    )
  )
  return createdCategories
}

Really not much magic involved. The important thing is to keep a reference between your Entries and Assets in Contentful on the one hand and the original posts and images from Wordpress on the other. I decided to achieve this by adding wpCategory as an attribute of the Asset created in Contentful.

On to the final step!

7. Creating, linking and publishing blogposts

This is where it all comes together. Now that we have our images created as Assets in Contentful and our categories created as entries with the appropriate type our blogposts will tie all of this together.

Now keep in mind: we need to replace the links to our images with the one we got from the Assets we created in Contentful and we need to link the right category. Lets take a quick peek at linking entries through the Content Management SDK.

category: {
    'en-US': {
        sys: {
            type: 'Link',
            linkType: 'Entry',
            id: categoryId
        }
    }
}

And with that piece of magic 🧙 in place, I present to you the creation of blogposts in this gist, because ain't nobody going to read a post with that much code in it.

Make the pieces work together

Now we have all the pieces to make our migration work.

I decided to create one script per step and have them work in two ways:

  1. From the commandline creating files in between
  2. Programatically together with listr as a taskrunner

This way I could easily run single steps during development and have a single command with pretty output as the final result.


Photo credits: Dog by Emerson Peters on Unsplash

Author

Portrait picture of Hendrik

I am a JavaScript and GenAI Enthusiast; developer for the fun of it!
Here I write about webdev, technology, personal thoughts and anything I finds interesting.

More about me

Read next

How I fell in love with an API-first CMS

Falling in love - Image by Contentful
Falling in love - Image by Contentful

The CMS (Content Management System) was one of the first building blocks of the content driven web. The CMS marked the move away from hardcoded HTML pages, and towards our modern web in which everyone has become a content creator. They are great for businesses because the competence of building websites and managing content could not only be split in theory, but also in practice. With a CMS, we can update our website on the fly — so there is really no justification for not using a CMS.

Contentful import to a different locale

After my recent undertaking to migrate a blog from Wordpress to Contentful I was also faced with the challenge to import content to a Space that had a different locale than the Space it was exported from. This came about after we setup a "base" space for our Product that is supposed to be rolled out in multiple countries.

Holiday greetings with GenAI

Festive Greetings - ChatGPT and Midjourney
Festive Greetings - ChatGPT and Midjourney

Happy Holidays and festive greetings, powered by ChatGPT, Midjourney and a little bit of Photoshop.

Utilizing my Custom GPT for Midjourney prompts (open source on GitHub), I generated the image and some subtle variations in three rounds. Finally touching it up with a tagline in Photopea.