WP-CLI batch edit thousands of posts at once

A walkthrough of making a bash script to automate editing categories on thousands of WordPress blog posts using WP-CLI commands.

In this article, I will walk through how I made a bash script using WP-CLI to edit categories on thousands of posts at once, showing you the script at various stages of revision. Hopefully, this will demonstrate the problem-solving process, so that it may help you solve something like this quicker than me.

The issue at hand

One of my WordPress clients wanted to reorganise a bunch of their posts into different categories. They have a medium size website with about 20k posts.

The client’s post categories are structured as a parent with two sub-categories – let us call them “A”, “B” and “C” respectively. All the posts in the parent category “A” and sub-categories “B” and “C” total about 5,000 posts. One of these sub-categories, “B”, is rarely used (about 200 posts). The client wants to merge both of the sub-categories, “B” and “C”, back into their parent category, “A”. Their existing setup presents us with a few problems and considerations:

  1. Manually editing thousands of posts in the WordPress admin dashboard would take a very long time.
    WordPress Admin Dashboard allows bulk/batch editing of 20 posts per action. To do what the client wants using WordPress built-in tools, they would have to spend days clicking these buttons.
  2. Must add the parent category to the sub-category posts. 
    About 3.5k posts were assigned to a sub-category (“C”) but did not have its parent category attached. When removing this sub-category in WordPress, the post would have no category association. One must add the parent category to these posts.
  3. Broken links.
    Since the URL’s for WordPress posts on my client’s website are post-names and not based on categories, this won’t be much of a problem. The only redirects will be for the category archive pages that are removed. Elsewhere, on other WordPress sites were URL’s would include categories, this would negatively impact SEO and links of older articles. The old category URL’s would need to be redirected to the new URL for this project to be completed properly, which may require the use of a plugin or rewrite rules in your theme files, however that is outside the scope of this post.

The approach used in this walkthrough

What you will need to follow this article:

  • Access to the command line/terminal and user for WP-CLI on your website.
  • Names of the categories you want to change.

This post will talk you through my process for tackling these problems. I will use bash for everything and aim for readability in my code. Note to the bash nerds: efficiency / most optimised solution isn’t exactly an issue here because we are already saving the client on having to do it manually. So, I’d rather make it very easy for someone to read this post and maybe use it as a guide to approaching this in their way.

Why not execute an SQL command on the WordPress database to achieve the same result?

Writing and executing a custom SQL query is totally acceptable. By all means, go for it yourself if you prefer, there are many ways to solve problems!

In either case, be sure to make a backup of your database before you operate on it. Mistakes that lead to corrupted data can have their damage undone when you recover from this backup. Development and staging environments serve as testing grounds anyway, make it easy for yourself to experiment with things!

I want to show how to solve this using WP-CLI because I enjoy using command line tools and I am trusting in the WP-CLI API interworkings. An API endorsed by WordPress should do everything the way WordPress expects it to so that the data and relationships are correctly updated. The only possible faults using WP-CLI should be from mistakes you make when calling the commands.

Let’s start solving – gathering data

Let’s start by capturing all of our posts by the category we want. I love plaintext; it’s easy to manipulate. So let’s gather all the post ids into a big list.

wp post list --category='Category Name' --format=ids

This command returns a list to standard output. Useful. We could store it in a variable, or redirect it to a file.

wp post list --category='Category Name' --format=ids > post_ids.txt

This command outputs a plaintext file with our post ids separated by spaces. Excellent! With thousands of ids, this may cause an issue with some tools that have line/character length limits. For the moment, let’s plough ahead with our code sketch and use this list of post ids to add the new category and remove the old category.

wp post term add <id> <taxonomy> <term>

WordPress treats categories as term taxonomies, as it does with tags. (If you are following this post and want to add or remove tags instead of categories, you can simply replace category with post_tag in these commands.)

wp post term add <post_id> category <new category slug> --quiet

Simple. I replace the id with our post id, the taxonomy with category and the term with the slug of our new category. I also add the --quiet flag, so we don’t fill up our terminal with junk when we process thousands of these. Next is an example of how to remove a category from a post.

wp post term remove <post_id> category <unwanted category slug> --quiet

Fantastic. Just had to replace “add” with “remove” and change the slug to the one we want to remove.

Put it all in a script.

Make a new file. Make this file into a script by adding the following text to the first line of the file:

#!/bin/bash

We are working on lots of posts, so the first thing we can do is make a simple loop in bash. Here is the quick sketch to get me started, which of course doesn’t work – so do not copy it! Skip to the end of the article if you want to copy the finished thing.

for i in post_ids.txt; do
  wp post term add $i category new_category --quiet;
  wp post term remove $i category old_category --quiet;
done;

That looks nice and easy, but it won’t work, it is first draft pseudo-code where I am putting my intention down. Let’s fix it because we need to write bash and not some kind of broken pseudo code.

I tend to block things out quickly and iterate on them. In the first line of the for loop, we intend for bash to open post_ids.txt and iterate through it. To achieve this, we must provide id’s instead of simply a file name. One way to provide suitable data to iterate over is to execute the cat command on the file, which produces the file contents as a string in place within the for loop.

for i in $(cat post_ids.txt); do

This line of code may look odd to anyone who uses bash, as well as anyone new to bash. The $(cat post_ids.txt) part is executing the cat command on our file, which gives us the output from the file to use as if it was a big string. Bash easily interprets spaces as a separator, so our space-separated list of post ids is ideal. There are certainly many ways of doing this. However, I find this way to be nicely readable. The slowest part of our script will likely be the accessing and updating of information through the API, not this. If you are curious, you could profile it.

You can also run it without creating a file by calling the wp post list command in the for loop instead of the cat command, since it returns the space seperated list anyway. That would work nicely if we are planning on using this for loop to cover the entire lot. However, I’m planning on doing something peculiar with that file of id’s a bit later!

Next, we step inside the for loop, do our thing, and end the for loop with the done keyword.

Let’s save our file as update_cats.sh. Now change permissions so we can execute it as a script:

chmod +x update_cats.sh

Add in a short description of the script as a comment, and the date you made it. Your file should now look something like this:

#!/bin/bash
# Get posts by category and modify categories,
# + adding <new_category> and removing <old_category>
# Date: September 2019

for i in $(cat post_ids.txt); do
  wp post term add $i category new_category --quiet;
  wp post term remove $i category old_category --quiet;
done;

Let’s flesh it out a little more with variables for categories names, and a little echo statement for us to track the speed in real-time.

#!/bin/bash
# Get posts by category and modify categories,
# + adding <new_category> and removing <old_category>
# Date: September 2019

# store category names for easy reuse
old_category='old-category-slug';
new_category='new-category-slug';
# files we are working with
working_path="$(pwd)/scratch"
post_ids_path="$working_path/post_ids.txt"

# make a list of post id's that match our category
wp post list --category=$old_category --format=ids > post_list.log;

# loop through our list of ids and replace old category with new category
echo "Please wait; this may be slow, working...";
for i in $(cat $post_ids_path) do;
  wp post term add $i category $new_category --quiet;
  wp post term remove $i category $old_category --quiet;
  echo -ne '.'; # indicate progress
done;

echo "Done";

If you run this, you may find that it is slow! It runs slow on my local web dev VM at least. We could leave it running all night long, except I don’t want to wait. I want to see if I can speed things up a bit.

How can we speed things up?

Currently, we are using WP-CLI, which is loading WordPress core, plus our plugins and theme code. We can skip all of this functionality; our case does not require anything handled by theming or plugins. Skipping plugins and themes will make a significant difference to our speed.

To do so, we add these flags to our WP-CLI commands: --skip-plugins and --skip-themes. So now our for loop looks like this:

# ...
for i in $(cat $post_ids_path) do;
  wp post term add $i category $new_category --quiet --skip-plugins --skip-themes;
  wp post term remove $i category $old_category --quiet --skip-plugins --skip-themes;
  echo -ne '.'; # indicate progress
done;
# ...

Another way we might get some kind of improvement is by splitting the id list into smaller lists and run our commands on them all at once. This way, we are using more memory and more CPU threads for executing our script (many things at once), so it should reduce the total time required to complete. Then you can finish work before 6 pm and go home satisfied with a job well done! Faster is better.

Ok, let’s split our list, for we have a life to live! Let the experiment begin.

# ...

#
# split the post ids file into a few files
#
prepare_batches() {
  log "Preparing batches"
  let count=1
  let batch=1

  for id in $(cat $post_ids_path); do
    if [ $(($count % 100)) = 0 ]; then
      log "Prepared batch $batch."
      ((batch++));
      log -ne "Batch $batch"
    fi
    echo -ne " $id" >> $working_path/post_id_batch_$batch.txt
    ((count++))

    #  progress indicator
    log -ne "."
  done

  log "Batches prepared."
}

# ...

Why perform this on the live production server and not on a database server?

This client operates entirely on a small production server and makes changes to their site regularly. This is common for small-medium size clients and blogs. Happily, nothing untoward happened and the resource consumption of the script did not affect the quality of service for any website users.

The following iteration of the script was the state of the script when I executed it on our server. I tested the script in local development, then staging environments first. Then I made a backup of the database on the live server before executing the script.

#!/bin/bash
# Get posts by category and modify categories,
# + adding <new_category> and removing <old_category>
# Date: September 2019

# store category names for easy reuse
old_category='old'
new_category='new'
working_path="$(pwd)/scratch"
post_ids_path="$working_path/post_ids.txt"
log_path="reorganise_cats.log"
echo "Working path: $working_path"
echo "Post ids path: $post_ids_path"
verbose=true
max_batches_at_once=5

#
# echo console log std out
#
log() {
  echo $@ >> $log_path
  if [ $verbose = true ]; then
    echo $@
  fi
}


clean() {
  read -n1 -p "Clean working path [y/n]?\n" ans
  if [ $ans = 'y' ] ; then
    log "Cleaning."
    rm -r $working_path
  else
    log "'$ans' is not 'y'. Skipping clean."
  fi
}


#
# split the post ids file into a few files
#
prepare_batches() {
  log "Preparing batches"
  let count=1
  let batch=1

  for id in $(cat $post_ids_path); do
    if [ $(($count % 100)) = 0 ]; then
      log "Prepared batch $batch."
      ((batch++));
      log -ne "Batch $batch"
    fi
    echo -ne " $id" >> $working_path/post_id_batch_$batch.txt
    ((count++))

    #  progress indicator
    log -ne "."
  done

  log "Batches prepared."
}


#  loop through our list of ids
#+ replace old category with new category
#+ indicate time for processing
batch_proc() {
  local batch_start_time="$(date -u +%s)"

  log "Processing batch $1"
  for i in $(cat $1); do
    local start_time="$(date -u +%s)"

    # The actual commands I want to run
    wp post term add $i category $new_category --quiet --skip-plugins --skip-themes
    wp post term remove $i category $old_category --quiet --skip-plugins --skip-themes

    local end_time="$(date -u +%s)"
    local elapsed="$(($end_time-$start_time))"
    # indicate progress
    log "[ Post Updated ] Batch: $1 - ID: $i - Time to complete: $elapsed seconds"
  done;

  local batch_end_time="$(date -u +%s)"
  local batch_elapsed="$(($batch_end_time-$batch_start_time))"
  log "[ Batch Complete ] Batch: $1 - Time to complete: $batch_elapsed seconds"
}


#
#  Main script - call the batch processing function
#+ on each file in generated batch
#
main() {

  clean

  mkdir $working_path && log "Making directory"
  touch $post_ids_path

  # make a list of posts to operate on
  wp post list --category_name="$old_category" --format=ids > $post_ids_path

  # split our list into smaller lists
  prepare_batches

  read -n1 -p "process batch $f [y/n]? " ans
  if [ $ans = 'y' ] ; then
    log "Continuing..."
    local script_start_time="$(date -u +%s)"
  else
    log "Quiting. Goodbye!"
    exit
  fi

  # begin processing the lists. this may take several minutes
  count=1
  for f in $(ls $working_path/post_id_batch_*.txt); do

    batch_proc "$f" &

    # limit the number of batches to operate on at once.
    if [ $(($count%$max_batches_at_once)) = 0 ] ; then
      log "Waiting for batches..."
      wait
    fi

    ((count++))
  done;


  local script_end_time="$(date -u +%s)"
  local script_elapsed="$(($script_end_time-$script_start_time))"
  log "[ Complete ] Time to complete: $script_elapsed seconds"
}

main

log "Done";

In the script above I’ve added rudimentary logging and time tracking so I can get more information on the processing of these posts.

This script did the job in about 20 minutes on a small-tier server instance running Ubuntu with PHP 7.2 and MYSQL 5.7. All the while, it only consumed ~50% CPU resources, and our web server was able to serve the website without issue. We are saving the client hours, if not days, of needlessly repetitive clicking to change the structure of their posts.

Further development

If you want to take the script further, you could try making it reusable and generic. You could also try using different techniques to maximise efficiency.

One way to make it more generic is by replacing parts of the script with command line parameters. WP-CLI commands in the wp post list command and batch_proc() function would be the right places to start.

Author: Darren Kearney

Interested in computers, programming, games, psychology and music.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.