I recently had a need at work to do some cleanup of a taxonomy vocabulary. The problem stemmed from messy data in a spreadsheet, that would be the source for an ongoing feeds import. This meant that I needed to provide a solution that allowed the clients to:
- Remove orphaned terms on an ongoing basis.
- Be able to do multiple terms at a time.
- See the terms they were removing, so the offending data could be fixed in the imported file as well.
My first step is always to look and see if there was an existing module that would meet my needs. My google-fu occasionally fails me, so if there is a module that addresses this problem that I haven't mentioned here, please mention it in a comment! I found two modules:
(I ran across a third module, Taxonomy Manager (https://drupal.org/project/taxonomy_manager), but the issue queue indicates that it does not have this functionality, so I didn't consider it.)
I started with Taxonomy Delete all Terms, as it seemed most likely to meet my needs. I grabbed it and enabled it with drush and gave it a whirl in my sandbox. This seems to be a fairly powerful and flexible module, but it is very developer oriented, and I was looking for something that was more user friendly.
Taxonomy Orphanage was next. This is a nice lightweight module, however, the admin interface does not provide any way to see which terms are orphaned, or even how many terms are being deleted, so I moved on.
In my googling, I had run across the thought of using a view + VBO, and there are even existing tutorials for this. I liked the idea of using modules that were already on the site, as my goal is always to keep as site as lightweight as possible. The existing tutorial is here : https://drupal.org/node/1472912 It wasn't going to work for me exactly as written, and I had to change some config to suit my needs, so here is the modified version.
- Create a new view. I named mine Taxonomy Term Cleanup.
- Display: choose taxonomy term. Choose the vocabulary that you wish to cleanup. Add the page title, select create page, give it a path (taxonomy-cleanup) and click create and edit.
- In the section Relationships choose Taxonomy Term: Content using “name of your vocabulary” from the list of options. The identifier in my case was field_subject, the field referencing my chosen vocabulary.
- In the Fields section add Content: nid and Bulk Operations: Taxonomy Term. (There will already be a Term: name field, leave it as is.) Label the Content: nid field something like "Number of References". For the Bulk Operation field, label it "select" . Ensure that “enable select all items” and “display processing result” are checked . In the selected operation choose the delete operation. Check the delete without confirmation if you want to delete the terms without an intermediate confirmatory dialog page.
- On the left side in the Other section next to Use aggregation, click on the no enable aggregation.
- Click on the aggregation settings beside the Node:nid filter and from the aggregation type drop down choose COUNT. What this will do is it will give a count of all nodes that reference a specific term.
- Now add a filter : Content: nid, select Count in the aggregation settings, and choose operator is equal to 0. This will restrict the view to terms that are not referenced by any nodes.
- My next step format the view as a table, and in the settings, make the term titles sortable.
- The last, and very important step, was to add an access restriction to the view. You can do it by Role, or by permission, whichever is the most efficient.
Now I have an easily accessible and highly user friendly solution, that meets all of my needs. The client can remove multiple orphaned terms, while still being able to see where the problem is in their original data, so it can be cleaned up at the source as well.