Scalable Tagging System in CakePHP

Tag clouds are a great way to allow users to navigate/search through your site but if you don’t design your application correctly from the ground up they can become a serious nightmare.

So - here’s my solution for creating a scalable tagging system in CakePHP (using MySQL in this instance but the theory applies to any other DB… yes… even you PostGRE’ers)

My example is a cookbook of recipes.

Most people that attempt a tagging system the first time around just go for a HABTM relationship - A recipe has and belongs to many tags.  So - you have you “recipes”, “tags”, and “recipes_tags” tables… all you have to do to find your tags is:

  1.  
  2. $this->Tag->find(‘all’);
  3.  

And to find what tags go with what recipes (assuming you setup your models correctly) is:

  1. id(int)
  2. tag(varchar)
  3. count(int)
  4. created(datetime)
  5. modified(datetime)

I pretty much always add fields for “created” and “modified” and set them to datetime - Cake fills them in automatically and it can be useful sometimes to have that history (you never know when a client wants to see something organized by the date they were added to the system - kinda sucks not to capture that info). Also - always, always, always create an “id” field that’s a unique, auto-incrementing, primary key for EVERY table… don’t call it something silly like uid or user or anything stupid like that. It’s just best practices. Period.

But, I digress;

So - your “recipes” table should look something like this:

  1. id(int)
  2. title(varchar)
  3. description(text)
  4. ingredients(text)
  5. method(text)
  6. tags(text)
  7. created(datetime)
  8. modified(datetime)

Notice that the “tags” field here is a text field - this is where we are going to store the tags, as written, by the user as one big block of text.

So - whenever a user creates a recipe we’re going to save all of the pertinent data to the database and then run a function called generate_tags:

  1.  
  2. function generate_tags($id=null)
  3. {
  4.    //get recipes by $id
  5.    $recipe = $this->Recipe->findById($id);
  6.          
  7.    //get all tags in the db
  8.    $tagsOld = $this->Tag->find(‘all’);
  9.          
  10.    $tagsOldArray = array();
  11.    $tagsOldArrayValues = array();
  12.          
  13.         //loop thru tags in db and create an array like so: $tag = array (’id’=>$id,’count’=>$count)
  14.    foreach($tagsOld as $key => $values)
  15.    {
  16.    $tagsOldArray[$values[‘Tag’][‘tag’]] = array(‘id’=>$values[‘Tag’][‘id’],‘count’=>$values[‘Tag’][‘count’]) ;
  17.    }
  18.             
  19.    $tagsArray = explode(‘,’,str_replace(‘, ‘,‘,’,$recipe[‘Recipe’][‘tags’]));
  20.    foreach($tagsArray as $tag)
  21.    {
  22.    $tags[] = $tag;
  23.    }
  24.             
  25.    $countArray = array_count_values($tags);
  26.          
  27.    $tagsOldArrayValues = array_keys($tagsOldArray);
  28.  
  29.    foreach($countArray as $tag => $count)
  30.    {                          
  31.         if(in_array($tag, $tagsOldArrayValues) == true)
  32.         {
  33.              $this->Tag->create();
  34.         $this->Tag->query(‘INSERT INTO `tags` (`tag`) VALUES (\’.$tag.\’) ON DUPLICATE KEY UPDATE `count` = `count`+1′);
  35.          } else {
  36.         $this->Tag->create();
  37.         $this->Tag->saveField(‘tag’,$tag);
  38.         $this->Tag->saveField(‘count’,$count);
  39.          }
  40.    }
  41. }
  42.  

What’s going on here? Well - first we find the recipe that you’re editing (or just inserted). Then we get all of the tags in the db. We foreach through the $tagsOldArray to format the array so that you have a nested array for each tag that contains the tag’s id and current count. We then explode the $tagsArray (the tags that are stored as plain text for the recipe at hand). We loop through the $tagsArray and then count the values of the keys. We then use array_keys to return the keys of $tagsOldArray which is used to determine whether or not a tag is already in the database. If it is in the array we execute the following query:

  1. $this->Tag->query(‘INSERT INTO `tags` (`tag`) VALUES (\’`.$tag.`\’) ON DUPLICATE KEY UPDATE `count` = `count`+1′);

This updates the count of the tag value by 1. If the tag is in the array we insert the tag and count.

There’s probably a much more elegant way to do all of this - maybe one of these days I’ll pair this code down (I threw it together tonight for a project - and looking at it now I can see some obvious redundancy… but alas there’s not enough time in the world to optimize code and actually launch a website…).

But the theory is sound…

Now it’s your turn to play along - we have to remember to do is decrement the `Tag`.`count` for all tags associated to a recipe if a user deletes that recipe… and we have to update the counts for any tags a user edits when they edit a recipe… Not as big a deal as it sounds…

My next post will focus on the view for a tag cloud using this setup.