We finally have all the pieces in place to actually put the Porter stemmer together. But it’s been so long, I’ve certainly forgotten what goes next, so let’s take a moment to remember where we are going with this.
Earlier I outlined the process that the stemmer will perform in five steps:
- Get rid of plurals, -ed, and -ing, and turn -y to -i, so it will be recognized as a suffix in later steps;
- Collapse multiple suffixes, such as -ational, -ator, -iveness, and others, to a single suffix, such as -ate, -ate, and -ive, respectively;
- Collapse a different set of multiple suffixes or remove a small set of single suffixes;
- Remove a set of suffixes including -ance, -ic, and -ive; and
- Remove final -e and change -ll to -l in some circumstances.
In the next posting, we’ll pick apart what needs to be done for step 1.
(Sorry this posting isn’t longer. I’m still taking a breath after the macro death march.)