Wednesday, February 16, 2011

Pluralize English Nouns

Pluralize English Nouns

Pluralize English Nouns

By KodefuGuru9. December 2010 18:56
It’s a common problem: you have a noun that needs to be pluralized (or singularized!), so you end up rolling a custom solution or go with the dreaded (s) moniker. I recently discovered that it is in fact easy: the .NET Framework will pluralize or singularize nouns for you.
I stumbled upon this after watching Jim Wooley’s presentation on Entity Framework 4. I was aware that EF4 got rid of the silly EntitySet name structure and had an intelligent way of pluralizing the entity sets, but I never really questioned it before. I had to discover what they were doing so I could use it myself.
The Entity Framework team did the right thing and exposed the requisite class. I disagree with its location due to the general purpose application of the class, but it is part of the framework, so the first thing you must do is add a reference to System.Data.Entity.Design. Then, in your code, add a using clause for Syste,.Data.Entity.Design.PluralizationServices.
[TestMethod]
public void EnglishPluralize()
{
    var service = PluralizationService.CreateService(CultureInfo.CurrentCulture);
    var plural = service.Pluralize("ninja");
    Assert.AreEqual("ninjas", plural);
}
 
[TestMethod]
public void EnglishSingularlize()
{
    var service = PluralizationService.CreateService(CultureInfo.CurrentCulture);
    var singular = service.Singularize("ninjas");
    Assert.AreEqual("ninja", singular);
}
Of course, the first thing I did after discovering this was try my hand at implementing a simple service in another language. I chose French because it was easy for me to find rules, but a word of warning goes that this is far from complete as it only handles simple scenarios and ignores irregular nouns and the like. In the process of throwing words at this, I discovered a logic error that makes this solution incorrect for an entire class of words… can you spot it?
public class FrenchPluralizationService : PluralizationService
{
    public override bool IsPlural(string word)
    {
        Contract.Requires(!String.IsNullOrWhiteSpace(word));
 
        return word.EndsWith("eux") || word.EndsWith("eaux") || word.EndsWith("aux") ||
            word.EndsWith("aux") || word.EndsWith("s");
    }
 
    public override bool IsSingular(string word)
    {
        return !IsPlural(word);
    }
 
    public override string Pluralize(string word)
    {
        Contract.Requires(!String.IsNullOrWhiteSpace(word));
        
        var words = word.Trim().Split(' ');
        if (words.Count() > 1)
        {
            return words.Select(w => Pluralize(w)).Delimit(" ");
        }
 
        if (IsPlural(word))
        {
            return word;
        }
 
        if (word.EndsWith("eu") || word.EndsWith("eau") || word.EndsWith("au"))
        {
            return word + "x";
        }
        else if (word.EndsWith("al"))
        {
            return word.Take(word.LastIndexOf("al")).ToArray() + "aux";
        }
        else
        {
            return word + "s";
        }
    }
 
    public override string Singularize(string word)
    {
        Contract.Requires(!String.IsNullOrWhiteSpace(word));
 
        var words = word.Trim().Split(' ');
        if (words.Count() > 1)
        {
            return words.Select(w => Singularize(w)).Delimit(" ");
        }
 
        if (IsSingular(word))
        {
            return word;
        }
 
        if (word.EndsWith("eux") || word.EndsWith("eaux") || word.EndsWith("aux"))
        {
            return word.TrimEnd('x');
        }
        else if (word.EndsWith("aux"))
        {
            return word.Take(word.Length - 2).With('l').ToArray().ToString();
        }
        else if (word.EndsWith("s"))
        {
            return word.TrimEnd('s');
        }
 
        return word;
    }
}
Now that I created a French pluralization service, the question turned to plugging it up. Without pointing to it, I attempted to call the PluralizationService.CreateService() factory method with the French culture.
var service = PluralizationService.CreateService(new CultureInfo("fr-FR"));
And then I discovered the horrible truth… a NotImplementedException occurred.
We don't support locales other than english yet
I opened the method with reflection only to find that there’s no way around it: there really is no way to plug your own service in. If you’re going to use this, you will be forced create your services for different languages in another manner.
I decided to directly initialize it.
[TestMethod]
public void FrenchPluralize()
{
    var service = new FrenchPluralizationService();
    var plural = service.Pluralize("le gâteau");
    Assert.AreEqual("les gâteaux", plural);
}
 
[TestMethod]
public void FrenchSingularlize()
{
    var service = new FrenchPluralizationService();
    var singular = service.Singularize("les gâteaux");
    Assert.AreEqual("le gâteau", singular);
}
The EnglishPluralizationProvider in the .NET Framework is an excellent implementation. I’m not sure if it’s possible to ever have a perfect implementation considering the fluid nature of natural language, but it certainly beats putting a (s) on the screen. Of course, there isn’t much need for this with static text which can be properly localized, but I know I’ve come across scenarios where EnglishPluralizationProvider would have, at the very least, spared me from meetings these sort of problems seem to generate.
**UPDATE**
Although the factory method doesn’t have anything in the way of discovering and instantiating your service, you can apply your PluralizationService to the EntityModelSchemaGenerator. I still think that pluralization is a common problem not specific to the Entity Framework, and the class should be moved to a different assembly/namespace.

No comments:

Post a Comment