Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Desktop Developer

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Parsing text with Piglet: making all the tests passWhy is it important to know that using has no empty catch and check for null in the finally? »
    comments

    Introduction

    Yesterday RandomPunter mentioned Piglet on Twitter.

    According to the site Piglet is:

    Piglet, the little friendly parser and lexer tool

    Piglet is a library for lexing and parsing text, in the spirit of those big parser and lexer genererators such as bison, antlr and flex. While not as feature packed as those, it is also a whole lot leaner and much easier to understand.

    Today I had some time and set out to see what this is all about. I never used a parser like this before so the learning curve would be high.

    I set out to parse a list of Plant names (the Latin ones) so that I could get the Genus, species, subspecies of them.

    The Tests

    First I set out to create some tests because that is the best way to see if the code I would be writing was correct.

    1. Public Class Plant
    2.     Public Property Genus As String
    3.     Public Property Species As String
    4.     Public Property SubSpecies As String
    5.     Public Property IsHybrid As Boolean
    6. End Class
    7.  
    8. Public Class ParserTests
    9.     <Test>
    10.     Public Sub IfGenusCanBeFoundWhenOnlyGenusAndSpiecesAreThere()
    11.         Dim parser = New ParseLatinPlantName
    12.         Dim result = parser.Parse("Salvia sylvatica")
    13.         Assert.AreEqual("Salvia", result.Genus)
    14.     End Sub
    15.  
    16.     <Test>
    17.     Public Sub IfSpeciesCanBeFoundWhenOnlyGenusAndSpiecesAreThere()
    18.         Dim parser = New ParseLatinPlantName
    19.         Dim result = parser.Parse("Salvia sylvatica")
    20.         Assert.AreEqual("sylvatica", result.Species)
    21.     End Sub
    22.  
    23.     <Test>
    24.     Public Sub IfSubSpeciesCanBeFoundWhenSubSpeciesIsProvided()
    25.         Dim parser = New ParseLatinPlantName
    26.         Dim result = parser.Parse("Salvia sylvatica sp. crimsonii")
    27.         Assert.AreEqual("crimsonii", result.SubSpecies)
    28.     End Sub
    29.  
    30.     <Test>
    31.     Public Sub IfIsHybridIsTrueWhenxIsInNameCanBeFoundWhenSubSpeciesIsProvided()
    32.         Dim parser = New ParseLatinPlantName
    33.         Dim result = parser.Parse("Salvia x jamensis")
    34.         Assert.IsTrue(result.IsHybrid)
    35.     End Sub
    36.  
    37. End Class

    And for the VB-challenged out there.

    1. [TestFixture]
    2. public class ParserTests
    3. {
    4.     [Test]
    5.     public void IfGenusCanBeFoundWhenOnlyGenusAndSpiecesAreThere()
    6.     {
    7.         var parser = new ParseLatinPlantName();
    8.         var result = parser.Parse("Salvia sylvatica");
    9.         Assert.AreEqual("Salvia", result.Genus);
    10.     }
    11.  
    12.     [Test]
    13.     public void IfSpeciesCanBeFoundWhenOnlyGenusAndSpiecesAreThere()
    14.     {
    15.         var parser = new ParseLatinPlantName();
    16.         var result = parser.Parse("Salvia sylvatica");
    17.         Assert.AreEqual("sylvatica", result.Species);
    18.     }
    19.  
    20.     [Test]
    21.     public void IfSubSpeciesCanBeFoundWhenSubSpeciesIsProvided()
    22.     {
    23.         var parser = new ParseLatinPlantName();
    24.         var result = parser.Parse("Salvia sylvatica sp. crimsonii");
    25.         Assert.AreEqual("crimsonii", result.SubSpecies);
    26.     }
    27.  
    28.     [Test]
    29.     public void IfIsHybridIsTrueWhenxIsInNameCanBeFoundWhenSubSpeciesIsProvided()
    30.     {
    31.         var parser = new ParseLatinPlantName();
    32.         var result = parser.Parse("Salvia x jamensis");
    33.         Assert.IsTrue(result.IsHybrid);
    34.     }
    35. }

    In essence, I want to transform the string into an object of type Class. When I have just two strings the first one is the genus and the second the species. If I also have an sp with a name than the name after the sp is the subspecies. When there is an x after the genus than it is a hybrid (if I remember correctly, school was 20+ years ago).

    The code

    So let's remember that this was the first time for me. And it was harder than I thought it would be.

    But I complained on twitter and Randompunter came to the rescue and made the first two tests pass with this code.

    1. public class ParseLatinPlantName
    2. {
    3.     public Plant Parse(string input)
    4.     {
    5.         IFluentParserConfigurator config = ParserFactory.Fluent();
    6.         var alphanumeric = config.Expression();
    7.         alphanumeric.ThatMatches(@"\w+").AndReturns(f => f);
    8.  
    9.         var rule = config.Rule();
    10.         rule.IsMadeUp.By(alphanumeric).As("Genus")
    11.             .Followed.By(alphanumeric).As("Species")
    12.             .WhenFound(o => new Plant {Genus = o.Genus, Species = o.Species});
    13.         IParser<object> parser = config.CreateParser();
    14.         return (Plant) parser.Parse(input);
    15.     }
    16. }

    And for the C#-challenged.

    1. Public Class ParseLatinPlantName
    2.  
    3.         Public Function Parse(ByVal name As String) As Plant
    4.             Dim config = ParserFactory.Fluent()
    5.             Dim expr = config.Rule()
    6.             Dim name1 = config.Expression()
    7.             name1.ThatMatches("\w+").AndReturns(Function(f) f)
    8.            
    9.             expr.IsMadeUp.By(name1).As("Genus") _
    10.                     .Followed.By(name1).As("Species") _
    11.                     .WhenFound(Function(f) New Plant With {.Genus = f.Genus, .Species = f.Species})
    12.  
    13.             Dim parser = config.CreateParser()
    14.             Dim result = DirectCast(parser.Parse(name), Plant)
    15.             Return result
    16.         End Function
    17.     End Class

    As you can see you still need to have some knowledge of regex to make this work. But it still readable.

    This uses the fluent configuration style that Piglet provides. I -n essence we are telling it to look for a string followed by another string of characters then Put the first one in a Property with the name defined by As and the second one in a property defined by Species.

    And then I got stuck.

    Tomorrow I will show you how this got solved with the help of Per Dervall the creator of Piglet.

    About the Author

    User bio imageChris is awesome.
    Social SitingsTwitterHomePageLTD RSS Feed
    c#, piglet, vb.net
    InstapaperVote on HN

    1 comment

    Comment from: Per Dervall [Visitor] · http://binarysculpting.com
    Per Dervall Awesome. I'm happy to see you trying Piglet out.

    The goal has always been to make grammar construction as easy as possible. This in itself is a significant challenge as grammars (the thing you're actually trying to build) are sensitive and delicate creatures that don't respond well to mucking things up. The nasty shift/reduce exception comes to mind. It tries to be as helpful as possible but it's really hard to make sense of a grammar that's badly formed :)

    It's usually a good idea to try to figure out the patterns you're trying to match and refactor the rules as much as possible. Looking forward to see the rest of your thoughts, and any suggestions are greatly appreciated.
    10/23/12 @ 09:13

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)