Monday, October 31, 2011

Parsing CSS for analysis

This time we’ll look into parsing CSS so we can use the result for analysis.

The story

As we are starting a new project using ASP.NET MVC3 one of the things I was thinking about was how to better manage our CSS and JavaScript. The first thing that I figured would be great was if you could to some static analysis on all of your CSS files to see if certain situations arise.

The issue with multiple people working on the same HTML based UI is that it can be hard to find out what CSS class to use. Because of that you might end up with classes doing the same thing, classes being defined more then once with different definitions, etc..

All this leads up to a less then ideal situation. Removing the cause of the problem, not knowing what CSS class to use, proves difficult. Helping counter the symptoms of the problem is proving a lot easier, so that’s the path I chose for, right now.

A two part solution

In order to do static analysis on something that is only available to me in plain text, I first need to parse the plain text into something that is easier to analyze. Then I can use the result to do the actual analysis on and communicate the result.

Analyzing how to parse CSS

To write a parser for any language requires a deep understanding of its syntax. Fortunately for me CSS is not a very complex language. W3Schools.com proves to be very helpful in providing us with an explanation on the CSS syntax. It comes down to this:

  • A CSS document consists of CSS rules
  • A CSS rule consists of a Selector and a set of Declarations
  • Each declaration consists of a property and a value
  • Comments can exist at the top level of the CSS document (outside of the CSS rules) or in between declarations

In order to do static analysis on this I came up with some interface definitions that allow me to query the structure of a CSS document:

   1: public enum SelectorType
   2: {
   3:     Tag,
   4:     Id,
   5:     Class
   6: }
   7:  
   8: public interface ICSSDocument
   9: {
  10:     string FilePath { get; set; }
  11:     IEnumerable<IRule> Rules { get; }
  12:     void AddRule(IRule rule);
  13: }
  14:  
  15: public interface IRule
  16: {
  17:     ISelector Selector { get; set; }
  18:     IEnumerable<IDeclaration> Declarations { get; }
  19:     void AddDeclaration(IDeclaration declaration);
  20: }
  21:  
  22: public interface ISelector
  23: {
  24:     string Name { get; set; }
  25:     SelectorType SelectorType { get; set; }
  26: }
  27:  
  28: public interface IDeclaration
  29: {
  30:     string Name { get; set; }
  31:     string Value { get; set; }
  32: }


Note that this might not be final as I might come up with requirements implementing the static analysis.



For parsing text like this there are always several approaches. In this case I decided that using plain and simple text parsing would be the most flexible, as I might want to add features to the parser in the future. Here is what I came up with for the main parse loop:




   1: public void Parse()
   2: {
   3:     string data = File.ReadAllText(FilePath);
   4:  
   5:     _position = 0;
   6:     _isInComment = false;
   7:     while (_position < data.Length)
   8:     {
   9:         if (IsEndOfFile(data))
  10:         {
  11:             break;
  12:         }
  13:         HandleBeginOfComment(data);
  14:         HandleEndOfComment(data);
  15:         if (!_isInComment)
  16:         {
  17:             HandleRule(data);
  18:         }
  19:         else
  20:         {
  21:             _position++;
  22:         }
  23:     }
  24: }


As you can see I have a (private) field to keep track of the position within the CSS document and another field to keep track of comments.



You might find the IsEndOfFile method weird as I have a condition within the while loop that should do the same thing. However I need to check ahead one position in case I’m still checking for comments (or are in a comment for that matter). The definition of the method is quite simple:





   1: private bool IsEndOfFile(string data)
   2: {
   3:     return _position == data.Length - 1;
   4: }



The HandleBeginOfComment method checks for the start of a comment:





   1: private void HandleBeginOfComment(string data)
   2: {
   3:     if (data[_position] == '/' && data[_position + 1] == '*')
   4:     {
   5:         _position += 2;
   6:         _isInComment = true;
   7:     }
   8: }



Basically it checks for the string /* and if it finds that string at the current position it moves the cursor by two characters and sets the _isInComment flag. HandleEndOfComment does the same thing for */ and sets the _isInComment flag to false again. Any comments are currently ignored, but it is easy to extend the main parse loop to allow for parsing comments as well.



The HandleRule method takes care of all the parsing magic, which makes sense as the Rule is the main component of a CSS document.





   1: private void HandleRule(string data)
   2: {
   3:     while (_position < data.Length && !StartOfRule(data[_position]))
   4:     {
   5:         HandleBeginOfComment(data);
   6:         if (_isInComment)
   7:         {
   8:             return;
   9:         }
  10:         _position++;
  11:     }
  12:     string selectorData = GetSelector(data);
  13:     string declarationsData = GetDeclarations(data);
  14:  
  15:     IRule rule = _kernel.Get<IRule>();
  16:     
  17:     ISelector selector = _kernel.Get<ISelector>();
  18:     selector.Name = selectorData;
  19:     selector.SelectorType = GetSelectorTypeFromName(selectorData);
  20:     rule.Selector = selector;
  21:  
  22:     HandleDeclarations(rule, declarationsData);
  23:  
  24:     AddRule(rule);
  25: }



The first loop deals with running into a comment later on in the document. If we do run into a comment we simply return to the main loop, which will then deal with finding the end of the comment. In the same loop it checks for the start of a Rule.



If we are still in the method after this loop, we have reached the start of a Rule. As we’ve noted earlier a Rule consists of a Selector and a set of Declarations. The methods GetSelector and GetDeclarations take care of parsing those portions of the CSS document. Once we have that data we can use it to create a rule. We use a Ninject Kernel to create instances of both the IRule and ISelector implementations.



Note that right now we handle the Selector like it’s a single entity. A future improvement might be to split up the Selector into parts and assign types to them individually.



The HandleDeclarations method takes the declarations text, parses it into IDeclaration implementations and adds them to the given IRule:





   1: private void HandleDeclarations(IRule rule, string declarationsData)
   2: {
   3:     string[] declarations = declarationsData.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
   4:     foreach (string declaration in declarations)
   5:     {
   6:         if (string.IsNullOrWhiteSpace(declaration))
   7:         {
   8:             continue;
   9:         }
  10:         int splitterIndex = declaration.IndexOf(":");
  11:         string declarationName = declaration.Substring(0, splitterIndex).Trim();
  12:         string declarationValue = declaration.Substring(splitterIndex + 1).Trim();
  13:  
  14:         IDeclaration declarationInstance = _kernel.Get<IDeclaration>();
  15:         declarationInstance.Name = declarationName;
  16:         declarationInstance.Value = declarationValue;
  17:         rule.AddDeclaration(declarationInstance);
  18:     }
  19: }



Note that I use String.Trim to make sure we don’t end up with white space in our declaration data, which could get in the way of our analysis (and is of no value any way in CSS).



So far so good. We can now parse CSS into an object model, which allows us to analyze the CSS in a structured way. I plan on writing a next post that shows the analysis based on this model.



No complete source included: Unfortunately, as this project is owned by my employer, I can not include full source code, however the parts I included should provide you with a good insight into parsing CSS.