Language grammars

Since I made beta 6 and onward unofficial, I've been quiet about new features, despite my plan (that, and being a lazy writer). Though seeing how 41% of users are using the “unofficial” betas, I'm going back to “normal” beta releases.

As for labeling these releases betas, at least two anonymous posts think I should call them alpha. For me alpha is something which crashes, and if alpha means feature incomplete and beta is feature complete, I'm afraid TextMate won't leave alpha for a long long time. But I'll probably annotate them with some descriptive term sometime in the future, which would clarify the state and also allow me to release something equivalent of nightly builds.

But enough about that, let's take a look at one of the major new features (though for anyone with the time, I strongly encourage reading the full release notes for each new (beta) release)…

The ability to tell TextMate about your language so that you can, amongst others:

  • turn strings blue
  • make return continue the current comment on next line
  • disable spell checking for HTML tags

TextMate debuted with a flexible format to define the language grammar, but the grammar was in a file on disk and the description for the format was hidden in the TextMate help book. Luckily though a lot of people did find these hidden items and have done a great job adding support for a lot of languages I hadn't heard about like Perl, Python, and Java (I have since learned that at least Perl is generally used by “hackers” who spend their time performing “DoS” attacks).

With the recent betas the grammar files can be inspected and edited by choosing (from the menus): View → Language → Edit Languages…

The language editor still use the old-style plist format, something I do plan to improve upon, but hopefully the current is enough to motivate people to play more with language grammars.

The rest of this post will detail only what's new since the help book was written.

Recursion

A grammar can now refer to itself. For example in the context of HTML we can create a rule like this:

{   name = "markup.bold.html";
    begin = "<b>"; end = "</b>";
    patterns = ( // really means “contains”
        { include = "$self"; }
    );
}

This rule tells TextMate that everything starting with <b> and ending with </b> should be named “markup.bold” and can contain any of the constructs that the current grammar has a rule for.

Local rules

To avoid repetition it is now possible to define a rule once and refer to it in multiple places. This is done by placing the rule in the key/value container stored at the root level under the repository key, and referring to it with a # prefix.

It's probably best with an example:

{   name = "PHP"; scopeName = "source.php";
    patterns = (
        { include = "#string"; },
        { include = "#variable"; }
    );
    repository = {
        string = {
            name = "string.quoted.double.php";
            begin = "\""; end = "\"";
            patterns = (
                { include = "#escape"; },
                { include = "#variable"; }
            );
        };
        variable = {
            name = "variable.other.php";
            // this pattern is a huge simplification
            match = "\\$[A-Za-z]\\w*|\\$\\{[A-Za-z]\\w*\\}";
        };
        escape = {
            name = "constant.character.escape";
            match = "\\\\.";
        }
    };
}

Another use for this is when we want to recursively refer to only a subset of our grammar. For example I'm told that Perl has qq(…) strings in which it allow balanced parentheses. To match that, we'd have to create a rule like this:

{   name = "string.unquoted.qq.perl";
    begin = "qq\\("; end = "\\)";
    patterns = (
        { include = "#qq_string_content"; },
    );
},

And then in the repository add the definition for qq_string_content:

qq_string_content = {
    begin = "\\("; end = "\\)";
    patterns = (
        { include = "#qq_string_content"; },
    );
};

Back references

For tags, LaTeX environments and similar, the end pattern depends on how the construct began. Previously it wasn't possible to make a begin/end rule for that, but now you can refer to captures from the begin pattern in the end pattern using normal back-references. For example to match bash heredoc constructs (w/o stripped indent) one could use the following rule:

{   name = "string.unquoted.heredoc.bash";
    begin = "<<(\"|')(\\w+)\\1"; end = "^\\2$";
}

One could of course also create a patterns array to correctly markup variables and similar which does get expanded by bash when the heredoc token is unquoted.

Styling

Previously the color/font style for a construct was placed in the rule which matched the construct. This gave each language its own personality, but was strangely disliked by a lot of users.

Now the grammar files doesn't contain any styles. They only assign a name to the construct matched and it's then possible to associate styles to that name using View → Theme → Edit Themes…

This Theme Editor is still very rudimentary (one of the reasons I've kept the current betas as unofficial). But the gist of it is this:

There are four themes by default (All Hallow's Eve, Boring, iPlastic, and Pastels on Dark), each theme includes one or more settings groups, (consider a settings group like a CSS file). Each settings group then contain a number of settings items (consider each settings item as a CSS rule).

When you open the Theme Editor you'll see a list with all the settings groups in the left side. Above this list is a popup gadget which show the theme. The themes all have access to the same settings groups, and the check marks to the left of the settings group names control whether or not that settings group is included for the selected theme.

If you unfold a settings group you'll see the settings items it contains. If you click one of these you'll get a) the settings it “sets” and the scope. The scope is a bit like a CSS selector. It is based on the names given to constructs by rules.

So for example if we look above for the qq(…) rule defined for Perl, this assigns the name “string.unquoted.qq.perl” to these constructs, and we can use that name in the scope to style these strings.

The match is however prefix-based, so we don't have to enter the entire name, we could settle with “string.unquoted.qq” or even just “string”. This has the advantage of targeting all constructs that has “string” as the first part of their name.

For this reason it is important that rules derive their name from one of the standard names.

I mentioned that the scope was a bit like a CSS selector, and like a CSS selector, the scope can also target constructs based on their context. So for example if we look above for the minimal PHP grammar. This one names strings, and inside strings we can have variables and escape codes. Imagine we'd like to have variables in strings underlined, but variables outside strings should appear as normal. For this we'd use a scope like “string variable”. This targets all variables inside strings.

Named captures and content

Since grammar rules assign a name to the entire thing matched, it is also possible to assign a name to the capture of the regular expressions.

This is done with an additional captures key. For example if we make one rule to match a tag and want to assign a name to the namespace and tag name we could do:

{   match = "</?(?:([-_a-zA-Z0-9]+):)?([-_a-zA-Z0-9:]+).*?>";
    captures = {
        1 = { name = "entity.name.tag.namespace.xml"; };
        2 = { name = "entity.name.tag.xml"; };
    };
}

Here capture 1 is given the name "entity.name.tag.namespace.xml" and capture 2 is given the name "entity.name.tag.xml".

For rules which use begin/end keys, captures refer to both patterns. But one can instead use beginCaptures or endCaptures to refer to only the begin or end pattern.

Additionally it might be useful to name only the stuff between the begin and end pattern. This is done using contentName instead of, or in addition to, the normal name. So we may want to revise the HTML bold rule above to:

{   contentName = "markup.bold.html";
    begin = "<(b)>"; end = "</(b)>";
    captures = { 
        1 = { name = "entity.name.tag.html"; };
    };
    patterns = (
        { include = "$self"; }
    );
}

Settings

The Theme Editor also allow you to set miscellaneous settings for matched constructs. I haven't composed a list of these yet, and I don't intend this to be settable in the Theme Editor for long (instead it'll go into the bundle editor, so settings can be structured like bundles), but for now you can unfold the “Settings: Basic” group and checkout some of the stuff currently set.

You'll need to switch to textual mode which is done with the segmented control in the upper right corner of the theme editor (showing three lines of text and the standard fonts and colors icon).

Scopes

Scopes were already described above under styling. What wasn't mentioned is that these scopes can also be used for bundle items (in the bundle editor). This affects when the “activation” method is active.

So for example if you make a macro and give it return as key equivalent but set the scope to “string” then only when the caret is inside a string, will your macro be executed when pressing return.

The scope is also used to decide which bundle item to execute, when there are multiple matches. It'll always use the one with the most exact scope. If there are several candidates, it'll show you a menu.

Multi threading

Previously parsing of your text happened lazy (before display) and was cached. But because parsing a line requires all lines above it to be parsed, it would need to parse the entire document when going to the bottom, or when pasting large chunks of text, you'd have to wait for the parser to complete working on this text.

The parser now runs in its own thread, and display no longer requires the text to have been parsed (though it'll lack styles if it hasn't). So pasting large chunks of text, loading a huge file and going to the bottom etc. should no longer cause a noticeable delay.

Oniguruma

The Oniguruma regular expression library (version 3.8.2) is now used instead of the library I wrote myself. The main functional advantage is support of look-behind, but Oniguruma has a lot of other neat features. This is the library used in Ruby. The documentation for the regular expression syntax can be found here.

I still use my own library for normal regular expression searches (so the help book entry on the syntax still applies here). I do plan to switch, but there are some technical problems, which is the main reason I wrote my own library in the first place.

Conclusion

This completes the functional revision of the language definition system for this time. Not that I lack ideas for further improvements, but I know some users are eager to see similar attention given to other parts of TextMate, and that's what I will do.

My plan is to finish the (graphical) interface to this stuff, wrap up the various lose ends, and then do a real non-beta 1.1 release. And then start with the 1.1.x releases where I'll take care of some of the other things I want to improve, especially the project window needs a major overhaul including making the project drawer more like the Finder, supporting tabs for non-projects, and allowing split views.

As for the function popup (in a better form than the current Go to Symbol list), this will most likely appear before 1.1 final. But please stop nagging me about it!

23 Comments

  1. 24 Jun 2005 | # Gary Buckle wrote…

    Just downloaded TextMate. I use and have used just about every text editor known to man. UltraEdit (Win), BBEdit,Pepper,Nedit,SubEthaEdit,Jedit and so on. I will try to write some new language files and see how it performs. Very nice interface and a pleasure to use. Some features I would love to see. 1. FTP support. 2. File comparisons. 3. Splitting windows. 4. Support inside Transmit. 5. Easy creation of Language modules.

    I look forward to buying this product.

    Good luck

  2. 24 Jun 2005 | # Noel wrote…

    Hy

    I've tried a view other Code-Editors like these in the post before but TextMate is by far the best! I love its ability to auto-complete code and its easy to use!

    But since i've installed TM 1.1b13 and i code with ActionScript, the syntax-highlighting has been lost! I tried to install 1.1b12 again but there is no AS anymore, too! Since there is no documentation to this issue, I'm a little bit puzzled… Is this a Bug or a "Feature"?

    Greets form Switzerland!

  3. 24 Jun 2005 | # Allan Odgaard wrote…

    Noel: well, the included ReadMe.txt does mention that b13 only has limited bundles and the rest can be found on the svn repository.

    Though I'll release b14 later today and include Action Script for that.

  4. 24 Jun 2005 | # Noel wrote…

    Sorry, i din't see that… But i've found the svn repository and I'm downloading the bundles now too :-) I had really problems to find proper access to svn… But with Apple+K its easy to download the bundles… But why doesn't it work with 1.1b12 anymore?

    Greets, Noël

  5. 24 Jun 2005 | # Allan Odgaard wrote…

    What doesn't work with 1.1b12 anymore?

  6. 24 Jun 2005 | # Danski wrote…

    Yep, I've been stung by the ol' missing actionscript bundle too. I'll make a go of accessing the SVN repo.

  7. 24 Jun 2005 | # Jeff P. wrote…

    Since I use text editors for writing and never coding, I'm probably in the minority of your users, but before I can use TextMate on a regular basis it needs to handle non-fixed-width fonts. I can't stand doing any lengthy composing in any of the fixed-width fonts. I like Georgia, Verdana, or Lucida Grande especially for composing.

    I keep crossing my fingers with each beta release you'll implement this. Just letting you know what I'm eagerly waiting for. :-)

  8. 24 Jun 2005 | # Jason wrote…

    The one bundle I miss the most is Smarty — but other than that, I love TextMate. Each beta has improved on the one before, and I'm now at the point where I can't code in any other app.

  9. 24 Jun 2005 | # Andrew Dupont wrote…

    Allan, I've gone back and forth from 1.1b5 to the bleeding-edge release over the last month or so. What's keeping me at 1.1b5 right now is both stability and the syntax highlighting — I like the styles much better than those in the bundles I get from svn. (I work with HTML, PHP, JS, and sometimes Ruby/Ruby on Rails.) I poked around the style editor, but changing all the values manually looked prohibitively time-consuming. Is there anywhere I can get bundles that use the new style system but give me the old color schemes?

  10. 24 Jun 2005 | # geekdreams wrote…

    I'd also like the old styles back, as I prefer them to the color schemes used in the latest release.

  11. 24 Jun 2005 | # Gary Bloom wrote…

    I also use text editors to write with and I am also waiting for support for proportional fonts. I decided to move on from BBEdit when they dropped smart insert/delete as an option back at 6.5, but haven't yet found the right replacement. Proportional fonts and smart insert/delete and I'm on TextMate.

  12. 25 Jun 2005 | # geekdreams wrote…

    I submitted a bug report for odd Soft Wrap behavior, but as an extra request, is it possible to set Soft Wrap prefs for all new files and/or across an entire project without having to select the menu item for each file?

  13. 26 Jun 2005 | # Alexander Romanovich wrote…

    I second geekdreams soft wrap comment. This has been driving me nuts for several betas now. I would love to set soft wrap as the default for all files I open. As it is, I have to select soft wrap each time I open a new file or even switch to a different tab and back again. Otherwise great app which I've purchased as my BBEdit replacement. big sigh of relief

  14. 26 Jun 2005 | # Jake wrote…

    I love it so far … but native ftp support would be great ! That's the only reason why we're still using BB(loat)Edit ;(

  15. 26 Jun 2005 | # Hannu Rajaniemi wrote…

    I agree with Gary Bloom above. I'm also a prose writer who likes text editors, and support for proportional fonts would make me never leave TextMate…

  16. 27 Jun 2005 | # paladin wrote…

    I would also like to see the old syntax highlighting available as an option. I was fond of some of them. You were expecting this, right?

  17. 29 Jun 2005 | # Roshambo wrote…

    I miss the old typewriter-robot icon. The new one looks rather generic.

  18. 04 Jul 2005 | # Mark Patterson wrote…

    Geekdreams: To get the soft-wrap working, just do act-cmd-W twice. It's a bit digitally gymnastic, but still quicker than using the menu.

  19. 04 Jul 2005 | # Allan Odgaard wrote…

    Replying a bit late to this, but let's see if I can manage to address all of the above…

    Jeff, Gary, Hannu: As for proportional fonts, I'm sorry to report that while it's on the to-do, it has a really low priority ATM (so don't expect this before 1.2).

    Gary: I don't know what smart insert/delete is — is that the thing responsible for inserting/removing an extra space on cut'n'paste operations?

    Jason: The PHP bundle should contain a grammar for Smarty.

    Andrew, geekdreams, paladin: The new style system is a superset of the old system. Editing of themes is likely to be slightly better in 1.1b15, but the system is here to stay, and I leave it up to users to come up with nicer colors than those included by default. The All Hallow's Eve theme should also have been updated for 1.1b15 (so it better resembles the HTML/Ruby style of 1.1b5), though I'm still waiting for DHH to checkin the update.

    Andrew: You mention stability as one of the concerns with using latest beta. I don't know if this is only a concern, or you actually do have stability problems. If it's the latter, please provide me with as much detail as possible.

    geekdreams, Alexander: initial soft wrap setting was broken in 1.1b14. I have released this minor update to fix it.

    As for setting the default value for file types, this is already possible, and might be what makes you think it has always been broken (it ships with soft wrap enabled for text files and disabled for source files). See this letter for more details. Scope specific overrides of various settings will be more apparent in the GUI in the future.

    paladin: as for expecting this; having done TM for approximately 10 months now, I'm still surprised by the ratio between feedback related to visual and non-visual things :)

    Roshambo: yes, the old icon had more personality, but was unfortunately disliked by a lot of users (and a lot of users submitted replacement icons). So the current was chosen from the pool of user contributions, and while it may appear more generic, it's still sufficiently different from other application icons for it to not be mixed up with a different application, and I think the rendering of it is very nice, plus it's a better fit with the Aqua icon guidelines.

  20. 05 Jul 2005 | # Geir wrote…

    The fact that you're making the test releases publicly available by definition makes them beta software, not alpha. See Jargon File:

    http://www.clueless.com/jargon3.0.0/beta.html

  21. 20 Jul 2005 | # Anonymous wrote…

    I agree that FTP support would be great, and wanted to add a request for SFTP as well.

    Also, as one of the anonymous submitters who originally talked about the "alpha" and "beta" thing– it doesn't really matter, of course. You can use the terms however you want, especially since all version numbers are rather suspect these days.

    Also, the alpha/beta/release candidate distinction makes more sense for a different development model, i.e. one which distributes physical media, has well-defined feature sets for a given version, tracks milestones, etc.

  22. 03 Aug 2005 | # Gary Bloom wrote…

    "Gary: I don’t know what smart insert/delete is – is that the thing responsible for inserting/removing an extra space on cut’n'paste operations?"

    Yes. All word processors have it. BBEdit used to have it. Siegel removed it as an option because "it confused users." Excuse me? Coders can't figure out a simple feature that every word processor has?

  23. 03 Aug 2005 | # Douglas wrote…

    It isn't betas which are feature complete, those are Release Candidates.

Comments closed, you can use the mailing list for discussion.