EmPy 4.1 release announcement

I’m pleased to announce the release of EmPy 4.1.

The 4.x series is a modernization of the software and a revamp of the EmPy system to update its feature set and make it more consistent with the latest Python versions and practices. EmPy 4.x was also relicensed to BSD.

The 4.x series adds new markups, including inline comments, backquote literals, chained if-then-else extended expression, functional expressions, support for modern Pythonic controls, stringized and multiline significators, named escapes, diacritics, icons, emojis, and customizable extension markups.

It adds support for configuration objects (replacing options dictionaries); native support for Unicode, file buffering, reference counted sys.stdout proxies and error dispatchers and handlers; fixes several serious bugs; has a set of full unit and system tests, an extensive builtin help system; and the online documention has been rewritten and expanded. It also allows customizing the underlying interpreter core.

Attempts have been made to make EmPy 4.x as backward compatible as is practical. Most common markup has not changed; the only changes being removal of repr (in favor of backquote literals) as well as literal close parenthesis, bracket and brace markup; in-place markup has changed syntax (to make way for emojis); and extension/custom markup is now parsed more sensibly.

Most backward-incompatible changes are in the embedding interface. The Interpreter constructor and global expand function now require keyword arguments to prevent further backward compatibility problems, though effort has been made to make the behavior as backward compatible as possible. The supported environment variables have changed, as well as the filter, diversion and hook APIs, and options dictionaries no longer exist (in deference to configurations).

For a comprehensive list of changes from 3.x to 4.x, see:
http://www.alcyone.com/software/empy/ANNOUNCE.html#changes

Introduction: Welcome to EmPy!

EmPy is a powerful, robust and mature templating system for inserting Python code in template text. EmPy takes a source document, processes it, and produces output. This is accomplished via expansions, which are signals to the EmPy system where to act and are indicated with markup. Markup is set off by a customizable prefix (by default the at sign, @). EmPy can expand arbitrary Python expressions, statements and control structures in this way, as well as a variety of additional special forms. The remaining textual data is sent to the output, allowing Python to be used in effect as a markup language.

EmPy also supports hooks, which can intercept and modify the behavior of a running interpreter; diversions, which allow recording and playback; filters, which are dynamic and can be chained together; and a dedicated user-customizable callback markup. The system is highly configurable via command line options, configuration files, and environment variables. An extensive API is also available for embedding EmPy functionality in your own Python programs.

EmPy also has a supplemental library for additional non-essential features (emlib), a documentation building library used to create this documentation (emdoc), and an extensive help system (emhelp) which can be queried from the command line with the main executable em.py (-h/--help, -H/--topics=TOPICS). The base EmPy interpreter can function with only the em.py/em file/module available.

EmPy can be used in a variety of roles, including as a templating system, a text processing system (preprocessing and/or postprocessing), a simple macro processor, a frontend for a content management system, annotating documents, for literate programming, as a souped-up text encoding converter, a text beautifier (with macros and filters), and many other purposes.

Markup overview

Expressions are embedded in text with the @(...) notation; variations include conditional expressions with @(...?...!...) and the ability to handle thrown exceptions with @(...$...). As a shortcut, simple variables and expressions can be abbreviated as @variable, @object.attribute, @sequence[index], @function(arguments...), @function{markup}{...} and combinations. Full-fledged statements are embedded with @{...}. Control flow in terms of conditional or repeated expansion is available with @[...]. A @ followed by any whitespace character (including a newline) expands to nothing, allowing string concatenations and line continuations. Line comments are indicated with @#... including the trailing newline. @*...* allows inline comments. Escapes are indicated with @\...; diacritics with @^...; icons with @|...; and emoji with @:...:. @%..., @%!..., @%%...%% and @%%!...%% indicate “significators,” which are distinctive forms of variable assignment intended to specify document metadata in a format easy to parse externally. In-place expressions are specified with @$...$...$. Context name and line number changes can be made with @?... and @!..., respectively. A set of markups (@((...)), @[[...]], @{{...}}, @<...>) are customizable by the user and can be used for any desired purpose. @`...` allows literal escaping of any EmPy markup. And finally, a @@ sequence (the prefix repeated once) expands to a single literal at sign.

The prefix defaults to @ but can be changed with the command line option -p/--prefix=CHAR (environment variable: EMPY_PREFIX, configuration variable: prefix).

Getting the software

The current version of EmPy is 4.1.

The official URL for this Web site is http://www.alcyone.com/software/empy/.

The latest version of the software is available in a tarball here:
http://www.alcyone.com/software/empy/empy-latest.tar.gz.

The software can be installed through PIP via this shell command:

% python3 -m pip install empy

For information about upgrading from 3.x to 4.x, see
http://www.alcyone.com/software/empy/ANNOUNCE.html#changes.

Requirements

EmPy works with any modern version of Python. Python version 3.x is expected to be the default and all source file references to the Python interpreter (e.g., the bangpath of the .py scripts) use python3. EmPy also has legacy support for versions of Python going back all the way to 2.3, with special emphasis on 2.7 regardless of its end-of-life status. It has no dependency requirements on any third-party modules and can run directly off of a stock Python interpreter.

EmPy will run on any operating system with a full-featured Python interpreter; this includes, but is probably not limited to, Linux, Windows, and macOS (Darwin). Using EmPy requires knowledge of the Python language.

EmPy is also compatible with several different Python implementations:

Implementation

Supported versions

Description

CPython

2.3 to 2.7; 3.0 and up

Standard implementation in C

PyPy

2.7; 3.2 and up

Implementation with just-in-time compiler

IronPython

2.7; 3.4 and up

Implementation for .NET CLR and Mono

Jython

2.7 (and up?)

Implementation for JVM

It’s probable that EmPy is compatible with earlier versions than those listed here (potentially going all the way back to 2.3), but this has not been tested.

Only a few .py module file(s) are needed to use EmPy; they can be installed system-wide through a distribution package, a third-party module/executable, or just dropped into any desired directory in the PYTHONPATH. A minimal installation need only install the em.py file, either as an importable module and an executable, or both, depending on the user’s needs.

EmPy also has optional support for several third-party modules; see Emoji markup for details.

The testing system included (the test.sh script and the tests and suites directories) is intended to run on Unix-like systems with a Bourne-like shell (e.g., sh, bash, zsh, etc.). EmPy is routinely tested with all supported versions of all available interpreters.

If you find an incompatibility with your Python interpreter or operating system, let me know.

License

This software is licensed under BSD (3-Clause).

Recent release history (since 3.x)

4.1 (2024 Mar 24)

Add support for extension markup @((...)), @[[...]], @{{...}}, @<...>, etc., with custom callbacks retained for backward compatibility; add @[match] control support; add interpreter cores for overriding interpreter behavior; add more command line option toggles; add notion of verbose/brief errors; more uniform error message formatting; various documentation updates.

4.0.1 (2023 Dec 24)

Add root context argument, serializers, and idents to interpreter; fix setContext... methods so they also modify the currents stack; better backward compatibility for expand function and CompatibilityError; fix inconsistent stack usage with expand method; add error dispatchers, cleaner error handling and ignoreErrors; have expand method/function raise exceptions to caller; eliminate need for FullContext class distinct from Context; support comments in “clean” controls; add --no-none-symbol option; add clearer errors for removed literal markup; add Container support class in emlib; hide non-standard proxy attributes and methods; support string errors (why not); update and expand tests; help subsystem and documentation updates.

4.0 (2023 Nov 29)

A major revamp, refresh, and modernization. Major new features include inline comments @*...*; backquote literals @`...`; chained if-then-else expressions; functional expressions @f{...}; full support for @[try], @[while ...] and @[with ...] control markup; @[defined ...] control markup; stringized and multiline significators; named escapes @\^{...}; diacritics @^...; icons @|...; emojis @:...:; configurations; full Unicode and file buffering support; proxy now reference counted; hooks can override behavior; many bug fixes; an extensive builtin help system (emhelp); and rewritten and expanded documentation in addition to a dedicated module (emdoc). Changes include relicensing to BSD, interpreter constructor now requires keyword arguments, -d/--delete-on-error instead of “fully buffered files”; cleaned up environment variables; “repr” markup replaced with emoji markup; remove literal markups @), @], @}; context line markup @!... no longer pre-adjusts line; custom markup @<...> now parsed more sensibly; filter shortcuts removed; context now track column and character count; auxiliary classes moved to emlib module; use argv instead of argc for interpreter arguments. See Full list of changes between EmPy 3.x and 4.0 for a more comprehensive list.


Changes

Major new features added

Here is a list of the major new features introduced in EmPy 4.x. See the documentation for more information.

Added markup

Inline comments @*...*

EmPy now supports comments which can be embedded anywhere and do not consume the whole line, and can even span multiple lines.

Backquote literals: @`...`

EmPy now has a mechanism for quoting any literal text, including EmPy markup. Note that this markup syntax replaces the old “repr” markup.

Chained if-then-else expressions: @(...?...!...?...!...)

If-then-else extended expressions can now be chained indefinitely.

Functional expressions: @f{...}

Simple expressions have been extended to support functional expressions, which allow calling objects whose arguments are expanded EmPy markup (and thus are strings), rather than Python expressions. The syntax can be repeated for multiple arguments, e.g., @f{argument1}{argument2}....

Full support for @[try] control with @[except ...], @[else], and @[finally]

All legal @[try] control markup equivalents of the Python try control structure are now supported.

Support for @[else] within @[while ...] control

All legal @[while ...] control markup equivalents of the Python while control structure are now supported.

@[with ...] control

The EmPy equivalent of the Python with control structure is now supported.

@[defined ...] control

There is now a new control markup which allows testing for the existence of variables in the globals or (optionally) locals dictionaries.

Stringized significators: @%!... NL

Significators now have a “stringized” form, which allows their values to be unquoted strings, rather than arbitrary Python expressions.

Multiline significators: @%%...%% NL, @%%!...%% NL

There are now multiline forms of significators, as well as stringized variants.

Named escapes: @\^{...}

There is now an extension of the escape markup which allows specifying escape control characters by name rather than having to use the ASCII/Unicode code point value, e.g., @\^{ESC} for the escape character.

Diacritics: @^...

There is now support for joining characters with Unicode combiners and normalizing the results, allowing the inclusion of accented characters without the need for a Unicode keyboard, e.g., @^e' is a lowercase E with an acute accent.

Icons: @|...

There is now support for user-specified icons, a set of key-value pairs which can be used by the user for arbitrary means. A default set is included, e.g., @|:) represents the smiling face emoji.

Emojis: @:...: with third-party module support

There is now support for specifying emojis by name, as well as suport for third-party emoji modules, e.g., @:VOLCANO: for the volcano emoji.

Extension markup: @((...)), @[[...]], @{{...}}, @<...>

Custom markup has been expanded and reengineered into extension markup, allowing the customization of more markups as well as the addition of completely new customizable markups.

@[match ...] control

A new EmPy control markup has been introduced to support the behavior of the now-standard match Python control structure.

Other additions

Configuration objects

Instead of a primitive options dictionary, there is now a full-fledged Configuration object which encapsulates all the configurable behavior of an EmPy interpreter. This can be created separately and shared between interpreters; if now configuration is specified, a default one is created.

Full support for Unicode

Previously “Unicode” support was awkward and strange and was specified with the -u/--unicode command line option. Now it is seamless (whether or not using open in text mode or codecs.open in binary mode) and that command line option is no different from specifying --binary for binary output.

Full support for file buffering

Proper file buffering (none, line, fixed, full) is now fully supported for both input and output. The default fixed buffering size has now also been significantly increased.

sys.stdout proxy now reference counted

The sys.stdout proxy is now reference counted when multiple interpreters are in use, rather than needing one proxy per interpreter.

Error dispatchers and handlers

There are now explicit error dispatchers and handlers which can be specified by the user whether to handle EmPy errors and how when they occur.

Hooks expanded and now can be used to override default interpreter behavior

Hooks have been significatly overhauled and have now been extended to allow return values for pre... and before... methods to override the standard behavior of the EmPy interpreter.

Interpreter cores can override underlying interpreter behavior

By default, of course, EmPy markup is translated into Python code, but this can be completely overridden with the use of interpreter cores.

Serious bugs fixed

Several serious bugs have been fixed, including a nasty \(O(n^2)\) complexity problem when parsing EmPy files containing statement markup containing many lines of text.

Full unit and system tests system

There is a now a full-fledged unit and system test regimen available via the test.sh script.

Extensive builtin help system

There is now a help system (implemented with the emhelp module if present) for getting help from the command line.

Documentation rewritten and expanded

The documentation has been completely rewritten and expanded to be as comprehensive as possible.

Upgrading from 3.x to 4.x

EmPy 4.x is largely compatible with 3.x, especially with regards to basic syntax, but some incompatibilities were necessary to move forward, particularly when using the embedding API. If you are upgrading from 3.x to 4.x, here are the changes in 4.x which may affect you. See the documentation for more information.

Changed markup

“repr” markup replaced backquote literal markup: @`...`

The “repr” markup has been removed and replaced with backquote literal markup. If you were using “repr” markup, do so explicitly with the expression markup, e.g., @(repr(...)).

Removed literal close parenthesis, bracket and brace markup: @), @], @}

These served no real purpose and have been removed. Just use an actual close parenthesis, bracket or brace instead.

If-then-else expression no longer supports : for “else”: @(...?...!...)

The use of : for the else delimiter in extended expressions was previously deprecated and has been removed in EmPy 4.0; use ! instead.

In-place markup replaced with emoji markup; in-place markup is now @$...$...$

In-place markup has changed form; it is now @$...$...$. @:...: is now used for emoji markup.

Context line markup (@!...) no longer attempts to pre-adjust line

Specifying the context line via markup previously attempted to adjust the line number so that the next line was the one specified in the markup. This was error-prone and confusing; now no such attempt is made, and the context affected is the line containing the markup, rather than the next one.

Custom (and extension) markup now parsed more sensibly: @<...>

Custom markup previously did not support contents containing a right angle bracket > except if it was preceded by a backslash; now custom markup is parsed by matching the same number of left angle brackets to start the markup that end it, allowing both left and right angle brackets to appear in its contents, e.g., @<<<This contains <angle brackets>.>>>. This also applies to other extension markup (e.g., @((...))).

Other changes

Relicensed to BSD

EmPy 4.0 was relicensed from LGPL to BSD. If your use of EmPy is affected by its license, you may need to take this into account.

Interpreter constructor and global expand function call now require keyword arguments

The interpreter Interpreter(...) constructor’s (and expand’s) arguments have changed over time, causing confusion. As of EmPy 4.0, the use of keyword arguments is required. This will generate a clear error the first time the change is encountered, but will cause no further problems even with additional changes.

Errors when calling global expand function now raise by default

When calling the standalone expand function, the behavior is now that exceptions raised during the expansion will be passed up to the caller. To change this behavior so that the ephemeral interpreter handles the exception, set dispatcher to True.

Specifying locals when calling expand has changed

Previously, extra keyword arguments to the standalone expand function were treated as a locals dictionary. Since expand has been changed to use keyword arguments to specify all the arguments for compatibility reasons, locals now need to be specified with an (optional) locals dictionary argument.

Use -d/--delete-on-error instead of “fully buffered files”

“Fully buffered files” was a method of deferring all output until the file was closed successfully to assist supporting the use of EmPy in a build system such as GNU Make. This was awkwardly named (it had nothing to do with actual file buffering) and was error-prone; use -d/--delete-on-error instead.

Cleaned up environment variable names

The environment variable names have been cleaned up and expanded. If you are using environment variables with EmPy, check to see whether the variables you are using have changed.

Options dictionary replaced with full-fledged configurations

If you are using the options dictionary, this has been replaced with a Configuration class starting in EmPy 4.0.

Filter shortcuts removed and filter API revised

Filter “shortcuts” (special objects representing certain types of filters) are un-Pythonic and have been removed. Also, the API has changed to be more clear.

Contexts now track name, line, column, and character (Unicode code point) count

The identify pseudomodule interpreter now returns a 4-tuple (including name, line number, column number, and character count), and formatted contexts including three items separated by colons (including name, line number, and column number). Custom context formats can be specified if desired.

Diversions API method names changed to be more clear

The API for diversions has been slightly changed, in particular distinguishing between methods which apply to diversions vs. diversion names.

Hook API completely revised; many hook events added

The hook API has been substantially revised and rationalized; it now supports overriding standard behavior.

Hook, Filter classes are now in emlib

The auxiliary Hook and Filter classes are now in a dedicated emlib module.

New emdoc module

There’s a new emdoc module used for creating this documentation.

Exposed global attributes on interpreter simplified; now only version, major, minor and compat

Previously, the interpreter exposed many auxiliary attributes; this has been simplified to just the ones relating identifying the running EmPy system.

Use argv interpreter attribute instead of args

Previously the interpreter exposed both argv and args attributes to represent EmPy script arguments, with argv corresponding to sys.argv (i.e., it includes the script name as argv[0]) and args being equivalent to argv[1:]. This was redundant and so args has been removed; use argv instead.

Full list of changes between EmPy 3.x and 4.x

  • Re-licensed from LGPL to BSD

  • Completely rewrote and expanded this documentation

  • Some serious \(O(n^2)\) inefficiencies in parsing and re-parsing after transient errors have been fixed

  • Some environment variable name cleanup

  • Added an optional library module for non-essential support classes (emlib)

  • Added a full-fledged, but optional, help system (emhelp)

  • Added a module to assist with generating (this) documentation (emdoc)

  • Interpreter constructor redesigned; recommend always using keyword arguments when creating an Interpreter

  • Configuration objects: If unknown configuration attributes are set or if invalid configurations are detected, the interpreter will raise a ConfigurationError; this replaces the under-used “options” concept from the interpreter

  • Configuration resource files: -c/--config-file=FILENAME

  • The “Unicode subsystem” backend (originally needed for seamless Unicode compatibility when Python 2 was released) was completely reworked; -u/--binary/--unicode still exists but now means nothing more than -u/--binary/--unicode

  • Full support for selecting Unicode encodings and error handlers for both input files and output files; exceptions are raised in the event that incompatible options are detected

  • Added full support for specifying file buffering

  • Specifying no EmPy script on the command line, or using the -i/--interactive option, goes into interactive mode, which is always line-buffered

  • Cleaned up escape codes and added a few extensions

  • The context line token (@!...) no longer attempts to adjust the line number so that the following line is the specified line number; this was error prone and potentially confusing

  • The stdout proxy file object is now installed only when needed, is now reference counted, and is checked for consistency (calling out interfering interpreters)

  • Filter shortcuts have been removed as they were un-Pythonic; corresponding classes exist in emlib

  • Filters can now also be prepended

  • Context strings now include the filename, line number and the column number

  • Context identifiers (empy.identify()) are now a 4-tuple consisting of the filename, line number, column number, and character count

  • Context methods now properly affect both the context and currents stack

  • Added support for context formatting

  • Some interpreter methods was inconsistently handling the context stack; this was addressed and approved methods are now documented

  • “Fully buffered files” (itself something of a misnomer) have been removed; use -d/--delete-on-error instead

  • Added an option for no output (-q/--no-output)

  • Added -S/--string=STR option

  • Added -Q/--postprocess=FILENAME option

  • Added -G/--postfile=FILENAME option

  • Diversions now have a name attribute

  • Added support for diversions spooling to files

  • Pseudomodule routines which manipulate diversion names rather than diversions are now so named

  • Completely revised hook API

  • Added many hook events

  • Added interpreter cores

  • Having a missing custom callback function is now an error by default

  • Added error handlers and dispatchers

  • Added brief vs. verbose errors

  • Hooks named pre... or before... can return a true value to indicate they’ve handled the event and that the normal processing should be skipped

  • The ability to use : in if-then-else expression markup to delimit the “else” condition, which was previously deprecated, has been removed; use ! instead (e.g., @(...?...!...))

  • If-then-else expression markup has been extended so it can be chained indefinitely (if-then-if-then-…-else): @(...?...! ...?...!...)

  • Newlines in expressions (say, due to word wrap) are replaced with spaces before evaluation if --no-replace-newlines is not specified

  • Removed literal close parenthesis bracket, and brace markup: @), @], @}

  • expand interpreter method and global function now consistently reraise errors to caller

  • Added inline comments: @*...*

  • Changed unnecessary “repr” markup to a backquote literal markup: @`...` (for the equivalent of “repr” markup, just use @repr(...))

  • Added functional expressions: @f{...}

  • Added multiline significators: @%%...%%

  • Added stringized significators @%!... and multiline stringized significators @%%!...%%

  • In-place markup was changed from @:...:...: to @$...$...$

  • Added more escape codes (@\...)

  • Added named escapes (@\^{...})

  • Added diacritic markup: @^...

  • Added icon markup: @|...

  • Added emoji markup: @:...:

  • Added extension markup: @((...)), @[[...]], @{{...}}, @<...>, etc. with extensions

  • Added support for all legal usages of [finally] and @[else] within @[try] control markup

  • Added support for @[else] within @[while ...] control markup

  • Added new @[dowhile ...] control markup

  • Added new @[with ....] control markup

  • Added new @[defined ....] control markup

  • Added new @[match ...] control markup

  • Newlines are allowed within control markup; they are treated as spaces

  • Custom markup (@<...>) and other extension markups are parsed more simply and usefully

  • Automatically swap markup symbols for an alternate prefix if there is a conflict (e.g., when specifying that the prefix is $ instead of @, @$...$...$ markup becomes $@...@...@)

  • Removed VERSION (EmPy version) interpreter attribute and replaced it with version; also added major, minor (detected Python version) and compat (list of compatibility features that needed to be enabled)

  • Removed Interpreter, Hook and Filter aliases as interpreter attributes; use em or emlib instead

  • Removed args interpreter attribute; use argv instead

  • Raise a ConsistencyError for problematic issues detected before and after running

  • Add system information reporting: -W/--info

  • Add a details system for submitting bug reports: -Z/--details

  • … And did a great deal of refactoring!