Syntax¶
This chapter describes how Hy source code is understood at the level of text,
as well as the abstract syntax objects that the reader (a.k.a. the parser)
turns text into, as when invoked with hy.read
. The basic units of
syntax at the textual level are called forms, and the basic objects
representing forms are called models.
Following Python, Hy is in general case-sensitive. For example, foo
and
FOO
are different symbols, and the Python-level variables they refer to are
also different.
An introduction to models¶
Reading a Hy program produces a nested structure of model objects. Models can
be very similar to the kind of value they represent (such as Integer
, which is a subclass of int
) or they can be
somewhat different (such as Set
, which is ordered,
unlike actual set
s). All models inherit from Object
, which stores textual position information, so tracebacks
can point to the right place in the code. The compiler takes whatever models
are left over after parsing and macro-expansion and translates them into Python
ast
nodes (e.g., Integer
becomes
ast.Constant
), which can then be evaluated or rendered as Python code.
Macros (that is, regular macros, as opposed to reader macros) operate on the
model level, taking some models as arguments and returning more models for
compilation or further macro-expansion; they're free to do quite different
things with a given model than the compiler does, if it pleases them to, like
using an Integer
to construct a Symbol
.
In general, a model doesn't count as equal to the value it represents. For
example, (= (hy.models.String "foo") "foo")
returns False
. But you
can promote a value to its corresponding model with hy.as-model
, or
you can demote a model with the usual Python constructors like str
or int
, or you can evaluate a model as Hy code with
hy.eval
.
Models can be created with the constructors, with the quote
or
quasiquote
macros, or with hy.as-model
. Explicit creation
is often not necessary, because the compiler will automatically promote (via
hy.as-model
) any object it's trying to evaluate.
Note that when you want plain old data structures and don't intend to produce
runnable Hy source code, you'll usually be better off using Python's basic data
structures (tuple
, list
, dict
, etc.) than models.
Yes, "homoiconicity" is a fun word, but a Hy List
won't provide any advantage over a Python list
when you're managing a
list of email addresses or something.
The default representation of models (via hy.repr
) uses quoting for
readability, so (hy.models.Integer 5)
is represented as '5
. Python
representations (via repr()
) use the constructors, and by default are
pretty-printed; you can disable this globally by setting hy.models.PRETTY
to False
, or temporarily with the context manager hy.models.pretty
.
- class hy.models.Object¶
An abstract base class for Hy models, which represent forms.
- class hy.models.Lazy(gen)¶
The output of
hy.read-many
. It represents a sequence of forms, and can be treated as an iterator. Reading each form lazily, only after evaluating the previous form, is necessary to handle reader macros correctly; seehy.read-many
.
Non-form syntactic elements¶
Shebang¶
If a Hy program begins with #!
, Hy assumes the first line is a shebang
line and ignores it. It's up
to your OS to do something more interesting with it.
Shebangs aren't real Hy syntax, so hy.read-many
only allows them
if its option skip_shebang
is enabled.
Whitespace¶
Hy has lax whitespace rules less similar to Python's than to those of most
other programming languages. Whitespace can separate forms (e.g., a b
is
two forms whereas ab
is one) and it can occur inside some forms (like
string literals), but it's otherwise ignored by the reader, producing no
models.
The reader only grants this special treatment to the ASCII whitespace characters, namely U+0009 (horizontal tab), U+000A (line feed), U+000B (vertical tab), U+000C (form feed), U+000D (carriage return), and U+0020 (space). Non-ASCII whitespace characters, such as U+2009 (THIN SPACE), are treated as any other character. So yes, you can have exotic whitespace characters in variable names, although this is only especially useful for obfuscated code contests.
Discard prefix¶
Like Clojure, Hy supports the Extensible Data Notation discard prefix #_
,
which is kind of like a structure-aware comment. When the reader encounters
#_
, it reads and then discards the following form. Thus #_
is like
;
except that reader macros still get executed, and normal parsing resumes
after the next form ends rather than at the start of the next line: [dilly #_
and krunk]
is equivalent to [dilly krunk]
, whereas [dilly ; and
krunk]
is equivalent to just [dilly
. Comments indicated by ;
can be
nested within forms discarded by #_
, but #_
has no special meaning
within a comment indicated by ;
.
Identifiers¶
Identifiers are a broad class of syntax in Hy, comprising not only variable
names, but any nonempty sequence of characters that aren't ASCII whitespace nor
one of the following: ()[]{};"'`~
. The reader will attempt to read an
identifier as each of the following types, in the given order:
Numeric literals¶
All of Python's syntax for numeric literals is supported in
Hy, resulting in an Integer
, Float
, or Complex
. Hy also provides a
few extensions:
Commas (
,
) can be used like underscores (_
) to separate digits without changing the result. Thus,10_000_000_000
may also be written10,000,000,000
. Hy is also more permissive about the placement of separators than Python: several may be in a row, and they may be after all digits, after.
,e
, orj
, or even inside a radix prefix. Separators before the first digit are still forbidden because e.g._1
is a legal Python variable name, so it's a symbol in Hy rather than an integer.Integers can begin with leading zeroes, even without a radix prefix like
0x
. Leading zeroes don't automatically cause the literal to be interpreted in octal like they do in C. For octal, use the prefix0o
, as in Python.NaN
,Inf
, and-Inf
are understood as literals. Each produces aFloat
. These are case-sensitive, unlike other uses of letters in numeric literals (1E2
,0XFF
,5J
, etc.).Hy allows complex literals as understood by the constructor for
complex
, such as5+4j
. (This is also legal Python, but Hy reads it as a singleComplex
, and doesn't otherwise support infix addition or subtraction, whereas Python parses it as an addition expression.)
Keywords¶
An identifier starting with a colon (:
), such as :foo
, is a
Keyword
.
Literal keywords are most often used for their special treatment in
expressions that aren't macro calls: they set
keyword arguments, rather than being passed in
as values. For example, (f :foo 3)
calls the function f
with the
parameter foo
set to 3
. The keyword is also mangled
at compile-time. To prevent a literal keyword from being treated specially in
an expression, you can quote
the keyword, or you can use it as the
value for another keyword argument, as in (f :foo :bar)
.
Otherwise, keywords are simple model objects that evaluate to themselves. Users
of other Lisps should note that it's often a better idea to use a string than a
keyword, because the rest of Python uses strings for cases in which other Lisps
would use keywords. In particular, strings are typically more appropriate than
keywords as the keys of a dictionary. Notice that (dict :a 1 :b 2)
is
equivalent to {"a" 1 "b" 2}
, which is different from {:a 1 :b 2}
(see
Dictionary literals).
The empty keyword :
is syntactically legal, but you can't compile a
function call with an empty keyword argument because of Python limitations.
Thus (foo : 3)
must be rewritten to use runtime unpacking, as in (foo #**
{"" 3})
.
- class hy.models.Keyword(value, from_parser=False)¶
Represents a keyword, such as
:foo
.- Variables:
name -- The string content of the keyword, not including the leading
:
. No mangling is performed.
- __bool__()¶
The empty keyword
:
is false. All others are true.
- __call__(data, default=<object object>)¶
Get the element of
data
named(hy.mangle self.name)
. Thus,(:foo bar)
is equivalent to(get bar "foo")
(which is different from(get bar :foo)
; dictionary keys are typically strings, nothy.models.Keyword
objects).The optional second parameter is a default value; if provided, any
KeyError
fromget
will be caught, and the default returned instead.
Dotted identifiers¶
Dotted identifiers are named for their use of the dot character .
, also
known as a period or full stop. They don't have their own model type because
they're actually syntactic sugar for expressions. Syntax
like foo.bar.baz
is equivalent to (. foo bar baz)
. The general rule is
that a dotted identifier looks like two or more symbols
(themselves not containing any dots) separated by single dots. The result is an
expression with the symbol .
as its first element and the constituent
symbols as the remaining elements.
A dotted identifier may also begin with one or more dots, as in .foo.bar
or
..foo.bar
, in which case the resulting expression has the appropriate head
(.
or ..
or whatever) and the symbol None
as the following element.
Thus, ..foo.bar
is equivalent to (.. None foo bar)
. In the leading-dot
case, you may also use only one constitutent symbol. Thus, .foo
is a legal
dotted identifier, and equivalent to (. None foo)
.
See the dot macro for what these expressions typically compile to.
See also the special behavior for expressions that begin
with a dotted identifier that itself begins with a dot. Note that Hy provides
definitions of .
and ...
by default, but not ..
, ....
,
.....
, etc., so ..foo.bar
won't do anything useful by default outside
of macros that treat it specially, like import
.
Symbols¶
Symbols are the catch-all category of identifiers. In most contexts, symbols
are compiled to Python variable names, after being mangled.
You can create symbol objects with the quote
operator or by calling
the Symbol
constructor (thus, Symbol
plays a role similar to the intern
function in other
Lisps). Some example symbols are hello
, +++
, 3fiddy
, $40
,
just✈wrong
, and 🦑
.
Dots are only allowed in a symbol if every character in the symbol is a dot.
Thus, a..b
and a.
are neither dotted identifiers nor symbols; they're
syntax errors.
As a special case, the symbol ...
compiles to the Ellipsis
object,
as in Python.
Mangling¶
Since the rules for Hy symbols and keywords are much more permissive than the rules for Python identifiers, Hy uses a mangling algorithm to convert its own names to Python-legal names. The steps are as follows:
Remove any leading underscores. Underscores are typically the ASCII underscore
_
, but they may also be any Unicode character that normalizes (according to NFKC) to_
. Leading underscores have special significance in Python, and Python normalizes all Unicode before this test, so we'll process the remainder of the name and then add the leading underscores back onto the final mangled name.Convert ASCII hyphens (
-
) to underscores (_
). Thus,foo-bar
becomesfoo_bar
. If the name at this step starts with a hyphen, this first hyphen is not converted, so that we don't introduce a new leading underscore into the name. Thus--has-dashes?
becomes-_has_dashes?
at this step.If the name still isn't Python-legal, make the following changes. A name could be Python-illegal because it contains a character that's never legal in a Python name or it contains a character that's illegal in that position.
Prepend
hyx_
to the name.Replace each illegal character with
XfooX
, wherefoo
is the Unicode character name in lowercase, with spaces replaced by underscores and hyphens replaced byH
. Replace leading hyphens andX
itself the same way. If the character doesn't have a name, useU
followed by its code point in lowercase hexadecimal.
Thus,
green☘
becomeshyx_greenXshamrockX
and-_has_dashes
becomeshyx_XhyphenHminusX_has_dashes
.Take any leading underscores removed in the first step, transliterate them to ASCII, and add them back to the mangled name. Thus,
__green☘
becomes__hyx_greenXshamrockX
.Finally, normalize any leftover non-ASCII characters. The result may still not be ASCII (e.g.,
α
is already Python-legal and normalized, so it passes through the whole mangling procedure unchanged), but it is now guaranteed that any names are equal as strings if and only if they refer to the same Python identifier.
You can invoke the mangler yourself with the function hy.mangle
, and try to undo this (perhaps not quite successfully) with hy.unmangle
.
Mangling isn't something you should have to think about often, but you may see
mangled names in error messages, the output of hy2py
, etc. A catch to be
aware of is that mangling, as well as the inverse "unmangling" operation
offered by hy.unmangle
, isn't one-to-one. Two different symbols,
like foo-bar
and foo_bar
, can mangle to the same string and hence
compile to the same Python variable.
String literals¶
Hy allows double-quoted strings (e.g., "hello"
), but not single-quoted
strings like Python. The single-quote character '
is reserved for
preventing the evaluation of a form, (e.g., '(+ 1 1)
), as in most Lisps
(see Additional sugar). Python's so-called triple-quoted strings (e.g.,
'''hello'''
and """hello"""
) aren't supported, either. However, in Hy, unlike
Python, any string literal can contain newlines; furthermore, Hy has
bracket strings. For consistency with Python's
triple-quoted strings, all literal newlines in literal strings are read as in
"\n"
(U+000A, line feed) regardless of the newline style in the actual
code.
String literals support a variety of backslash escapes.
Unrecognized escape sequences are a syntax error. To create a "raw string" that
interprets all backslashes literally, prefix the string with r
, as in
r"slash\not"
.
By default, all string literals are regarded as sequences of Unicode characters.
The result is the model type String
.
You may prefix a string literal with b
to treat it as a sequence of bytes,
producing Bytes
instead.
Unlike Python, Hy only recognizes string prefixes (r
, b
, and f
) in
lowercase, and doesn't allow the no-op prefix u
.
F-strings are a string-like compound construct documented further below.
- class hy.models.String(s=None, brackets=None)¶
Represents a literal string (
str
).- Variables:
brackets -- The custom delimiter used by the bracket string that parsed to this object, or
None
if it wasn't a bracket string. The outer square brackets and#
aren't included, so thebrackets
attribute of the literal#[[hello]]
is the empty string.
Bracket strings¶
Hy supports an alternative form of string literal called a "bracket string"
similar to Lua's long brackets. Bracket strings have customizable delimiters,
like the here-documents of other languages. A bracket string begins with
#[FOO[
and ends with ]FOO]
, where FOO
is any string not containing
[
or ]
, including the empty string. (If FOO
is exactly f
or
begins with f-
, the bracket string is interpreted as an f-string.) For example:
(print #[["That's very kind of yuo [sic]" Tom wrote back.]])
; "That's very kind of yuo [sic]" Tom wrote back.
(print #[==[1 + 1 = 2]==])
; 1 + 1 = 2
Bracket strings are always raw Unicode strings, and don't allow the r
or
b
prefixes.
A bracket string can contain newlines, but if it begins with one, the newline is removed, so you can begin the content of a bracket string on the line following the opening delimiter with no effect on the content. Any leading newlines past the first are preserved.
Sequential forms¶
Sequential forms (Sequence
) are nested forms
comprising any number of other forms, in a defined order.
- class hy.models.Sequence(iterable=(), /)¶
An abstract base class for sequence-like forms. Sequence models can be operated on like tuples: you can iterate over them, index into them, and append them with
+
, but you can't add, remove, or replace elements. Appending a sequence to another iterable object reuses the class of the left-hand-side object, which is useful when e.g. you want to concatenate models in a macro.When you're recursively descending through a tree of models, testing a model with
(isinstance x hy.models.Sequence)
is useful for deciding whether to iterate overx
. You can also use the Hyrule functioncoll?
for this purpose.
Expressions¶
Expressions (Expression
) are denoted by
parentheses: ( … )
. The compiler evaluates expressions by checking the
first element, called the head.
If the head is a symbol, and the symbol is the name of a currently defined macro, the macro is called.
Exception: if the symbol is also the name of a function in
hy.pyops
, and one of the arguments is anunpack-iterable
form, thepyops
function is called instead of the macro. This makes reasonable-looking expressions work that would otherwise fail. For example,(+ #* summands)
is understood as(hy.pyops.+ #* summands)
, because Python provides no way to sum a list of unknown length with a real addition expression.
If the head is itself an expression of the form
(. None …)
(typically produced with a dotted identifier like.add
), it's used to construct a method call with the element afterNone
as the object: thus,(.add my-set 5)
is equivalent to((. my-set add) 5)
, which becomesmy_set.add(5)
in Python.Exception: expressions like
((. hy R module-name macro-name) …)
, or equivalently(hy.R.module-name.macro-name …)
, get special treatment. Theyrequire
the modulemodule-name
and call its macromacro-name
, so(hy.R.foo.bar 1)
is equivalent to(require foo) (foo.bar 1)
, but without bringingfoo
orfoo.bar
into scope. Thushy.R
is convenient syntactic sugar for macros you'll only call once in a file, or for macros that you want to appear in the expansion of other macros without having to callrequire
in the expansion. As withhy.I
, dots in the module name must be replaced with slashes.
Otherwise, the expression is compiled into a Python-level call, with the head being the calling object. (So, you can call a function that has the same name as a macro with an expression like
((do setv) …)
.) The remaining forms are understood as arguments. Useunpack-iterable
orunpack-mapping
to break up data structures into individual arguments at runtime.
The empty expression ()
is legal at the reader level, but has no inherent
meaning. Trying to compile it is an error. For the empty tuple, use #()
.
- class hy.models.Expression(iterable=(), /)¶
Represents a parenthesized Hy expression.
List, tuple, and set literals¶
Dictionary literals¶
Literal dictionaries (dict
, Dict
) are
denoted by { … }
. Even-numbered child forms (counting the first as 0)
become the keys whereas odd-numbered child forms become the values. For
example, {"a" 1 "b" 2}
produces a dictionary mapping "a"
to 1
and
"b"
to 2
. Trying to compile a Dict
with an
odd number of child models is an error.
As in Python, calling dict
with keyword arguments is often more
convenient than using a literal dictionary.
- class hy.models.Dict(iterable=(), /)¶
Represents a literal
dict
.keys
,values
, anditems
methods are provided, each returning a list, although this model type does none of the normalization of a realdict
. In the case of an odd number of child models,keys
returns the last child whereasvalues
anditems
ignore it.
Format strings¶
A format string (or "f-string", or "formatted string literal") is a string
literal with embedded code, possibly accompanied by formatting commands. The
result is an FString
, Hy f-strings work much like
Python f-strings except that the embedded code is in Hy
rather than Python.
(print f"The sum is {(+ 1 1)}.") ; => The sum is 2.
Since =
, !
, and :
are identifier characters in Hy, Hy decides where
the code in a replacement field ends (and any debugging =
, conversion
specifier, or format specifier begins) by parsing exactly one form. You can use
do
to combine several forms into one, as usual. Whitespace may be necessary
to terminate the form:
(setv foo "a")
(print f"{foo:x<5}") ; => NameError: name 'hyx_fooXcolonXxXlessHthan_signX5' is not defined
(print f"{foo :x<5}") ; => axxxx
Unlike Python, whitespace is allowed between a conversion and a format specifier.
Also unlike Python, comments and backslashes are allowed in replacement fields.
The same reader is used for the form to be evaluated as for elsewhere in the
language. Thus e.g. f"{"a"}"
is legal, and equivalent to "a"
.
- class hy.models.FString(s=None, brackets=None)¶
Represents a format string as an iterable collection of
hy.models.String
andhy.models.FComponent
. The design mimicsast.JoinedStr
.- Variables:
brackets -- As in
hy.models.String
.
- class hy.models.FComponent(s=None, conversion=None)¶
An analog of
ast.FormattedValue
. The first node in the contained sequence is the value being formatted. The rest of the sequence contains the nodes in the format spec (if any).
Additional sugar¶
Syntactic sugar is available to construct two-item expressions with certain heads. When the sugary characters are encountered
by the reader, a new expression is created with the corresponding macro name as
the first element and the next parsed form as the second. No parentheses are
required. Thus, since '
is short for quote
, 'FORM
is read as
(quote FORM)
. Whitespace is allowed, as in ' FORM
. This is all resolved
at the reader level, so the model that gets produced is the same whether you
take your code with sugar or without.
Macro |
Syntax |
---|---|
|
|
|
|
|
|
|
|
|
|
|
Reader macros¶
A hash (#
) followed by a symbol invokes the reader
macro named by the symbol. (Trying to call an undefined reader
macro is a syntax error.) Parsing of the remaining source code is under control
of the reader macro until it returns.
Comments¶
Comments begin with a semicolon (
;
) and continue through the end of the line.There are no multi-line comments in the style of C's
/* … */
, but you can use the discard prefix or string literals for similar purposes.