BML Basics
For decades the common software engineering wisdom was to create programs which work in exactly the same way under all and any conditions. Functional programming paradigm even defines notion of "pure function" which always produces the same result given the same arguments. On a surface, it seems like a great idea: it significantly simplifies reasoning about program behavior, allows to build more efficient compilers, apply more optimizations etc. Unfortunately it quickly loses its usefulness when we try to use computers not only to do some abstract computations, but actually process information received from the external physical world and especially when we want to make computer interact with the world. Trying to blindly apply abstract programming techniques to real world applications turned out to be a terrible idea and opened a pandora box of very serious problems. Most of those problems are actually relate to non-functional aspects of the system like security, scalability, fault tolerance, ability to continuously evolve and adapt to ever changing conditions without any downtime etc.this is context and not the core. paragraph doesn't say any thing about the language and should be removedThis is purpose why BML was created in the first place. It should be here in some form. We can debate actual content and wording. The bigger, more complex and distributed the system becomes, the less we found ourselves focusing on functional aspects of the system (they become a sort of small and obvious thing) and more on those non-functional aspects to the extent that those aspects become the center point of the overall design.
This changes everything and old wisdom no longer applies, at least not in its original form. Technological goal, however, remains the same: we want a technology which is as easy to use and understand as possible. We still want to focus on functional aspects of the system and not be burdened by all the rest. In other words, we want to program in terms of our intent rather than in terms of exact sequence of algorithmic steps. There is nothing new with this desire. People were trying to build specification- based programming systems for decades. Unfortunately none of them were successful enough to go mainstream.
BML is an attempt to solve the problem, at least partially. The idea is to create a language with familiar look-and-feel and yet capable of separating the intent from the actual execution strategy. It also capable of evolution in every meaning of the word: both syntax and semantics can be extended, new languages can be created on top of it etc.
Execution model
BML execution is a collection of collaborating BML processes. Each process executes a set of instructions in sequence (in order of appearance in the source code). This allows BML to have look-and-feel very similar to mainstream programming languages. The similarity, however, is only skin-deep. BML uses the concept of fiberswhere is the definition?link added (we will use terms "process" and "fiber" interchangeably) to implement collaborative multitasking between concurrent activities. That is, it will dynamically assign available CPU threads to processes (thus they are called fibers) and reassign them as necessary. Thus, it is possible to have a very large number of fibers, much larger than the number of threads.
Fibers collaborate with other fibers whether local or remote via data flow variables and data flow streams. Unlike actor-based execution models which uses mutable state and software transactional memory, bound data flow variables are immutable and mutability is modeled as streams of immutable values. This allows for eventual consistency and reasoning about distributed system as a whole despite constraints imposed by the CAP theorem.Claim without the explainationA link can be added when chapter about dataflow will become available
Source code definition provides natural code fragmentation directly derived from the overall code structure. For example, the following snippet:
if($x > $y): $x = $y - 1; |
defines two code fragments: the logical expression "$x > $y" and a nested scope with one assignment "$x = $y - 1;". Obviously, not all fragments are actually executed (e.g. if logical condition is not satisfied, the nested scope is simply ignored)
Execution stages
Each fragment of a BML program is always executed in two stages: template processing and evaluation. This happens in a tick-tock fashion where tick is TP processing and tock is evaluation. These phases always work together, but never mix with each other.
Template processing (TP)
Everything in BML is a data structure unless and until we decide to interpret it as an executable set of instructions which are evaluated one-by-one in sequence. Instructions may refer or contain nested code fragments (e.g. "if" instruction above refers to its logical expression and nested scope). Template processing stage allows to modify or even generate executable code right before evaluation. TP is also extensively used for general data structure manipulations.
"Deep" vs. "flat" template processing
Since most of the time we're dealing with nested data structures, TP processing can be done in two ways:
- "Deep" template processing will process all nested structures recursively
- "Flat" template processing will process only the first level of the structure
The need for "flat" processing comes from execution. Indeed, we need our code to be TP processed in order it to be executable, but we don't want to go deep in this case. First, we don't really need to because we will evaluate only top level instructions in any given code fragment. Second, it would be premature and potentially wasteful to process nested code fragments. The main purpose of TP is to adjust the code to the current conditions and we do not yet know those conditions for the nested code fragments. In fact we don't even know whether those fragments will ever get executed at all.
Thus only the first level of any code fragment gets template processed. This guarantees that processing happens only once per evaluation and right before it is evaluated, not any earlier.
Evaluation
Once template processing phase is completed, BML VM executes instructions in sequence, just as any other imperative language. However, despite the sequential semantic VM will decide concrete execution sequence based on actual data dependencies between fibers as well as presence of lambda variables (lambda variablesdefinition should be before thisit explicitly says that it will be explained later. I've added link. represent deferred computations and described later in this document).
Sandboxing and injection of control
As mentioned earlier, having a program to perform a static set of operations is generally a bad idea in a distributed concurrent system. Indeed, unless the program performs some local well-defined operation, the external conditions (or program environment) are almost never the same: input messages may come out of order or not at all or have duplicates, local computational resources may dwindle or be temporarily in excess which may render some algorithms more efficient than others etc. The point is that we should never treat given program as a static and fixed sequence of operators, but rather as an intent to execute given sequence of operations with implied data dependencies between them.
For example, imagine we have an action which takes an input message, processes it and responds with an output message. This is a very typical HTTP server functionality. In case of actual HTTP server the input message would come from a protocol-specific pipeline associated with a TCP connection socket. The output message would need to be sent to the same pipeline. If all this was hardcoded into the action it would create serious problems down the road. Imagine now that the client who posted the input message in the first place is now local and runs in the same JVM. At the very least communication would now incur unnecessary overhead of making loopback TCP connection and message serialization/deserialization, but also pose a security problem if the client, for example, had no permission to access network capabilities. This is a classic case of confused deputy type of attack when a client can elevate its own level of privileges by tricking a more privileged "deputy" to perform a restricted action on client's behalf.
Each time we encounter a situation like this, the original action design would have to be changed to introduce additional abstractions. Clearly, in this particular case the business of message processing must be well separated from the business of receiving and sending the message. Thus we would need to create separate abstractions for receiving, processing and sending the message, probably by splitting original action into several and designing some API to use between them. This, in turn, comes with its own price as those changes are not free: they result in development and maintenance costs, API overhead etc.
Situation becomes virtually hopeless when the roles are reversed. Imagine now that we need to invoke a piece of code which came from a less-than-trusted source, most probably dynamically created from some template. There is no way to tell what kind of harm it can do to our system when given full access to it. (Of course, in theory, we can do deep code analysis before invoking, but this would be prohibitively expensive.)
What if we did not have to deal with any of this? Sure, we can't control (or even know) the internals of our callee, but we can control the environment it executes within. That is, instead of communicating to callee our denial of access to actual communication pipeline we can simply give it a fake pipeline so it would not be even able to tell the difference. From his perspective he would be sending messages to a pipeline, but in reality they would go into our fake pipeline and then inspected by us or even simply discarded right away. As usual, a bit of cooperation and knowledge exchange would go a long way. For example, if we intend to discard calee's message anyway, it would be nice to prevent it from creating it in the first place to save some memory and CPU cycles. If we know which action it will use to compose the message we may substitute it too with an empty one. A simple name convention can help us achieve such level of collaboration. That is, let's say we establish a convention that "CreateOutput" action is generally used to create output messages. Then by injecting our own implementation into the callee environment we can effectively change its behavior. This is a two way collaboration however as we might as well break the callee by violating its invariants (e.g. if callee would try to access the message after it's creation and instead encounter null or empty message). Therefore it must be a developer's choice to what extent and how tight integration must go. The more we know about a component the tighter and more efficient integration we can establish with the component. However, the price of not knowing and not integrating must always be limited to performance and resource consumption only, never result in compromising security or other vital system properties.
To summarize:
- The only capability any piece of code is ever given within the BML VM is to access its environment which is expressed as a set of variables. Even its own code structure is subject to such access.
- The environment is a first class entity in BML and caller has full and complete control over environments of any invocations it makes, including invocation of its own code.
- Obviously, caller can't give to callee anything it does not have itself in the first place. This allows for strict and mostly automatic enforcement of Principle of Least Authority (POLA). (I.e. a callee cannot possibly do anything which caller itself cannot do)
Variables
A variable is a name possibly associated with (or bound to) a value. Variables are fundamental concepts of any programming language as they allow to give names to values stored in computer memory and then referring to them by those names. Some modern languages also allow to reserve those names ahead of time, i.e. creating placeholders for future values. These are known as futures and promises. BML provides the same semantic via dataflow variables and streams. It also introduces lambda variables discussed below.
As everything in BML, variable names are case-insensitive, e.g. $StdOut or $stdout or $sTdOuT represent the same variable and refer to the same value.
Variable sigil
BML does not require explicit variable declarations (except some special cases discussed later). Instead it relies on '$' (dollar sign) sigil to distinct variable name from other language artifacts. That is, every time engine encounters an unquoted string of alphanumeric characters with leading dollar character (e.g. $abc) it will interpret it as a variable name.
Path variables
A variable name can itself be a composite construct and represent a path into some nested data structure. Dot-notation is used for composite variable names. For example, given the following structure
$x: k1: v1; k2: k21: v21; k3: v3; |
variable $x would refer to the whole structure, $x.k1 would refer to value v1, $x.k2 refers to the nested structure of k2, $x.k2.k21 refers to value v21 and $x.k3 refers to value v3 correspondingly.
Path parameterization
Expressions can be used as path components as well. In this case expressions must be enclosed in parentheses. For example, given that $y = k2; a parameterized path $x.($y).k21 would be evaluated into $x.k2.k21 and refer to value v21. Expressions are not limited to simple variable substitution and can be virtually anything, e.g. path $x.("k" + "2").k21 would produce the same result as string concatenation expression "k" + "2" would yield string "k2" as a path component.
Important: Please note how dots separating path components are combined with expression parentheses. Parentheses after a name without separating dot is a syntax for a function call (e.g. $x($y) is an invocation of a function named $x with a single parameter which value is stored in variable $y while $x.($y) is a two component parameterized path variable. Note dot in between 'x' and '(')
Path components are not required to be string. They can be any java objects suitable to be used as keys in maps, like integers, for example: $x.(5).y. Note that in order to be recognized as non-strings they need to be defined as expressions. It is your responsibility to make sure that such non-string keys can be matched properly. For example, using double values as keys is usually a bad idea as two doubles computed by different expressions are rarely equal to each other (this is why you would never compare them for equality, but instead compare their difference with some small (e.g. 1.E-6 for floats) epsilon value). Another example would be confusing integers with longs or other numeric types.
Note that as per java language guidelines it is not recommended to use mutable objects as keys in maps except in identity maps. BML provides standard function to convert any [possibly mutable] java instance into an identity key which can be used in any map (in order to access such a key you'd have to give the same or another instance of identity key for the same target to the map's get() method)
Selectors
Disclaimer: At the time of this writing selectors are not yet implemented in BML. this chapter will be updated once they are.
Selector is another way of path parameterization which allows to select target structure element not just by its name, but also by some properties of the element itself. Selectors are expressions enclosed in square brackets embedded in a path. For example
$x[4] # select 4th entry of structure $x$x[name == "myname"] # select all sub-structures of $x which contains |
Selector expressions may evaluate into
- Single integers. In this case the integer is used as an index to select a sub-element
- Range of integers. In this case the result is a collection of sub-elements between given indices
- Boolean. The result is a collection of sub-items satisfying given condition.
Performance tip: Selector expressions are rather expensive operations and usually yield linear performance with respect to the size of structures they operate upon. Using and especially cascading them while working with large structures may greatly affect your application performance. It is recommended to design your data structures in a way which reduces need for selectors as much as possible (e.g. use integer keys instead of indices etc.). This is standard trade-off between speed and memory. Using numeric keys will consume more memory as keys will be stored explicitly within the structure, but they will provide faster (i.e. constant time) access vs. equivalent selector expression (linear time in many cases).
Variable stores
Unlike most of other languages which place all variables into a single flat namespace, BML uses different variable stores for variables with different properties or different usage. Variable names can be prefixed by variable store ID which in most cases can be reduced to a single character like $P.x.
Important: Although all variable names are case-insensitive, variable store prefixes are not. That is, they are case sensitive and expressed as capital letters or letter combinations. This style helps avoid confusion and typos in single character local variable names. For example $p is a local variable name while $P is probably a typo of a process variable name missing the actual name like $P.x.
Note that there is no any legal way to refer to a store as a whole, i.e. $P will be automatically coerced to local variable $L.P. Hence it is recommended to use small characters in variable names and use capitals only for global constants and variable store prefixes.
Each variable store may have its own hierarchy of frames and scopes discussed below.
BML VM provides the following variable stores:
Local Data store ($L)
Local data store is a store where all local scope variables are kept. This is the default store therefore the prefix can be omitted altogether. That is, $L.x and $x mean exactly the same local variable "x". Local data store is intended for all process-local computations and therefore is not shared with child processes or anyone else. Since local data store is bound to its process, it has the same lifetime as the originating process. Local data store can be addressed by $L, $LD or $LocalData prefixes or without any prefix.
Process Data store ($P)
Process data store is intended for data related to the whole process including child processes. It is therefore shared with them. The lifetime of the process data store is the same as its process. Process data store can be addressed by $P, $PD or $ProcessData prefixes.
Context Data store ($C)
Context data store is intended for data shared across all processes originated by the same context. Usually it means originated by the same connection. For example, in case of HTTP server a new process would be created for each individual request/response, but all of them would be created for the same connection (i.e. same context). Thus context data will be preserved between and shared across separate requests. The lifetime of the context data store is the same as its context (e.g. connection). Context data store can be addressed by $C, $CD or $ContextData prefixes.
Action Definition store ($A)
The sole purpose of the action variable store is to control action registry of a process. As it was mentioned before, caller has full control over all aspects of own and downstream environments including action registry. Unless you want to change the way how callee operates you probably will never use it directly. The lifetime of this store is the same as its originating process. Action data store can be addressed by $A, $AD or $ActionDef prefixes.
Facet Data store ($F)
As it will be discussed later, facet is a smallest unit of deployment in A-stack and usually represent a distinct capability or an end-point provided by an application (application is usually composed of multiple facets). Thus, facet data store is intended to store information shared across all [connection] contexts for a given facet. The lifetime of this store is the same as its originating facet. Facet data store can be addressed by $F, $FD or $FacetData prefixes.
Template Data store ($T)
Template data store is intended to support advanced template processing discussed later. It has a special property that variables from this store can be substituted into the result structure during template processing. At other times it behaves like local data store and can be used for arbitrary purposes. Template data store can be addressed by $T, $TD or $TemplateData prefixes.
Template Source Data store ($TS)
This store exists and can be used only during template processing. It provides access to the template source structure. Template source data store can be addressed by $TS or $TemplateSource prefixes.
Template Result Data store ($TR)
This store exists and can be used only during template processing. It provides access to the template result structure. Template result data store can be addressed by $TR or $TemplateResult prefixes.
System Data store ($$)
System data store can be addressed by $$ prefix. Variables in this store are populated by the engine itself. Such variables are called system variables. As it was mentioned earlier, these variables provide interface between BML process and the native a-stack facilities like communication pipelines. Also engine may manipulate with system variables in somewhat different fashion than "normal" variables. For example, since system variables should always be visible, system data store does not allow creation of frames and initial set of variables is read-only (that is , you can only assign or change the value of a system variable when you [re-]define it in your own scope).
Although engine provides dedicated system store, all variable names having dollar sign '$' in their names are reserved as system variables and subject for possible special treatments and exemptions from general rules. It is not recommended to use such variable names for general purposes as it may produce unexpected results.
Important: Please don't confuse variable sigil (which is also a dollar sign) with dollar sign used in variable name. For example in variable name $T.$myvar the first dollar sign is the sigil and the second is a part of the $myvar variable name. Local system variables can be expressed as $$myvar where, again, first dollar is the sigil and second is a part of the variable name. Please note the difference with $$.myvar (dot after "$$"). The latter is a reference to a system variable "myvar" while the former is a reference to a local variable $myvar. This is why it is not recommended to use dollar sign in variable names.
Global constants
There is a small number of global constants provided by the engine and accessible by their names. No sigils or other decorations are allowed on global constants.
Important: Since there are constants with one-character names, global constant names are case-sensitive and usually all capitals. This is the only exception from overall case-insensitivity.
Assignments
Assignments are the most basic operations in any programming language. Each assignment is composed of left-hand-side (LHS) and right-hand-side (RHS) expressions. Unlike other programming languages, BML employs two distinct assignment types: passive and active.
Passive assignments
Passive assignments are denoted with standard CDM key/value separator (e.g. colon and space) and used to define static data structures and regular code. For example
X: Y: V |
Here X is passively assigned a structural value and Y is passively assigned a string value "V". Most code definitions and literal data structures use passive assignments, e.g.
if($info == null): $info: FirstName: Oleg; LastName: Danilov; |
Here we assign a code fragment to the "if" statement and also assign literal data structure to a variable $info within the code fragment. Both code definition of "if" statement and variable assignment use passive assignments. In other words, passive assignment is what you would expect from a "this key has this value" statement in some serialization data format.
Important: Passive assignments do not perform any actions on their right-hand-side values.
Active assignments
Active assignments are denoted with =_ (equal sign and space) separator (or colon and equal sign :=) and used to tell the engine that you want to evaluate right-hand-side value before the assignment. For example
$a: 3 + 1; # passive assignment of a string$b = 3 + 1; # active assignment of a computed numeric value |
Two assignments above have very different meanings: first assignment is passive and thus it assigns variable $a string value of "3 + 1" while second assignment is active and therefore its RHS is evaluated first so variable $b is assigned numeric value of 4. In other words, active assignment is what you would expect from "variable equals expression" operator.
Another way of thinking about active assignments is that right-hand-side expression is "giving" its value to the left-hand-side expression. This way you can execute a piece of code by "giving" it to a scope or inline instruction, for example.
In case when RHS is a structure active assignment will perform deep template processing first.
Important: Active assignments always evaluate their right-hand-side value first. In case of a structural value it means deep template processing.
Style suggestion: Although perfectly legal, passive assignments which look like expressions can be confusing. It is recommended to wrap them in quotes to explicitly express the fact that they are just strings e.g.
$a: 3 + 1; # legal, but not recommended style$a: "3 + 1"; # recommended style: show it's a string |
Data types
Since BML VM is written in java it provides the same basic data types as JVM. As in most script languages there is no distinction between primitive and boxed data types (BML VM does not operate on primitive java types, at least directly). Type coercion and method signature selection rules are greatly simplified in order to achieve better performance (adhering to the same complex rules as java compiler would be prohibitively expensive):
- Boolean values are represented by two standard java objects Boolean.TRUE and Boolean.FALSE
- At the source level CDM parser distinguish between Integer, Long and Double types. It also supports unary minus and standard hex prefix "0x..."
- At expression level a special prefix of "0c" is recognized for single unicode character definition (e.g. 0cX is parsed as character code 'X' represented as instance of Short)
- A reasonable attempt is made to avoid type promotions unless necessary so you can expect that the result of addition of two Integers will remain Integer. When evaluating an expression the engine will check the arguments and then promote narrow types to the widest required type (from the data representation ability). For example when adding Long and Integer, Integer will be promoted to Long. When adding Integer and String, Integer will be promoted to String etc.
Printing and logging
There is no dedicated BML operator or function to print on the console. However the Log instruction can handle most if not all your printing needs. BML logging system supports integration with many popular java logging systems and is a superset of all of them. The configuration and complete description of logging system is beyond the scope of this document.
Basic usage is very simple, however: you can simply give it a message and it will print it using TEXT log level which is similar to java System.out.println() facility. For example
log: My static text message;log = "My " + "composed " + "message"; |
Note that we used active assignment in latter case in order to concatenate strings.
You can also use more involved variant where you can specify log parameters
log(level: INFO, message = "My " + "composed " + "message"); |
or combine the styles
log(level: INFO) = "My " + "composed " + "message"; |
Operators
Note: At the time of this writing the implemented set of operators is incomplete. This document will be updated as new operators are added
Unary operators
Expression |
Name |
Description |
-$x |
negate |
|
~$x |
bitwise inverse |
|
typeof $x |
get type string |
"Safe" operator as it will never throw an exception or get converted to lambda (except when a part of a large lambda expression). Type strings are as follows:
|
entryof $x |
get variable store entry |
"Safe" operator to get MlEntry object corresponding to the argument variable. |
valueof $x |
get value "as is" |
"Safe" operator to get argument variable value "as is". It may return values corresponding to internal representation of undefined or lambda values. |
useof $x |
get "useful" value |
"Safe" operator to get "useful" value of a variable or expression. Similar to "valueof", but will convert undefined value into null value. |
Note that since priority of any unary operator is higher than of other operators, complex expressions need to be parenthesized. That is
-$x.($y) # incorrect: minus is applied to $x only-($x.($y)) # correct: munus is applied to complex path access result |
Arithmetic operators
Expression |
Name |
Description |
$x + $y |
binary plus |
performs addition |
$x - $y |
binary minus |
performs subtraction |
$x * $y |
times |
performs multiplication |
$x / $y |
divide |
performs division |
Boolean operators
Expression |
Name |
Description |
!$x |
negation |
Converts true to false and vice versa |
Bitwise operators
Expression |
Name |
Description |
|
$x & $y |
bitwise AND |
|
|
$x |
$y |
bitwise OR |
|
$x ^ $y |
bitwise exclusive OR |
|
Bit shift operators
Expression |
Name |
Description |
$x << $y |
left shift |
|
$x >> $y |
right signed shift |
|
$x >>> $y |
right unsigned shift |
|
Updating operators
TBD
Numeric comparisons
Operator |
Name |
== |
equality |
!= |
inequality |
<= |
less than or equal |
< |
less than |
> |
greater than |
>= |
greater than or equal |
Operator precedence and associativity
Standard rules apply.
Functions
All registered functions can also be invoked as methods with first argument removed. That is, function invocation expression valueBy($items, 2) can also be written using method invocation style $items.valueBy(2).
Signature |
Parameters |
Description |
String string(Object) |
Any object |
Returns string representation of the argument as per java String.valueOf() method |
Action action(Object) |
Any object |
Returns internal action representation which can be used for action assignment. Returns null if argument is null. Throws error if argument is not null and does not represent an action |
ML markCC(ML) |
Any structure |
Marks given structure as compiled without actually compiling it. Allows to avoid [attempted] compilations of structures which are known to be pure data. This is used to improve performance and suppressing compilation errors in case structures contain incompatible syntax. |
boolean isStructure(Object) |
Any object |
Test given argument for specific type. Returns boolean value. |
Object valueBy(Object, int) |
|
Selects a value from a structure or array by given index. Throws exception when default value is not given and index is outside the container. Allows negative indices. |
MlEntry entryBy(Object, int) |
|
Selects an entry from a structure or wraps array element into an entry (index will be set as key). Returns null when index is outside the container. Allows negative indices. |
Methods
BML provides seamless java method invocation. That is, java methods can be directly invoked from BML expressions. For example:
$text: "abcd";$leng = $text.length(); # assigns 4 |
As it was mentioned above, for performance reasons, method signature matching algorithm is greatly simplified in comparison with the one used by java compiler. Methods get registered with the BML VM either by initial configuration or as they are invoked by the code. All method signatures with the same name are registered at once.
Main selection is done on number of arguments and method name. When this does not yield a single result then subsequent selection is done on receiver instance type and finally on arguments types match. The latter is simplified to matching boolean, numeric and all other arguments as objects. When simplified signatures collide the preference is given to more generic methods (i.e. this is opposite of what java compiler does). For example if you have overloaded methods instance.method(SomeInterface arg) and instance.method(Object arg) the latter will be invoked, not the former.
The reasoning for that is that more generic methods will likely to do their own type checking internally and more narrow signatures are mostly used to utilize static type checking by the java compiler to avoid those checks at runtime.
Control flow
Conditional evaluation
There are two forms of "if" statement in BML:
Multi-entry if
This is the most familiar form of "if" statement shared by majority of programming languages. It consists of first "if" entry, consequent optional "else_if"Is _ really necessary? According to my understanding of CDM syntax 'else if' is a perfectly valid key.yes, but then you can run into hard to detect problems when more than one space or tab is used. So yes, it is possible to have "else if" as a key, but "else if" (two spaces) or "else\tif" (tab) would not be recognized. Just don't want to create wrong impression that these are two independent keywords, which are not entries and final optional "else" entry:
if($x > 10): log: "$x > 10";else_if($x > 5): log: "$x > 5"; else: log: "$x is too small"; |
"else_if" can also be written in camel case as "ElseIf" or simply "elseif" (recall case-insensitivity).
The advantage of this form is that it look the most familiar.
Single-entry if
As the name suggests, single entry "if" packs the whole if structure into a single entry:
if($x > 10): |
Note the indentation differences between the two forms and presence of "then" keyword in the latter.
The advantage of single-entry "if" is that it executes just a tiny bit faster as engine does not need to skip inactive branches. It also can be configured as just that, a single entry (more on this later)
Repeated evaluation: loops
There are several constructs for repeated evaluation:
"While" loop
This should look very similar to other languages:
$i = 4; # initialize loop variable while($i > 0): # check loop condition log = "$i = " + $i; # compose log message $i = $i - 1; # we don't have unary decrement yet |
"For" loop
This also should look familiar
for($i = 0, $j = 5; $i < $j; $i = $i + 1, $j = $j - 1): log = "for loop: $i= " + $i + ", $j= " + $j; |
Recall that this is actually CDM syntax and therefore difference between commas and semicolons is immaterial. You could just as well use all commas or all semicolons with the same result. BML compiler actually distinguishes initializers (i.e. $i = 0, $j = 5;) from re-initializers (i.e. $i = $i + 1, $j = $j - 1) by the position of the logical expression between them (i.e. all expressions to the left of loop condition are initializers and all to the right are re-initializers). However it is recommended to use familiar style for better readability.
"ForEach" loop
Like in many other languages foreach loop is intended for iteration over something "iterable" like a collection of items. Same is here. The minimal syntax of a foreach loop is the following:
foreach(<value accessor>: <var name>, in: <collection reference>): ... |
Here value accessor is a keyword specifying how to extract the value from the object returned by the iterator:
- item – return the object as is
- entry – object is expected to be an ML entry, return as is
- key, FlatKey, flat_key – object is expected to be an ML entry, return entry key
- DeepKey, deep_key – current path of deep ML iterator
- value – object is expected to be an ML entry, return entry value
The accessor value in loop definition is the variable name which will be assigned.
For example
foreach(value: $val, in: $s.a.e.h): log = "foreach(value: $val, in: $s.a.e.h): val= "+$val; |
Here we are iterating over all values found in structure $s under deep path a.e.h.
Note that when given in this form it will iterate over all $s variables found in all visible scopes so if you have defined $s in more than one scope you'll see all the values across all structures in all visible scopes.
It is possible to narrow the loop to a single structure by specifying optional "from" parameter:
foreach(value: $val, in: e.h, from: $s.a): log = 'foreach(value: $val, in: e.h, from: $s.a): val= '+$val; |
Here we first get $s.a structure and then iterate over all e.h paths in this single structure only.
Note that in this case "in" parameter contains just a partial path and not a complete variable reference.
When working with composite objects like ML entry you can use more than one accessor
foreach(key: $key, value: $val, in: $s.a.e): log = "foreach(key,value, in: $s.a.e): key= "$key", val= "+$val; |
Here we are iterating over all entries of all visible variables $s.a.e and printing all key/value pairs
We can also iterate in reverse order by specifying "in_reverse" or "in_reversed" (or using camel case InReverse or InReversed):
foreach(value: $val, in_reverse: e.h, from: $s.a): log = "foreach(value: $val, in_reverse: e.h, from: $s.a): val= "+$val; |
By requesting a deep key, it is possible to "flatten" the structure and iterate over all nested substructures in the same loop:
foreach(deep_key: $key, value: $val, in: a, from: $s): log = "foreach deep_key: key= "+dotJoin($key, 1)+", val= "+$val; |
Here we are iterating over all substructures found under key "a" in structure "$s" including all their substructures if any. To print all deep keys we convert them to dot-separated strings dropping the first element (which is always "a"): dotJoin($key, 1)
It is also possible to leave "in" parameter without value which would indicate that we want simply iterate over all entries in the "from" structure. Requesting deep key would also include iteration over all sub-structures as well:
foreach(DeepKey: $key, value: $val, in:, from: $s.a): log = "foreach deep_key: key= "dotJoin($key)", val= "+$val; |
You can request both deep and flat keys at the same time:
foreach(deepkey: $dk, key: $key, value: $val, in:, from: $s.a): log = "foreach deep_key: dk= "dotJoin($dk)", key= "+$key; |
TODO:
- The functionality of foreach loop will be extended to numeric sequences and time interval sequences (this will render SffSequenceFacet obsolete)
Break and continue
Any loop can be terminated by using "break" instruction or skip the rest of the current iteration and "continue" to the next:
$s4 = 4; while($s4 > 0): log = "1:$s4 = " + $s4; $s4 = $s4 - 1; if($s4 <= 2): continue; log = "2:$s4 = " + $s4; |
It is also possible to "break" or "continue" multiple nested loops by assigning loop a label and giving the label name as a value of "break" or "continue" instruction
for(label: myloop; $i = 0; $i < 5; $i = $i + 1): for($j = 0; $j < 3; $j = $j + 1): if($i + $j > 5): break: myloop; |
"Return" keyword
"Return" instruction will disregard the rest of the scope code and return control to enclosing scope. It may also return any value which will be available at the point of resume as $$.Result system variable.
Notifications
There is no such things as errors or exceptions in a distributed concurrent system. There are just different kinds of notifications.
Notification types
Indeed, errors and failures in a distributed system are just a part of normal operation. There is nothing "exceptional" about them. Also, the traditional model of exception handling is too limited to handle real life scenarios. Instead of two-way logic of success vs. failure we actually need three-way logic of success vs. cancellation vs. failure.
Success notification
This is, obviously, notification that everything went well and the process has completed as expected. This is "happy path" notification. At any time a process can issue success notification using "success" instruction.
Cancellation or discard notification
This kind of notification is issued when process is cancelled or discarded because its services are no longer required. A typical example would be when multiple instances of the same process were initiated to perform the same operation in fault-tolerant manner and operation had been successfully performed by one of those processes. Obviously all the remaining processes are no longer needed. Another example is [cascade] cancellation when input data becomes unavailable or obsolete or irrelevant in some other sense. This is not quite a "happy path", but not a failure either. This kind of situation is naturally unique to distributed systems and completely ignored by existing programming systems. A process can issue discard notification by "discard" instruction.
Failure notification
This is also obvious and matches with standard exception handling. This is "error occured" path. A process can issue failure notification by "failure" instruction.
Notification handling
Any scope in BML can have notification handlers attached to it. A handler can be as simple as a keyword or as complex as a separate script which will be invoked in case of notification. For example:
$s9 = 9; while($s9 >= 0; OnDiscard: recover): log = "$s9 = " + $s9; if($s9 <= 3): log = "$s9 is low: " + $s9; else_if($s9 <= 6; OnFailure: discard): log = "$s9 is medium: " + $s9; log: before failure df1; Failure: df1;log: after failure df1; else: log = "$s9 is high: " + $s9; $s9 = $s9 - 1; |
Here we are executing a loop with some logic inside and as per that logic, at some point, we issue a failure notification with value "df1" (the value is just for information purposes and will be passed to the notification handler if any). What will happen next is the following:
- The notification signal will "bubble up" to the first enclosing scope which has a corresponding handler defined. In our case it's the same scope as OnFailure handler is defined on else_if scope.
- BML engine will evaluate the handler. In our case OnFailure: discard handler simply remaps one notification type to another. That is, "failure" notification will be "downgraded" to "discard" notification.
- The process will continue to "bubble up" "discard" notification until it found a suitable handler. In our case OnDiscard: recover handler is defined on the enclosing loop
- "Recover" handler means "ignore and continue from the next instruction". This is exactly what engine will do: it will continue loop execution like nothing ever happened.
Should we had no handlers defined, the "failure" notification would propagate up to the top level and the whole process would be stopped and assigned "failure" state. This state would then cascade to all dependent processes and cause then to fail also.
Should we have only OnFailure: discard handler, the outcome would be similar, but less severe as this and all dependent processed would be cancelled (i.e. discarded) instead.
Remapping notification types is, perhaps, the most useful type of error handling. After all, a failure of one process is probably just a cancellation for another etc. Sometimes success of one process means failure or cancellation of another therefore there is no difference whatsoever in the way how these notifications are propagated and handled by the engine..
Available notification handler keywords are: "OnSuccess", "OnDiscard", "OnFailure".
Available default or "remap" handlers are: "success", "discard", "failure", "recover".
Here is an example of script-based notification handler:
# Failure handler example$handler: inline: # does not create new scopeinline: # does not create new scopelog = "handler invoked: " + $$.Argument; # print reason return; # terminate target scopescope(onFailure = $handler): log: before failure failure: reason; # issue failure with reasonlog: after failure |
Here we define our handler script and store it in $handler variable (nested inline instructions are here to illustrate usage of return instruction). Then we execute a scope with handler passed as onFailure handler. Scope generates a failure notification providing reason (reason can be any object or structure). As a result handler script gets executed. The invocation reason is available as $$.Argument system variable.
Effectively, handler is executed in place of the instruction which caused the notification and may perform the following actions:
- Complete normally. This would indicate "recover" resolution. Scope execution would continue after failure command (i.e. in this case we would see both log printouts before and after failure)
- Issue a return instruction. This is the same as return was in the scope code. It would terminate scope execution and return to its parent (i.e. in this case we would see only "before failure" printout). Note that return instruction must be issued from the handler level (i.e. if you define new scopes in your handler then it would be interpreted relative to that scopes). You can use [nested] "inline" instruction, however, as inline does not create any new scopes
- Issue another notification (e.g. another failure). This would escalate to next enclosing handler
Variable scoping
As in most other languages the visibility of variables and their values is controlled in BML by scoping rules. The scope of a variable is a region of code where variable is visible. These regions are usually nested and reflect the overall structure of the source code. This is called "lexical scoping" as the perceived scope of a variable is defined by lexical environment in which variable is defined as opposed to "dynamic scoping" when variable visibility is determined not by its lexical environment but by actual history of nested calls which lead to the current scope block. Both strategies have merits, but most programming languages settle on lexical scope as the easiest to understand by a developer.
BML also uses lexical scoping as default, but allows to use both strategies by providing dedicated scope controls operators discussed later.
Scope transparency
In simplest case, various BML constructs like loops, ifs etc. will create nested scope blocks. For example
$x = 10; # define variable in parent scope if($x > 5): # create child scope $y = $x; # define variable in child scope $x = 0; # access existing variable from parent scope |
Here we define a variable $x outside of if statement and therefore in an enclosing scope. Now since "if" condition evaluates to true, the nested code fragment will be executed. This will create a new nested scope, just like the code block is nested within the "if" tag. Since scopes are "transparent" from the variable access point of view we can assign new value of $x and it will remain so after we exit the nested block. This, again, due to the fact that we've defined $x in enclosing or parent scope. Our assignment of variable $y, however, will not be propagated to enclosing scope because there is no $y variable defined in that scope. Thus it will be defined only in the current scope which is our nested or child scope. So we can use $y within the nested block, but once we exit it, it will disappear. So after "if" is completed we still will have only $x and no $y.
Now if we change our example to
$x = 10; # define variable in parent scope$y; # define variable in parent scope if($x > 5): # create child scope $y = $x; # access existing variable from parent scope $x = 0; # access existing variable from parent scope |
then the effect will be different: Now we have $y also defined in parent scope (even though it has no value) and therefore it will get assigned new value just like $x will.
The general "rule of thumb" about default scoping rules is that if lexically an operator is indented relative to another operator, they will likely be executed in different scopes (more indented will likely be executed in nested scope). Conversely, operators at the same indentation level and members of the same uninterrupted sequence will likely be executed in the same scope
Now what if the situation was reversed and we did not want variable assignment in nested scope to be propagated to enclosing scope? We need a way to make sure that variable is defined (or redefined) in current scope so it would not "leak" to parent. This is the only case when we need variable declaration facility.
Variable declaration (var)
Should we want $y to remain local to the nested scope, we can modify our example to use "var" keyword to [re]define $y in child scope and therefore hide parent's version of the variable:
$x = 10; # define variable in parent scope$y; # define variable in parent scope if($x > 5): # create child scope var $y = $x; # define new variable in child scope $x = 0; # access existing variable from parent scope |
Now we will have the same behaviour as in our first example. Despite the fact that $y exists in the parent scope we've created a new version of it in the child scope and therefore any assignments made within that scope will not be propagated outside it. Instead they will apply only to our local $y.
This effect is often used to prevent name clashes across different scopes. It allows to make sure that any local [temporary] variables we want to use within the block will not interfere with any enclosing blocks. It is a good practice to declare all your temporary variables in such manner.
One BML-specific application of this effect is also called "hiding", but in a more literal sense. In BML variables may be defined, but have no initial value assigned to them (see $y definition in second example above). We can also say that such variables have value "undefined". For most practical purposes undefined variables behave like they don't exist at all. This allows quite literally hide variables from parent scopes in their child scopes. If we redefine an existing variable as "undefined" in current scope then, in most cases, in nested scopes, we can pretend that this variable was never defined at all in the first place.
Of course, there are caveats for such tricks. Variable is, but a single path into a data structure. The same structure can still be accessible through other variables, iterators etc. However, sometimes such an ability to hide stuff comes very handy.
Variable addition (add)
Just as it is possible to have multiple instances of the same variable in different scopes, it is possible to have multiple instances of the same variable in the same scope. This can be used as a form of array of values under the same name. New instances of a variable can be added using "add" keyword. For example:
$s.a: {} # create empty structureadd $s.a.m: m1; # add elementadd $s.a.m: m2; # add elementadd $s.a.m: m3; # add element |
This is equivalent of statically defining
$s.a: m: m1; m: m2; m: m3; |
Since variable addition, like variable assignment, can be done from within nested scopes, addition follows a similar rule: it finds the nearest scope containing first part of the variable name and add variable to that scope. For example
$s.a: {} # create empty structure |
would produce the same result even though each "if" statement would create a new nested scope. Each "add" instruction will find closest visible $s structure and add new value to it.
Important: If no parent structure found, just like in variable assignment case, the first value will be created in the current scope. So if you are going to incrementally collect a number of values over the course of your program execution, please make sure that you set up the collection target structure before executing your program and that it remains visible to all parts which add values.
Variable deletion (del)
If we can add variables then surely we can also delete them from scope using "del" keyword.
$s.a: m: m1; m: m2;del $s.a.m; # delete m2 value |
Important: Note that variable deletion is different from variable "hiding" by declaring them without values. "Hidden" variable will appear to the code as non-existent even when values exist in enclosing scopes (i.e. it will "hide" those values). Variable deletion, on the other hand, is just that: a variable record will be physically removed from its variable store. This may reveal previous value of the same variable if any.
By default, "del" keyword will delete the last value for the given variable name. It is possible, however, to specify additional parameters. For example
del $s.a.m: all; # delete all values in all scopes of the frame |
The following parameters are recognized:
- "last" – delete last occurence in order of definition (default, same as -1)
- "first" – delete first occurrence in order of definition (same as 0)
- "scope" – delete all occurrences in current scope
- "all" – delete all occurrences in current frame (i.e. all scopes)
- positive index – delete occurrence with given index starting from the beginning (0 is the first occurrence, 1 is the second etc.)
- negative index – delete occurrence with given index starting from the end (-1 is the last occurrence, -2 is next to last etc.)
Tips, caveats and known issues
There is currently one case when BML expression syntax may clash with CDM attribute definition syntaxSo how will += operator work? Also the entire left hand side can be an expression. Here is a typical example:
$flag ? $obj.funcA() : $obj.funcB();
Will CDM parser be able to handle this?$x += $y; and the like operators are relatively easy to implement (TODO item) as, by CDM rules, it will be stored as "$x +":"= $y" and compiler can recognize " +" (whitespace and operator) suffix of the key. LHS expressions are already working with the mentioned caveat. It should be possible to also handle cases with parenthesis in unquoted LHS, but it might be not that trivial (also TODO), i.e. when parentheses are used in an expression on the left side. For example, an expression $P.$stdout.putNextValue($msg); used as LHS, i.e. when there is no RHS at all, it's just a stand-alone method call. When used in such manner, CDM parser will confuse parenthesis with entry attribute syntax. To avoid this you can either wrap the whole expression in quotes or move it to the right side. That is:
$P.$stdout.putNextValue($msg); # will confuse CDM parser as tag(attr) |
Future versions of the BML compiler will likely correct this problem.