Fundamentals

Purpose

Behavior Modeling Language (BML) is a semi-compiled, homoiconic, multi-syntax programming language primarily intended for development and subsequent evolution of scalable distributed applications on Atomiton A-Stack platform.

Availability

Atomiton A-stack engine v2 provides limited BML support mostly for demonstration and testing purposes. A-stack engine v3 will provide complete BML VM as it is intended to become the primary programming language of the A-stack.The release schedule is unnecessary hereThis is not a schedule (there are no dates). This allows recognizing whether one can run BML on an a-stack just by looking on the version reported at startup. E.g. only FS can be run on v1, both FS and BML can be run on v2 and only BML can be run on v3

Design philosophy

Complexity types

There are two types of programming complexity we have to deal with:

Essential complexity

This is the complexity inherent to the task at hand. For example, If we were to design a US tax preparation application (e.g. TurboTax) then the internal complexity of such an application cannot be less than the complexity of the US tax code. This is called essential complexity.

Accidental complexity

However, on top of that, we would have complexities imposed by our software development technology (e.g. developing TurboTax in C++ vs. Fortran or Assembler, etc.), complexities of organizing development process, funding etc. This is called accidental complexity.

Dealing with complexity

In most cases, it is possible to greatly reduce or even eliminate accidental complexity. Obviously, it is impossible to eliminate or even reduce essential complexity. However, it is possible to shift complexity from one place to another. This is exactly what programming abstractions do: They wrap complexities and shift them away allowing us to deal with other complexities at higher levels of abstraction. The process is repeatable. Unfortunately, existing programming technologies mostly fall short of applying this principle consistently across the board. They either stop too low and disallow building higher abstractions on top of each other or wraps they impose are too restrictive and prevent free moving from one abstraction level to another. Usually, this manifests itself in performance vs. development cost relation: You can choose relatively low-level programming language (e.g. C++) and get decent performance, but your accidental complexity (and cost of development) will be very high, or you can go with high-level language, if one suitable even exists, of course, but then may end up with a sluggish product with little appeal. At the end of the day, the users of your product don't care how it was designed and built as long as it satisfies their needs.

Behavior modeling

BML attempts to apply complexity shifting principles consistently and across the board starting, of course, with its own architecture and implementation. Hence the word "modeling" in the name. Significant efforts were made to make BML look-and-feel like a regular programming language to the low learning curve as much as possible (i.e. reduce accidental complexity), but "under the hood" things are different. BML is not designed to just define a sequence of operations as any other imperative language. Instead, it will capture your intent to perform those operations and then choose the best way to perform them, hiding all the essential complexity which may arise in an attempt to do so. So despite the looks, what you write as BML program is only a model of some behavior, a sequence of operations we intend to perform in reaction to some external stimuli. When you deploy and run your code BML system with instantiate your models, perhaps multiple instances of the same model, and only then evaluate them, each in its own environment. Each instance of the same model may and most certainly will execute a different set of instructions and in a different order, but all those sequences will adhere to your original intent expressed in your source code.

Terminology

Everything in BML is a data structure unless and until we decide to execute is as a program.
Following the same complexity shifting principle, all BML internals are built on top of a single data structure called ML, which stands for Map+List. As the name suggests it possesses qualities of both Maps and Lists and hide all the complexities of deeply nested data structure manipulations compounded with concurrent access and other shared data access problems. Comprehensive ML description is outside of scope of this document, but we will use some of the terminology provided by ML.

Entries

As ML provides properties of Maps it repurposes some of the standard HashMap implementation details. That is a ML is a collection of entries where each entry holds its key, value and attribute references. Almost any line of BML code will end up as an entry stored within some ML.

X: Y # this will be stored as ML entry with key "X" and value "Y"

Instructions

Once we decide to evaluate our data structure as executable code, each entry at the top level of the data structure will be considered an instruction. Each instruction performs a certain action within BML engine.

$x: 5; # this instruction will store number 5 in variable x

Architecture

Two-level syntax parsing

The laws of technical evolution dictate that any system should evolve from a loose set of components to partially collapsed and then fully collapsed states. However, fully collapsed mono-systems cannot evolve any further at the same system level and thus full collapse is not advisable here. Also, it is practically impossible to cramp all the disparate and dynamic sets of features into a single language. Therefore BML supports two or more level syntax nesting with shared structural syntax as the top level syntax. That is, top level shared syntax defines the general structure and nested structure elements may employ different nested syntaxes. For example:

Data: X = a + 3;


Here structure element "Data" defines sub-element "X" which, in turn, uses expression syntax "= a + 3" as its value.
The top-level syntax is handled by Code & Data Markup (CDM) parser and nested expression syntax is handled later by an expression parser. The same definition can be written with explicit quotes as

Data: X: "= a + 3";


or JSON-like

"Data":{"X":"= a + 3"};


In other words, using CDM as top-level syntax we can make structure definition to look like a familiar program source. Obviously, the same principle can be applied to elements with attributes, so snippet

if(true): v = a + 3;


looks very similar to a Python source, for example. In reality, it is equivalent to the following XML

<if condition="true"> <v>= a + 3;</v>
</if>don't understand - is it same as python or it is different than python? What does $ means here?This is CDM spec. The example is to illustrate how data is stored in result structure by comparing representation of the same data as CDM vs. XML. I removed $ as it is irrelevant here


Use of markers and sigils

Many languages use explicit variable declarations to distinguish variable names from other language constructs and keywords. However, given that we must be able to freely mix the executable code with complex data structure definitions (and for other reasons which will be discussed later), explicit variable declaration quickly becomes very taxing and cumbersome. Instead, BML uses sigils (and other decorations). That is, in the snippet below

Data:
X: a # an element with a string literal Y = $a; # assignment from a variable


"X" and "Y" are structure elements, "a" is a string literal and "$a" is a variable distinguished from a literal by it's sigil "$".

Plug-in languages

Presence of a general purpose configurable secondary parser with smart token support allows for a plug-in language hierarchy. That is, a new language can be defined by providing a grammar and a transpiler from custom language AST to the base language. An opaque token in parent language can be interpreted as a nested child language construct by a smart token which wraps nested language parser. Various smart token implementations and transpilers can be deployed on the system as any other component.

Code and Data Markup (CDM) syntaxCan we have document in the form of json

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
So easy to understand and follow.
It is 7 Pages document and more than sufficient.
As mentioned above, CDM syntax is the top level structural syntax for all BML sources and data structure definitions.

Design objectives

  • Ability to define data structures just as easy as executable code because in our case they are one and the same. (This pretty much rules out most of "standard" serialization formats like JSON and YAML due to inability to preserve order of keys and to deal with multiple values for the same key. On a surface, XML is capable of representing this kind of information, but it is way too verbose, restrictive and cumbersome for this purpose)
  • Be human developer friendly and at least resemble "normal" programming language
  • Be flexible enough to deal with a very wide variety of data elements including fragment in different languages (e.g. URL/URN/URI [escaped] formats, XML/JSON fragments etc.)
  • Look declarative, but allow for incremental structure composition and sub-structure reusability. (Reusability is offered by more than one format like YAML or OpenDDL, but you can only assign labels to existing structures thus making true components impossible as all your "parts" must also remain explicitly present in the overall structure under some keys. At the end of the day, there is no escaping the fact that source text is, but a program to some parser VM of how to generate result structure out of the source. There is really no reason why we can't have local variables in such a program as long as it does not look too imperative and fits into the same syntax)
  • Be compact and efficient enough to be used in communications between components


Parsing model

One very desirable property of a parser used in communications with remote systems is an ability to parse an input which is given as a sequence (a stream) of chunks and perhaps never even composed as a whole document. (This is how communications actually occur over network protocols like TCP). For the parser it means that it never sees the whole document at once, only one chunk at a time and thus, obviously, cannot use any lookahead or any other parsing technique which would require looking at parts of the document not present in parser's memory.
Therefore CDM parsing is based on recognition of separators rather than indicators. So basically, parser scans the input string until it recognizes a separator which is one- or two- character sequence. Once it happens, parser will consult with its current state to determine what this separator means in this state if anything (e.g. end of key, end of value, end of input etc.) and then it will take accumulated piece of text (in most cases between two consequent separators) and store it into its result structure making standard type conversions if necessary (e.g. numeric or boolean values will be converted from text into correspondent java objects).

CDM syntax elements

  • Character escaping mechanism is pretty much the same as in YAML and other commonly used formats and programming languages:
    • Backslash character '\' starts an escape sequence
    • All of YAML escapes are supportedCan we list all. If we are using YAML as base, then only we can refer like this. Otherwise confusing somewhere comparison with python and other places with YAML.We can add it later. It's an extensive list and I did not want to spend time on it
    • You can escape any character by placing backslash character in front of it (e.g. \, will escape immediately following comma). Escaped characters lose their special meaning, if any, and always considered a part of the text, never a separator. This allows to use single-character separators in any strings by escaping them (e.g. tag(attr: (paren)): … here attribute value is "(paren)" string where last closing parenthesis is escaped to prevent confusing it with the end-of-attribute-list separator which is also a closing parenthesis ')')
    • Unlike YAML, however, CDM treats all quoted and unquoted strings (except multiline blocks which are described below) the same way and process escapes everywhere (again, except multiline text blocks).
  • Quoted strings (whether single- or double-quoted) are considered just that, strings. Any separator sequences found within quotes are not considered as such, but just a text (e.g. tag(atr: "a, b, c") here atr value is "a, b, c" as commas with spaces are not considered separators, compare this with tag(atr: a, b, c) where atr has a value of "a" and there are two more attributes with default keys and values "b" and "c"
  • Like YAML and unlike Python, only spaces can be used in indentation. You're free to use any whitespace characters in any other places except indentation. New line \n character is used as entry terminator (except multiline blocks) therefore its usage as whitespace is somewhat limited.
  • Like YAML, CDM allows for two types of layout:
    • Block layout: all structure is expressed via indentation. This is the default layout. Block layout is started with indentation and terminated by line break character (i.e. \n) or end of the document
    • Flow layout: all structure is expressed via curly and/or square braces like in JSON. New line character has no special meaning except as a whitespace in flow layout
  • List of single-character separators (which are mostly used in flow layouts):
    • ( left parenthesis – starts flow layout of attributes
    • ) right parenthesis – closes flow layout of attributes
    • { opening curly bracket – start flow layout of a mapping (associative array)
    • } closing curly bracket – closes flow layout of a mapping
    • [ opening square bracket – start flow layout of a sequence (regular array)
    • ] closing square bracket – closes flow layout of a sequence
  • List of two-character separators (in text below '^' character is used in place of "begin of input" i.e. a "virtual character which precedes first character of the input" and '' is used in place of a whitespace character i.e. space, tab or new line in some cases). Caret return \r character is ignored everywhere except quoted strings. Some sequences have more than one representation all of which are listed under the same bullet point. Generally whitespace is allowed around and within separator sequences i.e. key1=>key2… can be written as key1=>key2… and ": can be written as ": i.e. with space or tab between quote and colon. Obviously, this excludes cases when whitespace is a part of the separator to begin with (like colon+space :_). Parser always recognizes space, tab and newline as whitespace (unless confined within continuous text and surrounded with other non-separator characters) and makes sure that they are (or not) used as part of separator sequences. Other exotic whitespace (e.g. vertical tab \v) may not be recognized as such in all cases. Newline is considered whitespace as long as its placement cannot be interpreted as entry terminator (again, CDM uses indentation and line breaks to recognize nesting structure and entry termination points). So here is the separator sequences list:
    • ^# (# is the first input character) or _# (whitespace and #) Start of a block comment string. Comments may be placed at the end of any string or on a separate string. Comment string is terminated by its end-of-the-line (i.e. \n) character. Flow-like comments (e.g. c-like //) are not supported, but support can be added later
    • =: (equal sign and colon). Alternatives are ": (closing double quote and colon), ': (closing single quote and colon) and :_ (colon and whitespace). This separator adds new key to the result. For example

      Key1=:Value1 # "canonical" formKey2:_Value2 # "usual" form: space or tab after colon is required!"Key3":Value3 # JSON-like form

    • -: (minus sign and colon). This separator set the value of an existing key or adds new key to the result if not present. This allows to "override" values of existing keys. For example

      Key1-:Value2 # reset existing Key1 to Value2 Key4-:Value4 # add non-existing Key4 with Value4

      Key1=>Key2=>Key3=:Value3

      Key1: Key2: Key3: Value3

    • -> (minus sign and right angle bracket). Traverse existing nested key (or add if not defined). For example, given Key1..Key3 structure above

      Key1->Key2=>Key3: Value31

      will result in

      Key1: # existing Key1 is traversed by '->'Key2:Key3: Value3Key2: # new Key2 is added by '=>'Key3: Value31

      A "mnemonic rule" if you like, is that single dash (in -: or ->) means use existing entry and double dash (in =: or =>) means always add new entry.
      Attributes can also be defined "along the way" if required, e.g.What if => used for existing entry?'=>' will create another value for the same key (i.e. another nested structure) while '->' will traverse same key into existing nested structure

      K30->K302(xmlns: NS2)->K330: V330 # add attribute to existing K302

    • ; (semicolon and whitespace) or , (comma and whitespace). These can be used as value terminators or entry terminators when there is no value given. For example,

      Key1:;

      or simply

      Key1;

      create entries with no value. No value entries can be used to "hide" existing entries in data structures or simply used for better readability when used in executable code. They also can be used when multiple entries are specified on the same line. For example

      Key1: Value1; Key2: Value2 # space or tab after semicolon is required!

      Key1: Value1 Key2: Value2

      Although, generally, this style is not recommended as it's harder to read especially when defining data structures, sometimes it makes sense when writing code and, of course, we need commas and semicolons in flow layout like when we define attributes, for example:

      key(atr1: val1, atr2: val2) # space or tab after comma is required!

      or

      key(atr1: val1; atr2: val2) # space or tab after semicolon is required!

  • As it was shown above, attributes are given using flow layout in parenthesis after their tag.
  • Like in YAML, CDM allows usage of labels (or anchors in YAML's terminology) so labeled parts of the structure can be reused elsewhere. For example

    Key1: &val1 Value1

    and

    Key2: *val1

    Here &val1 is a label definition and *val1 is label usage. As a result, Key2 will be assigned Value1.
    Note that labels are references so the Key2 will share the same exact instance of Value1 with Key1. This is especially important when sharing structures between multiple entries.
    Obviously, if you want to specify *val1 as a value then you need to either quote the whole string (i.e. "*val1") or escape leading star (i.e. *val1).

  • YAML's "copy" operator (<<(smile) is not supported
  • Unlike YAML, CDM allows empty keys.

Entries with empty keys are parsed and processed as any other entry, but not stored in the result. You can use empty key values with labels as "local variables" to define reusable parts of a structure which will only exist during parse time and can be assigned (or not) to some other keys. For example

: &val07 # define no-key structure with a labelKey071: Key171: Value171Key072: Value072 ...Key08: *val07 # assign labeled structure to a key

You can also define no-key with attributes:

(a1: A1; a2: A2): &val1 Value1

Since tag attributes are stored just as regular entries with special "isAttribute" flag so the expression above is really defining a nested structure with attribute entries and mixed content entry of Value1 (just like it would do in XML, for example). When you use label *val1 elsewhere the reference to this whole structure along with attributes and mixed content will be assigned as a value.
Obviously, using no-key entries without labels is rather pointless as they will be simply discarded

  • Unlike YAML, CDM does not support structures as keys nor any YAML-like %TAG, !TAG, !!TAG or other directives or any "canonical" YAML formats. For the sake of compatibility, however, YAML's document separators "---" are parsed and ignored as we have no use for them
  • JSON-like flow layout can be used anywhere e.g.

    Key40: # block layout 'A':'a\tb', # JSON-like entry 'B': # JSON-like flow layout {B1: null,B2: '',B3: &vb3 3, # still can use labels and omit quotesB4: true,B5: 'a\u000Fb', 'B6':'text', B7=:7 },'C': # JSON-style array [ 25, 'c2', {CB1: cb1,CB2: 'cb2',CB3: *vb3, }, FALSE, # value conversion is case-insensitive 'c4', ],'D':[aaa,[{x: []}],bbb,]'EO':{},'EA':[],'EAO':[{}],'EAA':[[]],'EAA2':[[],[]],'EAO2':[{},{}],'EAA3':[[],[],[],],'EAO3':[{},{},{},],'EAAOA':[[],{},[]],'EAOAO':[{},[],{}],'EAAOAO':[[],{},[],{}],'EAOAOA':[{},[],{},[]]

  • Like in YAML long value strings can be broken up by escaped line breaks which will be removed from the result, e.g.

    322: Value \ 322\ AAA \ BBB

    will be parsed as

    322: Value 322 AAA BBB

    Note how space after "Value_" is preserved and "AAA" is surrounded by spaces (YAML calls this "folding": indentation defined by the first continuation line is stripped and then [escaped] new line characters are removed).

  • Like in YAML multiline text can be defined via text header. For example (_ characters denote spaces)

    LastText: >+ __ __This is __multiline __ __text __with bullet ____* bullet 1 ____* bullet 2 ____* bullet 3 __list __and __ __last line __ __

    will be parsed as

    LastText: |2+ __ __ __This is multiline __text with bullet ____* bullet 1 ____* bullet 2 ____* bullet 3 __list and __last line __ __

    i.e. the text was folded ('>' indicator) and trailing lines ('+' chomp indicator) preserved. The same text is rendered in JSON as

    "LastText":"\n\nThis is multiline\ntext with bullet\n * bullet 1\n * bullet 2\n * bullet 3\nlist and\nlast line\n\n\n"

    Note that as of time of this writing neither comments nor escapes are not processed within text blocks. Future versions may extend text header syntax with flags to allow such processing.

    Special provisions

    In order to make look-and-feel of CDM close to mainstream programming languages and to convey imperative semantics, CDM provides the following syntax extensions:
  • =_ (equal sign and space) or := (colon and equal sign) separators. This is equivalent to canonical =: separator with special treatment:
    • In case when value is a primitive, it is stored as a string with equal sign as the first character. That is

      $x = $y;

      "$x":"=$y"

    • In case when value is a nested structure a special flag is stored in the result structure to convey the assignment semantic

Parser will make an obvious exception in case when equal sign is immediately preceded with a character used in comparison operations like "==", ">=" or "<=" so text like if(x >= y) will be parsed correctly as if($: "x >= y") where "$" is the default attribute name.

  • Parser will take into account opening and closing brackets situated outside of quoted strings. This allows to reduce the need for escaping. For example

    if(g() + h()): ...

    Does not need escaping of right parenthesis in nested function calls even though it is a attribute flow terminator in CDM. Please note however, that this is done based on simple bracket counting (i.e. the number of opening brackets must match the number of closing brackets) and does not take into account bracket type (e.g. "({])" sequence would still be considered a correct bracket sequence).Does not make sense to use (, { and [ as same.this is for performance reasons. it should be sufficient for most if not all cases. proper bracket matching would make parser noticeably more complex with little gain in correctness

    Automatic document format recognition

    Same principle apply as with JSON and XML. Unless message/content type is explicitly specified (you can specify Content-Type: cdm or application/cdm etc.), engine will try to recognize content type by leading and trailing symbols of the text (at least two characters are tested in each case):
  • XML content is always enclosed between '<' and '>' characters (or their escaped form)
  • JSON content is always enclosed in '{' and '}' or '[' and ']' characters (engine does not support stand-alone JSON primitivesMakes it hard where to draw the line between YAML, JSON and pythonI don't understand. they are completely unrelated to each other technologies so it should be obvious.)
  • CDM, by itself, does not impose any containing symbols therefore, by convention, engine will recognize CDM content by leading comment symbol '#' followed by either a whitespace or another '#' (that is, recognizer will check at least two leading characters). This is why most of the examples in this document start with empty comment line. YAML document separator '---' can be used as well, but it makes no sense when you compose your document from multiple fragments via templating. Comment line is a much safer choice as comments can be placed anywhere in the document and concatenation of multiple fragmentsLot of terms like block, fragment should be defined first before using it.block relates to block layout and is explained (same as in YAML). fragment is just a fragment of text. Not sure what's unclear here., each starting with a comment, is still a valid and coherent document. It is recommended to use this comment line for its intended purpose and put some actual comment e.g. information about following content. It's OK to give an empty comment though. This is especially important when communicating over websocket where content type headers are not available. Without leading comment line, your content will most likely be recognized as plain text instead of CDM (unless configured otherwise in pipeline properties).
  • Conversely, CDM encoder will output leading empty comment line by default (unless explicitly configured otherwise). As a "cherry on the cake", when you log or print your structure, CDM strings are usually clearly visible because of this: each CDM string value will start with '#' string which would not happen if the value was an actual structure. This is beneficial as it gives you a clue how your structure is composed even when engine does text to CDM conversions automatically. Neither XML nor JSON does not provide such clarity as nested structure vs. string value usually render the same way.don't get the point here. Can we have an example here.point is to be able to see whether some nested value is a string or a nested structure which. for example, is impossible in XML or JSON output created by TP through text substitution of the value. Sure we can add examples.
  • By default CDM encoder will try to minimize output size and select escaping vs. quoting based on estimated output length. It means that only when your string contains 2+ offending characters it is more likely be quoted rather than escaped. That is, you can see find(only: abc\, xyz): ... instead of find(only: "abc, xyz"): ... because escaped abc\, xyz string is shorter than quoted "abc, xyz" even if by only one character. Arguably, however, the latter is more readable or at least looks more familiar. When you convert your sources to CDM you might want to make a pass and change some of the escaped strings to quoted ones for better readability.
  • One thing to remember is that before any conversion engine will always trim the text from both ends. With JSON or XML this usually does not cause any problems as the content is always wrapped into something. With CDM, however, last entry with trailing empty lines like the one above can get trimmed even before any conversion takes place and thus trailing lines lost despite "preserve trailing lines" '+' chomp indicator. This has nothing to do with CDM itself (it will preserve lines when given complete input), but rather with how engine process text. I think such situations are rare, but you can always add trailing [empty] comment line to prevent trimming in the first place. (We can also change engine implementation to preserve complete input, but I don't think it's urgent. Please let me know if you think otherwise.)

Comparison to other formats

XML

All features are supported except processing instructions which can be added later if required. Naturally, there is no need for weird XML quirks like:

  • Necessity of a root tag (this was never really a requirement with our XML parser, but it had imposed some extra weirdness like root tag stripping in Include)
  • Parsed entities like "&" (For a few limited exceptions from the main rule, all CDM separators are two-character combinations and thus almost never require any escaping)
  • Unlike XML we support things that XML does not:
  • You can have multiple attributes with the same name just as with keys
  • Attribute value can be a structure, e.g. you can have MyKey(MyAttr1: {Key1: Val1, Key2: Val2}, MyAttr2: [A1, A2]);
  • Attributes may have default keys/names which, in turn, can be omitted. Compare if(condition: true): … vs. if(true): … where "condition" is attribute name.
  • Explicit empty container as a value (e.g. MyKey: {} or MyKey: []) or null value (e.g. MyKey: null) or no value at all (e.g. MyKey(wink).

Note that null value differs from no value the same way as null differs from undefined in JavaScript. Keys which have no values behave the same way as absent keys. In combination with multiple values for the same key it allows to "hide" elements from existing structures.

JSON

For all and any practical purposes CDM is a superset of JSON, meaning that any JSON string can be given as a value for any key in CDM. At the whole structure level the same requirement remains: input JSON string must represent a container (i.e. either map or array). That is, technically speaking, a stand-alone string or a number is a valid JSON string and can be parsed by a JSON parser, but obviously, it cannot be expressed as an ML (which is a container) in any obvious way, hence it's not supported.

YAML

In most common cases CDM and YAML are comparable and compatible (to the extent that you can set your language to YAML in notepad++ and get a reasonable syntax highlight). It is recommended that you glance over YAML specification (see[ |http://yaml.org/]{+}http://yaml.org/+). Also there are plenty of examples on the Net. There are some important differences however:

  • Despite almost identical look-and-feel, parsing concept of CDM is totally different that of YAML (see YAML spec for more details). YAML parser assumes that all input text is given upfront and therefore it can use lookahead. All our new parsers, on the other hand, are streaming parsers, meaning that the source string can be fed to a parser in chunks of arbitrary size in multiple independent calls while parser state is preserved between calls. Such property of a parser is very much desired in environments like ours when message can come over a TCP pipeline in multiple chunks. Ability to parse input stream incrementally allows to avoid message aggregation facilities altogether for the benefit of less memory consumption and better performance.

Thus while YAML parser looks for leading indicators which designate keys and values, CDM parser looks for trailing separators instead and only needs to look back if ever.

  • CDM supports node attributes and multiple values for the same keys which YAML does not.
  • For the sake of compatibility CDM also support YAML's block sequence (aka array) syntax with "-" array item prefix. CDM is lenient enough to ignore extra indentation of array element items which seems to be used in some YAML sources despite that fact that strictly speaking it is a violation of indentation rules (i.e. that increased indentation always means nested container). That is, IndentedArray below will be parsed the same way as RegularArray


RegularArray: - # same indentation level as array keyExplain: H6DLPriority: 8 -Explain: 7WCYPriority: 2 -Explain: QUZOPriority: 0IndentedArray: - # additional indentation will be ignoredExplain: H6DLPriority: 8 -Explain: 7WCYPriority: 2 -Explain: QUZOPriority: 0


and result in the following structures:

RegularArray: Explain: H6DLPriority: 8RegularArray: Explain: 7WCYPriority: 2RegularArray: Explain: QUZOPriority: 0IndentedArray: Explain: H6DLPriority: 8IndentedArray: Explain: 7WCYPriority: 2IndentedArray: Explain: QUZOPriority: 0


Thus, If you actually want to define a nested array, you need to do it explicitly like in YAML (i.e. start next indentation level within the array):

NestedArray: - # enclosing array element - # nested array element (increased indentation level)Explain: H6DLPriority: 8 - # nested array elementExplain: 7WCYPriority: 2 - # nested array elementExplain: QUZOPriority: 0


This is how the same structures are rendered in JSON:

"RegularArray": [ {"Explain":"H6DL","Priority":8 }, {"Explain":"7WCY","Priority":2 }, {"Explain":"QUZO","Priority":0 } ],"IndentedArray": [ {"Explain":"H6DL","Priority":8 }, {"Explain":"7WCY","Priority":2 }, {"Explain":"QUZO","Priority":0 } ],"NestedArray": [ [ {"Explain":"H6DL","Priority":8 }, {"Explain":"7WCY","Priority":2 }, {"Explain":"QUZO","Priority":0 } ] ]


Given that we support multiple values for the same key which is how we represent arrays, I'm not sure if this syntax is very useful, (using explicit keys is, well, more explicit and recommended), but you can use it if you want to.

Python

CDM allows for almost identical coding style except that keyword parameters (e.g. condition in if(condition): ...) must be enclosed in parentheses (compare to python's if condition : ...). CDM does not support some of Python syntax features like triple quotes as it uses more flexible YAML-like syntax to work with multi-line strings.Unnecessary and should be moved to appendixthen the whole CDM spec should be moved. I'm not sure if this is a good idea though as CDM is a base syntax for BML. I don't see the point.

CDM Takeaways

  • Whitespace matters!
  • All separators except brackets are two-character sequences often including a whitespace character (e.g. space)
  • Only spaces can be used for indentation
  • Quotes can be omitted almost everywhere
  • When in doubt about whether some text can be unexpectedly parsed as multiple entries do enclose it in quotes
  • Vast majority of parsing errors are caused by unbalanced quotes or bracketscan we have simple processor to give the error at line numberit will give you the line number and position in case of error. Diagnostic can be and will be improved later.
    This is just to point out most often reasons of errors so you would know what to check first