Specification · Type Table
Overview Core File Rules Extension Version Declaration File Structure Encoding Whitespace Lexical Constructs Identifiers Delimiters Comments & Metadata Comment Scope Blank Lines Header Definition Syntax Multi-Line Headers Column Names Type References Value Definition String Values Numeric Values Boolean Values List Values Array Values Enum Values Null Container Prefixes Row Definition Row Validation Multi-Line Rows Multi-Line Containers Error Model Example Conformance External References

SuperCSV v1.0 Specification

“a transformational reinterpretation of CSV” — self-proclaimed father of modern CSV

A minimal, typed, human-readable, machine-parseable tabular data format.
Designed for clarity, lossless read/write, and wide compatibility.

Overview

Core File Rules

Extension

SuperCSV files use the .supr extension.

Version Declaration

A SuperCSV file MUST begin with a version declaration:

((SuperCSV v1.0))

Rules:

File Structure

A SuperCSV file consists of:

  1. One header row (required)
  2. Zero or more data rows
  3. Zero or more comment lines
  4. Zero or more blank lines

Comment lines and blank lines may appear anywhere after the version directive line.

Encoding

UTF-8 without BOM.

Whitespace

Structural whitespace

Only ASCII space (U+0020) and ASCII tab (U+0009) are treated as structural whitespace. These characters may appear around commas, around :, and at the edges of unquoted fields, and are trimmed or ignored according to the normal parsing rules. All other Unicode whitespace characters (for example U+00A0 NO-BREAK SPACE or U+2003 EM SPACE) are not structural and are never skipped by the parser.

Invisible and visually space-like characters inside values

Some characters are invisible or visually space-like at value edges and are hard to review reliably in unquoted text. These characters are allowed only when they are unambiguously part of the value:

Any character in the canonical edge-invalid set is invalid at the leading or trailing edge of an unquoted string. To represent such content at either edge, the value must be written as a quoted string.

Canonical edge-invalid set:

These characters remain valid inside quoted strings and remain valid inside unquoted strings when they are internal to the value rather than at either edge.

Empty fields

Unquoted empty fields are invalid in v1.0 (and also reserved).

(Implementations may, through options, allow unquoted empty fields to be treated as null to aid with reading legacy CSV files. This behaviour is outside the v1.0 specification.)

Lexical Constructs

Identifiers

Identifiers are used for:

Grammar:

identifier ::= [A-Za-z0-9][A-Za-z0-9_-]*

Rules:

Identifiers are not a value type.

Examples (valid)
Name
UserId
HTTPStatus
x
x1
snake_case
foo_
kebab-case
1Name
2026Data
6set
3dModel
Examples (invalid)
_            # reserved null literal
 Name        # leading whitespace
Name         # trailing whitespace
First Name   # internal whitespace
"Name"       # quoted
Name!        # invalid punctuation

Delimiters

Columns are separated by commas.
Whitespace around commas is ignored.
Trailing commas are not permitted.

Newlines are row delimiters and terminate the current row, except when a newline appears after a comma and before the next value (i.e., between fields).
Newlines inside a quoted string are a valid part of that value.

Comments and Metadata

SuperCSV supports comments (human-facing) and metadata syntax (machine-facing), in both line-level and inline (field-level) forms.
Comments and metadata syntax never change row or column structure.

Comments are for human readability only. They do not affect validation, decoding, encoded output, or data meaning.

Metadata is reserved syntax in v1.0. The (( … )) form is recognised as metadata syntax so positional and structural rules can be applied correctly. In v1.0, only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block is recognised syntax in v1.0, but has no defined meaning.

In v1.0, comments and non-version metadata are part of the file syntax and are recognised during parsing, but they are not part of the decoded data model. They do not affect validation, data meaning, or encoded output, and implementations must not preserve or emit them.

The metadata form is reserved for future versions, which will define metadata rules.

Design Intent (for explanation)

v1.0 defines the syntax and positioning rules for comments and metadata up front so every implementation parses the format the same way. Comments are allowed in v1.0. Metadata syntax is recognised as part of that same parsing model, but only the version declaration (((SuperCSV v1.0))) has defined metadata meaning in v1.0. This keeps the parser stable now and leaves a clear path for future versions to add metadata rules without redesigning the core format.

Non-version metadata has no defined meaning in v1.0, but its positional and structural rules are defined in advance so future versions can introduce metadata in a consistent and backward-compatible way. If metadata is added in later versions, it will follow these established rules.

Examples and tests may include non-version metadata blocks to verify parser correctness under the defined positional and structural rules. Such cases do not represent meaningful metadata or any v1.0 semantic behaviour.

Line-Level Forms

Three line-level forms are supported:

A line is treated as a line-level comment or metadata when it meets one of the following criteria after trimming leading ASCII structural whitespace (space and tab):

Leading and trailing ASCII structural whitespace around a standalone line comment or line metadata block is ignored when determining whether the physical line is a line-level comment or metadata-syntax line. Internal whitespace inside the block is preserved exactly, subject to the normal block-content rules.

Line comment:

A line comment may take two forms:

# some text
(some text)

Style recommendation:

Line metadata syntax:

((SuperCSV v1.0))

Rules:

Inline Forms (Field-Level)

Inline comment blocks relate to a single field value in data rows, or to a single HeaderField in header rows.

Form:

( some text )

Examples (all valid):

42 (approx)
(approx) 42
() 42

Inline metadata (( … )) blocks follow the same structural and positional rules as the corresponding inline comment form where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other inline metadata block is recognised in v1.0, but has no defined meaning.

Forbidden Placements

Inline comment blocks cannot appear:

Inline comments never change field count, container shape, or type structure. The same restrictions apply to metadata blocks.

Bracketing Rules

Allowed / Forbidden Characters Inside Blocks

Allowed: All characters except those listed below.

Forbidden inside ( … ) and (( … )):

These characters terminate or invalidate the block.

Comment Scope

Comments relate to either rows or fields:

Constraints:

Metadata syntax (( … )) is recognised using the same positional rules where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Non-version metadata blocks are recognised in v1.0, but they do not define row-level or field-level metadata semantics.

Blank Lines

A blank line is empty or contains only whitespace.
Blank lines are ignored.

Header Definition

The first non-blank, non-comment line is the header row.

Syntax

HeaderField ::= Identifier ":" Type
HeaderRow   ::= HeaderField { "," HeaderField }

Type names are always written in lowercase.

Whitespace around : and , is ignored.

Multi-Line Headers

Headers may span multiple lines for readability. If a header line ends with a comma (optionally followed by whitespace), the next line continues the same logical header row.

Rules:

Comments in headers

Metadata syntax (( … )) is recognised in the same structural positions as comments where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block in a header is recognised syntax, but has no defined meaning.

Example
# Wide header definition split across lines
Id:int,
Name:string,
Tags:list<string>,
Scores:arr<float>[3],
Status:enum<pending,active,done>,
Notes:string
1,Alice,[work,urgent],[9.5,8.0,7.5],active,Needs review

Equivalent to:

Id:int,Name:string,Tags:list<string>,Scores:arr<float>[3],Status:enum<pending,active,done>,Notes:string
1,Alice,[work,urgent],[9.5,8.0,7.5],active,Needs review

Column Names

Column names identify fields in the header definition.

Rules:

Type References

Column types are defined in SuperCsvTypeTable v1.0.
The header row must reference only types defined in that table.
Type aliases are defined in the Type Table (see Type Aliases in SuperCsvTypeTable v1.0).

ScalarType, EnumType, and container types are defined as follows:

Value Definition

String Values

Unquoted strings are allowed only if, after trimming leading and trailing whitespace, the resulting value contains none of:

Unquoted strings may contain internal spaces. Any leading or trailing whitespace in an unquoted string is always trimmed. If leading or trailing whitespace must be preserved, a quoted string must be used.

In addition, an unquoted string must not begin or end with any character from the canonical edge-invalid set defined in Invisible and visually space-like characters inside values. Those characters are allowed inside quoted strings and when internal to an unquoted string, but they must be quoted if they appear at either edge.

Quoted strings use the standard double-quote form:

After trimming leading and trailing whitespace:

Numeric Values

Must be valid literals for their declared type.
Numeric values must not be quoted.
Null literal _ allowed for all numeric items.

Boolean Values

Allowed forms are defined in the type table.

List Values

Null literal _ allowed for Lists.

Basic syntax:

[item1,item2,item3]

Rules:

Element type: T must be a ScalarType or EnumType.

Dynamic-size list:

list<T>

Fixed-size list:

list<T>[N]       # N > 0

Value-level prefix (non-empty only):

[N][item1,item2,item3]

Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.

Array Values

Null literal _ is allowed for Arrays.

1D array (arr<T>[N])

[1,2,3,4]

2D array (arr<T>[R,C])

[[1,2,3],[4,5,6],[7,8,9]]

Dynamic-size array (arr<T>)

[1,2,3]                     # valid 1D
[[1,2],[3,4],[5,6]]         # valid 2D (rectangular)

Rules:

Element type: T must be a ScalarType or EnumType.

Fixed-size:

arr<T>[N]    # N > 0
arr<T>[R,C]  # R > 0 AND C > 0

Dynamic-size:

arr<T>

Value-level prefix (non-empty only):

[N][1,2,3]
[R,C][[1,2],[3,4]]

Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.

Enum Values

Enum terminology

An EnumType is defined by enum<...> in the header.

Each EnumItem inside the brackets is either:

An EnumValue in a data row may be either the name or the value of an EnumItem.

EnumValues in data rows must match the name or declared value of one of the column’s EnumItems.
The null literal _ represents missing data and is not a defined EnumValue.

Grammar

enum<item1,item2,...>
item ::= Identifier | Identifier=Identifier

Rules

Lookup and Decoding Rule

When a data row field is resolved against an EnumType:

  1. Name match is checked first. If the field matches any EnumItem name, that item is selected. Name matches are always unambiguous because names are unique.
  2. Value match is checked second. If no name matches, the field is compared against EnumItem values in declaration order. The first matching item is selected.

This guarantees deterministic decoding even when values are duplicated.

Encoding Rule

Example

Header:

Color:enum<0=low,1=medium,2=high>

Valid EnumValues in rows:

Header with identifier values:

Color:enum<L=low,M=medium,H=high>

Valid EnumValues in rows:

Null

Literal:

_

Represents missing or undefined value.
Valid anywhere a value is expected.
Never quoted.
Not allowed as an EnumItem name, EnumItem value, or type name.

Container Prefix Clarifications

Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.

Forbidden patterns:

Valid empty container:

For fixed-size containers, [] is invalid because the value must match the declared size.

Row Definition

Row Validation

Each row must:

Whitespace-only rows are considered blank lines (see Blank Lines) and are ignored.
Invalid rows cause a parse error.

Multi-Line Rows

Data rows may span multiple physical lines when a newline appears after a comma and before the next value (i.e., between fields), and in this case the newline is treated as whitespace and does not terminate the row.

Newline inside a quoted string, is always part of that string and allowed and preserved.

A newline appearing after a value (outside a quoted string) terminates the row. If the row terminates before all columns are present, the row is invalid.

Rules

Comments in data rows

Within a data row (trailing comma pending):

Between data rows (no trailing comma pending):

Metadata syntax (( … )) is recognised in the same structural positions as comments where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block in a data row is recognised syntax, but has no defined meaning.

Examples (valid)
# Multi-line row examples (valid)
Name:string, Age:int

Bob, 35

# trailing comma continuation example
Dan,
43 (completes row)

# inline metadata syntax example (metadata has no v1.0 meaning)
Mark,56 (comment) (( meta ))

Equivalent to:

Name:string, Age:int
Bob,35
Dan,43
Mark,56
Examples (structurally valid with interleaved lines)
# Multi-line row examples (structurally valid with interleaved lines)
Name:string, Age:int

Bob, 35

# standalone comment ok, previous blank line ok and ignored
Dan,
(comment line between continuation lines relates to the next field)
(( metadata syntax between continuation lines; no v1.0 meaning ))
43

# standalone metadata syntax after terminating line - allowed structurally
(( standalone metadata syntax after terminating line ))
Examples (invalid)
# Multi-line row examples (invalid)
Name:string, Age:int

John (invalid - newline after value and before continuation comma)
, 35 (not reached)

# recognised metadata syntax on Bob line; still no v1.0 meaning
Bob  (( meta tbc )),
(( recognised metadata syntax here, but invalid because no data follows the continuation ))
Comment and metadata scope examples (valid)
# Comment scope examples plus metadata-syntax placement examples
Name:string, Age:int, Score:float

# row-level comment (no trailing comma pending — relates to next row)
(( row-level metadata syntax uses the same placement rule, but has no v1.0 meaning ))
Alice,
(field comment relates to Age) 30,
(( field metadata syntax )) 95.5

# first-field comment must be inline
(field comment) Bob, 25, 88.0

# blank line mid-continuation does not change field association
Carol,
(relates to Age despite blank line below)

40,
99.9
Comment and metadata scope examples (invalid structural cases)
# Comment and metadata scope examples (invalid structural cases)
Name:string, Age:int

# invalid — # comment mid-data-row
Alice,
# not permitted mid-row — only ( … ) comments are allowed within data rows
30

# invalid — two comments on same field
Bob,
(first comment)
(second comment) 25

# invalid — two metadata blocks on the same field position
Carol,
(( meta1 ))
(( meta2 )) 35

Multi-Line Container Values

Container values (list<T> and arr<T>) may span multiple physical lines for readability. A newline is permitted at specific positions within the container literal and is treated as whitespace — it does not terminate the row.

A newline appearing outside a permitted position within a container terminates the row. If the row terminates before the container is closed, the value is invalid.

Rules

Examples (valid)
# Multi-line container examples (valid)
Name:string, Tags:list<string>, Matrix:arr<int>

Alice,
[
  work,
  urgent
],
[[1,2],[3,4]]

Bob, [a,b,c], [
  [1,2],
  [3,4]
]
Examples (invalid)
# Multi-line container examples (invalid)
Name:string, Tags:list<string>

Alice, [work
,urgent]  (invalid - newline after value, not after comma)

Bob, [
# comment not allowed inside container
  work, urgent
]

Error Model

Validators MUST emit each error as a three-field SuperCSV row:

Line:int, ErrorSection:string, ErrorMsg:string

Line

Physical CSV line number (1-based), including blank and comment lines.

ErrorSection

Identifies the header, row or column and, if applicable, the position inside its value.

Valid forms (v1.0):

Rules:

ErrorMsg

Human-readable message derived from the error code’s template.

Error Message Examples
8, Price, "invalid int value: 'abc'"
14, Tags(4), "invalid enum label: 'blueish'"
19, Scores(1), "expected 3 elements, got 2"
12, Matrix(2,3), "invalid int value: '/'"
3, Price, "int values must not be quoted"
5, Tags, "container values must not be quoted"
8, Matrix, "expected shape [3,3], got [3,2]"
9, rowErr, "expected 5 columns, got 6"
2, headerErr, "invalid identifier: ' Name'"

Example

A complete SuperCSV file with typed headers and two data rows:

((SuperCSV v1.0))
Name:string, Score:int, Flags:list<bool>, Matrix:arr<int>, Level:enum<0=low,1=medium,2=high>

Ras, 42, [true,false,true], [[1,2],[3,4]], medium
Alex, 29, [1,0,1], [[5,6],[7,8]], 2

Additional examples covering all types and features are in the examples/ directory.

Conformance

Required Behaviors

To be completed in a future editorial pass.

Forbidden Behaviors

To be completed in a future editorial pass.

Implementation Notes

To be completed in a future editorial pass.

External References

Grammar (Normative)

The formal grammar for SuperCSV v1.0 is defined in:

grammar.ebnf

This file provides a machine-readable EBNF definition of the SuperCSV syntax and serves as a cross-check for parser implementations.

Type Table (Normative)

All built-in types are defined in:

SuperCsvTypeTable v1.0

This file is the canonical, versioned registry of types.