“a transformational reinterpretation of CSV” — self-proclaimed father of modern CSV
A minimal, typed, human-readable, machine-parseable tabular data format.
Designed for clarity, lossless read/write, and wide compatibility.
_ represents missing or undefined value — valid anywhere a value is expectedSuperCSV files use the .supr extension.
A SuperCSV file MUST begin with a version declaration:
((SuperCSV v1.0))
Rules:
SuperCSV is matched using ASCII case-insensitive comparison.v1.0.A SuperCSV file consists of:
Comment lines and blank lines may appear anywhere after the version directive line.
UTF-8 without BOM.
[, ], ,, or a value); whitespace that is part of an unquoted string value is preserved: in the header row is ignored_Only ASCII space (U+0020) and ASCII tab (U+0009) are treated as structural whitespace. These characters may appear around commas, around :, and at the edges of unquoted fields, and are trimmed or ignored according to the normal parsing rules. All other Unicode whitespace characters (for example U+00A0 NO-BREAK SPACE or U+2003 EM SPACE) are not structural and are never skipped by the parser.
Some characters are invisible or visually space-like at value edges and are hard to review reliably in unquoted text. These characters are allowed only when they are unambiguously part of the value:
Jean Paul is valid and the NBSP is part of the value.Any character in the canonical edge-invalid set is invalid at the leading or trailing edge of an unquoted string. To represent such content at either edge, the value must be written as a quoted string.
Canonical edge-invalid set:
U+000B, U+000C, U+0085, U+00A0, U+1680, U+2000–U+200A, U+2028, U+2029, U+202F, U+205F, U+3000U+200B, U+200C, U+200D, U+2060, U+FEFFThese characters remain valid inside quoted strings and remain valid inside unquoted strings when they are internal to the value rather than at either edge.
Unquoted empty fields are invalid in v1.0 (and also reserved).
(Implementations may, through options, allow unquoted empty fields to be treated as null to aid with reading legacy CSV files. This behaviour is outside the v1.0 specification.)
Identifiers are used for:
Grammar:
identifier ::= [A-Za-z0-9][A-Za-z0-9_-]*
Rules:
_ and -_ is reserved as the null literal and cannot be used as an identifier or enum nameIdentifiers are not a value type.
Name
UserId
HTTPStatus
x
x1
snake_case
foo_
kebab-case
1Name
2026Data
6set
3dModel
_ # reserved null literal
Name # leading whitespace
Name # trailing whitespace
First Name # internal whitespace
"Name" # quoted
Name! # invalid punctuation
Columns are separated by commas.
Whitespace around commas is ignored.
Trailing commas are not permitted.
Newlines are row delimiters and terminate the current row, except when a newline appears after a comma and before the next value (i.e., between fields).
Newlines inside a quoted string are a valid part of that value.
SuperCSV supports comments (human-facing) and metadata syntax (machine-facing), in both line-level and inline (field-level) forms.
Comments and metadata syntax never change row or column structure.
Comments are for human readability only. They do not affect validation, decoding, encoded output, or data meaning.
Metadata is reserved syntax in v1.0. The (( … )) form is recognised as metadata syntax so positional and structural rules can be applied correctly. In v1.0, only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block is recognised syntax in v1.0, but has no defined meaning.
In v1.0, comments and non-version metadata are part of the file syntax and are recognised during parsing, but they are not part of the decoded data model. They do not affect validation, data meaning, or encoded output, and implementations must not preserve or emit them.
The metadata form is reserved for future versions, which will define metadata rules.
v1.0 defines the syntax and positioning rules for comments and metadata up front so every implementation parses the format the same way. Comments are allowed in v1.0. Metadata syntax is recognised as part of that same parsing model, but only the version declaration (((SuperCSV v1.0))) has defined metadata meaning in v1.0. This keeps the parser stable now and leaves a clear path for future versions to add metadata rules without redesigning the core format.
Non-version metadata has no defined meaning in v1.0, but its positional and structural rules are defined in advance so future versions can introduce metadata in a consistent and backward-compatible way. If metadata is added in later versions, it will follow these established rules.
Examples and tests may include non-version metadata blocks to verify parser correctness under the defined positional and structural rules. Such cases do not represent meaningful metadata or any v1.0 semantic behaviour.
Three line-level forms are supported:
# — line comment( … ) — line comment(( … )) — line metadata syntaxA line is treated as a line-level comment or metadata when it meets one of the following criteria after trimming leading ASCII structural whitespace (space and tab):
# … — if the first non-whitespace character on the physical line is #, the line is a line-level comment regardless of other content after #.( … ) — if, after trimming leading and trailing ASCII structural whitespace, the physical line consists of only a single ( … ) block, it is a line-level comment. If additional non-whitespace content appears outside the block, the line is a data or header line with an inline prefix comment — not a line-level comment.(( … )) — if, after trimming leading and trailing ASCII structural whitespace, the physical line consists of only a single (( … )) block, it is treated as line-level metadata syntax. If additional non-whitespace content appears outside the block, the line is a data or header line with an inline prefix metadata block — not line-level metadata syntax.Leading and trailing ASCII structural whitespace around a standalone line comment or line metadata block is ignored when determining whether the physical line is a line-level comment or metadata-syntax line. Internal whitespace inside the block is preserved exactly, subject to the normal block-content rules.
Line comment:
A line comment may take two forms:
# some text
(some text)
Style recommendation:
# for section breaks or prominent comments.( … ) for softer, less intrusive comments, especially on continuation lines.Line metadata syntax:
((SuperCSV v1.0))
Rules:
(( … )) metadata blocks follow defined positional rules aligned with comment placement rules where applicable, but only the version declaration (((SuperCSV v1.0))) has defined metadata meaning.Inline comment blocks relate to a single field value in data rows, or to a single HeaderField in header rows.
Form:
( some text )
Examples (all valid):
42 (approx)
(approx) 42
() 42
Inline metadata (( … )) blocks follow the same structural and positional rules as the corresponding inline comment form where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other inline metadata block is recognised in v1.0, but has no defined meaning.
Inline comment blocks cannot appear:
Inline comments never change field count, container shape, or type structure. The same restrictions apply to metadata blocks.
( begins an inline comment block.(( begins an inline metadata block.) closes ( … ))) closes (( … ))( appearing before the block's closing delimiter makes the block invalid.Allowed: All characters except those listed below.
Forbidden inside ( … ) and (( … )):
()These characters terminate or invalidate the block.
Comments relate to either rows or fields:
Constraints:
( … ).Metadata syntax (( … )) is recognised using the same positional rules where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Non-version metadata blocks are recognised in v1.0, but they do not define row-level or field-level metadata semantics.
A blank line is empty or contains only whitespace.
Blank lines are ignored.
The first non-blank, non-comment line is the header row.
HeaderField ::= Identifier ":" Type
HeaderRow ::= HeaderField { "," HeaderField }
Type names are always written in lowercase.
Whitespace around : and , is ignored.
Headers may span multiple lines for readability. If a header line ends with a comma (optionally followed by whitespace), the next line continues the same logical header row.
Rules:
, (optionally followed by whitespace) continues to the next physical lineHeaderRow# … and ( … )) may appear between header continuation lines and are skipped using the normal line-level classification rule; they relate to the next header field (see Comment Scope). Unlike data rows, both line-level comment forms (# … and ( … )) are permitted in headers.Metadata syntax (( … )) is recognised in the same structural positions as comments where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block in a header is recognised syntax, but has no defined meaning.
# Wide header definition split across lines
Id:int,
Name:string,
Tags:list<string>,
Scores:arr<float>[3],
Status:enum<pending,active,done>,
Notes:string
1,Alice,[work,urgent],[9.5,8.0,7.5],active,Needs review
Equivalent to:
Id:int,Name:string,Tags:list<string>,Scores:arr<float>[3],Status:enum<pending,active,done>,Notes:string
1,Alice,[work,urgent],[9.5,8.0,7.5],active,Needs review
Column names identify fields in the header definition.
Rules:
_ and -Column types are defined in SuperCsvTypeTable v1.0.
The header row must reference only types defined in that table.
Type aliases are defined in the Type Table (see Type Aliases in SuperCsvTypeTable v1.0).
ScalarType, EnumType, and container types are defined as follows:
int, float, decimal, bool, string, bytes<hex>, bytes<b64>, date, time, datetime, datetimetz, timestamp, duration, timezone, uuid)enum<...>list<T> and arr<T>; container types MUST NOT nest (i.e., T cannot itself be a list or array)Unquoted strings are allowed only if, after trimming leading and trailing whitespace, the resulting value contains none of:
, # [ ] ( ) < > { }" ' ` ; : = ?/ \ | @Unquoted strings may contain internal spaces. Any leading or trailing whitespace in an unquoted string is always trimmed. If leading or trailing whitespace must be preserved, a quoted string must be used.
In addition, an unquoted string must not begin or end with any character from the canonical edge-invalid set defined in Invisible and visually space-like characters inside values. Those characters are allowed inside quoted strings and when internal to an unquoted string, but they must be quoted if they appear at either edge.
Quoted strings use the standard double-quote form:
"value"""""After trimming leading and trailing whitespace:
_ (the null literal)"" is a valid empty string (distinct from null)Must be valid literals for their declared type.
Numeric values must not be quoted.
Null literal _ allowed for all numeric items.
Allowed forms are defined in the type table.
Null literal _ allowed for Lists.
Basic syntax:
[item1,item2,item3]
Rules:
[] (only for dynamic-size lists)[] is invalid because the value must match the declared size_ allowed as an itemElement type: T must be a ScalarType or EnumType.
Dynamic-size list:
list<T>
Fixed-size list:
list<T>[N] # N > 0
Value-level prefix (non-empty only):
[N][item1,item2,item3]
Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.
Null literal _ is allowed for Arrays.
arr<T>[N])[1,2,3,4]
arr<T>[R,C])[[1,2,3],[4,5,6],[7,8,9]]
arr<T>)[1,2,3] # valid 1D
[[1,2],[3,4],[5,6]] # valid 2D (rectangular)
Rules:
[] (only for dynamic-size arrays)[] is invalid because the value must match the declared size_ allowed as elementarr<T>) may contain either 1D or 2D values; 2D arrays must be rectangularElement type: T must be a ScalarType or EnumType.
Fixed-size:
arr<T>[N] # N > 0
arr<T>[R,C] # R > 0 AND C > 0
Dynamic-size:
arr<T>
Value-level prefix (non-empty only):
[N][1,2,3]
[R,C][[1,2],[3,4]]
Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.
An EnumType is defined by enum<...> in the header.
Each EnumItem inside the brackets is either:
redE=red or 1=redAn EnumValue in a data row may be either the name or the value of an EnumItem.
EnumValues in data rows must match the name or declared value of one of the column’s EnumItems.
The null literal _ represents missing data and is not a defined EnumValue.
_ is allowed as a field value_ cannot appear as an EnumItem name_ is not included in the EnumType’s allowed EnumValuesenum<item1,item2,...>
item ::= Identifier | Identifier=Identifier
0 and 42_ is not a valid EnumValue_ may appear instead of an EnumValueWhen a data row field is resolved against an EnumType:
This guarantees deterministic decoding even when values are duplicated.
Header:
Color:enum<0=low,1=medium,2=high>
Valid EnumValues in rows:
low, medium, high0, 1, 2Header with identifier values:
Color:enum<L=low,M=medium,H=high>
Valid EnumValues in rows:
low, medium, highL, M, HLiteral:
_
Represents missing or undefined value.
Valid anywhere a value is expected.
Never quoted.
Not allowed as an EnumItem name, EnumItem value, or type name.
Prefix form is not permitted for empty or fixed-size containers in SuperCSV v1.0.
Forbidden patterns:
[0][] → INVALID (zero-count prefix is never allowed)[0][1,2,3] → INVALID (prefix count does not match element count)[R,0][…] or [0,C][…] → INVALID (declares zero total elements via a zero dimension)Valid empty container:
[] → VALID (only the bracket form may represent an empty list/array)For fixed-size containers, [] is invalid because the value must match the declared size.
Each row must:
Whitespace-only rows are considered blank lines (see Blank Lines) and are ignored.
Invalid rows cause a parse error.
Data rows may span multiple physical lines when a newline appears after a comma and before the next value (i.e., between fields), and in this case the newline is treated as whitespace and does not terminate the row.
Newline inside a quoted string, is always part of that string and allowed and preserved.
A newline appearing after a value (outside a quoted string) terminates the row. If the row terminates before all columns are present, the row is invalid.
Within a data row (trailing comma pending):
( … ) may appear between continuation lines; they are skipped using the normal line-level classification rule and relate to the next field (see Comment Scope)# … comments are not permitted within data rows — # is a section-break marker and would be misleading mid-rowBetween data rows (no trailing comma pending):
( … ) line relates to the next row as a whole, not to the first field(comment) value, …Metadata syntax (( … )) is recognised in the same structural positions as comments where applicable, but in v1.0 only the version declaration (((SuperCSV v1.0))) has defined metadata meaning. Any other metadata block in a data row is recognised syntax, but has no defined meaning.
# Multi-line row examples (valid)
Name:string, Age:int
Bob, 35
# trailing comma continuation example
Dan,
43 (completes row)
# inline metadata syntax example (metadata has no v1.0 meaning)
Mark,56 (comment) (( meta ))
Equivalent to:
Name:string, Age:int
Bob,35
Dan,43
Mark,56
# Multi-line row examples (structurally valid with interleaved lines)
Name:string, Age:int
Bob, 35
# standalone comment ok, previous blank line ok and ignored
Dan,
(comment line between continuation lines relates to the next field)
(( metadata syntax between continuation lines; no v1.0 meaning ))
43
# standalone metadata syntax after terminating line - allowed structurally
(( standalone metadata syntax after terminating line ))
# Multi-line row examples (invalid)
Name:string, Age:int
John (invalid - newline after value and before continuation comma)
, 35 (not reached)
# recognised metadata syntax on Bob line; still no v1.0 meaning
Bob (( meta tbc )),
(( recognised metadata syntax here, but invalid because no data follows the continuation ))
# Comment scope examples plus metadata-syntax placement examples
Name:string, Age:int, Score:float
# row-level comment (no trailing comma pending — relates to next row)
(( row-level metadata syntax uses the same placement rule, but has no v1.0 meaning ))
Alice,
(field comment relates to Age) 30,
(( field metadata syntax )) 95.5
# first-field comment must be inline
(field comment) Bob, 25, 88.0
# blank line mid-continuation does not change field association
Carol,
(relates to Age despite blank line below)
40,
99.9
# Comment and metadata scope examples (invalid structural cases)
Name:string, Age:int
# invalid — # comment mid-data-row
Alice,
# not permitted mid-row — only ( … ) comments are allowed within data rows
30
# invalid — two comments on same field
Bob,
(first comment)
(second comment) 25
# invalid — two metadata blocks on the same field position
Carol,
(( meta1 ))
(( meta2 )) 35
Container values (list<T> and arr<T>) may span multiple physical lines for readability. A newline is permitted at specific positions within the container literal and is treated as whitespace — it does not terminate the row.
A newline appearing outside a permitted position within a container terminates the row. If the row terminates before the container is closed, the value is invalid.
]) continues the same container value[ continues the same container value] of an inner array (i.e. when still inside the outer container) continues the same container value] terminates the row normally# Multi-line container examples (valid)
Name:string, Tags:list<string>, Matrix:arr<int>
Alice,
[
work,
urgent
],
[[1,2],[3,4]]
Bob, [a,b,c], [
[1,2],
[3,4]
]
# Multi-line container examples (invalid)
Name:string, Tags:list<string>
Alice, [work
,urgent] (invalid - newline after value, not after comma)
Bob, [
# comment not allowed inside container
work, urgent
]
Validators MUST emit each error as a three-field SuperCSV row:
Line:int, ErrorSection:string, ErrorMsg:string
Physical CSV line number (1-based), including blank and comment lines.
Identifies the header, row or column and, if applicable, the position inside its value.
Valid forms (v1.0):
Price — scalar columnTags(4) — 4th element of a list or 1D arrayMatrix(2,3) — element at row 2, column 3 of a 2D arrayheaderErr — header-level error (literal keyword)rowErr — row-level error (literal keyword)Rules:
Human-readable message derived from the error code’s template.
8, Price, "invalid int value: 'abc'"
14, Tags(4), "invalid enum label: 'blueish'"
19, Scores(1), "expected 3 elements, got 2"
12, Matrix(2,3), "invalid int value: '/'"
3, Price, "int values must not be quoted"
5, Tags, "container values must not be quoted"
8, Matrix, "expected shape [3,3], got [3,2]"
9, rowErr, "expected 5 columns, got 6"
2, headerErr, "invalid identifier: ' Name'"
A complete SuperCSV file with typed headers and two data rows:
((SuperCSV v1.0))
Name:string, Score:int, Flags:list<bool>, Matrix:arr<int>, Level:enum<0=low,1=medium,2=high>
Ras, 42, [true,false,true], [[1,2],[3,4]], medium
Alex, 29, [1,0,1], [[5,6],[7,8]], 2
Additional examples covering all types and features are in the examples/ directory.
To be completed in a future editorial pass.
To be completed in a future editorial pass.
To be completed in a future editorial pass.
The formal grammar for SuperCSV v1.0 is defined in:
grammar.ebnf
This file provides a machine-readable EBNF definition of the SuperCSV syntax and serves as a cross-check for parser implementations.
All built-in types are defined in:
SuperCsvTypeTable v1.0
This file is the canonical, versioned registry of types.