Package dev.toonformat.jtoon


package dev.toonformat.jtoon
JToon - Token-Oriented Object Notation encoder for Java.

Overview

This package provides a Java implementation of the TOON (Token-Oriented Object Notation) format, a compact, human-readable data format optimized for Large Language Model (LLM) contexts. TOON achieves 30-60% token reduction compared to JSON while maintaining readability and structure.

Core Components

Public API

  • JToon - Main entry point for encoding Java objects to TOON format
  • EncodeOptions - Configuration options for encoding (indent, delimiter, length marker)
  • Delimiter - Enum for array/tabular delimiter options (comma, tab, pipe)

Encoding Pipeline

Utility Classes

Usage Examples

Basic Encoding

import dev.toonformat.jtoon.JToon;
import java.util.*;

record User(int id, String name, boolean active) {}

User user = new User(123, "Ada", true);
String jtoon = JToon.encode(user);
// Output:
// id: 123
// name: Ada
// active: true

Tabular Arrays

record Item(String sku, int qty, double price) {
}

record Order(List<Item> items) {
}

Order order = new Order(List.of(
        new Item("A1", 2, 9.99),
        new Item("B2", 1, 14.5)));

String jtoon = JToon.encode(order);
// Output:
// items[2]{sku,qty,price}:
// A1,2,9.99
// B2,1,14.5

Custom Options

import dev.toonformat.jtoon.*;

// Use tab delimiters and length markers
EncodeOptions options = new EncodeOptions(2, Delimiter.TAB, true);
String jtoon = JToon.encode(data, options);

// Or use builder-style methods
EncodeOptions opts1 = EncodeOptions.withIndent(4);
EncodeOptions opts2 = EncodeOptions.withDelimiter(Delimiter.PIPE);
EncodeOptions opts3 = EncodeOptions.withLengthMarker(true);

Type Conversions

The library automatically normalizes Java-specific types for LLM-safe output:

  • Numbers: Finite values in decimal form; NaN/Infinity → null; -0 → 0
  • BigInteger: Converted to Long if within range, otherwise string
  • BigDecimal: Preserved as decimal number
  • Temporal types: Converted to ISO-8601 strings (LocalDateTime, Instant, etc.)
  • Optional: Unwrapped to value or null
  • Stream: Materialized to array
  • Collections: Converted to arrays
  • Maps: Converted to objects with string keys

Format Features

Indentation-Based Structure

Uses YAML-like indentation (default 2 spaces) for nested objects.

Tabular Arrays

Arrays of uniform objects with primitive values are encoded in CSV-like tabular format, declaring field names once in the header and then streaming rows.

Smart Quoting

Strings are quoted only when necessary (contains delimiters, special characters, looks like keywords/numbers, etc.). This minimizes token usage.

Delimiter Options

Arrays and tabular rows support three delimiters:

  • Comma (default): Implicit in headers - items[3]: a,b,c
  • Pipe: Explicit in headers - items[3|]: a|b|c
  • Tab: Explicit in headers - items[3\t]: a\tb\tc

Length Markers

Optional # prefix for array lengths to emphasize count vs index: items[#3] instead of items[3].

Architecture

Encoding Pipeline

  1. Normalization: JsonNormalizer converts Java objects to JsonNode
  2. Encoding: ValueEncoder recursively encodes JsonNode to TOON
  3. Output: LineWriter accumulates formatted lines

Design Principles

  • Single Responsibility: Each class has one clear purpose
  • Immutability: Configuration objects are immutable records
  • Utility Classes: Static utility classes with private constructors
  • Modern Java: Leverages Java 17 features (records, switch expressions)

Performance Considerations

  • Tabular format detection is O(n×m) where n = rows, m = fields
  • String validation uses precompiled regex patterns
  • StringBuilder for efficient string concatenation
  • No reflection in hot paths (relies on Jackson's object mapper)

Thread Safety

All public API methods are thread-safe. Internal classes are stateless utility classes or immutable configuration objects. The Jackson ObjectMapper instance is shared and thread-safe.

See Also

Since:
0.1.0
  • Class
    Description
    Configuration options for decoding TOON format to Java objects.
    Delimiter options for tabular array rows and inline primitive arrays.
    Configuration options for encoding data to JToon format.
    Main entry point for encoding and decoding TOON (Token-Oriented Object Notation) format.
    Path expansion mode for decoding dotted keys.