Home → JSON to Avro Schema Generator

JSON to Avro Schema Generator

Generate Apache Avro schema for Kafka and Hadoop pipelines.

About This Tool

Generate Apache Avro schema for Kafka and Hadoop pipelines. This tool runs entirely in your browser — no data is ever sent to a server. Free to use, no account required.

What Apache Avro Is

Avro is a compact, fast binary serialization format that stores data alongside its schema — ideal for big data and streaming pipelines.

Avro Schema

An Avro schema is written in JSON and defines the structure: record name, namespace, and a list of fields each with a name and type. The schema is required to read or write Avro binary data.

Avro vs JSON

Avro binary data is 3–10x smaller than equivalent JSON because field names are stored in the schema rather than repeated in every record. It is also significantly faster to encode and decode, making it the preferred format for Kafka and Spark pipelines.

Converting JSON to Avro Schema

The tool generates a complete Avro schema from your JSON sample, handling type mapping and nullable field detection automatically.

Type Mapping

JSON types map to Avro types: string → string, integer → int or long, decimal → float or double, boolean → boolean, object → record, array → array of the element type. The tool chooses the most appropriate Avro type based on actual values.

Nullable Fields

In JSON every field is implicitly nullable. In Avro, nullable fields require a union type: ["null", "string"]. The tool automatically generates union types for any field that contains a null value in your sample data.

Frequently Asked Questions

What is Apache Avro?

Apache Avro is a binary serialization format widely used in big data ecosystems like Apache Kafka, Hadoop, and Spark. Unlike JSON, Avro data is compact (field names are stored in the schema, not repeated in every record), strongly typed, and supports schema evolution. A separate Avro schema file (written in JSON) describes the structure of the data.

Why convert JSON to Avro?

Avro is significantly more compact and faster to parse than JSON, making it ideal for high-volume data pipelines. If you are setting up a Kafka topic or Hadoop ingestion pipeline and have sample JSON data, converting it to an Avro schema is the first step. The generated schema tells your Avro serializer how to encode and decode the data.

What is the difference between an Avro schema and Avro data?

An Avro schema is a JSON document that describes the structure: field names, types, and defaults. Avro data is the actual binary-encoded payload using that schema. This tool generates the schema from your JSON. To produce Avro binary data, you feed the schema and actual values to an Avro library in Python, Java, or Go.

How does Avro handle nullable fields?

Avro does not allow null by default. To make a field nullable, its type must be a union: ["null", "string"] means the value is either null or a string. The tool detects null values in your JSON and automatically generates the correct union type — one of the most important details when converting JSON to Avro.

JSON to Avro Schema Conversion

Apache Avro is a compact binary serialization format used in Kafka, Hadoop, and Spark pipelines. Avro schemas define the structure and types of data, enabling schema evolution with backward compatibility.

JSON Input → Avro Schema (.avsc)

// JSON Sample
{"userId": 1, "name": "Alice", "email": "alice@example.com", "active": true}

// Avro Schema
{
  "type": "record",
  "name": "User",
  "namespace": "com.example",
  "fields": [
    {"name": "userId", "type": "int"},
    {"name": "name",   "type": "string"},
    {"name": "email",  "type": "string"},
    {"name": "active", "type": "boolean"}
  ]
}

Avro Type System

JSON Type	Avro Type	Notes
string	"string"
integer	"int" or "long"	int=32-bit, long=64-bit
float	"float" or "double"
boolean	"boolean"
null	"null"	Usually in union: ["null","string"]
nullable field	["null", "string"]	Union with null for optional
array	{"type":"array","items":"string"}
object	{"type":"record",...}	Nested record definition

Using Avro with Python

# pip install avro-python3 fastavro
import fastavro, json
from io import BytesIO

schema = fastavro.parse_schema(json.loads(avsc_string))
records = [{"userId": 1, "name": "Alice", "email": "a@b.com", "active": True}]

buf = BytesIO()
fastavro.writer(buf, schema, records)
buf.seek(0)
for record in fastavro.reader(buf):
    print(record)

JSON to Avro Type Mapping

Avro has a richer type system than JSON. The converter maps each JSON primitive type to the appropriate Avro type, and handles nullable fields using Avro unions.

JSON Type	Avro Type	Avro Schema
string	string	{"type":"string"}
integer	int or long	{"type":"int"}
float/double	float or double	{"type":"double"}
boolean	boolean	{"type":"boolean"}
null	null	{"type":"null"}
object	record	{"type":"record","fields":[...]}
array	array	{"type":"array","items":...}
null or string	union	["null","string"]

// Input JSON
{"name": "Alice", "age": 30, "email": "alice@example.com", "active": true}

// Generated Avro Schema
{
  "type": "record",
  "name": "Root",
  "namespace": "com.example",
  "fields": [
    {"name": "name",   "type": "string"},
    {"name": "age",    "type": "int"},
    {"name": "email",  "type": "string"},
    {"name": "active", "type": "boolean"}
  ]
}

Using Avro with Apache Kafka

Avro is the standard serialization format for Kafka with Schema Registry. The schema is registered once and referenced by ID in every message, keeping messages compact while maintaining full type safety.

# Python with confluent-kafka and fastavro
from confluent_kafka.avro import AvroProducer
from confluent_kafka import avro

schema_str = open('user.avsc').read()
value_schema = avro.loads(schema_str)

producer = AvroProducer({
    'bootstrap.servers': 'localhost:9092',
    'schema.registry.url': 'http://localhost:8081'
}, default_value_schema=value_schema)

producer.produce(
    topic='users',
    value={"name": "Alice", "age": 30, "email": "alice@example.com", "active": True}
)
producer.flush()

Explore more tools: All JSON Tools | Validator | Pretty Print | JSON Diff