Home → JSON to Avro Schema Generator
Generate Apache Avro schema for Kafka and Hadoop pipelines.
Generate Apache Avro schema for Kafka and Hadoop pipelines. This tool runs entirely in your browser — no data is ever sent to a server. Free to use, no account required.
Avro is a compact, fast binary serialization format that stores data alongside its schema — ideal for big data and streaming pipelines.
An Avro schema is written in JSON and defines the structure: record name, namespace, and a list of fields each with a name and type. The schema is required to read or write Avro binary data.
Avro binary data is 3–10x smaller than equivalent JSON because field names are stored in the schema rather than repeated in every record. It is also significantly faster to encode and decode, making it the preferred format for Kafka and Spark pipelines.
The tool generates a complete Avro schema from your JSON sample, handling type mapping and nullable field detection automatically.
JSON types map to Avro types: string → string, integer → int or long, decimal → float or double, boolean → boolean, object → record, array → array of the element type. The tool chooses the most appropriate Avro type based on actual values.
In JSON every field is implicitly nullable. In Avro, nullable fields require a union type: ["null", "string"]. The tool automatically generates union types for any field that contains a null value in your sample data.
Apache Avro is a compact binary serialization format used in Kafka, Hadoop, and Spark pipelines. Avro schemas define the structure and types of data, enabling schema evolution with backward compatibility.
// JSON Sample
{"userId": 1, "name": "Alice", "email": "alice@example.com", "active": true}
// Avro Schema
{
"type": "record",
"name": "User",
"namespace": "com.example",
"fields": [
{"name": "userId", "type": "int"},
{"name": "name", "type": "string"},
{"name": "email", "type": "string"},
{"name": "active", "type": "boolean"}
]
}
| JSON Type | Avro Type | Notes |
|---|---|---|
| string | "string" | |
| integer | "int" or "long" | int=32-bit, long=64-bit |
| float | "float" or "double" | |
| boolean | "boolean" | |
| null | "null" | Usually in union: ["null","string"] |
| nullable field | ["null", "string"] | Union with null for optional |
| array | {"type":"array","items":"string"} | |
| object | {"type":"record",...} | Nested record definition |
# pip install avro-python3 fastavro
import fastavro, json
from io import BytesIO
schema = fastavro.parse_schema(json.loads(avsc_string))
records = [{"userId": 1, "name": "Alice", "email": "a@b.com", "active": True}]
buf = BytesIO()
fastavro.writer(buf, schema, records)
buf.seek(0)
for record in fastavro.reader(buf):
print(record)
Avro has a richer type system than JSON. The converter maps each JSON primitive type to the appropriate Avro type, and handles nullable fields using Avro unions.
| JSON Type | Avro Type | Avro Schema |
|---|---|---|
| string | string | {"type":"string"} |
| integer | int or long | {"type":"int"} |
| float/double | float or double | {"type":"double"} |
| boolean | boolean | {"type":"boolean"} |
| null | null | {"type":"null"} |
| object | record | {"type":"record","fields":[...]} |
| array | array | {"type":"array","items":...} |
| null or string | union | ["null","string"] |
// Input JSON
{"name": "Alice", "age": 30, "email": "alice@example.com", "active": true}
// Generated Avro Schema
{
"type": "record",
"name": "Root",
"namespace": "com.example",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"},
{"name": "email", "type": "string"},
{"name": "active", "type": "boolean"}
]
}
Avro is the standard serialization format for Kafka with Schema Registry. The schema is registered once and referenced by ID in every message, keeping messages compact while maintaining full type safety.
# Python with confluent-kafka and fastavro
from confluent_kafka.avro import AvroProducer
from confluent_kafka import avro
schema_str = open('user.avsc').read()
value_schema = avro.loads(schema_str)
producer = AvroProducer({
'bootstrap.servers': 'localhost:9092',
'schema.registry.url': 'http://localhost:8081'
}, default_value_schema=value_schema)
producer.produce(
topic='users',
value={"name": "Alice", "age": 30, "email": "alice@example.com", "active": True}
)
producer.flush()
Explore more tools: All JSON Tools | Validator | Pretty Print | JSON Diff