QR structure, fully mapped

Learn: QR Code Structure Guide

This page maps the QR symbol in detail, where symbol means the complete square QR code as a whole: not just the names of QR regions, but the exact placement logic, bit layouts, redundancy rules, and traversal behavior that make the symbol decodable in the real world.

Open Create

Start Here: What a QR Code Actually Is

This guide assumes zero background. A QR code is just a square grid of tiny dark and light cells called modules. Some modules are fixed landmarks that help a camera orient itself. The remaining modules store a message and the redundancy needed to recover that message if part of the symbol is damaged.

A QR code is a grid: the smallest standard QR symbol is version 1, which is 21 by 21 modules. Larger versions add 4 modules in width and height each time.
Some parts never hold your message: finder, timing, alignment, format, and sometimes version information are structural patterns. Collectively these reserved structural regions are called function patterns: fixed parts of the symbol that help scanners locate and decode it, rather than carrying normal message data.
Your message becomes bits: the text or scanner-style payload is converted into bytes, grouped into codewords, protected with Reed-Solomon error correction, and then placed into the open modules.
The final pattern is intentionally scrambled: a mask pattern is one of eight fixed rules that flips selected data modules. Masking means applying that rule so the symbol avoids misleading stripes or blocks that are harder for scanners to read.

One-sentence mental model: a QR code is a carefully structured square where fixed landmarks help the camera, while the open space holds your message plus enough redundancy to survive damage.

Module: a single dark or light square in the grid.
Payload: the actual message string the QR code is meant to carry.
Version: the symbol size. Version 1 is 21x21, version 40 is 177x177.
Codeword: an 8-bit byte-sized chunk in the encoded QR bitstream.
ECC: error correction codewords that let scanners recover missing or damaged data.

A Tiny Example Before the Heavy Details

Keep this beginner example in mind as you read the rest of the page. Every advanced section is explaining one part of this same journey. In QR terms, a mode is the rule that tells the decoder how to interpret the payload, and the mode indicator is the small field that stores that rule at the start of the data stream. UTF-8 is the text-to-bytes encoding rule that turns text into byte values, and byte mode is the QR mode for general byte data, which is why QR Peach uses it for normal text payloads.

Input: HELLO
Payload: HELLO
Bytes: UTF-8 gives 48 45 4C 4C 4F
Header: byte mode adds 0100 + 00000101
Result: fixed patterns + data codewords + ECC codewords + mask.

HELLO in a small byte-mode QR example

mode indicator      0100
byte count (5)      00000101
H                   01001000
E                   01000101
L                   01001100
L                   01001100
O                   01001111

...then terminator bits, pad bits, pad codewords, error correction, and placement into modules.

Payload Types: Start with Plain Text, Then Add Scanner Conventions

Beginners often assume a QR code has separate low-level structures for links, email, Wi-Fi, and contacts. Usually it does not. Most of the time, the QR symbol still stores a plain byte stream. The scanner decides how to act based on the string pattern inside that stream.

Plain text: a payload like HELLO WORLD has no special prefix, so readers simply show the decoded text.
Generic URI: a payload like sip:ada@example.com still uses byte encoding, but scanners can hand the scheme to an app that understands it.
Web link: a payload like https://example.com still uses byte encoding, but scanners recognize URI syntax and offer to open it.
Email or phone: prefixes like mailto: and tel: are just part of the payload text. They tell the scanner which app action to launch.
Wi-Fi, contact card, or event: structured strings such as WIFI:, MECARD:, BEGIN:VCARD, or BEGIN:VCALENDAR package multiple fields into one byte stream.

Generic URI vs URL: a URL is a web-focused URI such as https://example.com. Use URL when the payload is a normal web address. Use Generic URI when the payload uses another scheme such as sip:, otpauth:, bitcoin:, or an app-specific deep link.

Type	Envelope Example	How Scanners Interpret It
Text	HELLO WORLD	No special prefix. The decoded text is displayed directly.
Generic URI	sip:ada@example.com	Readers pass the scheme to an app that recognizes that URI type.
URL	https://example.com	Readers see URI syntax and offer to open the link.
Email	mailto:ada@example.com?subject=Hello	The mailto: prefix triggers compose-email behavior.
Phone	tel:+15551234567	The tel: prefix triggers dial behavior.
SMS	SMSTO:+15551234567:Hello	Readers treat the envelope as a pre-filled text message.
Contact	MECARD:N:Ada Lovelace;TEL:...;	Readers parse the structured contact payload and offer to save it.
vCard	BEGIN:VCARD\nVERSION:3.0\nFN:Ada Lovelace\n...	Readers parse the vCard record and offer to import the contact details.
Event	BEGIN:VCALENDAR\nVERSION:2.0\nBEGIN:VEVENT\nDTSTART;TZID=America/New_York:20260601T090000\n...	Readers parse the iCalendar event record and offer to add it to a calendar.
Wi-Fi	WIFI:T:WPA;S:Cafe WiFi;P:secret123;;	Readers recognize the Wi-Fi envelope and offer network setup.
Geo	geo:37.7749,-122.4194?q=Coffee	Readers pass the coordinates into a map flow.

Important distinction: these are payload conventions, not alternate low-level QR symbol structures. QR Peach serializes these envelopes into UTF-8 byte-mode payloads for visualization. Calendar payloads can use DTSTART;VALUE=DATE for all-day events or DTSTART;TZID=... for local timed events.

From Pixels to Modules: How a Decoder Starts

A module is one square cell in the QR grid: the smallest black-or-white unit the symbol is built from. So yes, visually it is one little square. In the data region, a module usually carries one written bit after masking, but many modules in a QR code are not payload bits at all. Finder patterns, timing patterns, alignment patterns, format information, version information, separators, and the dark module are all made of modules too, but they serve structural jobs instead of carrying message content.

Decoder Pipeline

Before a decoder can talk about payload bytes, ECC bytes, or format bits, it has to complete a more basic job: turn an image into a clean QR module grid. In practice that means decoding the image file, locating the symbol, estimating the grid geometry, sampling each module, reading the format and version metadata, undoing the mask, and only then reconstructing codewords.

That is why debugging rows about image dimensions, quiet zone, contrast, and module pitch are not cosmetic. They describe the evidence the decoder had available before payload parsing could even begin.

Source Bytes and Raster Size

A QR image starts as some encoded file bytes: PNG, JPEG, SVG, or another image format. The decoder first interprets that file and turns it into a raster image, meaning a rectangular pixel grid with a width and height in pixels. File byte length and pixel dimensions are related but not interchangeable. A tiny SVG can scale into a large crisp raster, while a large JPEG can still decode into a blurry image with weak module edges.

Contrast and Thresholding

After decoding the file, the scanner has to decide which pixels count as dark modules and which count as light background. That step depends on contrast. Thresholding is the step where the decoder draws that dark-versus-light boundary. If dark modules are not dark enough, or the background is not bright enough, thresholding becomes unstable and the recovered grid can drift or fragment before payload decoding even starts.

Module Pitch

Module pitch is the estimated pixel width and height of one QR module in the recovered image. A healthy scan usually has enough pixels per module to distinguish dark and light cells cleanly. If pitch is too small, compression and resampling can blur neighboring modules together. If horizontal and vertical pitch differ too much, that often hints at distortion, cropping, or perspective issues.

Quiet Zone as a Scanning Requirement

The quiet zone is the blank margin around the QR symbol. Beginners often think of it as wasted space, but decoders use it to separate the symbol from nearby graphics, text, or borders. If the dark region nearly touches the image edge, the scanner may misread the symbol boundary or fail to detect the finder structure cleanly at all.

Next idea: before a scanner can read any of those payloads, it must first find the symbol, determine orientation, estimate the grid, and compensate for distortion. That is what the fixed landmark patterns are for.

Finder Patterns

Purpose

Three identical 7x7 concentric square patterns placed at the top-left, top-right, and bottom-left corners of every QR symbol. They allow any scanner to detect that a QR code is present, determine the symbol orientation, and establish the module-grid coordinate system before any data is read.

Why Three, Not Four?

Three finder patterns create a unique L-shaped arrangement. Any three squares in those positions unambiguously define position and orientation; the missing fourth corner tells the decoder which way is up. A fourth pattern would create 180-degree rotational ambiguity.

Internal Structure

Each finder pattern is a 7x7 grid with three concentric layers.

Layer 1: 7x7 dark outer border
Layer 2: 5x5 light ring
Layer 3: 3x3 dark center block

The 1:1:3:1:1 Ratio Rule

Any horizontal or vertical scan line crossing a finder pattern produces dark:light:dark:light:dark module widths in the ratio 1:1:3:1:1. This ratio is what scanners search for, and it survives changes in scale, angle, and rotation.

Separator

Each finder pattern is surrounded by a 1-module-wide white separator on the sides facing the interior of the symbol. This prevents the finder from visually merging with nearby data modules. Separators are always white regardless of masking.

Placement Coordinates

Pattern	Top-left Corner	Row Range	Column Range
Top-left	(0, 0)	0-6	0-6
Top-right	(0, size-7)	0-6	size-7 to size-1
Bottom-left	(size-7, 0)	size-7 to size-1	0-6

Quiet Zone

Beyond the required 4-module quiet zone outside the symbol boundary, the internal separators act as an internal quiet buffer that keeps the finder patterns distinguishable from nearby data modules.

Timing Patterns

Purpose

Two alternating dark/light strips, one horizontal and one vertical, let the scanner determine module-grid density and compensate for perspective distortion or skew in the captured image.

Placement

Horizontal timing: row 6, columns 8 through size-9.
Vertical timing: column 6, rows 8 through size-9.

Pattern

The first module of each timing strip is always dark. After that, the strip alternates dark, light, dark, light, and so on. The strip length is size-16 modules.

Why Row and Column 6?

Row 6 and column 6 are fixed across all 40 versions. Because they sit between the finder patterns, scanners know exactly where to look before they know anything version-specific about the symbol.

Length by Version

Version	Symbol Size	Timing Strip Length
1	21x21	5 modules
5	37x37	21 modules
10	57x57	41 modules
20	97x97	81 modules
40	177x177	161 modules

Distortion Compensation

When a QR code is photographed at an angle, modules appear warped. By counting transitions along both timing strips, the decoder can estimate the perspective transform and realign the grid before reading payload data.

Alignment Patterns

Purpose

Small 5x5 concentric patterns placed throughout the data region of larger QR codes. They provide extra reference points for correcting warping and local distortion across the full symbol.

Structure

Border: 5x5 dark outer ring
Inner Ring: 3x3 light ring
Center: 1x1 dark module

Version Dependency

Version 1: no alignment patterns.
Version 2 and above: at least one alignment pattern, with count increasing by version.
Maximum: Version 40 uses 46 alignment patterns.

Center Coordinate Lists

The specification defines a list of center coordinates for each version. All combinations of (row, col) from that list are used as alignment centers, except any that would overlap a finder pattern or its separator.

Version	Center List	Pattern Count
2	[6, 18]	1
7	[6, 22, 38]	5
14	[6, 26, 46, 66]	12
21	[6, 28, 50, 72, 94]	21
40	[6, 30, 58, 86, 114, 142, 170]	46

Overlap Rules

If a computed center would place a 5x5 alignment pattern inside a finder corner region, that center pair is skipped. That is why the number of patterns is always lower than the raw square of the center-list length.

Why They Help

Large QR symbols can accumulate noticeable lens distortion and paper warping by the time the scanner reaches the middle of the code. Alignment patterns let the scanner continuously recalibrate instead of trusting a single global transform.

From Message to Bitstream: What the Data Region Ultimately Holds

What Is the Data Region?

Every module that is not part of a function pattern - finder, separator, timing, alignment, format information, version information, or the dark module - is part of the data region. These modules carry the full payload stream: mode indicator, character count, encoded data, error correction, and remainder bits.

Composition of the Data Bitstream

Before placement, the final bitstream follows a fixed structure. Beginners should read it left to right as: what mode am I in, how long is the payload, what are the payload bits, and what extra bits are needed to finish the QR structure cleanly and safely. A terminator is the short end marker that says the payload is over. Pad bits are zero bits added only to reach the next byte boundary. Pad codewords are full filler bytes added after that if the symbol still has unused data capacity.

Mode
4 bits

Char Count
variable

Encoded Data
variable

Terminator
up to 4b

Pad Bits

Pad CW
EC / 11

ECC
RS bytes

Rem
0-7b

Mode Indicator: 0001 numeric, 0010 alphanumeric, 0100 byte, 1000 Kanji
Character Count: variable length by version range and mode
Encoded Data: the actual payload bits
Closure: terminator, byte padding, pad codewords, ECC, then remainder bits

Beginner Example: HELLO in Byte Mode

For a short byte-mode example in versions 1 through 9, the character count field is 8 bits long. That means the encoder writes the 4-bit byte-mode marker, then an 8-bit byte count, then the UTF-8 bytes for each character.

HELLO example bitstream start

mode            0100
count (5)       00000101
H               01001000
E               01000101
L               01001100
L               01001100
O               01001111

Block Structure and Interleaving

For most versions, data is split across multiple Reed-Solomon blocks before ECC is computed. The final stream interleaves data blocks first, then ECC blocks, then appends any remainder bits. That spreads local damage across many blocks instead of destroying one block completely.

Arrange all data blocks side by side.
Interleave codeword 1 from block 1, block 2, block 3, and so on.
Repeat for codeword 2, codeword 3, and the rest of the data codewords.
Then interleave ECC codewords the same way.
Append any remainder bits at the very end.

Data Module Count by Version

Version	Size	Total Modules	Function Pattern Modules	Data Modules
1	21x21	441	202	208 (= 26 CW x 8)
7	45x45	2,025	393	1,568 (= 196 CW x 8)
20	97x97	9,409	1,265	8,324 (= 1040 CW x 8 + 4 rem)
40	177x177	31,329	2,677	29,648 (= 3706 CW x 8)

Remainder Bits

What They Are

Remainder bits are the final leftover data-region modules that remain after every data codeword and every ECC codeword has already been assigned. They are not payload bits, not padding codewords, and not error-correction bytes. They are simply extra module positions that exist because the usable data-region area of some QR versions is not an exact multiple of 8 bits.

Why They Exist

QR encoding works in bytes for data and ECC codewords, but the matrix is a grid of individual modules. After the function patterns are reserved, the remaining writable modules sometimes total a count like 8n + 3, 8n + 4, or 8n + 7. The QR specification still defines codeword capacity in whole bytes, so those extra 3, 4, or 7 modules cannot become another codeword. They are filled with zero-valued remainder bits instead.

remainder-bit count for a version

usable data modules        modules not reserved for function patterns
total codeword bits        8 * total codewords for that version

remainder bits             usable data modules - total codeword bits

That last subtraction is the whole idea. If a version has exactly enough writable modules for whole bytes, the remainder-bit count is zero. If it has a few extra modules beyond the byte-aligned codeword capacity, those modules become remainder bits.

Worked Example

The page already lists version 20 as having 8,324 data modules, written as 1040 CW x 8 + 4 rem. That means the version can carry 1040 total codewords across data and ECC, which consumes 8,320 modules. Four writable modules are still left over, so the encoder writes four zero remainder bits into them.

Example Version	Usable Data Modules	Total Codewords	Codeword Bits	Remainder Bits
1	208	26	208	0
2	359	44	352	7
14	1,971	246	1,968	3
20	8,324	1040	8,320	4

The important takeaway is that remainder bits are a property of the version's final writable-module count, not of the message, ECC level, or mask pattern. Once you choose a version, the remainder-bit count is fixed.

What Values They Hold

Remainder bits are always zero. There is no choice here and there is no optimization step. After all interleaved data and ECC codewords have been placed, the encoder writes zeroes into the remaining eligible modules. A decoder does not interpret them as payload and does not feed them into Reed-Solomon decoding.

Where They Sit in the Stream

In the logical stream order, remainder bits come after everything else:

mode indicator
character count field
encoded payload bits
terminator and zero padding to byte boundary
pad codewords
ECC codewords
remainder bits

This matters because remainder bits are outside the Reed-Solomon-protected byte stream. They do not belong to any block, they are not interleaved as codewords, and they have no effect on payload recovery.

How They Are Used During Placement

The matrix walk does not need a special path for remainder bits. The encoder performs the same normal zig-zag traversal used for all writable modules. It places interleaved data and ECC bits first. If the traversal still has eligible modules left after the final codeword bit is consumed, those last modules receive zero remainder bits in the same traversal order.

How a Decoder Treats Them

A decoder reconstructs the sequence of writable modules, demasks them, and reassembles codewords until it has read the version's defined number of total codewords. If a few modules remain after that count, the decoder knows they are remainder bits and stops treating them as codeword data. They are effectively ignored after structural validation.

Version Groups

Remainder Bits	Versions
0 bits	1, 7-13, 35-40
7 bits	2-6
3 bits	14-20, 28-34
4 bits	21-27

Common Confusions

Remainder bits are not the same thing as the Reed-Solomon remainder that generates ECC codewords. The names are similar, but they describe different parts of the process.
Remainder bits are not pad codewords. Pad codewords are full bytes inserted before ECC generation; remainder bits are a few trailing modules after all codewords are done.
Remainder bits do not increase capacity. They exist precisely because there are not enough leftover modules to form a full extra byte.
Remainder-bit count depends on version only, not on the message contents.

Format Information

Purpose

Format information tells the decoder two things it must know before reading the payload: the error correction level and the data mask pattern used across the symbol.

The 15-Bit Format String

The format string is 15 bits total: 2 ECC bits, 3 mask bits, and 10 BCH error-correction bits. BCH is a short binary error-correcting code used here to protect small metadata strings like format and version information.

Bits 14-13: ECC level
Bits 12-10: mask pattern
Bits 9-0: BCH redundancy

Where the Mask Option Lives in the Diagram

The mask choice is not stored somewhere else in the QR symbol. It lives inside the 15-bit format string itself. The three orange cells below are the mask-pattern bits, so this is the exact place where the selected mask option is represented.

ECC level bits Mask pattern bits BCH protection bits

example format payload before BCH and XOR mask

ECC level Q         11
mask pattern 5      101

combined 5 bits     11101
stored as bits      [14 13][12 11 10] + 10 BCH bits

The BCH correction bits are computed with generator polynomial x^10 + x^8 + x^5 + x^4 + x^2 + x + 1. The full 15-bit string is then XORed with the fixed mask 101010000010010 so that all-zero and all-one format strings do not occur.

What Masking Means in Plain Language

Masking does not encrypt the message. It only flips selected data modules so the final symbol avoids patterns that are visually awkward for scanners, such as long runs, heavy clumps, or finder-like stripes in the wrong place.

Placement - Two Copies

Both copies encode the same information. If one copy is damaged, the other can still be decoded.

Copy 1 sits next to the top-left finder pattern.
It occupies row 8 columns 0-5, row 8 column 7, row 8 column 8, and column 8 rows 7-0.
Copy 2 is split between row 8 near the top-right finder and column 8 near the bottom-left finder.

Bit Ordering

In Copy 1, bit 14 (MSB, most-significant bit) begins at row 8, col 0 and bit 0 (LSB, least-significant bit) ends at col 8, row 0. In Copy 2, bit 14 begins near col 8, row size-7 and the later bits continue near the top-right strip.

ECC Level Reference

Level	Indicator Bits	Recovery Capacity	Use Case
L	01	~7%	Clean environments
M	00	~15%	General use
Q	11	~25%	Moderate damage tolerance
H	10	~30%	High-damage environments

Mask Pattern Reference

Pattern	Bits	Formula	Visual Effect
0	000	(row + col) mod 2 = 0	Checkerboard
1	001	row mod 2 = 0	Horizontal stripes
2	010	col mod 3 = 0	Vertical stripes
3	011	(row + col) mod 3 = 0	Diagonal stripes
4	100	(floor(row/2) + floor(col/3)) mod 2 = 0	Large blocks
5	101	(rowcol) mod 2 + (rowcol) mod 3 = 0	Sparse dots
6	110	((rowcol) mod 2 + (rowcol) mod 3) mod 2 = 0	Dense texture
7	111	((row + col) mod 2 + (row*col) mod 3) mod 2 = 0	Mixed pattern

Version Information

Purpose

Present only in versions 7-40. Version information encodes the version number as an 18-bit string so decoders can confirm symbol size without counting modules, which matters when large symbols are distorted or partially obscured.

The 18-Bit Version String

The version block is 18 bits total: 6 data bits for the version number and 12 BCH correction bits. The BCH generator used is x^12 + x^11 + x^10 + x^9 + x^8 + x^5 + x^2 + 1, and the code can correct up to 3 bit errors in the version block.

Bits 17-12: version number
Bits 11-0: BCH redundancy

Why Versions 1-6 Do Not Need It

Small symbols are easier to size from the finder patterns and timing lines alone. By the time the decoder sees version 1 through 6, it can infer the grid size reliably enough that a dedicated 18-bit version block would waste valuable space. Versions 7 and above are large enough that explicit confirmation becomes worth the reserved modules.

Example Version Strings

Version	Binary (6-bit)	Full 18-bit String
7	000111	000111 110010 010100
10	001010	001010 011110 111100
20	010100	010100 111011 100100
40	101000	101000 100100 111101

Placement - Two Identical Copies

Both copies encode the same 18 bits. The string is arranged as a 6x3 block read column-by-column, top-to-bottom, left-to-right.

Copy 1: top-right block, rows 0-5 and columns size-11 through size-9.
Copy 2: bottom-left block, rows size-11 through size-9 and columns 0-5.
Copy 2 is effectively the transposed mirror of Copy 1.

Copy 1 Bit Layout

Copy 2 Bit Layout

Zig-Zag Bit Ordering

Overview

Once the payload and ECC bytes exist, the QR encoder must pour those bits into the remaining empty modules. It does that by sweeping right-to-left through 2-column-wide vertical strips, alternating travel direction up and down.

Step-by-Step Walk Algorithm

Step 1: Start at the bottom-right corner of the symbol.

Step 2: Move in a 2-column-wide strip, using the right column first and then the left column at each row.

Step 3: Alternate direction by strip: up, then down, then up again.

Step 4: Skip column 6 entirely because it is the vertical timing pattern.

Step 5: Skip function modules without advancing the bitstream index.

Step 6: Stop when all eligible data modules are filled.

Visual Diagram of Strip Order

direction of column-pair sweep (right to left)

| Strip 4 | Strip 3 | Strip 2 | Strip 1 |
|   UP    |  DOWN   |   UP    |  DOWN   |
| col n-7 | col n-5 | col n-3 | col n-1 |
| col n-8 | col n-6 | col n-4 | col n-2 |
| skip 6  |         |         | start   |

Tiny Placement Example

Imagine codeword 0 begins with bits 1 0 1 1 0 0 1 0. Bit 7 goes into the first visited data module, bit 6 goes into the second, and so on. Only after every bit of every interleaved codeword is placed are any leftover remainder modules filled with zero bits.

Right-Column-First Rule

Within each 2-wide strip, the right column is placed first and the left column second. At a given row, the sequence is right-current, left-current, right-next, left-next.

MSB-First Codeword Placement

Codewords are placed most-significant-bit first. Bit 7 of codeword 0 goes into the first visited data module; bit 0 of the final codeword is placed just before any remainder bits.

Remainder Bits

After all data and ECC codewords are placed, some versions still have a few leftover modules. Those are filled with zero-valued remainder bits and do not carry payload information.

Remainder Bits	Versions
0 bits	1, 7-13, 35-40
7 bits	2-6
3 bits	14-20, 28-34
4 bits	21-27

Why This Order?

Zig-zag placement distributes data across the full area of the symbol. Combined with block interleaving, that means local physical damage corrupts many different blocks a little instead of destroying one block completely, which is exactly what Reed-Solomon correction wants.

Data codewords Error correction Remainder bits Numbers in the live overlay represent bit index within a codeword (7 = MSB, 0 = LSB).

Version and ECC Tradeoffs

A larger version increases module count and capacity. A higher ECC level increases redundancy but reduces space available for payload bytes. Beginners can think of this as a trade between room and resilience.

ECC	Approx. Recovery	Main Tradeoff
L	~7%	Maximum payload capacity, least redundancy.
M	~15%	Balanced default setting.
Q	~25%	Higher resilience, lower payload budget.
H	~30%	Strongest redundancy, smallest payload budget.

ECC and Error Correction

ECC means error correction codewords: extra bytes added to a QR code so a scanner can still recover the message when part of the symbol is damaged, blurred, cropped, dirty, or hard to read. The big idea is simple: the QR does not only store your message. It also stores extra check bytes that describe the message in a recoverable way.

Start With the Big Idea

Imagine writing down a message and then writing down some extra helper notes about that message. If one part of the original message gets smudged, those helper notes can help you reconstruct what was lost. That is what ECC is doing. It is not making a full duplicate copy of the message. It is adding carefully chosen recovery bytes.

In QR, the important working unit is the codeword, which is one byte. Recovery happens at codeword level, not at raw pixel level. First the scanner turns the image back into dark and light modules, then into bits, then into codewords. Only after that does ECC begin helping.

Definitions You Need First

Data codeword: one byte that belongs to the real message stream.
ECC codeword: one byte added only for recovery.
Block: one chunk of codewords that gets its own local ECC protection.
Interleaving: mixing bytes from different blocks together before placement so local damage gets spread out.
Erasure: a byte position the decoder already knows is missing or unreliable.
Demasking: reversing the chosen mask rule so the decoder sees the real written data bits again.

What ECC Actually Protects

A QR symbol contains both fixed structure and message-carrying structure. ECC protects only the payload stream: mode indicator, character count, encoded payload bits, terminator bits, pad structure, and the final data codewords made from those bits. Finder patterns, timing patterns, alignment patterns, and format/version metadata are structural helpers; they are not part of the protected message polynomial.

A Toy Formula Before Real QR Math

Before looking at the real QR formulas, use a tiny toy example. Suppose we store three bytes A, B, and C, then create one helper byte with this rule:

toy recovery formula

P = A xor B xor C

If the stored values are A = 12, B = 5, and C = 9, then:

P = 12 xor 5 xor 9 = 0

Now imagine B is lost, but A, C, and P are still known. You can recover B by rearranging the same rule:

B = P xor A xor C
B = 0 xor 12 xor 9 = 5

QR does not use this exact toy formula. Real QR uses a much stronger system that can recover more complicated patterns of missing or wrong bytes. But the toy example shows the core idea correctly: extra bytes can make lost bytes recoverable.

Version 1 Example	Total Codewords	Data Codewords	ECC Codewords	Blocks
L	26	19	7	1
M	26	16	10	1
Q	26	13	13	1
H	26	9	17	1

Where the ECC Byte Counts Come From

The familiar levels L, M, Q, and H are table choices, not formulas that directly turn a percentage into a byte count. For each version and ECC level, the QR specification already defines the exact number of data codewords, the number of blocks, and the number of ECC codewords per block.

The first exact formula to learn is just a counting identity:

total codewords = data codewords + ECC codewords

For version 1 at level M, the table says:

26 total codewords = 16 data + 10 ECC

Because each codeword is one byte, you can also count bits:

total codeword bits = total codewords * 8
data bits          = data codewords * 8
ECC bits           = ECC codewords * 8

Using version 1 M again:

total bits = 26 * 8
data bits  = 16 * 8
ECC bits   = 10 * 8

Those are exact QR counts, not toy math. They tell you how much space belongs to message bytes and how much space belongs to recovery bytes.

As versions grow, the structure becomes more complex. Many versions split the payload across several blocks, and some versions use two block groups of slightly different data lengths. The encoder does not invent those counts on the fly; it looks them up from the QR version/ECC tables, then computes recovery bytes separately for each block.

A Simple Damage Formula

When people ask, “How much damage can ECC fix?”, the first exact rule is this:

guaranteed unknown-codeword correction per block = floor(ECC codewords in that block / 2)

The word floor just means “round down.” So if a block has 10 ECC codewords:

floor(10 / 2) = 5

That means the block can guarantee recovery from up to 5 unknown bad codewords in that block. If the decoder already knows which positions are unreliable, those are called erasures and the trade becomes more favorable.

How to Apply the Idea Across a Full Symbol

Now scale the simple ideas up from one toy group of bytes to a full QR symbol:

1. Choose version and ECC level: this fixes the total byte budget and the recovery budget.
2. Build the data bytes: mode bits, count field, payload bytes, terminator, zero padding, and pad codewords become the data codewords.
3. Split into blocks: some QR symbols use one block, others use many.
4. Compute ECC bytes per block: each block gets its own recovery bytes.
5. Interleave: mix bytes from different blocks together so one torn corner does not destroy one whole block all at once.
6. Place the bits into modules: the combined data and ECC stream is written into the data region of the symbol.
7. Decode in reverse: the scanner reads modules, demasks them, rebuilds codewords, deinterleaves blocks, and then uses the ECC bytes to repair damaged codewords.

That full-symbol flow is the practical version of ECC. The encoder adds recovery bytes block by block, then the decoder later uses those same block rules to repair missing or wrong codewords.

A Full-Symbol Worked Example

Use version 1 M because it is small and has only one block:

version 1 M

data codewords
ECC codewords
block
total codewords

Here the “apply it to the whole symbol” story is straightforward:

The message is converted into 16 data codewords.
The encoder computes 10 ECC codewords from those 16 data codewords.
The final symbol stores all 26 codewords.
If a few codewords are later corrupted, the decoder can use the 10 ECC codewords to repair the damaged data, as long as the damage stays within the correction budget.

Larger symbols follow the same story, but with more blocks. The only new complication is that the bytes are mixed together before placement and unmixed again during decoding.

What the Four ECC Levels Really Mean in Practice

Level	Approx. Recovery Summary	What Actually Changes Under the Hood	Main Cost
L	~7%	Fewest ECC codewords in the version table.	Least resilience, most payload room.
M	~15%	More ECC codewords and sometimes different block structure.	Lower payload room than L.
Q	~25%	Substantially more parity bytes per symbol.	Often forces a larger version for the same message.
H	~30%	Highest parity allocation in the version table.	Smallest payload budget and denser output for the same message.

The important beginner takeaway is that a higher level means more recovery bytes and fewer message bytes. It does not mean the QR has magically become unbreakable.

Common Misunderstandings

ECC is not a second full copy of the message.
ECC helps with damaged codewords, not with every possible failure. If the scanner cannot even find the symbol structure, ECC never gets a chance to help.
More ECC is not always better. If it forces the modules to become too small for the printer, screen, or camera, scanning can get worse.
A logo can work only if the remaining codeword damage stays inside the recovery budget.

Advanced: What QR Actually Computes

Everything above is the zero-knowledge version. The real QR algorithm uses Reed-Solomon coding over GF(256). That means each block of data codewords is treated as a polynomial, meaning an algebraic expression whose coefficients are byte values, and the encoder computes the ECC bytes as a remainder after division by a generator polynomial. That is why you will see names like generator polynomial, syndromes, error locator polynomial, and Forney's formula in deeper references.

encoder-side Reed-Solomon outline for one block

input bytes                d0 d1 d2 ... d(k-1)
ECC bytes required         t

message polynomial         d(x)
shift for parity space     d(x) * x^t
generator polynomial       g(x)

divide in GF(256)          (d(x) * x^t) / g(x)
keep remainder             r(x)

ECC codewords              coefficients of r(x)
stored block               data bytes followed by ECC bytes

What Arithmetic QR Uses

QR Reed-Solomon arithmetic happens in the finite field GF(256), using the primitive polynomial x^8 + x^4 + x^3 + x^2 + 1 (hex 0x11D). Addition and subtraction are both XOR. Multiplication and division are field operations, usually implemented with log and antilog lookup tables instead of slow polynomial math at runtime.

GF(256) facts used by QR encoders

byte addition              XOR
byte subtraction           XOR
byte multiplication        field multiply modulo 0x11D
generator roots            a^0 through a^(t-1)

practical implementation   precompute exp/log tables
why this matters           parity bytes depend on field math, not normal integer division

How the Decoder Uses ECC for Recovery

Read the symbol, undo masking, and reconstruct the interleaved stream of data and ECC codewords.
Deinterleave the stream back into the original Reed-Solomon blocks, meaning restore the mixed placement order into the original per-block groupings.
For each block, compute syndromes by evaluating the received polynomial at the generator roots. A syndrome is a compact consistency check; all-zero syndromes mean the block is consistent.
If syndromes are nonzero, derive an error locator polynomial, an algebraic object whose roots identify the bad codeword positions, commonly with Berlekamp-Massey or the extended Euclidean algorithm.
Run a root search over field elements to locate the failing codeword positions.
Compute error magnitudes, commonly with Forney's formula, a standard step that turns the locator information and syndromes into the amount each bad codeword must change, then patch the bad codewords.
If correction succeeds for every block, reassemble the corrected data codewords and continue normal QR payload parsing.

decoder-side recovery outline for one block

received block             c'(x)
compute syndromes          S_i = c'(a^i)

all S_i = 0 ?              block is consistent
otherwise                  locate and solve errors

derive locator             sigma(x)
find bad positions         roots of sigma(x)
compute magnitudes         how much each bad byte must change
correct block              patch bytes so syndromes become zero

How this connects to the rest of the page: the earlier sections explain how the symbol is built, where the data lives, and how the bits are placed. This section explains how the extra recovery bytes let the decoder repair damage after those earlier structures have been read correctly.