Learn: QR Code Structure Guide
This page maps the QR symbol in detail, where symbol means the complete square QR code as a whole: not just the names of QR regions, but the exact placement logic, bit layouts, redundancy rules, and traversal behavior that make the symbol decodable in the real world.
Start Here: What a QR Code Actually Is
This guide assumes zero background. A QR code is just a square grid of tiny dark and light cells called modules. Some modules are fixed landmarks that help a camera orient itself. The remaining modules store a message and the redundancy needed to recover that message if part of the symbol is damaged.
- A QR code is a grid: the smallest standard QR symbol is version 1, which is 21 by 21 modules. Larger versions add 4 modules in width and height each time.
- Some parts never hold your message: finder, timing, alignment, format, and sometimes version information are structural patterns. Collectively these reserved structural regions are called function patterns: fixed parts of the symbol that help scanners locate and decode it, rather than carrying normal message data.
- Your message becomes bits: the text or scanner-style payload is converted into bytes, grouped into codewords, protected with Reed-Solomon error correction, and then placed into the open modules.
- The final pattern is intentionally scrambled: a mask pattern is one of eight fixed rules that flips selected data modules. Masking means applying that rule so the symbol avoids misleading stripes or blocks that are harder for scanners to read.
One-sentence mental model: a QR code is a carefully structured square where fixed landmarks help the camera, while the open space holds your message plus enough redundancy to survive damage.
- Module: a single dark or light square in the grid.
- Payload: the actual message string the QR code is meant to carry.
- Version: the symbol size. Version 1 is 21x21, version 40 is 177x177.
- Codeword: an 8-bit byte-sized chunk in the encoded QR bitstream.
- ECC: error correction codewords that let scanners recover missing or damaged data.
A Tiny Example Before the Heavy Details
Keep this beginner example in mind as you read the rest of the page. Every advanced section is explaining one part of this same journey. In QR terms, a mode is the rule that tells the decoder how to interpret the payload, and the mode indicator is the small field that stores that rule at the start of the data stream. UTF-8 is the text-to-bytes encoding rule that turns text into byte values, and byte mode is the QR mode for general byte data, which is why QR Peach uses it for normal text payloads.
- Input: HELLO
- Payload: HELLO
- Bytes: UTF-8 gives 48 45 4C 4C 4F
- Header: byte mode adds 0100 + 00000101
- Result: fixed patterns + data codewords + ECC codewords + mask.
Payload Types: Start with Plain Text, Then Add Scanner Conventions
Beginners often assume a QR code has separate low-level structures for links, email, Wi-Fi, and contacts. Usually it does not. Most of the time, the QR symbol still stores a plain byte stream. The scanner decides how to act based on the string pattern inside that stream.
- Plain text: a payload like HELLO WORLD has no special prefix, so readers simply show the decoded text.
- Generic URI: a payload like sip:ada@example.com still uses byte encoding, but scanners can hand the scheme to an app that understands it.
- Web link: a payload like https://example.com still uses byte encoding, but scanners recognize URI syntax and offer to open it.
- Email or phone: prefixes like mailto: and tel: are just part of the payload text. They tell the scanner which app action to launch.
- Wi-Fi, contact card, or event: structured strings such as WIFI:, MECARD:, BEGIN:VCARD, or BEGIN:VCALENDAR package multiple fields into one byte stream.
| Type | Envelope Example | How Scanners Interpret It |
|---|---|---|
| Text | HELLO WORLD | No special prefix. The decoded text is displayed directly. |
| Generic URI | sip:ada@example.com | Readers pass the scheme to an app that recognizes that URI type. |
| URL | https://example.com | Readers see URI syntax and offer to open the link. |
| mailto:ada@example.com?subject=Hello | The mailto: prefix triggers compose-email behavior. | |
| Phone | tel:+15551234567 | The tel: prefix triggers dial behavior. |
| SMS | SMSTO:+15551234567:Hello | Readers treat the envelope as a pre-filled text message. |
| Contact | MECARD:N:Ada Lovelace;TEL:...; | Readers parse the structured contact payload and offer to save it. |
| vCard | BEGIN:VCARD\nVERSION:3.0\nFN:Ada Lovelace\n... | Readers parse the vCard record and offer to import the contact details. |
| Event | BEGIN:VCALENDAR\nVERSION:2.0\nBEGIN:VEVENT\nDTSTART;TZID=America/New_York:20260601T090000\n... | Readers parse the iCalendar event record and offer to add it to a calendar. |
| Wi-Fi | WIFI:T:WPA;S:Cafe WiFi;P:secret123;; | Readers recognize the Wi-Fi envelope and offer network setup. |
| Geo | geo:37.7749,-122.4194?q=Coffee | Readers pass the coordinates into a map flow. |
From Pixels to Modules: How a Decoder Starts
A module is one square cell in the QR grid: the smallest black-or-white unit the symbol is built from. So yes, visually it is one little square. In the data region, a module usually carries one written bit after masking, but many modules in a QR code are not payload bits at all. Finder patterns, timing patterns, alignment patterns, format information, version information, separators, and the dark module are all made of modules too, but they serve structural jobs instead of carrying message content.
Before a decoder can talk about payload bytes, ECC bytes, or format bits, it has to complete a more basic job: turn an image into a clean QR module grid. In practice that means decoding the image file, locating the symbol, estimating the grid geometry, sampling each module, reading the format and version metadata, undoing the mask, and only then reconstructing codewords.
That is why debugging rows about image dimensions, quiet zone, contrast, and module pitch are not cosmetic. They describe the evidence the decoder had available before payload parsing could even begin.
A QR image starts as some encoded file bytes: PNG, JPEG, SVG, or another image format. The decoder first interprets that file and turns it into a raster image, meaning a rectangular pixel grid with a width and height in pixels. File byte length and pixel dimensions are related but not interchangeable. A tiny SVG can scale into a large crisp raster, while a large JPEG can still decode into a blurry image with weak module edges.
After decoding the file, the scanner has to decide which pixels count as dark modules and which count as light background. That step depends on contrast. Thresholding is the step where the decoder draws that dark-versus-light boundary. If dark modules are not dark enough, or the background is not bright enough, thresholding becomes unstable and the recovered grid can drift or fragment before payload decoding even starts.
Module pitch is the estimated pixel width and height of one QR module in the recovered image. A healthy scan usually has enough pixels per module to distinguish dark and light cells cleanly. If pitch is too small, compression and resampling can blur neighboring modules together. If horizontal and vertical pitch differ too much, that often hints at distortion, cropping, or perspective issues.
The quiet zone is the blank margin around the QR symbol. Beginners often think of it as wasted space, but decoders use it to separate the symbol from nearby graphics, text, or borders. If the dark region nearly touches the image edge, the scanner may misread the symbol boundary or fail to detect the finder structure cleanly at all.
Next idea: before a scanner can read any of those payloads, it must first find the symbol, determine orientation, estimate the grid, and compensate for distortion. That is what the fixed landmark patterns are for.
Finder Patterns
Three identical 7x7 concentric square patterns placed at the top-left, top-right, and bottom-left corners of every QR symbol. They allow any scanner to detect that a QR code is present, determine the symbol orientation, and establish the module-grid coordinate system before any data is read.
Three finder patterns create a unique L-shaped arrangement. Any three squares in those positions unambiguously define position and orientation; the missing fourth corner tells the decoder which way is up. A fourth pattern would create 180-degree rotational ambiguity.
Each finder pattern is a 7x7 grid with three concentric layers.
- Layer 1: 7x7 dark outer border
- Layer 2: 5x5 light ring
- Layer 3: 3x3 dark center block
Any horizontal or vertical scan line crossing a finder pattern produces dark:light:dark:light:dark module widths in the ratio 1:1:3:1:1. This ratio is what scanners search for, and it survives changes in scale, angle, and rotation.
Each finder pattern is surrounded by a 1-module-wide white separator on the sides facing the interior of the symbol. This prevents the finder from visually merging with nearby data modules. Separators are always white regardless of masking.
| Pattern | Top-left Corner | Row Range | Column Range |
|---|---|---|---|
| Top-left | (0, 0) | 0-6 | 0-6 |
| Top-right | (0, size-7) | 0-6 | size-7 to size-1 |
| Bottom-left | (size-7, 0) | size-7 to size-1 | 0-6 |
Beyond the required 4-module quiet zone outside the symbol boundary, the internal separators act as an internal quiet buffer that keeps the finder patterns distinguishable from nearby data modules.
Timing Patterns
Two alternating dark/light strips, one horizontal and one vertical, let the scanner determine module-grid density and compensate for perspective distortion or skew in the captured image.
- Horizontal timing: row 6, columns 8 through size-9.
- Vertical timing: column 6, rows 8 through size-9.
The first module of each timing strip is always dark. After that, the strip alternates dark, light, dark, light, and so on. The strip length is size-16 modules.
Row 6 and column 6 are fixed across all 40 versions. Because they sit between the finder patterns, scanners know exactly where to look before they know anything version-specific about the symbol.
| Version | Symbol Size | Timing Strip Length |
|---|---|---|
| 1 | 21x21 | 5 modules |
| 5 | 37x37 | 21 modules |
| 10 | 57x57 | 41 modules |
| 20 | 97x97 | 81 modules |
| 40 | 177x177 | 161 modules |
When a QR code is photographed at an angle, modules appear warped. By counting transitions along both timing strips, the decoder can estimate the perspective transform and realign the grid before reading payload data.
Alignment Patterns
Small 5x5 concentric patterns placed throughout the data region of larger QR codes. They provide extra reference points for correcting warping and local distortion across the full symbol.
- Border: 5x5 dark outer ring
- Inner Ring: 3x3 light ring
- Center: 1x1 dark module
- Version 1: no alignment patterns.
- Version 2 and above: at least one alignment pattern, with count increasing by version.
- Maximum: Version 40 uses 46 alignment patterns.
The specification defines a list of center coordinates for each version. All combinations of (row, col) from that list are used as alignment centers, except any that would overlap a finder pattern or its separator.
| Version | Center List | Pattern Count |
|---|---|---|
| 2 | [6, 18] | 1 |
| 7 | [6, 22, 38] | 5 |
| 14 | [6, 26, 46, 66] | 12 |
| 21 | [6, 28, 50, 72, 94] | 21 |
| 40 | [6, 30, 58, 86, 114, 142, 170] | 46 |
If a computed center would place a 5x5 alignment pattern inside a finder corner region, that center pair is skipped. That is why the number of patterns is always lower than the raw square of the center-list length.
Large QR symbols can accumulate noticeable lens distortion and paper warping by the time the scanner reaches the middle of the code. Alignment patterns let the scanner continuously recalibrate instead of trusting a single global transform.
From Message to Bitstream: What the Data Region Ultimately Holds
Every module that is not part of a function pattern - finder, separator, timing, alignment, format information, version information, or the dark module - is part of the data region. These modules carry the full payload stream: mode indicator, character count, encoded data, error correction, and remainder bits.
Before placement, the final bitstream follows a fixed structure. Beginners should read it left to right as: what mode am I in, how long is the payload, what are the payload bits, and what extra bits are needed to finish the QR structure cleanly and safely. A terminator is the short end marker that says the payload is over. Pad bits are zero bits added only to reach the next byte boundary. Pad codewords are full filler bytes added after that if the symbol still has unused data capacity.
- Mode Indicator: 0001 numeric, 0010 alphanumeric, 0100 byte, 1000 Kanji
- Character Count: variable length by version range and mode
- Encoded Data: the actual payload bits
- Closure: terminator, byte padding, pad codewords, ECC, then remainder bits
For a short byte-mode example in versions 1 through 9, the character count field is 8 bits long. That means the encoder writes the 4-bit byte-mode marker, then an 8-bit byte count, then the UTF-8 bytes for each character.
For most versions, data is split across multiple Reed-Solomon blocks before ECC is computed. The final stream interleaves data blocks first, then ECC blocks, then appends any remainder bits. That spreads local damage across many blocks instead of destroying one block completely.
- Arrange all data blocks side by side.
- Interleave codeword 1 from block 1, block 2, block 3, and so on.
- Repeat for codeword 2, codeword 3, and the rest of the data codewords.
- Then interleave ECC codewords the same way.
- Append any remainder bits at the very end.
| Version | Size | Total Modules | Function Pattern Modules | Data Modules |
|---|---|---|---|---|
| 1 | 21x21 | 441 | 202 | 208 (= 26 CW x 8) |
| 7 | 45x45 | 2,025 | 393 | 1,568 (= 196 CW x 8) |
| 20 | 97x97 | 9,409 | 1,265 | 8,324 (= 1040 CW x 8 + 4 rem) |
| 40 | 177x177 | 31,329 | 2,677 | 29,648 (= 3706 CW x 8) |
Remainder Bits
Remainder bits are the final leftover data-region modules that remain after every data codeword and every ECC codeword has already been assigned. They are not payload bits, not padding codewords, and not error-correction bytes. They are simply extra module positions that exist because the usable data-region area of some QR versions is not an exact multiple of 8 bits.
QR encoding works in bytes for data and ECC codewords, but the matrix is a grid of individual modules. After the function patterns are reserved, the remaining writable modules sometimes total a count like 8n + 3, 8n + 4, or 8n + 7. The QR specification still defines codeword capacity in whole bytes, so those extra 3, 4, or 7 modules cannot become another codeword. They are filled with zero-valued remainder bits instead.
That last subtraction is the whole idea. If a version has exactly enough writable modules for whole bytes, the remainder-bit count is zero. If it has a few extra modules beyond the byte-aligned codeword capacity, those modules become remainder bits.
The page already lists version 20 as having 8,324 data modules, written as 1040 CW x 8 + 4 rem. That means the version can carry 1040 total codewords across data and ECC, which consumes 8,320 modules. Four writable modules are still left over, so the encoder writes four zero remainder bits into them.
| Example Version | Usable Data Modules | Total Codewords | Codeword Bits | Remainder Bits |
|---|---|---|---|---|
| 1 | 208 | 26 | 208 | 0 |
| 2 | 359 | 44 | 352 | 7 |
| 14 | 1,971 | 246 | 1,968 | 3 |
| 20 | 8,324 | 1040 | 8,320 | 4 |
The important takeaway is that remainder bits are a property of the version's final writable-module count, not of the message, ECC level, or mask pattern. Once you choose a version, the remainder-bit count is fixed.
Remainder bits are always zero. There is no choice here and there is no optimization step. After all interleaved data and ECC codewords have been placed, the encoder writes zeroes into the remaining eligible modules. A decoder does not interpret them as payload and does not feed them into Reed-Solomon decoding.
In the logical stream order, remainder bits come after everything else:
- mode indicator
- character count field
- encoded payload bits
- terminator and zero padding to byte boundary
- pad codewords
- ECC codewords
- remainder bits
This matters because remainder bits are outside the Reed-Solomon-protected byte stream. They do not belong to any block, they are not interleaved as codewords, and they have no effect on payload recovery.
The matrix walk does not need a special path for remainder bits. The encoder performs the same normal zig-zag traversal used for all writable modules. It places interleaved data and ECC bits first. If the traversal still has eligible modules left after the final codeword bit is consumed, those last modules receive zero remainder bits in the same traversal order.
A decoder reconstructs the sequence of writable modules, demasks them, and reassembles codewords until it has read the version's defined number of total codewords. If a few modules remain after that count, the decoder knows they are remainder bits and stops treating them as codeword data. They are effectively ignored after structural validation.
| Remainder Bits | Versions |
|---|---|
| 0 bits | 1, 7-13, 35-40 |
| 7 bits | 2-6 |
| 3 bits | 14-20, 28-34 |
| 4 bits | 21-27 |
- Remainder bits are not the same thing as the Reed-Solomon remainder that generates ECC codewords. The names are similar, but they describe different parts of the process.
- Remainder bits are not pad codewords. Pad codewords are full bytes inserted before ECC generation; remainder bits are a few trailing modules after all codewords are done.
- Remainder bits do not increase capacity. They exist precisely because there are not enough leftover modules to form a full extra byte.
- Remainder-bit count depends on version only, not on the message contents.
Format Information
Format information tells the decoder two things it must know before reading the payload: the error correction level and the data mask pattern used across the symbol.
The format string is 15 bits total: 2 ECC bits, 3 mask bits, and 10 BCH error-correction bits. BCH is a short binary error-correcting code used here to protect small metadata strings like format and version information.
- Bits 14-13: ECC level
- Bits 12-10: mask pattern
- Bits 9-0: BCH redundancy
The mask choice is not stored somewhere else in the QR symbol. It lives inside the 15-bit format string itself. The three orange cells below are the mask-pattern bits, so this is the exact place where the selected mask option is represented.
The BCH correction bits are computed with generator polynomial x^10 + x^8 + x^5 + x^4 + x^2 + x + 1. The full 15-bit string is then XORed with the fixed mask 101010000010010 so that all-zero and all-one format strings do not occur.
Masking does not encrypt the message. It only flips selected data modules so the final symbol avoids patterns that are visually awkward for scanners, such as long runs, heavy clumps, or finder-like stripes in the wrong place.
Both copies encode the same information. If one copy is damaged, the other can still be decoded.
- Copy 1 sits next to the top-left finder pattern.
- It occupies row 8 columns 0-5, row 8 column 7, row 8 column 8, and column 8 rows 7-0.
- Copy 2 is split between row 8 near the top-right finder and column 8 near the bottom-left finder.
In Copy 1, bit 14 (MSB, most-significant bit) begins at row 8, col 0 and bit 0 (LSB, least-significant bit) ends at col 8, row 0. In Copy 2, bit 14 begins near col 8, row size-7 and the later bits continue near the top-right strip.
| Level | Indicator Bits | Recovery Capacity | Use Case |
|---|---|---|---|
| L | 01 | ~7% | Clean environments |
| M | 00 | ~15% | General use |
| Q | 11 | ~25% | Moderate damage tolerance |
| H | 10 | ~30% | High-damage environments |
| Pattern | Bits | Formula | Visual Effect |
|---|---|---|---|
| 0 | 000 | (row + col) mod 2 = 0 | Checkerboard |
| 1 | 001 | row mod 2 = 0 | Horizontal stripes |
| 2 | 010 | col mod 3 = 0 | Vertical stripes |
| 3 | 011 | (row + col) mod 3 = 0 | Diagonal stripes |
| 4 | 100 | (floor(row/2) + floor(col/3)) mod 2 = 0 | Large blocks |
| 5 | 101 | (row*col) mod 2 + (row*col) mod 3 = 0 | Sparse dots |
| 6 | 110 | ((row*col) mod 2 + (row*col) mod 3) mod 2 = 0 | Dense texture |
| 7 | 111 | ((row + col) mod 2 + (row*col) mod 3) mod 2 = 0 | Mixed pattern |
Version Information
Present only in versions 7-40. Version information encodes the version number as an 18-bit string so decoders can confirm symbol size without counting modules, which matters when large symbols are distorted or partially obscured.
The version block is 18 bits total: 6 data bits for the version number and 12 BCH correction bits. The BCH generator used is x^12 + x^11 + x^10 + x^9 + x^8 + x^5 + x^2 + 1, and the code can correct up to 3 bit errors in the version block.
- Bits 17-12: version number
- Bits 11-0: BCH redundancy
Small symbols are easier to size from the finder patterns and timing lines alone. By the time the decoder sees version 1 through 6, it can infer the grid size reliably enough that a dedicated 18-bit version block would waste valuable space. Versions 7 and above are large enough that explicit confirmation becomes worth the reserved modules.
| Version | Binary (6-bit) | Full 18-bit String |
|---|---|---|
| 7 | 000111 | 000111 110010 010100 |
| 10 | 001010 | 001010 011110 111100 |
| 20 | 010100 | 010100 111011 100100 |
| 40 | 101000 | 101000 100100 111101 |
Both copies encode the same 18 bits. The string is arranged as a 6x3 block read column-by-column, top-to-bottom, left-to-right.
- Copy 1: top-right block, rows 0-5 and columns size-11 through size-9.
- Copy 2: bottom-left block, rows size-11 through size-9 and columns 0-5.
- Copy 2 is effectively the transposed mirror of Copy 1.
Zig-Zag Bit Ordering
Once the payload and ECC bytes exist, the QR encoder must pour those bits into the remaining empty modules. It does that by sweeping right-to-left through 2-column-wide vertical strips, alternating travel direction up and down.
Step 1: Start at the bottom-right corner of the symbol.
Step 2: Move in a 2-column-wide strip, using the right column first and then the left column at each row.
Step 3: Alternate direction by strip: up, then down, then up again.
Step 4: Skip column 6 entirely because it is the vertical timing pattern.
Step 5: Skip function modules without advancing the bitstream index.
Step 6: Stop when all eligible data modules are filled.
Imagine codeword 0 begins with bits 1 0 1 1 0 0 1 0. Bit 7 goes into the first visited data module, bit 6 goes into the second, and so on. Only after every bit of every interleaved codeword is placed are any leftover remainder modules filled with zero bits.
Within each 2-wide strip, the right column is placed first and the left column second. At a given row, the sequence is right-current, left-current, right-next, left-next.
Codewords are placed most-significant-bit first. Bit 7 of codeword 0 goes into the first visited data module; bit 0 of the final codeword is placed just before any remainder bits.
After all data and ECC codewords are placed, some versions still have a few leftover modules. Those are filled with zero-valued remainder bits and do not carry payload information.
| Remainder Bits | Versions |
|---|---|
| 0 bits | 1, 7-13, 35-40 |
| 7 bits | 2-6 |
| 3 bits | 14-20, 28-34 |
| 4 bits | 21-27 |
Zig-zag placement distributes data across the full area of the symbol. Combined with block interleaving, that means local physical damage corrupts many different blocks a little instead of destroying one block completely, which is exactly what Reed-Solomon correction wants.
Version and ECC Tradeoffs
A larger version increases module count and capacity. A higher ECC level increases redundancy but reduces space available for payload bytes. Beginners can think of this as a trade between room and resilience.
| ECC | Approx. Recovery | Main Tradeoff |
|---|---|---|
| L | ~7% | Maximum payload capacity, least redundancy. |
| M | ~15% | Balanced default setting. |
| Q | ~25% | Higher resilience, lower payload budget. |
| H | ~30% | Strongest redundancy, smallest payload budget. |
ECC and Error Correction
ECC means error correction codewords: extra bytes added to a QR code so a scanner can still recover the message when part of the symbol is damaged, blurred, cropped, dirty, or hard to read. The big idea is simple: the QR does not only store your message. It also stores extra check bytes that describe the message in a recoverable way.
Imagine writing down a message and then writing down some extra helper notes about that message. If one part of the original message gets smudged, those helper notes can help you reconstruct what was lost. That is what ECC is doing. It is not making a full duplicate copy of the message. It is adding carefully chosen recovery bytes.
In QR, the important working unit is the codeword, which is one byte. Recovery happens at codeword level, not at raw pixel level. First the scanner turns the image back into dark and light modules, then into bits, then into codewords. Only after that does ECC begin helping.
- Data codeword: one byte that belongs to the real message stream.
- ECC codeword: one byte added only for recovery.
- Block: one chunk of codewords that gets its own local ECC protection.
- Interleaving: mixing bytes from different blocks together before placement so local damage gets spread out.
- Erasure: a byte position the decoder already knows is missing or unreliable.
- Demasking: reversing the chosen mask rule so the decoder sees the real written data bits again.
A QR symbol contains both fixed structure and message-carrying structure. ECC protects only the payload stream: mode indicator, character count, encoded payload bits, terminator bits, pad structure, and the final data codewords made from those bits. Finder patterns, timing patterns, alignment patterns, and format/version metadata are structural helpers; they are not part of the protected message polynomial.
Before looking at the real QR formulas, use a tiny toy example. Suppose we store three bytes A, B, and C, then create one helper byte with this rule:
If the stored values are A = 12, B = 5, and C = 9, then:
Now imagine B is lost, but A, C, and P are still known. You can recover B by rearranging the same rule:
QR does not use this exact toy formula. Real QR uses a much stronger system that can recover more complicated patterns of missing or wrong bytes. But the toy example shows the core idea correctly: extra bytes can make lost bytes recoverable.
| Version 1 Example | Total Codewords | Data Codewords | ECC Codewords | Blocks |
|---|---|---|---|---|
| L | 26 | 19 | 7 | 1 |
| M | 26 | 16 | 10 | 1 |
| Q | 26 | 13 | 13 | 1 |
| H | 26 | 9 | 17 | 1 |
The familiar levels L, M, Q, and H are table choices, not formulas that directly turn a percentage into a byte count. For each version and ECC level, the QR specification already defines the exact number of data codewords, the number of blocks, and the number of ECC codewords per block.
The first exact formula to learn is just a counting identity:
For version 1 at level M, the table says:
Because each codeword is one byte, you can also count bits:
Using version 1 M again:
Those are exact QR counts, not toy math. They tell you how much space belongs to message bytes and how much space belongs to recovery bytes.
As versions grow, the structure becomes more complex. Many versions split the payload across several blocks, and some versions use two block groups of slightly different data lengths. The encoder does not invent those counts on the fly; it looks them up from the QR version/ECC tables, then computes recovery bytes separately for each block.
When people ask, “How much damage can ECC fix?”, the first exact rule is this:
The word floor just means “round down.” So if a block has 10 ECC codewords:
That means the block can guarantee recovery from up to 5 unknown bad codewords in that block. If the decoder already knows which positions are unreliable, those are called erasures and the trade becomes more favorable.
Now scale the simple ideas up from one toy group of bytes to a full QR symbol:
- 1. Choose version and ECC level: this fixes the total byte budget and the recovery budget.
- 2. Build the data bytes: mode bits, count field, payload bytes, terminator, zero padding, and pad codewords become the data codewords.
- 3. Split into blocks: some QR symbols use one block, others use many.
- 4. Compute ECC bytes per block: each block gets its own recovery bytes.
- 5. Interleave: mix bytes from different blocks together so one torn corner does not destroy one whole block all at once.
- 6. Place the bits into modules: the combined data and ECC stream is written into the data region of the symbol.
- 7. Decode in reverse: the scanner reads modules, demasks them, rebuilds codewords, deinterleaves blocks, and then uses the ECC bytes to repair damaged codewords.
That full-symbol flow is the practical version of ECC. The encoder adds recovery bytes block by block, then the decoder later uses those same block rules to repair missing or wrong codewords.
Use version 1 M because it is small and has only one block:
Here the “apply it to the whole symbol” story is straightforward:
- The message is converted into 16 data codewords.
- The encoder computes 10 ECC codewords from those 16 data codewords.
- The final symbol stores all 26 codewords.
- If a few codewords are later corrupted, the decoder can use the 10 ECC codewords to repair the damaged data, as long as the damage stays within the correction budget.
Larger symbols follow the same story, but with more blocks. The only new complication is that the bytes are mixed together before placement and unmixed again during decoding.
| Level | Approx. Recovery Summary | What Actually Changes Under the Hood | Main Cost |
|---|---|---|---|
| L | ~7% | Fewest ECC codewords in the version table. | Least resilience, most payload room. |
| M | ~15% | More ECC codewords and sometimes different block structure. | Lower payload room than L. |
| Q | ~25% | Substantially more parity bytes per symbol. | Often forces a larger version for the same message. |
| H | ~30% | Highest parity allocation in the version table. | Smallest payload budget and denser output for the same message. |
The important beginner takeaway is that a higher level means more recovery bytes and fewer message bytes. It does not mean the QR has magically become unbreakable.
- ECC is not a second full copy of the message.
- ECC helps with damaged codewords, not with every possible failure. If the scanner cannot even find the symbol structure, ECC never gets a chance to help.
- More ECC is not always better. If it forces the modules to become too small for the printer, screen, or camera, scanning can get worse.
- A logo can work only if the remaining codeword damage stays inside the recovery budget.
Everything above is the zero-knowledge version. The real QR algorithm uses Reed-Solomon coding over GF(256). That means each block of data codewords is treated as a polynomial, meaning an algebraic expression whose coefficients are byte values, and the encoder computes the ECC bytes as a remainder after division by a generator polynomial. That is why you will see names like generator polynomial, syndromes, error locator polynomial, and Forney's formula in deeper references.
QR Reed-Solomon arithmetic happens in the finite field GF(256), using the primitive polynomial x^8 + x^4 + x^3 + x^2 + 1 (hex 0x11D). Addition and subtraction are both XOR. Multiplication and division are field operations, usually implemented with log and antilog lookup tables instead of slow polynomial math at runtime.
- Read the symbol, undo masking, and reconstruct the interleaved stream of data and ECC codewords.
- Deinterleave the stream back into the original Reed-Solomon blocks, meaning restore the mixed placement order into the original per-block groupings.
- For each block, compute syndromes by evaluating the received polynomial at the generator roots. A syndrome is a compact consistency check; all-zero syndromes mean the block is consistent.
- If syndromes are nonzero, derive an error locator polynomial, an algebraic object whose roots identify the bad codeword positions, commonly with Berlekamp-Massey or the extended Euclidean algorithm.
- Run a root search over field elements to locate the failing codeword positions.
- Compute error magnitudes, commonly with Forney's formula, a standard step that turns the locator information and syndromes into the amount each bad codeword must change, then patch the bad codewords.
- If correction succeeds for every block, reassemble the corrected data codewords and continue normal QR payload parsing.
How this connects to the rest of the page: the earlier sections explain how the symbol is built, where the data lives, and how the bits are placed. This section explains how the extra recovery bytes let the decoder repair damage after those earlier structures have been read correctly.