updated 04:38 pm EDT, Tue August 6, 2013
Problem linked to JBIG2 compression algorithm, workaround available
A handful of Xerox devices have been found to randomly substitute characters while performing a copy action, but not an optical character recognition (OCR) analysis. Confirmed by experiment, both the Xerox WorkCentre 7535 and 7556 perform the swap, with a possible eight other devices also by Xerox manifesting the issue. The researcher who found the problem discovered that "patches of the pixel data are randomly replaced in a very subtle and dangerous way: The scanned images look correct at first glance, even though numbers may actually be incorrect."
According to reseacher David Kriesel, "the error does not occur if PDFs are scanned with OCR, or TIFs are scanned (the latter seems plausible, as the pure image data should be saved into the TIF). Additionally, there seems to be a correlation between font size and scan dpi used. I was able to reliably reproduce the error for 200 DPI PDF scans without OCR, of sheets with Arial 7pt and 8pt numbers."
Since original discovery, the error has been linked to overzealous compression within the scanner and printer combination. The JBIG2 algorithm, when used in "normal" mode (but not higher levels) has been found to make the substitution during copy or document saving operations when OCR is not being used.
The error is beyond a simple "8 for 6" exchange as seen in the third image below, as the JBIG2 routine "creates a dictionary of image patches it finds 'similar.' Those patches then get reused instead of the original image data, as long as the error generated by them is not 'too high'." Xerox confirmed the problem with the researcher in a conference call a few days after the discovery, and the substitution effect is seen in the second image below.
Responding to the issue, Xerox has said that the default print quality of "higher" does in fact prevent this issue from manifesting itself. In a statement, the company claims that "for data integrity purposes, we recommend the use of the factory defaults with a quality level set to 'higher.' In cases where lower quality/higher compression is desired for smaller file sizes, we provide the following message to our customers next to the quality settings within the device web user interface: 'The normal quality option produces small file sizes by using advanced compression techniques. Image quality is generally acceptable, however, text quality degradation and character substitution errors may occur with some originals.'" If the resolution is set at the printer at the time of the scan, the alert is not given, however.
The eight other models reportedly having the issue are the Xerox WorkCentre models 7530, 7328, 7346, 7546, 7535, and the 7556. The Xerox ColorQube 9203 and 9201 are also allegedly manifesting the problem, according to reader reports.