Baked beans or dogfood? Find out, with talking barcodes

Let's imagine that you can't see to find out what's in that tin you just took out of the cupboard. This isn't a hypothetical scenario if you're visually impaired. Rice pudding, baked beans, or dog food? Doesn't bear thinking about, does it? Later on in this post I'll offer a simple solution for Android devices, but first here's a bit of background...

I have an interest in assistive technology, and for the last few years have been attending Sight Village to geek out on the latest gadgets for blind and partially sighted people. Historically there has been a running theme of hugely expensive software and/or hardware packages to do things like screen magnification and screen reading. This year I felt that things were substantially different - the buzz was about apps for iPhone and Android, and it was telling that there were lots of iPads in evidence. I think Apple have a particular mindshare here because of their Voiceover extensions to iOS:

My experience of screen readers and magnifiers under older Symbian and Windows Mobile devices has been pretty dire - memory and CPU hogging, and prone to crashing the device due to being only tangentially integrated with the operating system. To add insult to injury these packages have tended to be extremely expensive due to the limited size of their market. The move with Android and iOS to support accessibility from the ground up is very encouraging. Let's see what sorts of apps it makes possible...

Here's a very powerful example. SayText is free OCR and text-to-speech software for the iPhone:

But let's head back towards baked beans and dog food... At Sight Village this year there were several systems for labelling things and then using a reader to identify them. The RNIB's PenFriend is probably the most impressive of these. You might expect that this uses RFID tags, but it's actually based on optical recognition. This makes the labels effectively disposable as the unit cost is very low. Here's John Godber from the RNIB explaining how this system works:

The PenFriend was pretty impressive to play with in real life, but I found myself thinking that many of the items you would be interested in (e.g. CDs/DVDs and food packaging) already have an optical tag on them - the humble barcode (UPC). This took me back to my first adventures with barcode scanning on mobile phones. Google Googles for Android is probably the most impressive app in this category, as it will scan bar codes, carry out on-the-fly OCR of text, translate text into other languages, and carry out a Google search. Here's a video of Hartmut Neven from Google demoing recent enhancements:

I found myself thinking that a paired down version of this could be very handy for a visually impaired Android user. Enter the Scripting Layer for Android (SL4A), which exposes the underlying Android APIs to a range of scripting languages. This makes the barrier of entry very low indeed for doing interesting things with your device. Now you can write trivial Python, Perl or Ruby scripts that use geo-location, accelerometer, compass, speech synthesis, speech recognition, scan barcodes with the camera, and interact with the Internet. Here's a video demoing some simple Python scripting on the phone itself:

The SL4A community has helpfully put together a large number of tutorials, including this nice example of barcode scanning for books from Google's Matt Cutts:

import android
droid = android.Android()
code = droid.scanBarcode()
isbn = int(code['result']['SCAN_RESULT'])
url = “http://books.google.com?q=%d” % isbn
droid.startActivity(‘android.intent.action.VIEW’, url)

This six line script is all it takes to start up the Android barcode scanner, look up the UPC code and then carry out a Google Books search in the Android web browser. Unfortunately if you turn on the Android TalkBack screen reader, it insists on reading out everything it comes across, including the URL of the Google site being visited.

So, here goes with talking barcodes!

This script is a quick and dirty hack to scan a barcode, search the Google Products database for the barcode, pull out some interesting attributes (title, price, description etc) and speak them.

import android
import urllib2
import re

from htmlentitydefs import name2codepoint
name2codepoint['#39'] = 39

def unescape(s):
  return re.sub('&(%s);' % '|'.join(name2codepoint),
    lambda m: unichr(name2codepoint[m.group(1)]), s)

droid = android.Android()
droid.ttsSpeak("Ready to scan bar code")

code = droid.scanBarcode()
barcode = int(code.result['extras']['SCAN_RESULT'])
print "Barcode: ", barcode

url = "http://www.google.com/products?q=%d" % barcode
handler = urllib2.urlopen(url)
response = handler.read()
handler.close()

# clunky code to pull out the interesting bits - was hoping to use xml.dom.minidom instead
rtitle = re.compile(r'<h3 class="result-title">.*</h3>', re.M|re.S).search(response)
rattrs = re.compile(r'<p class="result-attributes">([^<]+)</p>', re.M|re.S).search(response)
rdescr = re.compile(r'<p class="result-desc">([^<]+)</p>', re.M|re.S).search(response)
rprice = re.compile(r'<span class="main-price">([^<]+)</span>', re.M|re.S).search(response)

output = ""
if rtitle is not None:
  output = re.sub(r'.* >(.*)</a></h3>', r'\1', rtitle.group(0))
if rattrs is not None:
  output += re.compile(r'<p class="result-attributes">(.*)</p>', re.M|re.S).sub(r'\1', rattrs.group(0))
if rdescr is not None:
  output += re.compile(r'<p class="result-desc">(.*)</p>', re.M|re.S).sub(r'\1', rdescr.group(0))
if rprice is not None:
  output += re.compile(r'<span class="main-price">(.*)</span>', re.M|re.S).sub(r'\1', rprice.group(0))

print unescape(output)
droid.ttsSpeak(unescape(output))

With a little more finessing I think it should be possible to replace the crufty regular expressions with callouts to the Python DOM library. I should mention that the HTML entity decoding above is courtesy of dluce's posting to Stack Overflow. A Python guru could probably replace most of this with a couple of well chosen one liners.

Ironically, whilst the Google Products database search did just fine with a bunch of books and CDs, it didn't match my test case tin of baked beans. I think a useful next step would be to link the script up with the barcode database provided by upcdata.info, as a database of last resort. This loses the immediate potential for comparison shopping that we get through the Google database, though. I like to picture a blind person at the supermarket being asked if they need any help - "Thanks but I'm OK. Think I'll get this from the local shop down the road, my phone says it's cheaper there" :-)

Now it's over to you to think of some other applications for SL4A, like this nice example of onboard rocket telemetry gathering...

5 comments:

Biker-X17 August 2010 at 12:04
Great post Martin. Picking up on your early side issue RE OCR to speech are you aware of any apps that would take Word or PDF files and turn them into iTunes available Podcasts? I envisage using iTunes for the usual but adding in the ability to handle work reading (without doing the reading). It would be so nice to have the flexibility to listen to committee papers, project reports etc etc while away form the laptop. Maybe your assistive technology knowledge could help out?
Martin Hamilton17 August 2010 at 18:28
Funny you should mention that - I'd been thinking just recently about doing something like this using Google Docs. One of the recent enhancements to Google Docs is a feature to "OCR" PDFs...
http://googledocs.blogspot.com/2010/06/optical-character-recognition-ocr-in.html
(though it has some limitations on document size)

In other news, why stop at text-to-speech when you could translate on the fly as well :-) http://weston.ruter.net/projects/google-tts/ But note that this is limited to 100 characters at a time :-(

Now if we could drive the OCR of that PDF using an API, then there's a clear path from PDF to MP3. Aha, there is an API for the OCR stuff... http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#OCR

Here's a guy who has knocked up a nice little app to convert his ebooks to MP3 using the Google TTS, handling the 100 character chunking automatically... http://www.codeproject.com/KB/audio-video/GoogleTTS-Ebook-Reader.aspx
Notlistening25 August 2010 at 09:55
Hey Martin,

Can you get in direct contact with me as I think we are working towards similar goals here. I am doing some interesting work on TTS under linux using SAPI. I have very much the same aims as you and if would be great to chat about ideas and stuff. Mail me thomas lloyd at yahoo dot com.
F.T14 January 2012 at 07:29
This is a great great tutorial full of information. How to follow you on g+?

Jaycon

Arduino
Martin Hamilton24 January 2012 at 11:56
I'm at http://gplus.to/comth :)