The Seeing Eye Droid - Anticipating Google Glass


In a recent post I talked about using Google Glass as a sort of "seeing eye" for the blind, in a similar way to my talking barcode scanner hack. For a blind or partially sighted person, the world is full of obstacles like the small print on the postbox photo above. Let's take a moment to see what we can do to make them a thing of the past...

Google Glass hasn't come out yet, but we're expecting that it will run a variant of Android. The good news is that there are some 750m Android devices already in circulation, and there is actually quite a lot that we can do with this existing hardware. It won't be as convenient or serendipitous as Glass, but equally we can crack on and get hacking straightaway.

In this post we'll see how to write a Seeing Eye app that uses your Android device's camera to take a picture, uploads it to a cloud service for optical character recognition, then downloads and speaks aloud the results. We'll write both the app and the cloud service, and provide some sample output. If you take a sneaky look ahead, you will see that both the app and the cloud service are a mere 13 lines of code - an experienced hacker could easily reduce their footprint quite a bit more, but this is really a quick working demo to illustrate the principles at work.

For the bar code scanner hack I used something called SL4A, or Scripting Layer for Android. SL4A wraps up the Android APIs in a form that you can use from popular scripting languages like Ruby, Python and Perl - and less popular ones like Lua and Rhino. For non-programmers, the point of this is that you can write a very powerful app in just a few lines of code. Your app can talk to all of the exotic hardware lurking in the typical Android phone or tablet, such as its barometer, camera, GPS, accelerometer and humidity sensors. It can also make data connections over WiFi or the mobile phone network to talk to cloud services.

Here we'll use SL4A with Python and Chris AtLee's poster Python extension to take a picture, send it to our cloud service for optical character recognition, and then read out the results. Even if you are not a programmer, I think you will be able to figure out the main elements of the code without too much difficulty:
#!/usr/bin/python

import android
import urllib2

from poster.encode import multipart_encode
from poster.streaminghttp import register_openers

register_openers()

droid = android.Android()
droid.ttsSpeak("Ready to take a picture")
droid.cameraInteractiveCapturePicture("/sdcard/DCIM/sl4a/i_see.jpg")
datagen, headers = multipart_encode({"file": open("/sdcard/DCIM/sl4a/i_see.jpg")})

droid.ttsSpeak("Attempting to OCR the image")
request = urllib2.Request("http://your.server.here/path/to/ocr.pl", datagen, headers)
response = urllib2.urlopen(request).read()
droid.ttsSpeak(response)
Now, here's that "cloud service" - actually a Perl script running under the Apache CGI handler ;-) All it really does is run the Tesseract open source Optical Character Recognition package and then return its results. As an aside, you might have noticed that a similar OCR facility is now available as part of document uploads in Google Docs/Drive.
#!/usr/bin/perl

print "Content-type: text/plain\r\n\r\n";

use CGI;
$c = new CGI;

$filename = $c->param('file');
$upload_filehandle = $c->upload('file');

open(UPLOADFILE, ">/data/tmp/$$.jpg") or die "$!";
binmode UPLOADFILE;
while(<$upload_filehandle>) { print UPLOADFILE; }
close UPLOADFILE;

system("/usr/bin/tesseract /data/tmp/$$.jpg /data/tmp/$$");
print `cat /data/tmp/$$.txt`;

unlink("/data/tmp/$$.jpg");
unlink("/data/tmp/$$.txt");
Could it really be that simple? In a nutshell - yes. However, do be aware that the quality of the Tesseract results can vary dramatically depending on the input data. Here's what it made of my snapshot of the letter box at the top of this post:

41.:

"8457 74052.0

r tcoliectionris made at 7.00pm

0 nnghjm Road, A

, at collections may be made througlmut the day UF'r’»"”‘

‘ l

from the Postbox at

4.\ 1--

lmportant information:

From 28th October 2007 we
will no longer collect from

3 letter boxes on Sundays or

_ i Bank Holidays.

For further details of all our
products & services visit
www.royalmail.com or call

Customer Services on

O

8457 740740.

_ _ “ ‘ , , n 6‘ ..(~ . .vd6......now  K
‘_‘._,‘.,-,-.;_,..‘,_--.., ‘Q-3-zwgg-auiwsat-~  x  afwh ' ‘ 5‘: ‘W ' ‘
_ .

‘R
.., {t


You might be underwhelmed by this, but keep in mind that this was just a quick first stab, and that there is actually quite a bit of information content here that would be simply inaccessible to a blind or partially sighted person - the app did a surprisingly good job of decoding the "small print" on the sticker attached to the post box. Struggling through the line noise might also help to give you an insight into the world of the visually impaired.

Here's another example where the app was rather more successful:


And the output for this one:

Centre for
Biological Engineering

\ 11/ \ / . \
 -~1T":“:‘..V“_>
:«K,_)_ . x.“"~”
 \eiC_;“‘f-_>"‘>. .::~~""
,.


So where next for this? Right now it's a bit fiddly to launch the Seeing Eye on your phone, and it has to be driven via the SL4A parent app. We could make it into a standalone app in its own right, and create a lock screen shortcut or even a custom lock screen for it. I'd also note that Tesseract is available as an Android library now, and if you have a high end phone or tablet it may well run as quickly locally as the "cloud" based version - with no need to worry about potential data charges. Tesseract also wasn't the fastest, even on a moderately well specified Linux host VM, typically taking around 30 seconds to OCR the images we sent it.

It would also be interesting to explore crowdsourcing to train Tesseract to better recognise the text in these photos - whilst potentially providing a real time assistance service to the blind and partially sighted. You may recall Google doing something along similar lines with their reCAPTCHA project. There is probably a whole new business model based on micropayments lurking in here ;-)

I haven't thought of any nefarious applications for the Seeing Eye app off the top of my head, although it does strike me that someone could embed malicious code in a poster in the hope of getting it executed - are those backticks in the cloud service safe?! This has actually started to happen with malicious QR codes, so should not be discounted offhand. I am also conscious that a lot of people are interested in Tesseract as a way of circumventing CAPTCHAs. There is something of an arms race going on here that (if the code is shared) could well be quite beneficial in the long run.

If anyone would like to pick up the code in this post and play with it or develop a production quality app inspired by it, please feel free to - I hereby declare it public domain. If you should do so, please leave a comment with a pointer to your app.

No comments:

Post a Comment