(This is a series in Augmented Reality and you can find Part 1 here.)
Historically, when someone used the words “Augmented Reality” (AR), they typically involved HUDs or HMDs, and a bunch of very expensive hardware that produced accurate results. Oh, and everyone owned one of these… NOT.
Recently, AR has been getting quite a bit of buzz thanks to mobile devices, such as iPhones and Android devices. “But why,” you might ask? Many use cases of AR need mobility — from a simple question like, “What is that I’m seeing?” to “Where is X?” these questions in most cases require the user to be on the road. Smartphones these days meet much of these needs for these use cases in the palm of your hand. (And getting to my point,) Applications such as Layar and Wikitude are being touted as “augmented reality” browsers.
But, is that reeeeeeally AR?
Many of the folks who’ve researched and invented AR might say this: “THAT IS NOT AUGMENTED REALITY!!!”
By definition, AR integrates or “blends” the virtual computer graphics objects to the real on your visual device, and it displays them in real time. But perhaps more importantly,
“Augmented reality does not simply mean the superimposition of a graphic object over a real world scene. This is technically an easy task. One difficulty in augmenting reality, as defined here, is the need to maintain accurate registration of the virtual objects with the real world image.”
(From Jim Vallino’s webpage.)
So, what exactly is the issue with Layar and Wikitude’s apps that does not make them AR? By using the smartphone’s camera, they can superimpose crude but virtual overlays in real time. Isn’t that AR?
The problem lies in the accurate registration of the virtual objects with the real-world image.
Without going into too much detail (and being nerdy), neither Layar nor Wikitude uses the visual information (the video stream) to accurately register and and display virtual objects with the real-world image. In fact, the video streams are not analyzed at all to determine anything. They don’t track or use image recognition or vision algorithms to tell you what you are seeing. They are simply using the device’s GPS and compass to determine where you are and which way you’re facing. Once you know that, they find out what’s near by and display a “dot” on top of the video stream to tell you the general direction, distance, and description of that place. In fact, if you turn the video stream off and the AR “layer” on, it would essentially still work. Functionally, there’s not much difference between this and Google Maps on your mobile phone. (Well, I think Google Maps is much more useful.)
So, when you are a block away from a POI (point of interest), they would generally give you a good “ish” estimation. But otherwise, they are subject to GPS errors, magnetic interferences on the compasses, and a database that might be wrong or old.
So, I ask you, is this really AR? I think not — not in the “classical” sense, at least.
Now, let’s flip the coin and see the other side.
AR’s been coined and been around for a couple of decades (and the idea, even longer). But it really hasn’t made that much progress over the decades, in a sense that it really hasn’t influenced our lives. Do you experience AR technology in your day to day? Most of us don’t (unless you are a Top Gun flying multi-million dollar airplanes).
So, where did it go wrong?
Here’s my guess: It focused too much on accuracy of registering virtual objects to the real world.
The same freaking problem.
Let’s take a simple example. I want to place my can of virtual diet coke on any surface that I see (it could be curved, like on top of my car). So, I take my smartphone, point it at some surface, and click on the “place can” button. And my virtual coke can is placed accurately as I’ve indicated. Should be simple, right? Wrong.
This scenario is quite difficult technically. In fact, I’d say it’s not possible yet in general. For this to happen, your smart phone (or even a super computer) needs to first understand the geometry of the surface. Just from a single video stream, it’s hard to robustly compute this in general. You either need multiple cameras to determine the geometry via stereo (or move around the (static, diffuse, and simple) surface suffciently), or laser scan it, or have some image understanding. None of these are robust enough for our everyday use yet, and potentially decades away (unless it’s very specific use case — you are a Top Gun and looking for enemy airplanes, flying a multi-million dollar hardware with top-of-the-line everything). There are applications out there that use some markers or fiducials to tell you the orientation and scale of a flat surface, but you can’t have these markers everywhere. (This topic really is another blog. I will talk more about this, since it actually is quite interesting to see these apps popping up more and more.)
At this rate, AR technology may never come to the consumer market if accuracy is the gating issue.
So, going back to my original question, I ask again: Are these mobile “AR” applications really AR? I would still say no, with a caveat that they have the right idea.
I think the right idea is not to place too much focus on accuracy but on use cases that influence our daily lives. Don’t worry too much about the brain surgery use case, where 1mm accuracy matters. When the constraints are relaxed, more solutions arise. Using other sensors combined with the visual sensor, such as GSP, compass, accelerators, gyros, RFID, Wifi, markers, etc., I think AR can actually start tackling the consumer markets (i.e. the long tail), and have the potential to come up with a killer app for this tech (which really is sorely lacking).
What if every POI had a unique sensor that broadcasted what and where it was? E.g. every Starbucks had a short/medium/long sensors to tell you where they were. It could even be applicable to dynamic things, such as your car, luggage, pets, and every inventory out there. It doesn’t have to be exact. Just needs to work. (And these are different!)
So, I say to researchers and inventors — SCREW ACCURACY (for now)! Focus on what will make a difference in our lives! And make something that works!
And to Layars and Wikitudes of the world — keep going, and don’t forget to push innovation.