Is That *Really* Augmented Reality?

Augmented Reality?

Augmented Reality?

(This is a series in Augmented Reality and you can find Part 1 here.)

Historically, when someone used the words “Augmented Reality” (AR), they typically involved HUDs or HMDs, and a bunch of very expensive hardware that produced accurate results.  Oh, and everyone owned one of these…  NOT.

Recently, AR has been getting quite a bit of buzz thanks to mobile devices, such as iPhones and Android devices.  “But why,” you might ask?  Many use cases of AR need mobility — from a simple question like, “What is that I’m seeing?” to “Where is X?” these questions in most cases require the user to be on the road.  Smartphones these days meet much of these needs for these use cases in the palm of your hand.  (And getting to my point,) Applications such as Layar and Wikitude are being touted as “augmented reality” browsers.

Layar screenshot

Layar screenshot

But, is that reeeeeeally AR?

Many of the folks who’ve researched and invented AR might say this: “THAT IS NOT AUGMENTED REALITY!!!”

By definition, AR integrates or “blends” the virtual computer graphics objects to the real on your visual device, and it displays them in real time.  But perhaps more importantly,

“Augmented reality does not simply mean the superimposition of a graphic object over a real world scene. This is technically an easy task. One difficulty in augmenting reality, as defined here, is the need to maintain accurate registration of the virtual objects with the real world image.”

(From Jim Vallino’s webpage.)

So, what exactly is the issue with Layar and Wikitude’s apps that does not make them AR?  By using the smartphone’s camera, they can superimpose crude but virtual overlays in real time.  Isn’t that AR?

The problem lies in the accurate registration of the virtual objects with the real-world image.

Without going into too much detail (and being nerdy), neither Layar nor Wikitude uses the visual information (the video stream) to accurately register and and display virtual objects with the real-world image.  In fact, the video streams are not analyzed at all to determine anything.  They don’t track or use image recognition or vision algorithms to tell you what you are seeing.  They are simply using the device’s GPS and compass to determine where you are and which way you’re facing.  Once you know that, they find out what’s near by and display a “dot” on top of the video stream to tell you the general direction, distance, and description of that place.  In fact, if you turn the video stream off and the AR “layer” on, it would essentially still work.  Functionally, there’s not much difference between this and Google Maps on your mobile phone.  (Well, I think Google Maps is much more useful.)

So, when you are a block away from a POI (point of interest), they would generally give you a good “ish” estimation.  But otherwise, they are subject to GPS errors, magnetic interferences on the compasses, and a database that might be wrong or old.

So, I ask you, is this really AR?  I think not — not in the “classical” sense, at least.

Wikitude

Wikitude

Now, let’s flip the coin and see the other side.

AR’s been coined and been around for a couple of decades (and the idea, even longer).  But it really hasn’t made that much progress over the decades, in a sense that it really hasn’t influenced our lives.  Do you experience AR technology in your day to day?  Most of us don’t (unless you are a Top Gun flying multi-million dollar airplanes).

So, where did it go wrong?

Here’s my guess:  It focused too much on accuracy of registering virtual objects to the real world.

The same freaking problem.

Let’s take a simple example.  I want to place my can of virtual diet coke on any surface that I see (it could be curved, like on top of my car).  So, I take my smartphone, point it at some surface, and click on the “place can” button.  And my virtual coke can is placed accurately as I’ve indicated.  Should be simple, right?  Wrong.

This scenario is quite difficult technically.  In fact, I’d say it’s not possible yet in general.  For this to happen, your smart phone (or even a super computer) needs to first understand the geometry of the surface.  Just from a single video stream, it’s hard to robustly compute this in general.  You either need multiple cameras to determine the geometry via stereo (or  move around the (static, diffuse, and simple) surface suffciently), or laser scan it, or have some image understanding.  None of these are robust enough for our everyday use yet, and potentially decades away (unless it’s very specific use case — you are a Top Gun and looking for enemy airplanes, flying a multi-million dollar hardware with top-of-the-line everything).  There are applications out there that use some markers or fiducials to  tell you the orientation and scale of a flat surface, but you can’t have these markers everywhere.  (This topic really is another blog.  I will talk more about this, since it actually is quite interesting to see these apps popping up more and more.)

At this rate, AR technology may never come to the consumer market if accuracy is the gating issue.

So, going back to my original question, I ask again:  Are these mobile “AR” applications really AR?  I would still say no, with a caveat that they have the right idea.

I think the right idea is not to place too much focus on accuracy but on use cases that influence our daily lives. Don’t worry too much about the brain surgery use case, where 1mm accuracy matters. When the constraints are relaxed, more solutions arise.  Using other sensors combined with the visual sensor, such as GSP, compass, accelerators, gyros, RFID, Wifi, markers, etc., I think AR can actually start tackling the consumer markets (i.e. the long tail), and have the potential to come up with a killer app for this tech (which really is sorely lacking).

What if every POI had a unique sensor that broadcasted what and where it was?  E.g. every Starbucks had a short/medium/long sensors to tell you where they were.  It could even be applicable to dynamic things, such as your car, luggage, pets, and every inventory out there.  It doesn’t have to be exact.  Just needs to work.  (And these are different!)

So, I say to researchers and inventors — SCREW ACCURACY (for now)!  Focus on what will make a difference in our lives!  And make something that works!

And to Layars and Wikitudes of the world — keep going, and don’t forget to push innovation.

About these ads

About Mok Oh

Chief Scientist, PayPal View all posts by Mok Oh

25 responses to “Is That *Really* Augmented Reality?

  • DJames Nickles

    -What if every POI had a unique sensor that broadcasted what and where it was? –
    Mesh Network.

  • Tim Walters

    Hi Mok – An interesting post. Like you (I think), I’m suspicious of refusals to stretch the nomenclature to apply to innovations on the grounds that they don’t meet the *established and traditional* definition of the space. And I agree with you we ought to stop worrying about whether mobile data overlays really are or are not AR. (This could turn into a playground argument: “AR too!” “AR not!”) The question, as you say, is are they useful? Do they enhance the user experience? Do they add value?

    There’s a parallel here to the machine translation industry. For decades, MT disappointed its champions because it didn’t measure up to human translation. (According to criteria such as Fully Automated High Quality Translation (FAHQT) and the Bleu metric.) It still doesn’t. But in commercial exchanges, and especially in context like a self-help web site, the appropriate question is, “Does this answer your question?” Tested for FAUT — Fully Automated Useful Translation — a well-trained MT system can now get quite close to human translation. (I’ve written about MT for Forrester and will host an upcoming teleconference on the topic. There is podcast available at no cost at http://www.forrester.com/podcasts/ikm.)

    Still, the critics are absolutely right that the mobile examples do not incorporate accurate registration of the virtual objects with the real-world image. And I can think of a way to illustrate the impact of this flaw that doesn’t involve a virtual diet coke can. Say I’m standing at the point in Cambridge, MA shown in your first screenshot. However, due to some faulty knowledge management process over at the city planning bureau, Harvard University has recently been bulldozed to make room for an extension of the Big Dig. I hold up my iPhone camera, which accurate depicts the remaining carnage — but quite inaccurately thinks I’m still looking at quaint brick buildings in the vicinity of a Hudson News outlet.

    BTW, if a diet coke has no calories, does a virtual diet coke have negative calories?

    • Mok Oh

      LOL! AR too!

      Many thanks, Tim, for your wonderful insight! I like your Harvard bulldozing example much better. (Cigar and scotch, my usual blogging accoutrement, tend to stifle cogent thought..). To make things really realtime, image analysis is required with your example.

      Now, if someone can invent negative calories, that person would be rich.

      Hope to meet in person sometime, Tim.

  • Matthias Wagner

    I wouldn’t fault the simplistic GPS coordinates + compass reading version as being “fake AR.” It’s a good start, and it will be augmented (sorry) by imaging-based location/orientation recognition over time. I do worry that the hype surrounding it, and the redundancy/triviality of most of the applications (AR for AR’s sake) will cause an early backlash.

    I would also not narrow “augmented reality” to applications that involve a screen with markers superimposed on real-time video. There are other ways to augment experience based on the environment/location.

    Working on some concepts in this area — non-redundant, revenue-producing ones, of course! ;-) Would love to get together sometime over a coffee in Cambridge to brainstorm!

  • Disney

    I wouldn't fault the simplistic GPS coordinates + compass reading version as being "fake AR." It's a good start, and it will be augmented (sorry) by imaging-based location/orientation recognition over time. I do worry that the hype surrounding it, and the redundancy/triviality of most of the applications (AR for AR's sake) will cause an early backlash.

    I would also not narrow "augmented reality" to applications that involve a screen with markers superimposed on real-time video. There are other ways to augment experience based on the environment/location.

    Working on some concepts in this area — non-redundant, revenue-producing ones, of course! ;-) Would love to get together sometime over a coffee in Cambridge to brainstorm!;. All the best!!

  • Rob

    Of course they’re augmented reality.

    These applications are augmenting reality with additional, location specific information. The fact that they’re using a compass and GPS instead of analysing the video stream is irrelevent.

    The average person, whose life will one day soon be enriched by augmented reality, will not care about that distinction. These are the people who will determine the meaning of new words and phrases as they enter the common vernacular by being used on a daily basis. And they will determine that based on the impact it has on their lives; not on the technology used to achieve it.

  • WeddingRunners

    Congratulations, what words … brilliant idea

  • Matthew Papakipos

    Nice post Mok.
    Accurate registration would be really cool, but I agree that computer vision isn’t yet up to the task for most real-world cases.
    We need to adjust our expectations to what is possible.
    That said, I think we could make some very cool things today.
    -matt

  • You Know What Really Grinds My Gears? Augmented Reality! « All Things Visual

    [...] last post was more about being rational, but this post is more about being raw and emotional.  I wanna make [...]

  • Ron Dawes

    I would agree with Rob and also point out that most current HUD technology in fighters does not meet the strict AR definition stated above. HUDs superimpose information about the aircraft’s attitude and performance on top of the visual stream seen by the pilot, i.e altitude, attitude, airspeed, velocity vector, weapons status, etc. Recent innovations into synthetic vision add info from radar and IR sensors to enhance the visual stream in poor visual conditions. So I’d say that what is/is not AR is in the eye of the beholder. If it augments the reality that you are concerned about then to you it IS AR.

  • EddieC

    What happened to radar? (Build a sonic pinger into the handset?) Then you can rely on sight and sound, which is what humans do to a large extent.

  • Ben Hoyle

    Hi Mok,

    I think you are right to focus on workable rather than entirely accurate examples, especially those that combine many different data sources to “augment” reality in the form of a video/still image.

    One major problem with combining disparate data sources is of course interoperability. In order to really get off the ground we would need clear interfaces that enabled cross-platform communication, for example, a Starbucks Wifi signal emitted by a Belkin router would need to be combined with an iPhone compass and a large map-based database (e.g. GoogleMaps). We also need intelligent systems to make sense of the mess of data. I think currently these intelligent systems are not up to scratch.

    I also think Tim’s analogy with Machine Translation is apt: I can read a piece of good-enough Google Machine Translation and understand what is going on. Exact accuracy is not required; if it was required I would employ a professional (human) translator.

  • Jorge

    Hi Mok, nice post
    I am student of the CICESE research center in B.C., Mexico,
    I agree with your comments.

    I wanna mix tagging technologies, ubiquitous tecnologies, etc to create a AR platform

    I like to have contact with you to share ideas

    Thanks

  • tech: Resetting Expectations: Some Augmented Reality Links | tech3bite

    [...] scientist and Everyscape CTO/founder, @mok_oh, who’s also been blogging about AR. In the first of two posts, he points out that accurate object and image recognition remain formidable† technical [...]

  • high equity

    Considerably, the article is in reality the top on that precious topic. I harmonise with your conclusions and also can eagerly look forward to your next updates. Simply saying thanks will certainly not simply just be enough, for the phenomenal lucidity in your writing. I will directly grab your rss feed to stay informed of any updates. De lightful work and much success in your business dealings!

  • Shiloh Villela

    Awesome post, thanks. I’ve been attempting to lose weight with the Cambridge diet these past couple of months and it’s going well. Will try and update soon to let you guys know how it’s going, so far I’ve lost 20 pounds in 4 months!

  • LookTorie

    Hello all! I like this forum, i found tons gripping people on this forum.!!!

    Great Community, consideration all!

  • Halley

    Hey I know this is off topic but I was wondering if you knew of any
    widgets I could add to my blog that automatically tweet my newest twitter updates.
    I’ve been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something
    like this. Please let me know if you run into anything.
    I truly enjoy reading your blog and I look forward
    to your new updates.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: