Category Archives: mirror worlds

The Indoor Mall Local Search Problem

I’m sitting in a pretty sizable mall (Natick Mall in MA near Boston) and thinking about how a mirror-world local search would work. In other words, think about how a use can browse to the Natick Mall, walk around this space immersively, walk into stores, and shop like in real life?  How would EveryScape implement this?  How would Bing or Google do this?

Natick Mall Partial Panorama

I snapped a few photos and created a partial panorama above within a few minutes using my iPhone 4 and AutoStitch.  You can see more than a dozen shops from a simple image like this, and it can give you a pretty good insight into what the space is like.

How can I get the “crowd” to do this for me?  Will Foursquare/Gowalla-like approach work?  How well is the photo-crowd sourcing working?  How do we solve the GPS problem for indoors?


Panoramas vs. 3D (Part 1): Introduction

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY, at a "bird's eye" view.

Wade Roush made a great comment in Panoramas vs. Photosynth Part 4, not to forget that there are forces in 3D happening as well.  This blog is inspired by Wade.  Thanks, Wade!

Now, imagine an online web representation of our world in 3D…  You type in maps.bing.com to get to a place you are interested in.  You find a map, and you zoom down close to see the street.  Then you click on the “3D” viewing mode.  You zoom in some more.  Then you land on the street…  a photorealistic, real-time 3D street.  You can smoothly walk around, not just in the streets but on the side walk as well.  Hell, you can fly if you wanted to.  You find the store you like, and you navigate inside.  When you do, you are greeted by the store owner avatar saying hello.  And it’s actually a person behind the avatar.  You ask your questions about the store, if they have what you are looking for.  You walk around to browse what they have.  Then you walk out and zoom to your favorite restaurant to see how crowded the place really is currently…  And cut. (Or wake up.)

I think this scenario, or something similar, has a real possibility in the future.  Assuming this can happen, then the question is, how far off are we?   It feels to me at least a decade away.  Most likely more.

Now let’s get back to reality and talk about what we have today.  We have Google Earth and Microsoft Bing that have a 3D representation of our world.  There are wonderful other technologies, like C3 Technologies, but they have not yet proven to be scalable (by “proven,” I mean published and has significant coverage).  It really is amazing to see cities like New York in full 3D glory, BUT from a “bird’s eye” view.  At a thousand feet above ground, these cities look amazingly real.

At a ground level, not so much.  (See below for a comparison.)

St. Patricks Cathedral, New York, NY

Photograph of St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

So, another relevant question is: What is it going to take to create a believable 3D on the web at the ground level?  I think quite a bit.  (To be discussed in the following blogs.)

At a bird’s eye point of view, the level of detail required to make a person believe what she’s seeing is “real” is much less so than at a ground level.  Up there, the buildings are more or less boxes with photo textures.  (Don’t get me wrong — the feat accomplished by Google Earth and Microsoft Bing are incredible.)  But at a ground level, I would argue that there is an order of magnitude more 3D shit you gotta model to convince people that you are actually at 43rd and 5th in New York City, or Time Square or Champs Elysees.  You have all the street level things that require to be represented realistically — people, cars, trees, news stands, lamp posts, signs, etc.  All these have to be there just like they are in real life to convince folks that they’re really there.

So, for the foreseeable future, we have panoramas and UGC photographs to represent our real world, aka mirror world, a la Google Street View and EveryScape.  These are what I call 2.5D representations, since they are not quite 3D, but more than 2D.

This series of my blog sounds like fun to write.  Want more?  Please let me know.


Tribe Sourcing, Crowd Sourcing, and Automation

Floating Drones!

Floating Drones!

UPDATE: A friend of mine told me about an article on the New York Times about a floating drone the army’s working on!  Automatic photo acquisition technology is closer than we think.

Cab Sourcing

Cab Sourcing the Panoramic Photography Data Collection

When you are mass producing something repeatedly, you absolutely want automation as much as possible. Machines typically produce less error and are more consistent than humans for very specific tasks. But when you can’t, for one reason or another, going for either crowd sourcing or tribe sourcing makes a heap of sense. This doesn’t mean that automation has no place — it means automation takes on a different role.

Let’s be a bit more specific.

In my context, I’m talking about taking ba-zillitons of photographs from a human perspective (as opposed to a satellite perspective) and making sense of it for users. There are companies like Google, EveryScape, Microsoft, Tele Atlas, and NavTeq, that go around photographing the world for online use. Let’s focus on photographic data collection as a “case study” for this blog.

So the right set of questions in this context may be:  Can we automate the picture-taking process? If not, can we tap into the crowd?  How about creating a tribe?

Star Wars Darth Maul Droids

Star Wars Darth Maul Droids

Well, automating the photography of the entire world would be tough. One future solution could be to create lots of robots that walk, drive, or fly around acquiring and geotagging pictures.  NERD ALERT!  Remember Star Wars Episode 1, where Darth Maul sent out floating droids in Tatooween to find Princess Amidala? Something like that.  Unfortunately, we don’t have these droids yet.  (Can someone get on that???)

So, when automation isn’t possible, the next question is: Can we crowd source?  I’m not certain if we can crowd source this yet either, since car-mounted camera systems aren’t something we can buy at Best Buy. I don’t think it’s that far off either.  We may see panoramic cameras mounted on taxis — what I call cab sourcing — for instance. There are a few logistical, business-related, and technical issues that needs to be solved before this can happen, but why not?

Microsoft’s Photosynth harnesses the power of the crowd to make sense of a real place, but it’s yet to be seen that this technology can conquer the world.  (In fact, I’m looking forward to a new research publication coming out this September).

Google Street View Car

Google Street View Car

Tribe sourcing is the viable solution for now — find leadership, enable and incentivize the tribe to go out and photograph the world according to some plan. Google has the cash to create (quite wonderfully equipped) cars and have folks drive around. EveryScape’s found a cost-effective solution for this to tribe source (a.k.a. The Ambassador Program).

Until automation can happen, photographic data collection happens with tribes (or crowds).  Automation plays a role in that pictures are taken, geotagged, oriented, stitched, and processed automatically.


Panoramas vs. Photosynth (Part 3): Technical Characteristics

Photographically capturing the WORLD!

Photographically capturing the WORLD!

This is Part 3 of this series.  (At least I didn’t pull a Lucas and start with part 4.)

Let’s compare technical characteristics/requirements as I’ve mentioned in Part 1 (and pls read Part 2 as well).

  • scalable
  • distributable
  • maintainable
  • extensible

Again, there may be more and these are not orthogonal or exclusive of each other.

Scalability

Now, remember that the context in which I’m comparing these two “methods” are in trying to photographicaly capture the entire world at a human-level POV!  So imagine an online experience where you can go to a website (e.g. EveryScape and Google) and be able to walk around just like you were there.  Yep. BIG idea.

So, being able to scalably capture, store, distribute, share, etc. the whole world is tantamount.  If you can’t do this, then game over man.

Companies like EveryScape, Google, Earthmine, Mapjack, Immersive Media and bunch others found a way to (cost) effectively drive around cities with car-mounted cameras.  Especially EveryScape and Google have done this scalably in multiple cities all round the world with thousands of miles of coverage.  (I’m sure there are others but I haven’t seen this much quantity of their content published yet.)  I think this is proof enough for me to say that panoramic images can scalably cover the world.

Photosynth has not quite done this yet.  I’ve seen pretty extensive number of photographs used to represent a landmark or an area, but I have not yet seen an entire city done this way yet.  There are lots of brilliant minds at this, I’m sure, and it does feel feasbile.  But if content publication is the standard…

Panorama 1, Photosynth 0.

Distributability

By this, I mean folks online can easily view and experience the content.  Again, going to everyscape.com or maps.google.com is proof enough.  Using Flash (and Flash did “change the world” in this sense) or Silverlight, users can experience the content, and the backend seems to have been implemented well.

Oh BTW, SeaDragon‘s f’in brilliant!

Panorama 2, Photosynth 1.

Maintainability

We live in a dynamic world.  Things change all around us.  Tomorrow, a Starbucks could turn into a Dunkin Donuts (yes, I’m from the east coast).  By maintainability, I mean that these changes in the real world could easily be reflected in the mirror world online.

In any type of changes in the real world, we (EveryScape) have a “self healing” backend, so only real work is photo acquisition.  Assuming all other car-mounted systems are similar, this is technically solved.

For Photosynth, it seems like a similar approach will work.  Although there may be some ownership issues with Photosynth (if crowd sourced), it feels quite easy to make this assumption of maintainability.

Panorama 3, Photosynth 2.

Extensibility

Panorama 4, Photosynth 3.

Overview of Technical Characteristics

It seems like the main tech difference between Panoramas and Photosynth is the scalability. One main issue with Photosynth is the image registration / pose estimation problem and how scalable this can be.  Basically, for each image added to the synth, features are detected, then corresponded to the rest of the point cloud, then a relative camera extrinsics are computed.  (Apologies for the tech lingo.)  I’m not fully convinced that this is the way to go when scaling up to what I want (da world!).  Perhaps supplementing the image with GPS and other sensors is a good way to solve this.  BUT, if the philosophy for Photosynth is still automation, consumer cameras, and crowd sourcing, I’m not sure I quite believe in scalability (yet).

Is scalability issue overcome-able for Photosynth?  I think yes.  Just need to see it to believe it.


Follow

Get every new post delivered to your Inbox.