jump to navigation

The Indoor Mall Local Search Problem July 5, 2010

Posted by Mok Oh in Local Search, Photography, maps, mirror worlds, panorama.
Tags:
add a comment

I’m sitting in a pretty sizable mall (Natick Mall in MA near Boston) and thinking about how a mirror-world local search would work. In other words, think about how a use can browse to the Natick Mall, walk around this space immersively, walk into stores, and shop like in real life?  How would EveryScape implement this?  How would Bing or Google do this?

Natick Mall Partial Panorama

I snapped a few photos and created a partial panorama above within a few minutes using my iPhone 4 and AutoStitch.  You can see more than a dozen shops from a simple image like this, and it can give you a pretty good insight into what the space is like.

How can I get the “crowd” to do this for me?  Will Foursquare/Gowalla-like approach work?  How well is the photo-crowd sourcing working?  How do we solve the GPS problem for indoors?

Panoramas vs. 3D (Part 1): Introduction August 16, 2009

Posted by Mok Oh in 2.5D, 3D, Photography, mirror worlds, panorama.
Tags: , , , , ,
2 comments
Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY, at a "bird's eye" view.

Wade Roush made a great comment in Panoramas vs. Photosynth Part 4, not to forget that there are forces in 3D happening as well.  This blog is inspired by Wade.  Thanks, Wade!

Now, imagine an online web representation of our world in 3D…  You type in maps.bing.com to get to a place you are interested in.  You find a map, and you zoom down close to see the street.  Then you click on the “3D” viewing mode.  You zoom in some more.  Then you land on the street…  a photorealistic, real-time 3D street.  You can smoothly walk around, not just in the streets but on the side walk as well.  Hell, you can fly if you wanted to.  You find the store you like, and you navigate inside.  When you do, you are greeted by the store owner avatar saying hello.  And it’s actually a person behind the avatar.  You ask your questions about the store, if they have what you are looking for.  You walk around to browse what they have.  Then you walk out and zoom to your favorite restaurant to see how crowded the place really is currently…  And cut. (Or wake up.)

I think this scenario, or something similar, has a real possibility in the future.  Assuming this can happen, then the question is, how far off are we?   It feels to me at least a decade away.  Most likely more.

Now let’s get back to reality and talk about what we have today.  We have Google Earth and Microsoft Bing that have a 3D representation of our world.  There are wonderful other technologies, like C3 Technologies, but they have not yet proven to be scalable (by “proven,” I mean published and has significant coverage).  It really is amazing to see cities like New York in full 3D glory, BUT from a “bird’s eye” view.  At a thousand feet above ground, these cities look amazingly real.

At a ground level, not so much.  (See below for a comparison.)

St. Patricks Cathedral, New York, NY

Photograph of St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

So, another relevant question is: What is it going to take to create a believable 3D on the web at the ground level?  I think quite a bit.  (To be discussed in the following blogs.)

At a bird’s eye point of view, the level of detail required to make a person believe what she’s seeing is “real” is much less so than at a ground level.  Up there, the buildings are more or less boxes with photo textures.  (Don’t get me wrong — the feat accomplished by Google Earth and Microsoft Bing are incredible.)  But at a ground level, I would argue that there is an order of magnitude more 3D shit you gotta model to convince people that you are actually at 43rd and 5th in New York City, or Time Square or Champs Elysees.  You have all the street level things that require to be represented realistically — people, cars, trees, news stands, lamp posts, signs, etc.  All these have to be there just like they are in real life to convince folks that they’re really there.

So, for the foreseeable future, we have panoramas and UGC photographs to represent our real world, aka mirror world, a la Google Street View and EveryScape.  These are what I call 2.5D representations, since they are not quite 3D, but more than 2D.

This series of my blog sounds like fun to write.  Want more?  Please let me know.

Panoramas vs. Photosynth: Qualitative Comparison August 13, 2009

Posted by Mok Oh in 2.5D, panorama.
Tags: , , , , ,
8 comments
On Top of the Leaning Tower of Pisa

On Top of the Leaning Tower of Pisa

This is Part 4 of the Panorama vs. Photosynth comparison.  You can find the other parts here:

In this section, we discuss some of the qualitative characteristic comparison between panoramas and photosynths.

Before I move on, though, many have asked, “How about combining both?”  And the answer is an enthusiastic Abso-f’in-lutely!  I think that is the ultimate goal for many.  I really do think there is a “best of both worlds” solution.  In fact, I would argue that it is imperative to combine both to capture the world.  With that said, let’s move on.

I think it makes sense to think about the qualitative aspects of this comparison in these steps: Acquisition, processing, viewing, user experience (UX) and sharing.

Acquisition

Man, oh man.. Photosynth soooo has the right idea here.  Let users take photos with their whatever cameras, put it into a “bucket” (a.k.a. the computer), and let it sort it out for me.  Only little know-how of overlapping the pictures is necessary to get your synths going.  HOLY SHIT this is powerful.  I saw Noah Snavely give this talk at SIGGRAPH 2006 and was freakin’ blown away.  The idea of using UGC photos to make sense this was mind blowing for me.

On the other hand, panoramas are not as easy to acquire.  I’ve been taking panoramas for about a decade now, and it still kinda sucks how much work it involves the user.

Panorama 0, Photosynth 1

Processing

Processing for photosynth is quite easy.  You go to Photosynth.net, upload your photos, and wait a few minutes.  It’s all web based so no other software bit is necessary.

For panoramas, you need to have a stitching software, e.g. PTGui, PanoTools, EasyPano.  Some are free, and most are cheap.  These software have become quite easy to use over the years — you just have to load the photos, press a button to stitch, and wait a few minutes.

Here’s the dicey part.  In photosynth, there’s no easy way to control the processing.  If you took some pictures and there’s not a sufficient overlap or the computer can’t find common-enough similarities between the photos, then you’re kinda screwed.  You can’t explicitly tell the computer, “Hey stupid, this picture belongs here, and that belongs there.”  Often the computer algorithm gets it wrong, and only thing you can do is to live with it or remove the photo (which can subsequently mess other things up).

In panoramas, the user can have finer controls by explicitly giving it guidance.  But this notion of what we call “correspondence” is not always an intuitive thing for some.

User control vs. simplicity..  Hm.  Which one wins?  Call it a draw.

Panorama 0, Photosynth 1 (same score as before).

Viewing

Viewing wise, I would give panoramas the advantage, just because the “standard” viewer now uses flash, which I think is a huge f’in deal!  Photosynth uses Silverlight.  Flash has about 97% penetration, Silverlight has about 33%.  This means that Photosynth works in only about third of the computers.  That sucks.  Even when you have the Microsoft brand, very few people wanna download any plugins.  Users expect things to just work .

Panorama 1, Photosynth 1.

UX

UX wise, this is definitely subjective.  I have my biases with panoramas.  And with Google really spreading the panoramic viewing experience with their Street View, panorama viewers and user interface and experiences are becoming more common.  Besides, panoramas feel a lot more natural when looking around.

Photosynth has a weird user experience for me.  I cannot move around a space the way I want to — you are limited to, in many ways, where the photos are.  It’s somewhat frustrating, since it really feels like a 3D environment (it actually is), but the limitations of movement (and that annoying auto-snapping to pics) seem to misset my expectations.

Due to my bias, I’m not scoring this part.

Panorama 1, Photosynth 1.

Sharing

Sharing is pretty easy for both (albeit somewhat limited).

Panorama 2, Photosynth 2.

Overall

It looks like a tie using this simplistic scoring system.  What do you think?

Tribe Sourcing, Crowd Sourcing, and Automation August 10, 2009

Posted by Mok Oh in 2.5D, maps, mirror worlds, panorama.
Tags: , , , , , , , , ,
add a comment
Floating Drones!

Floating Drones!

UPDATE: A friend of mine told me about an article on the New York Times about a floating drone the army’s working on!  Automatic photo acquisition technology is closer than we think.

Cab Sourcing

Cab Sourcing the Panoramic Photography Data Collection

When you are mass producing something repeatedly, you absolutely want automation as much as possible. Machines typically produce less error and are more consistent than humans for very specific tasks. But when you can’t, for one reason or another, going for either crowd sourcing or tribe sourcing makes a heap of sense. This doesn’t mean that automation has no place — it means automation takes on a different role.

Let’s be a bit more specific.

In my context, I’m talking about taking ba-zillitons of photographs from a human perspective (as opposed to a satellite perspective) and making sense of it for users. There are companies like Google, EveryScape, Microsoft, Tele Atlas, and NavTeq, that go around photographing the world for online use. Let’s focus on photographic data collection as a “case study” for this blog.

So the right set of questions in this context may be:  Can we automate the picture-taking process? If not, can we tap into the crowd?  How about creating a tribe?

Star Wars Darth Maul Droids

Star Wars Darth Maul Droids

Well, automating the photography of the entire world would be tough. One future solution could be to create lots of robots that walk, drive, or fly around acquiring and geotagging pictures.  NERD ALERT!  Remember Star Wars Episode 1, where Darth Maul sent out floating droids in Tatooween to find Princess Amidala? Something like that.  Unfortunately, we don’t have these droids yet.  (Can someone get on that???)

So, when automation isn’t possible, the next question is: Can we crowd source?  I’m not certain if we can crowd source this yet either, since car-mounted camera systems aren’t something we can buy at Best Buy. I don’t think it’s that far off either.  We may see panoramic cameras mounted on taxis — what I call cab sourcing — for instance. There are a few logistical, business-related, and technical issues that needs to be solved before this can happen, but why not?

Microsoft’s Photosynth harnesses the power of the crowd to make sense of a real place, but it’s yet to be seen that this technology can conquer the world.  (In fact, I’m looking forward to a new research publication coming out this September).

Google Street View Car

Google Street View Car

Tribe sourcing is the viable solution for now — find leadership, enable and incentivize the tribe to go out and photograph the world according to some plan. Google has the cash to create (quite wonderfully equipped) cars and have folks drive around. EveryScape’s found a cost-effective solution for this to tribe source (a.k.a. The Ambassador Program).

Until automation can happen, photographic data collection happens with tribes (or crowds).  Automation plays a role in that pictures are taken, geotagged, oriented, stitched, and processed automatically.

Why Peleng Lens Sucks for Panoramas August 6, 2009

Posted by Mok Oh in Cameras, panorama.
Tags: , , , , ,
1 comment so far

When taking images for panoramas, a lot of folks use circular fisheye lenses — the wide field of view means less pictures to cover the full 360 x 180 degrees, so you can acquire less and stitch faster.

I played around with various lenses out in the market for panoramic photography trying to balance price vs. quality.   One of my favorite is Sigma 8mm, but unfortunately, Sigma is really hiking up their prices — what used to cost around ~$500, is now up to ~$900 (MSRP $1230!).  Can you say “exploitation?” Something’s going on.  (If anyone knows what’s up here, pls comment!)

Naturally, I looked for an alternative and found Peleng 8mm at less than half the price!  But in testing it out, it ended up being not acceptable for professional panoramic photography.

Peleng 8mm Fisheye Lens

Peleng 8mm Fisheye Lens

Here is my main reason for saying this is unacceptable — two words: lens flare.

Peleng 8mm Lens Test

Peleng 8mm Lens Test

Notice the flares on the bottom right (from light source on upper left) and bottom left (from light source on upper right) of this image.  The resulting panorama is below:

Peleng 8mm Lens Test: Stitched Pano

Peleng 8mm Lens Test: Stitched Pano

Now notice the flares creeping into the stitched panorama above on the bottom left.  (Pls click on the image to see a higher resolution panorama.)  Although this issue can be “solved” by cropping tighter or Photoshopping, this then messes up enough things to a point I would rather use a better (but more expensive) lens.

Other things like the lens not fitting in snugly to the camera, having a hard time taking the lens off, manual controls, etc. make me think that it’s not worth the savings for the risk of taking bad pictures for our customers.

Peleng, if you are reading, please let us know when you’ve improved and resolved some problems.  Until then, we are sticking to Sigmas.

Anyone out there have a similar experience?

Panoramas vs. Photosynth (Part 3): Technical Characteristics August 4, 2009

Posted by Mok Oh in 2.5D, Photography, mirror worlds, panorama, post processing.
Tags: , , , , , , ,
6 comments

Photographically capturing the WORLD!

Photographically capturing the WORLD!

This is Part 3 of this series.  (At least I didn’t pull a Lucas and start with part 4.)

Let’s compare technical characteristics/requirements as I’ve mentioned in Part 1 (and pls read Part 2 as well).

  • scalable
  • distributable
  • maintainable
  • extensible

Again, there may be more and these are not orthogonal or exclusive of each other.

Scalability

Now, remember that the context in which I’m comparing these two “methods” are in trying to photographicaly capture the entire world at a human-level POV!  So imagine an online experience where you can go to a website (e.g. EveryScape and Google) and be able to walk around just like you were there.  Yep. BIG idea.

So, being able to scalably capture, store, distribute, share, etc. the whole world is tantamount.  If you can’t do this, then game over man.

Companies like EveryScape, Google, Earthmine, Mapjack, Immersive Media and bunch others found a way to (cost) effectively drive around cities with car-mounted cameras.  Especially EveryScape and Google have done this scalably in multiple cities all round the world with thousands of miles of coverage.  (I’m sure there are others but I haven’t seen this much quantity of their content published yet.)  I think this is proof enough for me to say that panoramic images can scalably cover the world.

Photosynth has not quite done this yet.  I’ve seen pretty extensive number of photographs used to represent a landmark or an area, but I have not yet seen an entire city done this way yet.  There are lots of brilliant minds at this, I’m sure, and it does feel feasbile.  But if content publication is the standard…

Panorama 1, Photosynth 0.

Distributability

By this, I mean folks online can easily view and experience the content.  Again, going to everyscape.com or maps.google.com is proof enough.  Using Flash (and Flash did “change the world” in this sense) or Silverlight, users can experience the content, and the backend seems to have been implemented well.

Oh BTW, SeaDragon‘s f’in brilliant!

Panorama 2, Photosynth 1.

Maintainability

We live in a dynamic world.  Things change all around us.  Tomorrow, a Starbucks could turn into a Dunkin Donuts (yes, I’m from the east coast).  By maintainability, I mean that these changes in the real world could easily be reflected in the mirror world online.

In any type of changes in the real world, we (EveryScape) have a “self healing” backend, so only real work is photo acquisition.  Assuming all other car-mounted systems are similar, this is technically solved.

For Photosynth, it seems like a similar approach will work.  Although there may be some ownership issues with Photosynth (if crowd sourced), it feels quite easy to make this assumption of maintainability.

Panorama 3, Photosynth 2.

Extensibility

Panorama 4, Photosynth 3.

Overview of Technical Characteristics

It seems like the main tech difference between Panoramas and Photosynth is the scalability. One main issue with Photosynth is the image registration / pose estimation problem and how scalable this can be.  Basically, for each image added to the synth, features are detected, then corresponded to the rest of the point cloud, then a relative camera extrinsics are computed.  (Apologies for the tech lingo.)  I’m not fully convinced that this is the way to go when scaling up to what I want (da world!).  Perhaps supplementing the image with GPS and other sensors is a good way to solve this.  BUT, if the philosophy for Photosynth is still automation, consumer cameras, and crowd sourcing, I’m not sure I quite believe in scalability (yet).

Is scalability issue overcome-able for Photosynth?  I think yes.  Just need to see it to believe it.

Panoramas vs. Photosynth (Part 2): What Are They? August 3, 2009

Posted by Mok Oh in 2.5D, Photography, panorama.
Tags: , , , ,
3 comments

This is a continuation from Panoramas vs. Photosynth Part 1.  In Part 1, we discussed some characteristics that we may use to determine which one is better.  But before we delve into that, let me describe a bit more about these methods of capturing our world.

Panoramas

Panoramas assume that pictures are taken from a common point in space (i.e. pivoting around the entrance pupil).  That’s why a panoramic heads (shown below) are important in minimizing errors caused by parallax, so we can stitch a nice panorama without much visual artifacts.  Also, due to their immersive nature, many images are necessary to take a full 360 x 180 degree panorama.  Many people resort to using wide-angle or fisheye lens to lessen the quantity of images necessary to cover all view directions.

Panoramic tripod head

Panoramic tripod head

Panoramas are amazing in their immersive nature — you really do feel like you are there.  And the continuity of the experience is just fantastic in describing the scene captured.  But the constraints on the acquisition process make it more difficult to capture than just taking regular pictures.

Another limitation is that the users cannot move around in space.  But of course, I will argue that EveryScape has solved that.

Photosynth

Photosynth, on the other hand, has no such constraints when acquiring images.  You don’t have to be anal about the “entrance pupil,” or the “no-parallax point,” or the “pivot point” or all that BS.  Although technically rational, these constraints really do suck for the picture takers.  (That’s why there are gazillions more regular photographers than panoramic photographers.)  In taking pictures for Photosynth, there just needs to be sufficient visual overlap between them, such that the computer algorithm can automatically try to determine where the pictures were taken from.  This also means that you can move around in space!

But often times, using just pictures do suck in a sense that the users don’t quite feel immersive.  Immersivity is one of the qualities I mentioned in Part 1.  I was talking to a friend of mine a while back (@billwarner), and he said to me, “Don’t break reality.”  In many ways, if you “break reality,” then you don’t gain as much confidence from your users about the space you are describing.

So, What Next?

There are pros and cons for both these methods and in the next part, we will start to compare them and grade them if we can.

Panoramas vs. Photosynth (Part 1) August 2, 2009

Posted by Mok Oh in 2.5D, Platforms, panorama.
Tags: , , , , , ,
4 comments
Photosynth example

Photosynth example

In this series, I’ll compare panoramas vs. Photosynth — methods that could be used to create a photographic platform of our world.  Well, actually more like initiate a conversation in comparing these two methods in representing our world (aka Mirror Worlds and the Metaverse Roadmap).  I’ll assume people have a decent understanding of what these are.  If not, please let me know and I can certainly explain.

So, how can we tell which one’s better?  Only time will tell.  Photosynth has a pretty huge backing and push from Microsoft, and personally I’m a huge fan.  On the other hand, panoramas have a big push as well from companies like EveryScape and Google.

One way to at least start a conversation about which one’s better is to list a set of requirements (i.e. “things you want”), weigh which one is more important, and rate how well either one does as a weighted sum (yes, I’m a f!@# nerd).

So what are these requirements?  Here’s what I think for starters:

Technical:

  • scalable
  • distributable
  • maintainable
  • extensible

Qualitative:

  • believable
  • photorealisitic
  • interactive
  • immersive

Web 2.0:

  • annotatable
  • searchable
  • sharable
  • personalization-able

Operational:

  • cost
  • time
  • resources needed

Business:

  • how do you make $ from this?
  • how do you get more users?
  • what are some applications?

UI/UX:

  • how intuitive and simple to use is it?
  • can my parents use it?
  • how helpful is it?

I’ve mentioned some of them in my talk at ETech 2009 and Where 2.0 2008.  The list may vary depending on specific applications, but for general platforms it’s a decent list, IMO.

Did I miss anything?  Please let me know, and stay tuned for the next set of blogs in this series.

Panoramic Equipment July 28, 2009

Posted by Mok Oh in Cameras, Photography, panorama.
Tags: , , , , , , , ,
7 comments

I’ve been taking panoramic images  for over 10 years, and I’ve been using various gears for taking them — cameras, lenses, rotating heads, tripods, and GPS.  Curious about what I use now?  Here’s my list.

Lens

My lens of choice is Sigma 8mm fisheye lens.  In general, I prefer the fisheye lens since the field of view is very wide, i.e. need to take less amount of pictures to cover the full 360 x 180 degrees; i.e. faster.

The optics is quite good, and we’ve had very few of them fail.  There’s some chromatic aberration, but typically stitching software takes care of that.

Sigma 8mm Fish Eye Lens

Sigma 8mm Fisheye Lens

Camera

My camera brand of choice is Canon.  We’ve tried Nikons but they failed a lot more for us under extreme conditions (ask me if you’re curious).  I currently use Canon T1i, which has a 1080p video recording capability. Awesome camera.

Canon T1i

Canon T1i

Because the T1i is not a full frame digital SLR, when used with the Sigma 8mm, the circular fisheye image is cropped.  But I actually prefer the crop for better “resolution” of the scene.

Panoramic Tripod Head

My choice for panoramic tripod head is Nodal Ninja R1.  It’s light, compact, sturdy, and precise. Also, because the mount attaches to the ring-mounted lens (see images below), you don’t have to worry about messing up the focus of the lens — I initially had some trepidation about this, but not any more.

Nodal Ninja R1 Ring-Mounted Camera

Nodal Ninja R1 Ring-Mounted Camera

So there.  What do you use to take your panoramas?  Care to share?

Arc de Triomphe Photography with My iPhone July 18, 2009

Posted by Mok Oh in Mobile, Photography, panorama, post processing.
Tags: , , , , , , , , , , ,
4 comments

One way to get more resolution or field of view is to create a panorama — take more photos and put them together.  My previous two posts have been about this, and am following up with a few more examples of Arc de Triomphe in Paris.

As I’ve mentioned before, I used AutoStitch on my iPhone 3G S.  Much of the panoramas were an experimentation of adding some time and positional elements, which resulted in pretty cool stitched photos.

To get what I call the time element, I stayed in the same place a few minutes waiting for dynamic elements of the scene to change — e.g. cars, people, clouds.  By doing this, things that are static remain more solid and things that move have a ghost-like quality to them.

To get what I call the positional elements, I tried to focus on a feature as I walked along a path.  In these examples, I focused on the Arc while moving towards it.  This tends to create an impressionist-painting-like effect.