Category Archives: Photography

The Indoor Mall Local Search Problem

I’m sitting in a pretty sizable mall (Natick Mall in MA near Boston) and thinking about how a mirror-world local search would work. In other words, think about how a use can browse to the Natick Mall, walk around this space immersively, walk into stores, and shop like in real life?  How would EveryScape implement this?  How would Bing or Google do this?

Natick Mall Partial Panorama

I snapped a few photos and created a partial panorama above within a few minutes using my iPhone 4 and AutoStitch.  You can see more than a dozen shops from a simple image like this, and it can give you a pretty good insight into what the space is like.

How can I get the “crowd” to do this for me?  Will Foursquare/Gowalla-like approach work?  How well is the photo-crowd sourcing working?  How do we solve the GPS problem for indoors?


Panoramas vs. 3D (Part 1): Introduction

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY, at a "bird's eye" view.

Wade Roush made a great comment in Panoramas vs. Photosynth Part 4, not to forget that there are forces in 3D happening as well.  This blog is inspired by Wade.  Thanks, Wade!

Now, imagine an online web representation of our world in 3D…  You type in maps.bing.com to get to a place you are interested in.  You find a map, and you zoom down close to see the street.  Then you click on the “3D” viewing mode.  You zoom in some more.  Then you land on the street…  a photorealistic, real-time 3D street.  You can smoothly walk around, not just in the streets but on the side walk as well.  Hell, you can fly if you wanted to.  You find the store you like, and you navigate inside.  When you do, you are greeted by the store owner avatar saying hello.  And it’s actually a person behind the avatar.  You ask your questions about the store, if they have what you are looking for.  You walk around to browse what they have.  Then you walk out and zoom to your favorite restaurant to see how crowded the place really is currently…  And cut. (Or wake up.)

I think this scenario, or something similar, has a real possibility in the future.  Assuming this can happen, then the question is, how far off are we?   It feels to me at least a decade away.  Most likely more.

Now let’s get back to reality and talk about what we have today.  We have Google Earth and Microsoft Bing that have a 3D representation of our world.  There are wonderful other technologies, like C3 Technologies, but they have not yet proven to be scalable (by “proven,” I mean published and has significant coverage).  It really is amazing to see cities like New York in full 3D glory, BUT from a “bird’s eye” view.  At a thousand feet above ground, these cities look amazingly real.

At a ground level, not so much.  (See below for a comparison.)

St. Patricks Cathedral, New York, NY

Photograph of St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

So, another relevant question is: What is it going to take to create a believable 3D on the web at the ground level?  I think quite a bit.  (To be discussed in the following blogs.)

At a bird’s eye point of view, the level of detail required to make a person believe what she’s seeing is “real” is much less so than at a ground level.  Up there, the buildings are more or less boxes with photo textures.  (Don’t get me wrong — the feat accomplished by Google Earth and Microsoft Bing are incredible.)  But at a ground level, I would argue that there is an order of magnitude more 3D shit you gotta model to convince people that you are actually at 43rd and 5th in New York City, or Time Square or Champs Elysees.  You have all the street level things that require to be represented realistically — people, cars, trees, news stands, lamp posts, signs, etc.  All these have to be there just like they are in real life to convince folks that they’re really there.

So, for the foreseeable future, we have panoramas and UGC photographs to represent our real world, aka mirror world, a la Google Street View and EveryScape.  These are what I call 2.5D representations, since they are not quite 3D, but more than 2D.

This series of my blog sounds like fun to write.  Want more?  Please let me know.


Panoramas vs. Photosynth (Part 3): Technical Characteristics

Photographically capturing the WORLD!

Photographically capturing the WORLD!

This is Part 3 of this series.  (At least I didn’t pull a Lucas and start with part 4.)

Let’s compare technical characteristics/requirements as I’ve mentioned in Part 1 (and pls read Part 2 as well).

  • scalable
  • distributable
  • maintainable
  • extensible

Again, there may be more and these are not orthogonal or exclusive of each other.

Scalability

Now, remember that the context in which I’m comparing these two “methods” are in trying to photographicaly capture the entire world at a human-level POV!  So imagine an online experience where you can go to a website (e.g. EveryScape and Google) and be able to walk around just like you were there.  Yep. BIG idea.

So, being able to scalably capture, store, distribute, share, etc. the whole world is tantamount.  If you can’t do this, then game over man.

Companies like EveryScape, Google, Earthmine, Mapjack, Immersive Media and bunch others found a way to (cost) effectively drive around cities with car-mounted cameras.  Especially EveryScape and Google have done this scalably in multiple cities all round the world with thousands of miles of coverage.  (I’m sure there are others but I haven’t seen this much quantity of their content published yet.)  I think this is proof enough for me to say that panoramic images can scalably cover the world.

Photosynth has not quite done this yet.  I’ve seen pretty extensive number of photographs used to represent a landmark or an area, but I have not yet seen an entire city done this way yet.  There are lots of brilliant minds at this, I’m sure, and it does feel feasbile.  But if content publication is the standard…

Panorama 1, Photosynth 0.

Distributability

By this, I mean folks online can easily view and experience the content.  Again, going to everyscape.com or maps.google.com is proof enough.  Using Flash (and Flash did “change the world” in this sense) or Silverlight, users can experience the content, and the backend seems to have been implemented well.

Oh BTW, SeaDragon‘s f’in brilliant!

Panorama 2, Photosynth 1.

Maintainability

We live in a dynamic world.  Things change all around us.  Tomorrow, a Starbucks could turn into a Dunkin Donuts (yes, I’m from the east coast).  By maintainability, I mean that these changes in the real world could easily be reflected in the mirror world online.

In any type of changes in the real world, we (EveryScape) have a “self healing” backend, so only real work is photo acquisition.  Assuming all other car-mounted systems are similar, this is technically solved.

For Photosynth, it seems like a similar approach will work.  Although there may be some ownership issues with Photosynth (if crowd sourced), it feels quite easy to make this assumption of maintainability.

Panorama 3, Photosynth 2.

Extensibility

Panorama 4, Photosynth 3.

Overview of Technical Characteristics

It seems like the main tech difference between Panoramas and Photosynth is the scalability. One main issue with Photosynth is the image registration / pose estimation problem and how scalable this can be.  Basically, for each image added to the synth, features are detected, then corresponded to the rest of the point cloud, then a relative camera extrinsics are computed.  (Apologies for the tech lingo.)  I’m not fully convinced that this is the way to go when scaling up to what I want (da world!).  Perhaps supplementing the image with GPS and other sensors is a good way to solve this.  BUT, if the philosophy for Photosynth is still automation, consumer cameras, and crowd sourcing, I’m not sure I quite believe in scalability (yet).

Is scalability issue overcome-able for Photosynth?  I think yes.  Just need to see it to believe it.


Panoramas vs. Photosynth (Part 2): What Are They?

This is a continuation from Panoramas vs. Photosynth Part 1.  In Part 1, we discussed some characteristics that we may use to determine which one is better.  But before we delve into that, let me describe a bit more about these methods of capturing our world.

Panoramas

Panoramas assume that pictures are taken from a common point in space (i.e. pivoting around the entrance pupil).  That’s why a panoramic heads (shown below) are important in minimizing errors caused by parallax, so we can stitch a nice panorama without much visual artifacts.  Also, due to their immersive nature, many images are necessary to take a full 360 x 180 degree panorama.  Many people resort to using wide-angle or fisheye lens to lessen the quantity of images necessary to cover all view directions.

Panoramic tripod head

Panoramic tripod head

Panoramas are amazing in their immersive nature — you really do feel like you are there.  And the continuity of the experience is just fantastic in describing the scene captured.  But the constraints on the acquisition process make it more difficult to capture than just taking regular pictures.

Another limitation is that the users cannot move around in space.  But of course, I will argue that EveryScape has solved that.

Photosynth

Photosynth, on the other hand, has no such constraints when acquiring images.  You don’t have to be anal about the “entrance pupil,” or the “no-parallax point,” or the “pivot point” or all that BS.  Although technically rational, these constraints really do suck for the picture takers.  (That’s why there are gazillions more regular photographers than panoramic photographers.)  In taking pictures for Photosynth, there just needs to be sufficient visual overlap between them, such that the computer algorithm can automatically try to determine where the pictures were taken from.  This also means that you can move around in space!

But often times, using just pictures do suck in a sense that the users don’t quite feel immersive.  Immersivity is one of the qualities I mentioned in Part 1.  I was talking to a friend of mine a while back (@billwarner), and he said to me, “Don’t break reality.”  In many ways, if you “break reality,” then you don’t gain as much confidence from your users about the space you are describing.

So, What Next?

There are pros and cons for both these methods and in the next part, we will start to compare them and grade them if we can.


Panoramic Equipment

I’ve been taking panoramic images  for over 10 years, and I’ve been using various gears for taking them — cameras, lenses, rotating heads, tripods, and GPS.  Curious about what I use now?  Here’s my list.

Lens

My lens of choice is Sigma 8mm fisheye lens.  In general, I prefer the fisheye lens since the field of view is very wide, i.e. need to take less amount of pictures to cover the full 360 x 180 degrees; i.e. faster.

The optics is quite good, and we’ve had very few of them fail.  There’s some chromatic aberration, but typically stitching software takes care of that.

Sigma 8mm Fish Eye Lens

Sigma 8mm Fisheye Lens

Camera

My camera brand of choice is Canon.  We’ve tried Nikons but they failed a lot more for us under extreme conditions (ask me if you’re curious).  I currently use Canon T1i, which has a 1080p video recording capability. Awesome camera.

Canon T1i

Canon T1i

Because the T1i is not a full frame digital SLR, when used with the Sigma 8mm, the circular fisheye image is cropped.  But I actually prefer the crop for better “resolution” of the scene.

Panoramic Tripod Head

My choice for panoramic tripod head is Nodal Ninja R1.  It’s light, compact, sturdy, and precise. Also, because the mount attaches to the ring-mounted lens (see images below), you don’t have to worry about messing up the focus of the lens — I initially had some trepidation about this, but not any more.

Nodal Ninja R1 Ring-Mounted Camera

Nodal Ninja R1 Ring-Mounted Camera

So there.  What do you use to take your panoramas?  Care to share?


A9 and Streetside: Why Did They Fail?

Amazon A9 Maps

Amazon A9 Maps

Before EveryScape and Google Street View existed (and yes, we were doing this before Google was), there were a couple of attempts of street-level photography by companies you might have heard of: Amazon and Microsoft.

Amazon had their A9 Block View (shown above) and Microsoft  had (has?) their Streetside.

My question is: Why did they fail?

In some sense, their intensions were the same as EveryScape and Google — to enable users to virtually see places, businesses, points of interest from the comfort of your browser for various use cases and applications.

One obvious “feature” difference is that they did not use panoramic imagery.  One could argue that panoramic imagery is more immersive and experiential.

Does panoramic imagery make that much of a difference?  Isn’t one of the beauties of the Web is that “keep it simple, stupid” wins?

Or are the users really looking for richer online experiences?  A better UI/UX (a la Apple and iPhone)?  Were their approach limiting feature wise?

More questions than answers, unfortunately.  Facts or biases, your feedback is appreciated.


HDR Part 2: Exposure Fusion

This blog is the second part of the previous blog on high dynamic range imagery.

Exposure Fusion (a.k.a. Enfuse) does not use HDR.  But it is related in a sense that it uses multiple exposures to create a nice “fused” image.  (So technically, “part 2″ is a bit misleading.)

Exposure Fusion was a paper by Mertens, Kautz, and Van Reeth in 2007, and you can learn more about the work here.  This technique basically bypasses HDR creation all together to create a wonderfully fused image.

Let’s just briefly discuss some issues with HDR (I will discuss some benefits of HDR in the next blog).  HDR “assembly” takes quite a bit of processing time and the file sizes bloat up big time — which also means longer time to load to any programs like Photoshop to do anything to it.  From there, you typically end up tone mapping the image anyway.  And don’t get some folks started on the pain-in-the-ass-ness of tone mapping.  Yeah, it generally sucks when you end up doing a lot of them by hand.

Exposure Fusion basically says, “that’s bullsh!t!” There’s no need to convert a bunch of files to something you won’t use, then have to convert again, only to spend the next 2 hours tweaking some parameters you don’t understand, that was named by some ivory-tower researchers (sorry guys ;-) ). Exposure fusion just creates a wonderfully “fused” image from your multiple-exposure set, which is the part I really like.

So, gettin’down to the brass tax, if you have a hard time going from HDR, then back to LDR using some tone mapping operator that doesn’t understand you, then use Enfuse.  It’s one of the most consistent way to create an image from multiple exposures.  And, it’ll save you time and lots of disk space.

One caveat is that Enfuse is a command line tool.  If you don’t like that, you can find some GUI wrapper programs out there (e.g. Bracketeer).


High Dynamic Range Imagery (part 1) — What the Heck Is It?

So, you’ve heard some folks talk about “high-dynamic range imagery” or HDR, and you think you sort of understand it, or not really?  Well then, I hope to de-mystify it for you in a series of blogs.

In my previous blog, I talked (or more like bitched) about why cameras suck; and one of my reasons was that they lacked sufficient dynamic range in capturing light.

“…, let’s talk about dynamic range.  This is the whole problem of the images above.  We’re only stuck with 0-255 per RGB channels.  This means that we need to describe the brightness of what we see — from dark shadows to sunlight — within the integer range of 0 to 255.  Even RAWs don’t cover it since the dynamic range needed to describe what we see could be 0-1,000,000.  Yes, cameras suck.  There are high-dynamic range imagery, and I will talk more about that soon.”

As shown above, there are series of photos that describe this problem.  In the beach shot 1, you see that the exposure was very long, so most pixels are washed out, but you can still see some contrast in the dark parts of the palm trees as well as the dark shadows on the sand ridges.

As the series of images get darker, e.g. beach shot 2 and beach shot 3, you can see the scenery much better, but the bright area around the sun is still too washed out.  By beach shot 6, we can see some outline of the sun better but everything else is now too dark.

Somewhere in this series of variously-exposed images lies the “right” answer for the composite image — this is the dynamic range problem.  Our eyes can see much better contrast than any of these camera shots (also because we can dynamically adapt better too, but that’s some other blog).  You can imagine manually photoshopping these images to get the solution image you want.  A more automated way is to create a single high-dynamic range image from this series of images, then tone map it.

Putting aside the technical lingo bullsh!t, I hope I’ve convinced you that there is a way to combine these images somehow to get the final image you want. (And I won’t bore you with the tech details either — if you must know, let me know pls.)

There are nice software products to do just this: Photoshop, Photomatix, and Enfuse.  There are more, but these are the ones I like.  (If you have your favorite, please comment and share!)

Beach shot "solution" using Photoshop

Beach shot "combined solution" using Photoshop

Beach shot "combined solution" using Photomatix

Beach shot "combined solution" using Photomatix

I’m not showing Enfuse just yet since it really isn’t HDR.  But I’m gonna stop right here for now, since the blog is getting too long.  I will talk more about Enfuse and more of HDRI-related issues in part 2 of this series.


Why Do Cameras Suck (Compared to Our Eyes)?

Why can’t the cameras capture what we see how we see?  In most cases, my digital camera pictures don’t correspond well with what I am actually seeing.  For instance, when I look out the window, I can see the bright outsides as well as the insides fine with my own eyes. BUT when I take a picture, I’m either stuck with a blown out window or dark interior.

Why the f#@! is this, you may ask?  More technically in this case, it’s because cameras lack the dynamic range of our eyes.  More universally, it’s because cameras suck compared to human eyes (and we don’t even have the best eyes in the animal kingdom!).

My point is that cameras have a long ways to go before we can start capturing images that fall within the standards of what we actually see everyday.  Isn’t that the point?  I feel like we got stuck with the limitations of technology, and seem to have forgotten the whole point of reproducing how and what we see.  Yeah, cameras are getting better all the time, but not fast enough and not close enough yet.

So what are some “parameters” of the camera we can improve?  There’s been a lot of research in optics, vision, perception, etc., but I’m writing a blog and not a research paper (thank goodness it won’t be boring — hopefully).  I’m going to only talk about the following parameters:

  • Resolution
  • Focal length
  • Dynamic range

There’s definitely more, and I found a nice link here with interesting data.

As for human eye resolution, it is about ~600 mega pixel resolution.  Man, cameras are not even close to that!  Sure, you can create a panorama, but I’m talking about doing this in a single shot (or at 30 fps!).

Focal length wise, the article puts our eyes at about 16-22 mm.  Basically, our eyes can see a lot all around.  Do a simple test: put your hands out straight, wiggle your fingers, then start to move your arms to the opposing sides while looking straight.  I can see to about 180 degrees horizontally.

Finally, let’s talk about dynamic range.  This is the whole problem of the images above.  We’re only stuck with 0-255 per RGB channels.  This means that we need to describe the brightness of what we see — from dark shadows to sunlight — within the integer range of 0 to 255.  Even RAWs don’t cover it since the dynamic range needed to describe what we see could be 0-1,000,000.  Yes, cameras suck.  There are high-dynamic range imagery, and I will talk more about that soon.

These things basically mean, to me anyway, that our typical camera lenses are not wide enough, we need a lot more resolution, and needs a lot more dynamic range.

Are there more parameters we can hope for?  Of course!  What do you wish for in the next gen camera?


Arc de Triomphe Photography with My iPhone

One way to get more resolution or field of view is to create a panorama — take more photos and put them together.  My previous two posts have been about this, and am following up with a few more examples of Arc de Triomphe in Paris.

As I’ve mentioned before, I used AutoStitch on my iPhone 3G S.  Much of the panoramas were an experimentation of adding some time and positional elements, which resulted in pretty cool stitched photos.

To get what I call the time element, I stayed in the same place a few minutes waiting for dynamic elements of the scene to change — e.g. cars, people, clouds.  By doing this, things that are static remain more solid and things that move have a ghost-like quality to them.

To get what I call the positional elements, I tried to focus on a feature as I walked along a path.  In these examples, I focused on the Arc while moving towards it.  This tends to create an impressionist-painting-like effect.


Follow

Get every new post delivered to your Inbox.