Commoditization of Car-Mounted Immersive Imagery September 24, 2009
Posted by Mok Oh in Acquisition, Cameras.Tags: 2.5D, Acquisition, car-mounted, fisheye, panorama, Photography
2 comments

PixKorea Car. They use digital SLRs with fisheye lenses. The mechanical rig can change heights. Cool.
Sorry it’s been a while since my last post. I’ve been busy with family, work, my dog ate my home work, had to wash my hair…
I was invited to give a talk at a conference in Seoul, Korea called National Spatial Data Infrastructure Expo 2009 (Sep. 9-11). I spoke about “How to Paint the World,” which stressed on the importance of a framework for capturing, processing, storing and distributing photorealistic, immersive, interactive content of our world for various applications (e.g. local search). Ok, that sounded fancier than it actually was (or perhaps more boring than it was?).
But that isn’t the gist of this blog. I wanted to write about how pleasantly surprised I was to see so many street-level, car-mounted camera aquisition systems in the show floor of the expo. I think I saw at least 5 companies doing that when I walked around half the show floor, with various configs and cameras.

This car uses Point Grey's Lady Bug (red on top) as well as digital SLRs on the bottom. Not exactly sure why.

This car uses Point Grey's Lady Bug and two GPS's to determine orientation. When asked how well that worked, the answer was ambiguous.
In general, I am seeing a bunch of companies being formed that have a car-mounted system for street-level panoramic acquisition around the world. I’m glad to see this, since it feels like another step towards this content type being useful and in demand. But ultimately, street content will be commoditized (even before it can be monetized — but that’s whole other topic).
So, what does this mean? Well, it means the consumers win in the long run. It also means that the competition will hopefully improve the image quality of this exterior content (really lead by Google Street View). Further differentiation and innovation needed to win in the competitive market will push the innovative minds to do a lot more than just display panoramas — enabling mashups, UGCs, improving extensibility and maintainability, encoding a whole lot more geo info, getting INTERIORS (a-hem!) etc. will be necessary for survival. As I said, this should all be good for the consumers, if it pans out this way. Yay.
It’s still a bit early to tell who, how, what will win or lose. And somewhat surprisingly (to a US-centric person), Google is not winning else where around the world. Yay.
You Know What Really Grinds My Gears? Augmented Reality! September 3, 2009
Posted by Mok Oh in augmented reality.Tags: augmented reality
15 comments

AR really grinds my gears!
My last post was more about being rational, but this post is more about being raw and emotional. I wanna make two points that really grind my gears about so called “Augmented Reality” apps we’re seeing these days.
1. What really grinds my gears about “Augmented Reality”: It really fucking SUCKS!
There I said it. By “Augmented Reality,” I mean those iPhone and Android apps I mentioned in the previous blog in their current incarnation. And if you really think they’re useful, I will respectfully argue you’re full of shit. Get a REAL use case, and try a comparison with Google Maps. If you still think those “Augmented Reality” apps are still more useful, then I’d again respectfully argue that you are a delusional fuck.
Please don’t get me wrong — I’d *love* to see those guys succeed. I’m just saying there’s a lot of work to be done at its current state, and they really need to differentiate the functionality from all other local search and mapping applications beyond the video-overlay eye candy.
2. What really grinds my gears about “Augmented Reality”: HYPE will KILL the industry.
Just like “Artificial Intelligence” and “Virtual Reality,” or any other technical buzz words that were hyped waaaaaaay beyond their technological capabilities, the current AR hype will kill the future of AR industry.
Let’s take a look at what happened to Artificial Intelligence (AI). From late 50’s on, the buzz words “AI” was coined and researched by amazingly brilliant minds. For one reason or another, too much hype followed too much funding, and eventually too much promises and vision could not be realized, even today. It’s cuz the vision was great, but the technology didn’t exist yet!
Nowadays, we see AI-inspired applications everywhere, e.g. Pandora’s music recommendations, Amazon’s “you might also like,” and even facial recognition algorithms — these are in one form or another inspired by AI. But the problem is no one uses the word “AI” anymore. In fact, some avoid it like a plague.
AI is still not realized today according to Isaac Asimov’s definition. But this does not mean AI-inspired technologies aren’t useful. In fact, they are. I would even venture out to say that if the hype was minimized and expectations set properly, perhaps there would be more overall stream of funding to advance these technologies much beyond what we have today. (Rule of thumb: If you hear, “but we should be able to do that in 10 years,” then, shit, you ain’t got no solution.)
Similarly, AR is not realized today as defined by William Gibson or Bruce Sterling. But we should be able to do this in 10 years, right? I wouldn’t bet on it. Gibson and Sterling are futurists — they can beautifully write scenarios and use cases that are really quite useful and believable for the future. And these use cases really should drive technology to make our lives for the better. BUT that doesn’t mean that these technologies CAN be realized.
I would argue that the forefathers of AR, did and do have the right idea (pls read the last blog post). I still think we need to continue to expand/expound on vision algorithms (e.g. image tracking, image detection/recognition, etc.) and couple that with other sensors (e.g. Wifi, RFID, Bluetooth, accelerators, gyros, GPS, compasses, etc.) to more precisely tell people what they’re seeing in an interactive and augmented sense.
The level of precision provided by current apps are good from a mapping perspective (i.e. the 2D “aerial” view), but not good enough from a first-person’s ground perspective. (I will definitely write another blog more on the technical short-comings.)
I think that AR has been over hyped many years ago, and I don’t want to see any over-hyping done today or the future anymore. Perhaps, we need to reset people’s expectations somehow, or rebrand the words to something else. Because I really do think that there’s plenty of use for AR-inspired technologies as being defined by Layars and Wikitudes of the world.
Let’s not throw the baby out with the bath water.
Shit, I believe in AR. Just don’t fucking kill it… (Sorry about my fucks and shits. I told you it was going to be emotional..)
Is That *Really* Augmented Reality? August 23, 2009
Posted by Mok Oh in augmented reality.Tags: augmented reality, Layar, Wikitude
21 comments

Augmented Reality?
(This is a series in Augmented Reality and you can find Part 1 here.)
Historically, when someone used the words “Augmented Reality” (AR), they typically involved HUDs or HMDs, and a bunch of very expensive hardware that produced accurate results. Oh, and everyone owned one of these… NOT.
Recently, AR has been getting quite a bit of buzz thanks to mobile devices, such as iPhones and Android devices. ”But why,” you might ask? Many use cases of AR need mobility — from a simple question like, “What is that I’m seeing?” to “Where is X?” these questions in most cases require the user to be on the road. Smartphones these days meet much of these needs for these use cases in the palm of your hand. (And getting to my point,) Applications such as Layar and Wikitude are being touted as “augmented reality” browsers.
But, is that reeeeeeally AR?
Many of the folks who’ve researched and invented AR might say this: ”THAT IS NOT AUGMENTED REALITY!!!”
By definition, AR integrates or “blends” the virtual computer graphics objects to the real on your visual device, and it displays them in real time. But perhaps more importantly,
“Augmented reality does not simply mean the superimposition of a graphic object over a real world scene. This is technically an easy task. One difficulty in augmenting reality, as defined here, is the need to maintain accurate registration of the virtual objects with the real world image.”
(From Jim Vallino’s webpage.)
So, what exactly is the issue with Layar and Wikitude’s apps that does not make them AR? By using the smartphone’s camera, they can superimpose crude but virtual overlays in real time. Isn’t that AR?
The problem lies in the accurate registration of the virtual objects with the real-world image.
Without going into too much detail (and being nerdy), neither Layar nor Wikitude uses the visual information (the video stream) to accurately register and and display virtual objects with the real-world image. In fact, the video streams are not analyzed at all to determine anything. They don’t track or use image recognition or vision algorithms to tell you what you are seeing. They are simply using the device’s GPS and compass to determine where you are and which way you’re facing. Once you know that, they find out what’s near by and display a “dot” on top of the video stream to tell you the general direction, distance, and description of that place. In fact, if you turn the video stream off and the AR “layer” on, it would essentially still work. Functionally, there’s not much difference between this and Google Maps on your mobile phone. (Well, I think Google Maps is much more useful.)
So, when you are a block away from a POI (point of interest), they would generally give you a good “ish” estimation. But otherwise, they are subject to GPS errors, magnetic interferences on the compasses, and a database that might be wrong or old.
So, I ask you, is this really AR? I think not — not in the “classical” sense, at least.
Now, let’s flip the coin and see the other side.
AR’s been coined and been around for a couple of decades (and the idea, even longer). But it really hasn’t made that much progress over the decades, in a sense that it really hasn’t influenced our lives. Do you experience AR technology in your day to day? Most of us don’t (unless you are a Top Gun flying multi-million dollar airplanes).
So, where did it go wrong?
Here’s my guess: It focused too much on accuracy of registering virtual objects to the real world.
The same freaking problem.
Let’s take a simple example. I want to place my can of virtual diet coke on any surface that I see (it could be curved, like on top of my car). So, I take my smartphone, point it at some surface, and click on the “place can” button. And my virtual coke can is placed accurately as I’ve indicated. Should be simple, right? Wrong.
This scenario is quite difficult technically. In fact, I’d say it’s not possible yet in general. For this to happen, your smart phone (or even a super computer) needs to first understand the geometry of the surface. Just from a single video stream, it’s hard to robustly compute this in general. You either need multiple cameras to determine the geometry via stereo (or move around the (static, diffuse, and simple) surface suffciently), or laser scan it, or have some image understanding. None of these are robust enough for our everyday use yet, and potentially decades away (unless it’s very specific use case — you are a Top Gun and looking for enemy airplanes, flying a multi-million dollar hardware with top-of-the-line everything). There are applications out there that use some markers or fiducials to tell you the orientation and scale of a flat surface, but you can’t have these markers everywhere. (This topic really is another blog. I will talk more about this, since it actually is quite interesting to see these apps popping up more and more.)
At this rate, AR technology may never come to the consumer market if accuracy is the gating issue.
So, going back to my original question, I ask again: Are these mobile “AR” applications really AR? I would still say no, with a caveat that they have the right idea.
I think the right idea is not to place too much focus on accuracy but on use cases that influence our daily lives. Don’t worry too much about the brain surgery use case, where 1mm accuracy matters. When the constraints are relaxed, more solutions arise. Using other sensors combined with the visual sensor, such as GSP, compass, accelerators, gyros, RFID, Wifi, markers, etc., I think AR can actually start tackling the consumer markets (i.e. the long tail), and have the potential to come up with a killer app for this tech (which really is sorely lacking).
What if every POI had a unique sensor that broadcasted what and where it was? E.g. every Starbucks had a short/medium/long sensors to tell you where they were. It could even be applicable to dynamic things, such as your car, luggage, pets, and every inventory out there. It doesn’t have to be exact. Just needs to work. (And these are different!)
So, I say to researchers and inventors — SCREW ACCURACY (for now)! Focus on what will make a difference in our lives! And make something that works!
And to Layars and Wikitudes of the world — keep going, and don’t forget to push innovation.
Panoramas vs. 3D (Part 1): Introduction August 16, 2009
Posted by Mok Oh in 2.5D, 3D, Photography, mirror worlds, panorama.Tags: 2.5D, 3D, EveryScape, Google Earth, Microsoft Bing, panoramas
2 comments
Wade Roush made a great comment in Panoramas vs. Photosynth Part 4, not to forget that there are forces in 3D happening as well. This blog is inspired by Wade. Thanks, Wade!
Now, imagine an online web representation of our world in 3D… You type in maps.bing.com to get to a place you are interested in. You find a map, and you zoom down close to see the street. Then you click on the “3D” viewing mode. You zoom in some more. Then you land on the street… a photorealistic, real-time 3D street. You can smoothly walk around, not just in the streets but on the side walk as well. Hell, you can fly if you wanted to. You find the store you like, and you navigate inside. When you do, you are greeted by the store owner avatar saying hello. And it’s actually a person behind the avatar. You ask your questions about the store, if they have what you are looking for. You walk around to browse what they have. Then you walk out and zoom to your favorite restaurant to see how crowded the place really is currently… And cut. (Or wake up.)
I think this scenario, or something similar, has a real possibility in the future. Assuming this can happen, then the question is, how far off are we? It feels to me at least a decade away. Most likely more.
Now let’s get back to reality and talk about what we have today. We have Google Earth and Microsoft Bing that have a 3D representation of our world. There are wonderful other technologies, like C3 Technologies, but they have not yet proven to be scalable (by “proven,” I mean published and has significant coverage). It really is amazing to see cities like New York in full 3D glory, BUT from a “bird’s eye” view. At a thousand feet above ground, these cities look amazingly real.
At a ground level, not so much. (See below for a comparison.)
So, another relevant question is: What is it going to take to create a believable 3D on the web at the ground level? I think quite a bit. (To be discussed in the following blogs.)
At a bird’s eye point of view, the level of detail required to make a person believe what she’s seeing is “real” is much less so than at a ground level. Up there, the buildings are more or less boxes with photo textures. (Don’t get me wrong — the feat accomplished by Google Earth and Microsoft Bing are incredible.) But at a ground level, I would argue that there is an order of magnitude more 3D shit you gotta model to convince people that you are actually at 43rd and 5th in New York City, or Time Square or Champs Elysees. You have all the street level things that require to be represented realistically — people, cars, trees, news stands, lamp posts, signs, etc. All these have to be there just like they are in real life to convince folks that they’re really there.
So, for the foreseeable future, we have panoramas and UGC photographs to represent our real world, aka mirror world, a la Google Street View and EveryScape. These are what I call 2.5D representations, since they are not quite 3D, but more than 2D.
This series of my blog sounds like fun to write. Want more? Please let me know.
Panoramas vs. Photosynth: Qualitative Comparison August 13, 2009
Posted by Mok Oh in 2.5D, panorama.Tags: EveryScape, Google, Microsoft, panoramas, photosynth, qualitative comparison
8 comments
This is Part 4 of the Panorama vs. Photosynth comparison. You can find the other parts here:
In this section, we discuss some of the qualitative characteristic comparison between panoramas and photosynths.
Before I move on, though, many have asked, “How about combining both?” And the answer is an enthusiastic Abso-f’in-lutely! I think that is the ultimate goal for many. I really do think there is a “best of both worlds” solution. In fact, I would argue that it is imperative to combine both to capture the world. With that said, let’s move on.
I think it makes sense to think about the qualitative aspects of this comparison in these steps: Acquisition, processing, viewing, user experience (UX) and sharing.
Acquisition
Man, oh man.. Photosynth soooo has the right idea here. Let users take photos with their whatever cameras, put it into a “bucket” (a.k.a. the computer), and let it sort it out for me. Only little know-how of overlapping the pictures is necessary to get your synths going. HOLY SHIT this is powerful. I saw Noah Snavely give this talk at SIGGRAPH 2006 and was freakin’ blown away. The idea of using UGC photos to make sense this was mind blowing for me.
On the other hand, panoramas are not as easy to acquire. I’ve been taking panoramas for about a decade now, and it still kinda sucks how much work it involves the user.
Panorama 0, Photosynth 1
Processing
Processing for photosynth is quite easy. You go to Photosynth.net, upload your photos, and wait a few minutes. It’s all web based so no other software bit is necessary.
For panoramas, you need to have a stitching software, e.g. PTGui, PanoTools, EasyPano. Some are free, and most are cheap. These software have become quite easy to use over the years — you just have to load the photos, press a button to stitch, and wait a few minutes.
Here’s the dicey part. In photosynth, there’s no easy way to control the processing. If you took some pictures and there’s not a sufficient overlap or the computer can’t find common-enough similarities between the photos, then you’re kinda screwed. You can’t explicitly tell the computer, “Hey stupid, this picture belongs here, and that belongs there.” Often the computer algorithm gets it wrong, and only thing you can do is to live with it or remove the photo (which can subsequently mess other things up).
In panoramas, the user can have finer controls by explicitly giving it guidance. But this notion of what we call “correspondence” is not always an intuitive thing for some.
User control vs. simplicity.. Hm. Which one wins? Call it a draw.
Panorama 0, Photosynth 1 (same score as before).
Viewing
Viewing wise, I would give panoramas the advantage, just because the “standard” viewer now uses flash, which I think is a huge f’in deal! Photosynth uses Silverlight. Flash has about 97% penetration, Silverlight has about 33%. This means that Photosynth works in only about third of the computers. That sucks. Even when you have the Microsoft brand, very few people wanna download any plugins. Users expect things to just work .
Panorama 1, Photosynth 1.
UX
UX wise, this is definitely subjective. I have my biases with panoramas. And with Google really spreading the panoramic viewing experience with their Street View, panorama viewers and user interface and experiences are becoming more common. Besides, panoramas feel a lot more natural when looking around.
Photosynth has a weird user experience for me. I cannot move around a space the way I want to — you are limited to, in many ways, where the photos are. It’s somewhat frustrating, since it really feels like a 3D environment (it actually is), but the limitations of movement (and that annoying auto-snapping to pics) seem to misset my expectations.
Due to my bias, I’m not scoring this part.
Panorama 1, Photosynth 1.
Sharing
Sharing is pretty easy for both (albeit somewhat limited).
Panorama 2, Photosynth 2.
Overall
It looks like a tie using this simplistic scoring system. What do you think?
Tribe Sourcing, Crowd Sourcing, and Automation August 10, 2009
Posted by Mok Oh in 2.5D, maps, mirror worlds, panorama.Tags: automation, crowd sourcing, droids, EveryScape, geotagging, google street view, maps, panorama, star wars, tribe sourcing
add a comment
UPDATE: A friend of mine told me about an article on the New York Times about a floating drone the army’s working on! Automatic photo acquisition technology is closer than we think.
When you are mass producing something repeatedly, you absolutely want automation as much as possible. Machines typically produce less error and are more consistent than humans for very specific tasks. But when you can’t, for one reason or another, going for either crowd sourcing or tribe sourcing makes a heap of sense. This doesn’t mean that automation has no place — it means automation takes on a different role.
Let’s be a bit more specific.
In my context, I’m talking about taking ba-zillitons of photographs from a human perspective (as opposed to a satellite perspective) and making sense of it for users. There are companies like Google, EveryScape, Microsoft, Tele Atlas, and NavTeq, that go around photographing the world for online use. Let’s focus on photographic data collection as a “case study” for this blog.
So the right set of questions in this context may be: Can we automate the picture-taking process? If not, can we tap into the crowd? How about creating a tribe?

Star Wars Darth Maul Droids
Well, automating the photography of the entire world would be tough. One future solution could be to create lots of robots that walk, drive, or fly around acquiring and geotagging pictures. NERD ALERT! Remember Star Wars Episode 1, where Darth Maul sent out floating droids in Tatooween to find Princess Amidala? Something like that. Unfortunately, we don’t have these droids yet. (Can someone get on that???)
So, when automation isn’t possible, the next question is: Can we crowd source? I’m not certain if we can crowd source this yet either, since car-mounted camera systems aren’t something we can buy at Best Buy. I don’t think it’s that far off either. We may see panoramic cameras mounted on taxis — what I call cab sourcing — for instance. There are a few logistical, business-related, and technical issues that needs to be solved before this can happen, but why not?
Microsoft’s Photosynth harnesses the power of the crowd to make sense of a real place, but it’s yet to be seen that this technology can conquer the world. (In fact, I’m looking forward to a new research publication coming out this September).

Google Street View Car
Tribe sourcing is the viable solution for now — find leadership, enable and incentivize the tribe to go out and photograph the world according to some plan. Google has the cash to create (quite wonderfully equipped) cars and have folks drive around. EveryScape’s found a cost-effective solution for this to tribe source (a.k.a. The Ambassador Program).
Until automation can happen, photographic data collection happens with tribes (or crowds). Automation plays a role in that pictures are taken, geotagged, oriented, stitched, and processed automatically.
Why Peleng Lens Sucks for Panoramas August 6, 2009
Posted by Mok Oh in Cameras, panorama.Tags: camera, fisheye, panorama, Peleng 8mm, Sigma 8mm, test
1 comment so far
When taking images for panoramas, a lot of folks use circular fisheye lenses — the wide field of view means less pictures to cover the full 360 x 180 degrees, so you can acquire less and stitch faster.
I played around with various lenses out in the market for panoramic photography trying to balance price vs. quality. One of my favorite is Sigma 8mm, but unfortunately, Sigma is really hiking up their prices — what used to cost around ~$500, is now up to ~$900 (MSRP $1230!). Can you say “exploitation?” Something’s going on. (If anyone knows what’s up here, pls comment!)
Naturally, I looked for an alternative and found Peleng 8mm at less than half the price! But in testing it out, it ended up being not acceptable for professional panoramic photography.
Here is my main reason for saying this is unacceptable — two words: lens flare.
Notice the flares on the bottom right (from light source on upper left) and bottom left (from light source on upper right) of this image. The resulting panorama is below:
Now notice the flares creeping into the stitched panorama above on the bottom left. (Pls click on the image to see a higher resolution panorama.) Although this issue can be “solved” by cropping tighter or Photoshopping, this then messes up enough things to a point I would rather use a better (but more expensive) lens.
Other things like the lens not fitting in snugly to the camera, having a hard time taking the lens off, manual controls, etc. make me think that it’s not worth the savings for the risk of taking bad pictures for our customers.
Peleng, if you are reading, please let us know when you’ve improved and resolved some problems. Until then, we are sticking to Sigmas.
Anyone out there have a similar experience?
Panoramas vs. Photosynth (Part 3): Technical Characteristics August 4, 2009
Posted by Mok Oh in 2.5D, Photography, mirror worlds, panorama, post processing.Tags: EveryScape, Google, ground, Lady Bug, maps, panorama, Photography, Point Grey
6 comments

Photographically capturing the WORLD!
This is Part 3 of this series. (At least I didn’t pull a Lucas and start with part 4.)
Let’s compare technical characteristics/requirements as I’ve mentioned in Part 1 (and pls read Part 2 as well).
- scalable
- distributable
- maintainable
- extensible
Again, there may be more and these are not orthogonal or exclusive of each other.
Scalability
Now, remember that the context in which I’m comparing these two “methods” are in trying to photographicaly capture the entire world at a human-level POV! So imagine an online experience where you can go to a website (e.g. EveryScape and Google) and be able to walk around just like you were there. Yep. BIG idea.
So, being able to scalably capture, store, distribute, share, etc. the whole world is tantamount. If you can’t do this, then game over man.
Companies like EveryScape, Google, Earthmine, Mapjack, Immersive Media and bunch others found a way to (cost) effectively drive around cities with car-mounted cameras. Especially EveryScape and Google have done this scalably in multiple cities all round the world with thousands of miles of coverage. (I’m sure there are others but I haven’t seen this much quantity of their content published yet.) I think this is proof enough for me to say that panoramic images can scalably cover the world.
Photosynth has not quite done this yet. I’ve seen pretty extensive number of photographs used to represent a landmark or an area, but I have not yet seen an entire city done this way yet. There are lots of brilliant minds at this, I’m sure, and it does feel feasbile. But if content publication is the standard…
Panorama 1, Photosynth 0.
Distributability
By this, I mean folks online can easily view and experience the content. Again, going to everyscape.com or maps.google.com is proof enough. Using Flash (and Flash did “change the world” in this sense) or Silverlight, users can experience the content, and the backend seems to have been implemented well.
Oh BTW, SeaDragon’s f’in brilliant!
Panorama 2, Photosynth 1.
Maintainability
We live in a dynamic world. Things change all around us. Tomorrow, a Starbucks could turn into a Dunkin Donuts (yes, I’m from the east coast). By maintainability, I mean that these changes in the real world could easily be reflected in the mirror world online.
In any type of changes in the real world, we (EveryScape) have a “self healing” backend, so only real work is photo acquisition. Assuming all other car-mounted systems are similar, this is technically solved.
For Photosynth, it seems like a similar approach will work. Although there may be some ownership issues with Photosynth (if crowd sourced), it feels quite easy to make this assumption of maintainability.
Panorama 3, Photosynth 2.
Extensibility
Panorama 4, Photosynth 3.
Overview of Technical Characteristics
It seems like the main tech difference between Panoramas and Photosynth is the scalability. One main issue with Photosynth is the image registration / pose estimation problem and how scalable this can be. Basically, for each image added to the synth, features are detected, then corresponded to the rest of the point cloud, then a relative camera extrinsics are computed. (Apologies for the tech lingo.) I’m not fully convinced that this is the way to go when scaling up to what I want (da world!). Perhaps supplementing the image with GPS and other sensors is a good way to solve this. BUT, if the philosophy for Photosynth is still automation, consumer cameras, and crowd sourcing, I’m not sure I quite believe in scalability (yet).
Is scalability issue overcome-able for Photosynth? I think yes. Just need to see it to believe it.
Panoramas vs. Photosynth (Part 2): What Are They? August 3, 2009
Posted by Mok Oh in 2.5D, Photography, panorama.Tags: comparision, entrance pupil, panorama, parallax, photosynth
3 comments
This is a continuation from Panoramas vs. Photosynth Part 1. In Part 1, we discussed some characteristics that we may use to determine which one is better. But before we delve into that, let me describe a bit more about these methods of capturing our world.
Panoramas
Panoramas assume that pictures are taken from a common point in space (i.e. pivoting around the entrance pupil). That’s why a panoramic heads (shown below) are important in minimizing errors caused by parallax, so we can stitch a nice panorama without much visual artifacts. Also, due to their immersive nature, many images are necessary to take a full 360 x 180 degree panorama. Many people resort to using wide-angle or fisheye lens to lessen the quantity of images necessary to cover all view directions.
Panoramas are amazing in their immersive nature — you really do feel like you are there. And the continuity of the experience is just fantastic in describing the scene captured. But the constraints on the acquisition process make it more difficult to capture than just taking regular pictures.
Another limitation is that the users cannot move around in space. But of course, I will argue that EveryScape has solved that.
Photosynth
Photosynth, on the other hand, has no such constraints when acquiring images. You don’t have to be anal about the “entrance pupil,” or the “no-parallax point,” or the “pivot point” or all that BS. Although technically rational, these constraints really do suck for the picture takers. (That’s why there are gazillions more regular photographers than panoramic photographers.) In taking pictures for Photosynth, there just needs to be sufficient visual overlap between them, such that the computer algorithm can automatically try to determine where the pictures were taken from. This also means that you can move around in space!
But often times, using just pictures do suck in a sense that the users don’t quite feel immersive. Immersivity is one of the qualities I mentioned in Part 1. I was talking to a friend of mine a while back (@billwarner), and he said to me, “Don’t break reality.” In many ways, if you “break reality,” then you don’t gain as much confidence from your users about the space you are describing.
So, What Next?
There are pros and cons for both these methods and in the next part, we will start to compare them and grade them if we can.












