Tag Archives: Google

Panoramas vs. Photosynth: Qualitative Comparison

On Top of the Leaning Tower of Pisa

On Top of the Leaning Tower of Pisa

This is Part 4 of the Panorama vs. Photosynth comparison.  You can find the other parts here:

In this section, we discuss some of the qualitative characteristic comparison between panoramas and photosynths.

Before I move on, though, many have asked, “How about combining both?”  And the answer is an enthusiastic Abso-f’in-lutely!  I think that is the ultimate goal for many.  I really do think there is a “best of both worlds” solution.  In fact, I would argue that it is imperative to combine both to capture the world.  With that said, let’s move on.

I think it makes sense to think about the qualitative aspects of this comparison in these steps: Acquisition, processing, viewing, user experience (UX) and sharing.

Acquisition

Man, oh man.. Photosynth soooo has the right idea here.  Let users take photos with their whatever cameras, put it into a “bucket” (a.k.a. the computer), and let it sort it out for me.  Only little know-how of overlapping the pictures is necessary to get your synths going.  HOLY SHIT this is powerful.  I saw Noah Snavely give this talk at SIGGRAPH 2006 and was freakin’ blown away.  The idea of using UGC photos to make sense this was mind blowing for me.

On the other hand, panoramas are not as easy to acquire.  I’ve been taking panoramas for about a decade now, and it still kinda sucks how much work it involves the user.

Panorama 0, Photosynth 1

Processing

Processing for photosynth is quite easy.  You go to Photosynth.net, upload your photos, and wait a few minutes.  It’s all web based so no other software bit is necessary.

For panoramas, you need to have a stitching software, e.g. PTGui, PanoTools, EasyPano.  Some are free, and most are cheap.  These software have become quite easy to use over the years — you just have to load the photos, press a button to stitch, and wait a few minutes.

Here’s the dicey part.  In photosynth, there’s no easy way to control the processing.  If you took some pictures and there’s not a sufficient overlap or the computer can’t find common-enough similarities between the photos, then you’re kinda screwed.  You can’t explicitly tell the computer, “Hey stupid, this picture belongs here, and that belongs there.”  Often the computer algorithm gets it wrong, and only thing you can do is to live with it or remove the photo (which can subsequently mess other things up).

In panoramas, the user can have finer controls by explicitly giving it guidance.  But this notion of what we call “correspondence” is not always an intuitive thing for some.

User control vs. simplicity..  Hm.  Which one wins?  Call it a draw.

Panorama 0, Photosynth 1 (same score as before).

Viewing

Viewing wise, I would give panoramas the advantage, just because the “standard” viewer now uses flash, which I think is a huge f’in deal!  Photosynth uses Silverlight.  Flash has about 97% penetration, Silverlight has about 33%.  This means that Photosynth works in only about third of the computers.  That sucks.  Even when you have the Microsoft brand, very few people wanna download any plugins.  Users expect things to just work .

Panorama 1, Photosynth 1.

UX

UX wise, this is definitely subjective.  I have my biases with panoramas.  And with Google really spreading the panoramic viewing experience with their Street View, panorama viewers and user interface and experiences are becoming more common.  Besides, panoramas feel a lot more natural when looking around.

Photosynth has a weird user experience for me.  I cannot move around a space the way I want to — you are limited to, in many ways, where the photos are.  It’s somewhat frustrating, since it really feels like a 3D environment (it actually is), but the limitations of movement (and that annoying auto-snapping to pics) seem to misset my expectations.

Due to my bias, I’m not scoring this part.

Panorama 1, Photosynth 1.

Sharing

Sharing is pretty easy for both (albeit somewhat limited).

Panorama 2, Photosynth 2.

Overall

It looks like a tie using this simplistic scoring system.  What do you think?


Panoramas vs. Photosynth (Part 3): Technical Characteristics

Photographically capturing the WORLD!

Photographically capturing the WORLD!

This is Part 3 of this series.  (At least I didn’t pull a Lucas and start with part 4.)

Let’s compare technical characteristics/requirements as I’ve mentioned in Part 1 (and pls read Part 2 as well).

  • scalable
  • distributable
  • maintainable
  • extensible

Again, there may be more and these are not orthogonal or exclusive of each other.

Scalability

Now, remember that the context in which I’m comparing these two “methods” are in trying to photographicaly capture the entire world at a human-level POV!  So imagine an online experience where you can go to a website (e.g. EveryScape and Google) and be able to walk around just like you were there.  Yep. BIG idea.

So, being able to scalably capture, store, distribute, share, etc. the whole world is tantamount.  If you can’t do this, then game over man.

Companies like EveryScape, Google, Earthmine, Mapjack, Immersive Media and bunch others found a way to (cost) effectively drive around cities with car-mounted cameras.  Especially EveryScape and Google have done this scalably in multiple cities all round the world with thousands of miles of coverage.  (I’m sure there are others but I haven’t seen this much quantity of their content published yet.)  I think this is proof enough for me to say that panoramic images can scalably cover the world.

Photosynth has not quite done this yet.  I’ve seen pretty extensive number of photographs used to represent a landmark or an area, but I have not yet seen an entire city done this way yet.  There are lots of brilliant minds at this, I’m sure, and it does feel feasbile.  But if content publication is the standard…

Panorama 1, Photosynth 0.

Distributability

By this, I mean folks online can easily view and experience the content.  Again, going to everyscape.com or maps.google.com is proof enough.  Using Flash (and Flash did “change the world” in this sense) or Silverlight, users can experience the content, and the backend seems to have been implemented well.

Oh BTW, SeaDragon‘s f’in brilliant!

Panorama 2, Photosynth 1.

Maintainability

We live in a dynamic world.  Things change all around us.  Tomorrow, a Starbucks could turn into a Dunkin Donuts (yes, I’m from the east coast).  By maintainability, I mean that these changes in the real world could easily be reflected in the mirror world online.

In any type of changes in the real world, we (EveryScape) have a “self healing” backend, so only real work is photo acquisition.  Assuming all other car-mounted systems are similar, this is technically solved.

For Photosynth, it seems like a similar approach will work.  Although there may be some ownership issues with Photosynth (if crowd sourced), it feels quite easy to make this assumption of maintainability.

Panorama 3, Photosynth 2.

Extensibility

Panorama 4, Photosynth 3.

Overview of Technical Characteristics

It seems like the main tech difference between Panoramas and Photosynth is the scalability. One main issue with Photosynth is the image registration / pose estimation problem and how scalable this can be.  Basically, for each image added to the synth, features are detected, then corresponded to the rest of the point cloud, then a relative camera extrinsics are computed.  (Apologies for the tech lingo.)  I’m not fully convinced that this is the way to go when scaling up to what I want (da world!).  Perhaps supplementing the image with GPS and other sensors is a good way to solve this.  BUT, if the philosophy for Photosynth is still automation, consumer cameras, and crowd sourcing, I’m not sure I quite believe in scalability (yet).

Is scalability issue overcome-able for Photosynth?  I think yes.  Just need to see it to believe it.


Panoramas vs. Photosynth (Part 1)

Photosynth example

Photosynth example

In this series, I’ll compare panoramas vs. Photosynth — methods that could be used to create a photographic platform of our world.  Well, actually more like initiate a conversation in comparing these two methods in representing our world (aka Mirror Worlds and the Metaverse Roadmap).  I’ll assume people have a decent understanding of what these are.  If not, please let me know and I can certainly explain.

So, how can we tell which one’s better?  Only time will tell.  Photosynth has a pretty huge backing and push from Microsoft, and personally I’m a huge fan.  On the other hand, panoramas have a big push as well from companies like EveryScape and Google.

One way to at least start a conversation about which one’s better is to list a set of requirements (i.e. “things you want”), weigh which one is more important, and rate how well either one does as a weighted sum (yes, I’m a f!@# nerd).

So what are these requirements?  Here’s what I think for starters:

Technical:

  • scalable
  • distributable
  • maintainable
  • extensible

Qualitative:

  • believable
  • photorealisitic
  • interactive
  • immersive

Web 2.0:

  • annotatable
  • searchable
  • sharable
  • personalization-able

Operational:

  • cost
  • time
  • resources needed

Business:

  • how do you make $ from this?
  • how do you get more users?
  • what are some applications?

UI/UX:

  • how intuitive and simple to use is it?
  • can my parents use it?
  • how helpful is it?

I’ve mentioned some of them in my talk at ETech 2009 and Where 2.0 2008.  The list may vary depending on specific applications, but for general platforms it’s a decent list, IMO.

Did I miss anything?  Please let me know, and stay tuned for the next set of blogs in this series.


A9 and Streetside: Why Did They Fail?

Amazon A9 Maps

Amazon A9 Maps

Before EveryScape and Google Street View existed (and yes, we were doing this before Google was), there were a couple of attempts of street-level photography by companies you might have heard of: Amazon and Microsoft.

Amazon had their A9 Block View (shown above) and Microsoft  had (has?) their Streetside.

My question is: Why did they fail?

In some sense, their intensions were the same as EveryScape and Google — to enable users to virtually see places, businesses, points of interest from the comfort of your browser for various use cases and applications.

One obvious “feature” difference is that they did not use panoramic imagery.  One could argue that panoramic imagery is more immersive and experiential.

Does panoramic imagery make that much of a difference?  Isn’t one of the beauties of the Web is that “keep it simple, stupid” wins?

Or are the users really looking for richer online experiences?  A better UI/UX (a la Apple and iPhone)?  Were their approach limiting feature wise?

More questions than answers, unfortunately.  Facts or biases, your feedback is appreciated.


Crowd Sourcing vs. Tribe Sourcing

Google Maps new geolocation feature -- now you can share your location on Google Map

Google Maps new geolocation feature -- now you can share your location on Google Map

Yesterday, Google Maps launched a geolocation feature.  When you click on the small blue dot on the upper-left controls, it will try to figure out where you are using Wi-Fi.  It’s a pretty darn cool feature.  Well, Skyhook‘s been doing that much longer than Google has and definitely has a better product at this point (your Google Map on your iPhone uses Skyhook!  Things that make you go hm…).  Hold this thought.  I’ll return to my point about this in a bit.

This blog is not about Google getting their tentacles into many different markets.  (We had that experience and Galen Moore of Mass High Tech quoted me quite well in his article.)  That’s definitely a multi-part blog for some other time.

I want to talk more about crowd sourcing vs. tribe sourcing in this blog.  I think people have a decent idea of what crowd sourcing is.  So, what is tribe sourcing?  Tribe sourcing is when you have not everyone involved; much less but focused set of folks doing the sourcing.  Crowds can create lots and lots of data, but have many different intensions — their “intension vectors,” if you will, are not aligned, hence creating lots of noise as well.  So, in order to gather what you want from this vast amounts of information, you have to filter accordingly.  Meaning, make some assumptions, process, and potentially make some guesses as to what that means.

Now, let’s take the example of what I initially mentioned about Google geolocation vs. Skyhook geolocation.  Sources say that Google’s geolocation feature is not as good.  It turns out that’s because they are crowd sourcing their info. From Wade Roush’s article:

“[Google] quietly gathers local readings every time someone uses a Google app on an iPhone or a Blackberry, or some other mobile device.”

As opposed to Skyhook’s tribe-sourced data:

“Skyhook’s own approach is to send Wi-Fi-sensing vehicles down every highway, street, and alley, methodically establishing the position and strength of every access point they pass.”

Skyhook may have much less quantity of people contributing to their data, but they have a very focused tribe gathering the right data.  Their intension vectors are very well aligned in collecting the data in a structured and optimal way for this particular application.

So, which one’s better?  It’stoo early to tell but my bias is Skyhook (and has nothing to do with the fact that I know Ted Morgan and folks at Skyhook fairly well).  Is tribe sourcing better than crowd sourcing?  Vice versa?  More specifically, when will Google’s data/product be better than Skyhook’s?  I don’t know, but time will tell.

Yet another question:  Why combine and do both?  Google’s everywhere (including Android) and seemingly has unlimited resources, so they can.  I think Skyhook can too.  Perhaps the answer lies in somewhere in the balance between the crowd and the tribe.


2D, 3D… 2.5D??

EveryScape's "2.5D" Local Business Webscape Demo

EveryScape's "2.5D" Local Business Webscape Demo

When I say “2D,” people understand.  When I say “3D,” folks get that too.  But when I say “2.5D,” I either get a “huh?” or a  “hm..”  Yes, dimensions are typically in integers, so it’s a fuzzy description for sure.  When I say 2.5D, I mean visual representations that look almost 3D but not quite.  More specifically, in my context, I mean connected series of immersive panoramas.

Ok, some nerdy stuff (but don’t fall asleep). Typically, 3D in our context means three orthogonal axes in space, let’s call them X, Y, and Z — hence, the 3 dimensions.  When a first-person or a camera or a viewer is involved, we need to add a couple more dimensions Phi and Theta for looking up-down and side-to-side.  So position (x, y, and z) and some viewing direction (phi, theta) consist of 5 dimensions (also called the extrinsics).  Yes, there’s something called the intrinsics as well, but that’s for some other discussion — it just means what type of camera and lens you’re using.

So, what’s my point?  My point is 2.5D really is just a figure of speech.  But more interestingly, I think that 2.5D way of representing our world in a digital fashion is really useful.

I gave a talk at an O’Reilly Emerging Technologies Conference early this year titled “2D, 3D… 2.5D?” The abstract was as follows:

“Historically, 3D on the Web has always been associated with difficulties. Although 3D has been around for decades, from research labs to gaming to visualization of a 3D earth, there are numerous reasons why 3D is still having majority adoption challenges. On the other hand, digital photography (and video) have blossomed well into the world-wide consumer market, from both hardware (e.g., cell phones with cameras) and software perspectives (e.g., Flickr, YouTube).

In this talk we delve deeper into the benefits of a “2.5D” representation of our world, leveraging both 2D photography and 3D graphics and vision techniques. We open up a discussion for why such difficulties in 3D realm exist, what/how we can benefit from digital point-and-shoot photography, and further discuss the benefits of creating a “2.5D” representation—more specifically from the mirror world and web perspective (e.g., Amazon A9, Google Street View, EveryScape).

We will discuss the pros/cons of 3D using specific examples (e.g. Google Earth, MS Virtual Earth, Sketchup, Maya, etc.), 2D (e.g. digital photography, photoshop, jpeg, flickr, etc.), and 2.5D (e.g. EveryScape, Google Street View, MS Photosynth). Below is a table where we compare the each of the 3D, 2.5D, and 2D with some parameters, in which we argue all must be in the “easy” category for mass adoption success. Mass adoption of 3D may be years (or even decades) away, and various 2.5D solutions are filling in some of the needs now. We further discuss EveryScape’s specific journey in research and development—how and why we ended up where we are today.”

The point is that while scalable 3D isn’t quite here yet for the web and mass adoption, 2.5D technologies are filling in the gap (e.g. EveryScape, Google Street View, Earthmine, Mapjack).  They are emerging indeed!

If there’s enough interest, I will put up my slides.  Pls let me know.


Follow

Get every new post delivered to your Inbox.