jump to navigation

Panoramas vs. 3D (Part 1): Introduction August 16, 2009

Posted by Mok Oh in 2.5D, 3D, Photography, mirror worlds, panorama.
Tags: , , , , ,
2 comments
Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY, at a "bird's eye" view.

Wade Roush made a great comment in Panoramas vs. Photosynth Part 4, not to forget that there are forces in 3D happening as well.  This blog is inspired by Wade.  Thanks, Wade!

Now, imagine an online web representation of our world in 3D…  You type in maps.bing.com to get to a place you are interested in.  You find a map, and you zoom down close to see the street.  Then you click on the “3D” viewing mode.  You zoom in some more.  Then you land on the street…  a photorealistic, real-time 3D street.  You can smoothly walk around, not just in the streets but on the side walk as well.  Hell, you can fly if you wanted to.  You find the store you like, and you navigate inside.  When you do, you are greeted by the store owner avatar saying hello.  And it’s actually a person behind the avatar.  You ask your questions about the store, if they have what you are looking for.  You walk around to browse what they have.  Then you walk out and zoom to your favorite restaurant to see how crowded the place really is currently…  And cut. (Or wake up.)

I think this scenario, or something similar, has a real possibility in the future.  Assuming this can happen, then the question is, how far off are we?   It feels to me at least a decade away.  Most likely more.

Now let’s get back to reality and talk about what we have today.  We have Google Earth and Microsoft Bing that have a 3D representation of our world.  There are wonderful other technologies, like C3 Technologies, but they have not yet proven to be scalable (by “proven,” I mean published and has significant coverage).  It really is amazing to see cities like New York in full 3D glory, BUT from a “bird’s eye” view.  At a thousand feet above ground, these cities look amazingly real.

At a ground level, not so much.  (See below for a comparison.)

St. Patricks Cathedral, New York, NY

Photograph of St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

Google Earth St. Patrick's Cathedral, New York, NY

So, another relevant question is: What is it going to take to create a believable 3D on the web at the ground level?  I think quite a bit.  (To be discussed in the following blogs.)

At a bird’s eye point of view, the level of detail required to make a person believe what she’s seeing is “real” is much less so than at a ground level.  Up there, the buildings are more or less boxes with photo textures.  (Don’t get me wrong — the feat accomplished by Google Earth and Microsoft Bing are incredible.)  But at a ground level, I would argue that there is an order of magnitude more 3D shit you gotta model to convince people that you are actually at 43rd and 5th in New York City, or Time Square or Champs Elysees.  You have all the street level things that require to be represented realistically — people, cars, trees, news stands, lamp posts, signs, etc.  All these have to be there just like they are in real life to convince folks that they’re really there.

So, for the foreseeable future, we have panoramas and UGC photographs to represent our real world, aka mirror world, a la Google Street View and EveryScape.  These are what I call 2.5D representations, since they are not quite 3D, but more than 2D.

This series of my blog sounds like fun to write.  Want more?  Please let me know.

Much Ado About Augmented Reality? July 26, 2009

Posted by Mok Oh in 3D, Mobile, augmented reality.
Tags: , , , ,
9 comments
Augmented Reality Heads-Up Display View

Augmented Reality Heads-Up Display View

My question to you is, do you believe in Augmented Reality?

In many ways, the notion of augmented reality is similar to 3D in a sense that expectations have been set high, but it hasn’t quite delivered.  We expected HUD in cars that help us with directions and more; we expected augmented reality glasses that help us with what we are seeing — and we expected this to happen a decade ago.  Hm…

Recently, there’s been a lot of talk about iPhone 3.1 that will enable real-time overlays for applications like augmented reality stuff.  Definitely looking forward to what will happen there, and what type of innovations will be pushed from this community.

Personally, I think there’s a lot more innovation in technology and UI/UX for AR to succeed.  Image recognition algorithms are not robust enough for general scenes (like in our every-day lives), but when supplemented with GPS, compass, accelerometers, and other sensors just might work and be useful.

What do you think?

2D, 3D… 2.5D?? July 8, 2009

Posted by Mok Oh in panorama.
Tags: , , , , , , , , , ,
13 comments
EveryScape's "2.5D" Local Business Webscape Demo

EveryScape's "2.5D" Local Business Webscape Demo

When I say “2D,” people understand.  When I say “3D,” folks get that too.  But when I say “2.5D,” I either get a “huh?” or a  “hm..”  Yes, dimensions are typically in integers, so it’s a fuzzy description for sure.  When I say 2.5D, I mean visual representations that look almost 3D but not quite.  More specifically, in my context, I mean connected series of immersive panoramas.

Ok, some nerdy stuff (but don’t fall asleep). Typically, 3D in our context means three orthogonal axes in space, let’s call them X, Y, and Z — hence, the 3 dimensions.  When a first-person or a camera or a viewer is involved, we need to add a couple more dimensions Phi and Theta for looking up-down and side-to-side.  So position (x, y, and z) and some viewing direction (phi, theta) consist of 5 dimensions (also called the extrinsics).  Yes, there’s something called the intrinsics as well, but that’s for some other discussion — it just means what type of camera and lens you’re using.

So, what’s my point?  My point is 2.5D really is just a figure of speech.  But more interestingly, I think that 2.5D way of representing our world in a digital fashion is really useful.

I gave a talk at an O’Reilly Emerging Technologies Conference early this year titled “2D, 3D… 2.5D?” The abstract was as follows:

“Historically, 3D on the Web has always been associated with difficulties. Although 3D has been around for decades, from research labs to gaming to visualization of a 3D earth, there are numerous reasons why 3D is still having majority adoption challenges. On the other hand, digital photography (and video) have blossomed well into the world-wide consumer market, from both hardware (e.g., cell phones with cameras) and software perspectives (e.g., Flickr, YouTube).

In this talk we delve deeper into the benefits of a “2.5D” representation of our world, leveraging both 2D photography and 3D graphics and vision techniques. We open up a discussion for why such difficulties in 3D realm exist, what/how we can benefit from digital point-and-shoot photography, and further discuss the benefits of creating a “2.5D” representation—more specifically from the mirror world and web perspective (e.g., Amazon A9, Google Street View, EveryScape).

We will discuss the pros/cons of 3D using specific examples (e.g. Google Earth, MS Virtual Earth, Sketchup, Maya, etc.), 2D (e.g. digital photography, photoshop, jpeg, flickr, etc.), and 2.5D (e.g. EveryScape, Google Street View, MS Photosynth). Below is a table where we compare the each of the 3D, 2.5D, and 2D with some parameters, in which we argue all must be in the “easy” category for mass adoption success. Mass adoption of 3D may be years (or even decades) away, and various 2.5D solutions are filling in some of the needs now. We further discuss EveryScape’s specific journey in research and development—how and why we ended up where we are today.”

The point is that while scalable 3D isn’t quite here yet for the web and mass adoption, 2.5D technologies are filling in the gap (e.g. EveryScape, Google Street View, Earthmine, Mapjack).  They are emerging indeed!

If there’s enough interest, I will put up my slides.  Pls let me know.

I Want My 3D! July 5, 2009

Posted by Mok Oh in 3D.
Tags: , , ,
8 comments
3D Operating System from Jurassic Park

3D Operating System from Jurassic Park

Remember in Jurassic Park (yes, the first one) when the little girl character sees the computer system and says something like, “This is Unix!  I can do this!” and she navigates the database in 3D?   (Screenshot above).  Well, to me, that was pretty darn cool.  I thought then, wow, this is how the future of OS’s will be like.  That was around the time first web browsers began to surface as well and the future looked bright for 3D.

Fast forward to present day.  Why aren’t we  browsing the web on 3D?  Why aren’t the Operating Systems more 3D?  Where the hell is Virtual Reality that I was promised?  WTF happened???  I WANT MY 3D!!!

I’ve been in the 3D world (both in academic and industry) long enough to have an opinion, and I would be very interested if someone’s done some survey or deep analysis on this.  Let’s go back and check with my last blog on 3D for a framework on this discussion.

Content authoring in 3D is hard.

To incorporate 3D into feature-length films cost millions of dollars and tremendous amounts of 3D brain power (lots of expensive PhDs).  3D games also take a long time to create, lots of $ and resources, and take some very smart and experienced folks and tools.  I don’t think 3D authoring will get that much easier any time soon, but as long as some people are spending enough $ and putting enough resources and brain power into this problem, shouldn’t there be enough to bring this to the mainstream?

Content distribution for 3D is hard, but getting better

We used to need to buy a set of CDs or DVDs to get the 3D content (let’s say a game) into our computers way back when.  These days, pretty much everything digital is delivered via the internet — even 3D content.  Google Earth and Virtual Earth both have tremendous amounts of images and 3D content, but not all need to be delivered at once.  In 3D lingo, level of detail algorithms enable “on demand” content delivery.  For example, if you are using Google Earth or Maps, the right images are delivered to your viewer depending on your zoom level.  But still, it takes quite a bit to load a 3D city in Google Earth or Virtual Earth.

Need a 3D viewer on the web browser

No one downloads a plugin.  Well, very few.  And without a “standard” 3D viewer on the browser it’s still hard to see 3D content.  I think Flash will change everything.  3D can be displayed using Flash 10, and although it’s got some performance limitations so far, it can bring the 3D experience to your browsers now, and much improved versions for the future.  Very exciting.

3D User Experirnce (UX) is getting better

I think that one of the detrimental things that happened to 3D was that the user experience of 3D were initially done by engineers (and therefore usable only by engineers).  Yeah, sure there are engineers with design chops, but that’s for some other blog.  3D UX in my opinion was hard, and I still think it’s too hard.  I’ve seen a recent 3D demo by one of the big players and even he had a hard time navigating the 3D scene.  I think we can get there by getting more designers in the loop (and engineers out of the loop ;-) .  Think Apple.

Thoughts about 3D (part 1) — The Framework July 4, 2009

Posted by Mok Oh in 3D.
Tags: , , , , , , , ,
1 comment so far
3D Axis

The 3D Axis -- Yep, 'Y' is up.

In the past few blogs, I’ve focused on the state of panoramas as visual medium.  There’s still quite a bit to talk about there, but I’d like to shift gears to discussing 3D.  3D is a big topic and there are lots to discuss, so I will break down this blog series into specific topics within 3D — hence the “part 1.”

Let’s first discuss a framework from which we can breakdown 3D into various sub components:

  • 3D content authoring – This refers to the content authoring; how they are made (e.g. tools to automation), file formats
  • 3D distribution – Here, we’ll talk about how the content gets to the users, file formats, algorithms, etc.
  • 3D viewer/player – Once the data reaches the user, a viewer or a player is necessary to see and interact with the content
  • 3D application – Finally, there should be a reason for creation of the 3D content; we discuss UI, UX

So, moving forward, we can categorize our discussions into one (or more) of these buckets to give us a good basis for a decent framework.  There may will be more, and this is not an exhaustive list, but it’s a good start (and I ain’t waiting for this to be right to blog).

Mobile Reality Demo June 30, 2009

Posted by Mok Oh in 3D, Mobile, Photography, panorama, post processing.
Tags: , , , ,
2 comments

I was privileged enough to be a part of a panel in Mobile Reality at the Where 2.0 Conference this year, chaired by Brady Forrest.  Here’s the short description of the panel:

“An emerging class of smartphones including location-based services and persistent data connections are lenses by which we can effectively view data layers atop physical space. What was once only available from tethered desktop computers is now possible from pocket-sized companion devices that travel with us. We are seeing examples of this in their earliest incarnations – social networking, gaming, reference and commerce.

Opposed to looking far into the future, this panel looks at examples of this technology in use and available today to consumers on a variety of smartphone platforms, including the Apple iPhone and Google Android. Panelists will provide short demonstrations of this technology, followed by a topic discussion and Q&A.”

The reason for sharing this is to show you EveryScape’s initiatives towards mobility.  I believe EveryScape has one of the coolest and most useful visual platforms around (in my unbiased opinion), and you can see a glimpse of what’s being worked on in the video below (starting at the 2:3o mark).

3D Photoshop-ery! June 26, 2009

Posted by Mok Oh in Photography, panorama, post processing.
Tags: , , , ,
2 comments

Recently I wrote a blog about panoramic photography and about post processing.  I found this related video above which was a part of my research in collaboration with Max Chen and Fredo Durand at MIT around 2000.  This was the panorama that I took of the Omni Parker House Hotel mentioned in this post.  This was created from a single panoramic image.  I will certainly write more about this topic, but I want to drive home the fact that post processing of photographs is not limited to pixels and colors — geometry is certainly a part of it.  Case and point — we live in a three-dimensional space and photos are projections of that captured light.