Wednesday, 24 June 2015

Imagining your imagination

Given the news that are making such a splash recently - "dreaming A.I." and "machines with imagination" (http://googleresearch.blogspot.fr/2015/06/inceptionism-going-deeper-into-neural.html), a few interesting questions are up for pondering...

An NN's (neural network's) "imagination" is a property of the data it has seen and the task it has been trained to do. So an NN trained to recognize buildings will hallucinate buildings in novel images it is given, an NN trained on YouTube videos will discover cats where no cats have ever been, etc.,.. So, an NN trained on my experience, one that sees what I see very day, (and provided it has the machinery to make similar generalizations) should be able to imagine what I would imagine, right?

Facebook and Google and other social services should be jumping on this right now to offer you an app to upload all your photo streams and produce for you "figments of your imagined imagination" or "what your photos reveal about what might be in your mind" (the high-tech NN version of personality quizzes, perhaps). Basically, you can expect the output to be a bizarre juxtaposition of faces and objects and shapes (like in the news article) but customized just for you! Wait for it, I'm sure it's just around the corner.

So if we strap on our GoPros or our Google Glasses and run out into the world hungrily collecting every moment, every sight, and every experience that we live through, can we then hope that our very own personal A.I.s will be able to learn from all this data to remember our dreams when we can't, guess a word off the tip of our tongue, make the same connections, parallels, and metaphors? and know what new thought our mind could have jumped to from the context of the previous conversation? As we envision that A.I. will one day augment us, do we take into account the fact that the augmentation will not be a simple division of labor: "I as the human being will leave the superior, heuristic, and creative tasks to myself, and leave my duller mechanical half to deal with all the storage and lookup and speed that I lack" -- this may be an outdated thought; perhaps "your" A.I. will be able to make bigger generalizations, leap further, find more distant connections, to innovate and create. The correct question should then be: what can YOU contribute to your A.I.?

Thursday, 18 June 2015

CVPR recap and where we're going

The Computer Vision and Pattern Recognition (CVPR) conference was last week in Boston. For the sake of the computer vision folk (at least in my group), I created a summary/highlights document of some paper selections here: http://web.mit.edu/zoya/www/CVPR2015brief.pdf

It takes an hour just to read all the titles of all the sessions - over 120 posters/session, 2 sessions a day, 3 days... and workshops. This field is MONSTROUS in terms of output (and this is only the 20% or so of papers that actually make it to the main conference).
Thus, having a selection of papers instead of all of them becomes at least a tiny bit more manageable.

The selections I made are roughly grouped by topic area, although many papers fit in more than one topic, and multiple might not be optimally grouped - but hey, this is how my brain sees it.

The selection includes posters I went to see, so I can vouch that they are at least vaguely interesting. For some of them I also include a few point-form notes, which are likely to help with navigation even more.

Here's my summary of the whole conference:

I saw a few main lines of work throughout this conference: CNNs applied to computer vision problem X, metric for evaluating CNNs applied to computer vision problem X, new dataset for problem X (many times larger than previous, to allow for application of CNNs to problem X), new way of labeling the data for the new dataset for CNNs.

In summary, CNNs are here to stay. At this conference I think everyone realized how many people are actually working on CNNs... there have been arxiv entries popping up all over, but once you actually find yourself in a room full of CNN-related posters, it really hits you. I think many people also realized how many other groups are working on the exact same problems, thinking about the exact same issues, and planning on the exact same approaches and datasets. It's become quite crowded.

So this year was the CNN hammer applied to just about any vision problem you can think of - setting new baselines and benchmarks left and right. You're working on an old/new problem? Have you tried CNNs? No? The crowd moves on to the next poster that has. Many papers have "deep" or "nets" somewhere in the title, with a cute way of naming models applied to some standard problem (ShapeNets, DeepShape, DeepID, DevNet, DeepContour, DeepEdge, segDeep, ActivityNet). See a pattern? Are these people using vastly different approaches to solve similar problems? Who knows.

So what is the field going to do next year? Solve the same problem with the next hottest architecture? R-CNNs? even deeper? Some new networks with memory and attention modules? More importantly, do results get outdated the moment the papers are submitted because the next best architecture has already been released somewhere on arxiv, waiting for new benchmarking efforts? How do we track whether the numbers we are seeing reported are the latest numbers there are? Are papers really the best format to present this information and communicate progress?

These new trends in the computer vision are leaving us to think about a lot of very hard questions. It's becoming increasingly hard to predict where the field's going in a year, let alone a few years from now.

I think there are two emerging trends right now: more industry influence (all the big names seem to be moving to Google and Facebook), and more neuroscience influence (can the networks tell us more about the brain, and what can we learn about the brain to build better networks?). These two forces are beginning to increasingly shape the field. Thus, closely watching what these two forces have at their disposal might offer glimpses into where we might be going with all of this...





Wednesday, 17 June 2015

The Computer History Museum in SF

The Computer History Museum in SF was great! It was a bit of a random stumble during a trip along the West Coast a few weeks ago, but it left a memorable trace! The collection of artifacts is quite amazing: name just about any time in computer history (ancient history included) and any famous computer (Babbage Engine, Eniac, Enigma, Univac, Cray, etc.) and some part of it is very likely at this museum. We totally assumed the museum would be a 2-hour stopover on the way to other SF sights, but ended up staying until closing, without even having covered all of it.



As a teaser I include a few random bits of the museum that caught my attention (I may have been too engrossed in the rest of the museum to remember taking pictures).

One of the oldest "computers": Antikythera mechanism - had never heard of it before! The Ancient Greeks continue to impress! Shows another timeless quality of humanity: our technological innovations are consistently driven by our need for entertainment (in the case of the Ancient Greeks, such innovations can be linked back to scheduling the Olympic Games). At this museum, there was a full gallery devoted to old calculators and various mechanical computing implements from different cultures.


A fully-working constructed version of Babbage's Difference Engine - completed in 2008 according to Babbage's original designs (which apparently worked like a charm without any modification!) Museum workers crank this mechanical beast up a few times a day for the marvel of the crowd. Once set, this machine can compute logarithms, print them on a rolling receipt, and simultaneously stamp an imprint of the same values into a mold (for later reprinting!) Babbage also thought of what happens when the imprinting fills up the whole mold - a mechanical mechanism halts the whole process, so that the tablet can be replaced! That's some advanced UI, developed without any debugger or user studies.

Based on the over-representation of this Babbage Engine in this post, you can tell that quite a bit of time was spent gawking at it:



By the way, here's a real (previously functional) component from the Univac. Unlike that panel with lights and switches at the top of this post. Apparently, that did not do anything. It was purely for marketing purposes for whenever the then-investors came around to check out this "machine" - much more believable that something real is happening when you have dashboard blinking of some kind and large buttons that serve no purpose but look "very computational". Looks like this continues to be a powerful marketing strategy to this day :)


Just a fun fact (no, this is not the origin of the word "bug", which is what I thought this was at first, but does demonstrate some successful debugging):


The following describes quite a few computer scientists I know:


There was a gallery devoted to Supercomputers and another gallery devoted to computer graphics. Look at what I found there - every Graphics PhD student's rite of passage (by the way, the Intrinsic Images dataset is sitting in my office, no glass case, but we will soon start charging chocolate to see it):


There was also a whole gallery devoted to robots and A.I. (an impressive collection), a gallery devoted to computer games, and a gallery devoted to the Apple computer just to name a few.


By the way, something I didn't know about the Apple computer - here is some awesome bit of marketing that came out in 1984:


There was a gallery devoted to the Google self-driving car. I like how this is in the Computer History museum, because really, you can't put any computer technology in a museum and assume it will remain current for very long. The drone in the corner of that room had a caption that mentioned something about possible future deliveries. Old news. I've seen bigger drones :) 


That's about the extent of the photos I took, because photos really fail to convey the environment that a museum surrounds you with. It is a museum I would gladly recommend!

As an after-thought, it's interesting to visit a "history" museum where you recognize many of the artifacts. Gives you a sense of the timescale of technological innovation which continues to redefine what "history", "progression" and "timescale" really mean... notions that we have to regularly recalibrate to.








Saturday, 13 June 2015

Google I/O Recap

Announcements from Google I/O are increasingly popping up over the media.
Last year, after going to Google I/O I compiled a series of slides about some of the top announcements and some of the other sessions I went to: http://web.mit.edu/zoya/www/googleIOrecap.pdf
This year, I watched many Google I/O videos online, and I've compiled a small summary here: http://web.mit.edu/zoya/www/googleIO2015_small.pdf
As a researcher, I find it instructive to look to where such giants such as Google are moving in order to get a sense of which research directions and developments will be especially in need in the coming while. Thus, I look at the talks from an academic perspective: what are the key research questions surrounding every product? I tried to include some of these in my latest slides.

Tuesday, 2 June 2015

Why Google has the smartest business strategy: openness and the invisible workforce

Google works on an input/output system. In other words, for everything that Google developers create, Google accepts input from users and developers around the world. Note that the latter group/community is orders of magnitudes larger than the former, so by harnessing the resources and power from the users and developers around the world, Google's Global footprint becomes significantly larger.

For instance, Google produces continuos output in the form of products and developer platforms, and accepts input in the form of development directions and most importantly, apps. By creating platforms on which developers can build on top of, Google harnesses the users that want the apps. The more that Google releases (e.g. SDKs), the more developers are looped in to create new apps, and the more users get pulled in to use the apps, thus acquiring the Google products in the process. Thus, the number of people around the world that are increasing the consumer base for Google products far extends past the number of Google employees.

In fact, the number of people indirectly working for Google is huge. Consider the Google Developer Groups (GDGs) that can be found all around the world - independent organizations of developers and enthusiasts that get-together to bond over Google's technology (they also give Google product-related talks and host help sessions for their local communities, all on their own time). What's in it for the members? Members of GDGs have the support and network of individuals with similar interests. Google wins by having a Global network of communities that are self-sufficient and self-reinforcing and do not require Google support or investment. Google Trusted Testers are non-employees that test beta products for Google. What's in it for the testers? First-hand experience with Google products. What's in it for Google? A workforce for whom being "first to try a product" is sufficient reward. The Google Student Ambassador Program gives college students an opportunity to exhibit leadership by acting as a liaison between Google and their home institution, putting on Google-supported events (information sessions, hackathons, etc.) and forming student communities. The student ambassador's motivation is a nice line on their resume and great experience communicating with both industrial and institutional personnel and organizing events. Google wins by being promoted on college campuses and having easier avenues for student recruitment... all for the price of providing some Google-themed freebies at college events. Then there's all the other smaller organizations that are not directly supported by, but have affiliation with, Google. For instance, the Google Anita Borg Alumni Planning Committee that I am part of is devoted to increasing visibility and interest in computer science among minorities and help promote diversity in computer science education. We, as a group of females distributed Globally, start initiatives and put on events (such as the following) in our local communities to advance these missions. Google provides the branding. We win through affiliation with Google, Google wins through affiliation with philanthropic organizations. These are just a few of the organizations and communities that are affiliated with but not directly supported (at least financially) by Google. In fact, Google does not need to directly support or control/govern any of these communities precisely because they are self-sufficient and self-motivated - a big win for Google, given the limited investment.

Now consider the yearly Google I/O conference that draws over 5,000 attendees. Many of these attendees are developers who come to the conference to hear first-hand about new product and platform releases (and participate in hands-on workshops with the Google product developers themselves). These developers then bring this knowledge back to their communities, and contribute their own apps and products to the Google community. Each year, at this conference, Google announces new support infrastructures to make the use of Google products increasingly easier (this year, for instance, Google announced new OS and language support for the Internet of Things so that developers can more easily add mobile support to physical objects - think: the smart home). Correspondingly, the number of Google product-driven apps increases and expands. Users of apps buy Google products and services and continuously provide feedback (either directly through surveys or indirectly by having their interactions and preferences logged on Google servers). Thus, we are all contributors to the growth of the Google footprint.

What can we infer from all of this? Google is firmly rooted in our societies and is here to stay. The number of people supporting, improving, and building on top of Google products is huge - it is Google's invisible workforce. Thus, Google will continue to grow and improve at great speeds.

What lesson can we learn from all of this? Being open (i.e. in terms of software and even hardware) can allow a company to harness the power of other developer and user communities, thus increasing the size of the effective workforce that builds the company's products, directions, and reputation. Google has one heck of a business strategy.