Thursday 18 June 2015

CVPR recap and where we're going

The Computer Vision and Pattern Recognition (CVPR) conference was last week in Boston. For the sake of the computer vision folk (at least in my group), I created a summary/highlights document of some paper selections here:

It takes an hour just to read all the titles of all the sessions - over 120 posters/session, 2 sessions a day, 3 days... and workshops. This field is MONSTROUS in terms of output (and this is only the 20% or so of papers that actually make it to the main conference).
Thus, having a selection of papers instead of all of them becomes at least a tiny bit more manageable.

The selections I made are roughly grouped by topic area, although many papers fit in more than one topic, and multiple might not be optimally grouped - but hey, this is how my brain sees it.

The selection includes posters I went to see, so I can vouch that they are at least vaguely interesting. For some of them I also include a few point-form notes, which are likely to help with navigation even more.

Here's my summary of the whole conference:

I saw a few main lines of work throughout this conference: CNNs applied to computer vision problem X, metric for evaluating CNNs applied to computer vision problem X, new dataset for problem X (many times larger than previous, to allow for application of CNNs to problem X), new way of labeling the data for the new dataset for CNNs.

In summary, CNNs are here to stay. At this conference I think everyone realized how many people are actually working on CNNs... there have been arxiv entries popping up all over, but once you actually find yourself in a room full of CNN-related posters, it really hits you. I think many people also realized how many other groups are working on the exact same problems, thinking about the exact same issues, and planning on the exact same approaches and datasets. It's become quite crowded.

So this year was the CNN hammer applied to just about any vision problem you can think of - setting new baselines and benchmarks left and right. You're working on an old/new problem? Have you tried CNNs? No? The crowd moves on to the next poster that has. Many papers have "deep" or "nets" somewhere in the title, with a cute way of naming models applied to some standard problem (ShapeNets, DeepShape, DeepID, DevNet, DeepContour, DeepEdge, segDeep, ActivityNet). See a pattern? Are these people using vastly different approaches to solve similar problems? Who knows.

So what is the field going to do next year? Solve the same problem with the next hottest architecture? R-CNNs? even deeper? Some new networks with memory and attention modules? More importantly, do results get outdated the moment the papers are submitted because the next best architecture has already been released somewhere on arxiv, waiting for new benchmarking efforts? How do we track whether the numbers we are seeing reported are the latest numbers there are? Are papers really the best format to present this information and communicate progress?

These new trends in the computer vision are leaving us to think about a lot of very hard questions. It's becoming increasingly hard to predict where the field's going in a year, let alone a few years from now.

I think there are two emerging trends right now: more industry influence (all the big names seem to be moving to Google and Facebook), and more neuroscience influence (can the networks tell us more about the brain, and what can we learn about the brain to build better networks?). These two forces are beginning to increasingly shape the field. Thus, closely watching what these two forces have at their disposal might offer glimpses into where we might be going with all of this...


  1. Hey, I hope you are doing good. thanks for the summary :)
    As an almost-newbie to computer vision research, Your summary is useful for me.
    Also, It is mind-boggling to see so many tings, which has to be learned :P

  2. I wonder if any of these papers and experiments succeeded in bringing an idea about what the nonlinear reality is? We all know that the entire nature is governed by nonlinear processing, while, as the time passed by, our brain has turned tremendously more linear. When deep learning appeared, less than five years ago, I was helpful that, finally, humanity found a tool to reach into nonlinear. As a matter of fact deep learning was such a tool, where multiple causes can be added hierarchically and can analyse,in parallel, and now in a nonlinear manner, the path from input to output. In know about an application of deep learning in the research of the genome, and this about it.
    I have the tendency to think that our linear science that supervises and feeds our society with linear theories and justifications is very much afraid to disclose un ugly truth about the linear artificiality we created everywhere. Of course it is much easier to create new consumerist goods of limited use than to have a glimpse into something that will indicate that all our actual work is good for scrapping, is unsustainable, and mostly it is alarmingly dangerous. The financial institutions, owing everything in our world, cannot face such an ugly truth. Our post-doc neuroscientists work in the stores but not for disclosing the true face of nonlinearity.

  3. Hi Zoya,

    I appreciate your survey work in CVPR'15. Your conclusion is exact and I think next big thing will be a fundamental change in Neural network itself and this will improve CV problem solving tactics.