For further thought...

ECCV in a theatrical setting

2016-10-20T11:53:00.002-07:00

Koninklijk Theater Carré (in Amsterdam), where the main conference was held, is regularly used for theatrical and circus performances. The main stage was home to all the oral and most of the poster presentations during the week. This both meant that (i) speakers were performers with their audience looming above them from all sides and balconies, and that (ii) poster sessions from a birds-eye-view looked like a simulation of particles moving through a viscous liquid, trapped within the confines of the stage (scroll to the end of this post for a demonstration).

Due to this unusual set-up, audience questions could not be solicited in the usual manner of line ups in front of a microphone (try climbing over all those people, and out of a balcony). Instead, given a tech crowd, it was expected that technology could easily come to the rescue... the results of which can be summarized by comments made on separate occasions by the respective session chairs:

"Please post your questions on twitter and we will ask them on your behalf [...]
But neither of us have twitter, so we will ask our own questions in this session."

"It seems the community is composed of two groups:

those that have questions, and those that know how to use twitter

- we’re still hoping there will be an intersection at some point."

There was little to complain about otherwise: the venue was quite beautiful; there were many comfortable corners all around the building that were quite favorable to getting some paper reading done; the little baked parmesan palmiers that waiters carried around on trays all throughout the day were impeccable; and the city surrounding the conference was bursting with energy and canals.

Main topics:

During the welcome, the general chairs put up some statistics about the topic areas that represented ECCV this year. The top ones include:

deep learning

3D modeling

events, actions

object class detection

semantic image

object tracking

de-blurring

scene understanding

image indexing

face recognition

segmentation

Topics like sparse coding are going down in paper representation. High acceptance rate topics are confounded by the size of those topics: smaller topics have a larger relative percentage of that are accepted (e.g. model-based reconstruction, 3D representation, etc.). Popular reviewer subject areas mostly follow the top topic areas above - specifically: 3D modeling, deep learning, object class detection, events, face recognition, object class detection, scene understanding, etc.

Summary notes:

My summary notes on the presentations that I attended can be found here (covers ~70% of the oral sessions): https://docs.google.com/document/d/175ORVlLMdjOscJ7-93WIt0bieUiu21vtlL7J-7-7qBI/pub

Some general research trends*:

* disclaimer: very biased by my own interests, observations, and opinions

(which tend to revolve around perception, cognition, attention, and language)

for an objective summary, go instead to the summary notes linked above

Nobody asks anymore: "is this done with CNNs too?", and more and more research is digging into the depths of the not-so-black* box of CNNs. The remaining fruits are now a little higher than they were before, and we are beginning to see more reaching - in the form of innovations in architectures, evaluations, internal representations, transfer learning, integration with new sensors/robotics, and unsupervised approaches. More about some of these below.

* With some notable exceptions -> Chair: “did you train with stochastic gradient descent?” Speaker: “we trained with caffe”

We're seeing old ideas come back in new architectural forms: new ways of encoding long-thought-about constraints and relations. If one can open an old vision paper and reformulate the proposed pipeline as an end-to-end network, encode constraints and heuristics as appropriate loss functions, and leverage different task knowledge by designing a corresponding training procedure, then a new paper is in the making (e.g. active vision for recognition).

http://www.eccv2016.org/files/posters/S-1B-05.pdf

Themes that we are beginning to see more and more of: unsupervised learning, semi-supervised learning, and self-supervised learning (with varying degrees of overlap, depending on how you define them). The main idea being that with the deep and powerful architectures we have now, solving each new problem in an end-to-end fashion would require an Imagenet-scale dataset. Because this is not always possible, transferring knowledge, labels, and classifications across tasks, datasets, and individual frames/images is the sought-after approach.

Video is a popular modality: temporal information can provide a strong supervisory signal for propagating labels across frames or for learning to do object detection from unlabeled video (e.g., Walker et al., Long et al.). Key frames of an action or an event can serve as targets for the rest of the frames. For instance, Zhao et al. perform facial expression recognition using peak facial expressions as a supervisory signal, by matching the internal representations (i.e. network features) of peak and non-peak facial expressions in order to build more robustness and invariance into the recognition pipeline. Similarly, photo sequences or collections provide loose temporal relationships that can be harnessed as a self-supervisory cue for predicting relevant/future photos (e.g, Sigurdsson, Chen & Gupta). As a side note, there is a lot more work on multi-dimensional inputs (3D, video, image sequences/collections) than single-images. Even with single images, there is a lot more temporal processing (e.g., via attention modules, more about this below). In other words, tasks that can be summarized as "image in" -> "single-label prediction out" have pretty much been exhausted.

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders, Jacob Walker, Carnegie Mellon University; Carl Doersch, Carnegie Mellon University; Abhinav Gupta, ; Martial Hebert, Carnegie Mellon University

Learning Image Matching by Simply Watching Video, Gucan Long, NUDT; Laurent Kneip, Australian National University; Jose M. Alvarez, Data61 / CSIRO; Hongdong Li, ; Xiaohu Zhang, NUDT; Qifeng Yu, NUDT

Peak-Piloted Deep Network for Facial Expression Recognition, Xiangyun Zhao, University of California, San Diego; Xiaodan Liang, Sun Yat-sen University; Luoqi Liu, Qihoo/360; Teng Li, Anhui University; Yugang Han, 360 AI Institute; Nuno Vasconcelos, ; Shuicheng Yan

Learning Visual Storylines with Skipping Recurrent Neural Networks, Gunnar Sigurdsson, Carnegie Mellon University; Xinlei Chen, CMU; Abhinav Gupta

Language is another powerful supervisory signal: images that share tags or words in their respective descriptions (think also: comments in the context of social media) can be used to train network representations to cluster such images closer together or further apart (e.g., Yang et al.). Some further examples of self-supervision by language include the works of Rohrbach and Lu. Other examples of cues/tasks used as self-supervision to learn useful internal representations for other tasks: co-occurrence, denoising, colorization, sound, egomotion, context, and video. Ways of leveraging existing images, modifying them and then learning the mapping back to the original images can be used as free training data (e.g., colorization, discussed more below, or image scrambling: Noroozi & Favaro).

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations, Hao Yang, NTU; Joey Tianyi Zhou, IHPC; Jianfei Cai, NTU

Grounding of Textual Phrases in Images by Reconstruction, Anna Rohrbach; Marcus Rohrbach, UC Berkeley; Ronghang Hu, UC Berkeley; Trevor Darrell, UC Berkeley; Bernt Schiele

Visual Relationship Detection with Language Priors, Cewu Lu, Stanford University; Ranjay Krishna, Stanford University; Michael Bernstein, Stanford University; Fei-Fei Li, Stanford University

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, Mehdi Noroozi, University of Bern; Paolo Favaro

http://www.eccv2016.org/files/posters/O-1B-04.pdf

Works that demonstrate new unsupervised approaches will typically do evaluation in one of the following ways: (i) show that useful intermediate features emerge, by visualizing what neurons learn to fire on (as in Owens', Zhang's work, based on the approach introduced by Zhou et al.), or (ii) show that the learned internal representation provides good initialization for other tasks - i.e. that it is amenable to transfer learning (see Larsson et al. or Zhang's work for more examples). A great example of this self-supervised learning approach is the work by Pinto et al. who showed that a physical robot that grasped, pushed, and poked a whole bunch of objects a whole bunch of times could learn useful visual representations for other tasks. Demonstrating that a learned representation is useful can be done by fixing the network and using computed features to directly cluster/retrieve images, or learning a classifier on top of the computed features for a new task, or using the learned representation only as an initialization while retraining with new data. The latter approach is especially useful if the task for which the network needs to be retrained does not have enough training data for complete end-to-end learning, and the unsupervised approach can bootstrap some of the feature learning.

Ambient sound provides supervision for visual learning, Andrew Owens, MIT; Jiajun Wu, MIT; Josh Mcdermott, MIT; Antonio Torralba, MIT; William Freeman, MIT

Colorful Image Colorization, Richard Zhang, UC Berkeley; Phillip Isola, MIT; Alexei Efros

Learning Representations for Automatic Colorization, Gustav Larsson, University of Chicago; Michael Maire, Toyota Technological Institute at Chicago; Greg Shakhnarovich, TTI Chicago, USA

The Curious Robot: Learning Visual Representations via Physical Interactions, Lerrel Pinto, Carnegie Mellon University; Dhiraj Gandhi, ; Yuanfeng Han, ; Yong-Lae Park, ; Abhinav Gupta

http://www.eccv2016.org/files/posters/O-4B-04.pdf

This also touches on an important trend: we are starting to see more integration with robotics. We are coming back to active vision (e.g., Jayaraman & Grauman). New architectures and compute power are providing us with the capabilities of learning structure from (relatively unstructured) interactions. This area of research will likely see tremendous growth in the next few years.
Deep is coming to a robotics lab near you.

Look-ahead before you leap: end-to-end active recognition by forecasting the effect of motion, Dinesh Jayaraman, UT Austin; Kristen Grauman, University of Texas at Austin

http://www.eccv2016.org/files/posters/P-3B-17.pdf

Language continues to be a hot topic. This includes image captioning (and variants, like Zeng's "title generation"), and related tasks like visual question answering - VQA (e.g., Mallya & Lazebnik, Lin & Parikh), referring expressions (e.g., Hu et al., Yu et al.), explanation generation (Hendricks et al.), semantic tagging, and leveraging language as a supervisory cue for other visual recognition tasks (as discussed above). Attention modules are also beginning to pop up more frequently: here, "attention" is used to refer to a modulation of (visual) features - a reweighing of which features, at which spatial locations, are used most at a given timestep (e.g., Zhang et al.). Often, attention modules go hand-in-hand with recurrent neural networks (RNNs, e.g., LSTMs) that can encode temporal relationships. In this case, processing of the visual input at one time step influences processing at the next time step. For instance, captioning systems may "attend" to different image regions in sequence, while generating caption words sequentially. VQA systems may use a similar iterative procedure to refine the location of the image that can provide an answer to the question or aid with localizing a referring expression (e.g. Rohrbach et al., Xu & Saenko).

Title Generation for User Generated Videos, Kuo-Hao Zeng, National Tsing Hua University; Tseng-Hung Chen, National Tsing Hua University; Juan Carlos Niebles, Stanford University; Min Sun, National Tsing Hua University

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering, Arun Mallya, UIUC; Svetlana Lazebnik

Leveraging Visual Question Answering for Image-Caption Ranking, Xiao Lin, Virginia Tech; Devi Parikh, Virginia Tech

Segmentation from Natural Language Expressions, Ronghang Hu, UC Berkeley; Marcus Rohrbach, UC Berkeley; Trevor Darrell

Modeling Context in Referring Expressions, Licheng Yu, University of North Carolina; Patrick Poirson, ; Shang Yang, ; Alex Berg, ; Tamara Berg, University on North Carolina

Generating Visual Explanations, Lisa Anne Hendricks, UC Berkeley; Zeynep Akata, ; Marcus Rohrbach, UC Berkeley; Jeff Donahue, UC Berkeley; Bernt Schiele, ; Trevor Darrell

Top-down Neural Attention by Excitation Backprop, Jianming Zhang; Zhe Lin, Adobe Systems, Inc.; Jonathan Brandt; Xiaohui Shen, Adobe; Stan Sclaroff, Boston University

Grounding of Textual Phrases in Images by Reconstruction, Anna Rohrbach; Marcus Rohrbach, UC Berkeley; Ronghang Hu, UC Berkeley; Trevor Darrell, UC Berkeley; Bernt Schiele

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering, Huijuan Xu, UMass Lowell; Kate Saenko, University of Massachusetts Lowel

In general, many more works are using RNNs - and this is because some portion of the input or required output can be interpreted as a sequence: e.g. a sequence of frames, a sequence of images in a collection, or a sequence of words (in the input question or output caption). RNNs have also been shown to provide effective iterative refinement (e.g. Liang et al.). An "attention module" can similarly be used to parse an image or image features as a sequence (e.g. Xiao et al., Peng et al., Ye et al.). What this accomplishes is some simulation of bottom-up combined with top-down reasoning. And by the way, we talked a bit about attention and how it can be used to leverage other vision tasks in our Saturday tutorial.

Semantic Object Parsing with Graph LSTM, Xiaodan Liang, Sun Yat-sen University; Xiaohui Shen, Adobe; Jiashi Feng, NUS; Liang Lin, Sun Yat-sen University; Shuicheng Yan, NUS

Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks, Shengtao Xiao, National University of Singapore; Jiashi Feng, NUS; Junliang Xing, Chinese Academy of Sciences; Hanjiang Lai, SUN YAT-SEN UNIVERSITY; Shuicheng Yan, National University of Singapore; Ashraf Kassim, National University of Singapore

A Recurrent Encoder-Decoder Network for Sequential Face Alignment, Xi Peng, Rutgers University; Rogerio Feris, IBM Research Center, USA; Xiaoyu Wang, Snapchat Research; Dimitris Metaxas, Rutgers University

Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation, Qi Ye, ; Shanxin Yuan, Imperial College London; Tae-Kyun Kim, Imperial College London

With regards to image understanding and language, beyond scene recognition and object detection, we are also seeing increasing interest in interaction and relationship detection (e.g. Mallya & Lazebnik, Lu et al., Nagaraja et al.). I also found quite interesting the applications of language to non-natural images - specifically, diagrams (Kembhavi et al., Siegel et al.).

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering, Arun Mallya, UIUC; Svetlana Lazebnik

Visual Relationship Detection with Language Priors, Cewu Lu, Stanford University; Ranjay Krishna, Stanford University; Michael Bernstein, Stanford University; Fei-Fei Li, Stanford University

Modeling Context Between Objects for Referring Expression Understanding, Varun Nagaraja, University of Maryland; Vlad Morariu, University of Maryland; Larry Davis, University of Maryland

A Diagram Is Worth A Dozen Images, Aniruddha Kembhavi, AI2; Michael Salvato, Allen Institute for Artificial; Eric Kolve, Allen Institute for AI; Minjoon Seo, University of Washington; Hannaneh Hajishirzi, University of Washington; Ali Farhadi, University of Washington

FigureSeer: Parsing Result-Figures in Research Papers, Noah Siegel, ; Zachary Horvitz, ; Roie Levin, ; Santosh Kumar Divvala, Allen Institute for Artificial Intelligence; Ali Farhadi, University of Washington

We continue to see interesting innovations in neural network architectures - for instance, alternatives to convolution filters (Liu et al., Danelljan et al.), integration of CRFs with NNs (Arnab et al., Gadde et al., Chandra & Kokkinos), and nice tricks to facilitate training like stochastic depth (Huang et al.), to mention just a few.

Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network, Sifei Liu, UC Merced; Jinshan Pan, UC Merced; Ming-Hsuan Yang, UC Merced

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking, Martin Danelljan, Linköping University; Andreas Robinson, Linköping University; Fahad Khan, Linkoping University, Sweden; Michael Felsberg, Link_ping University

Higher Order Conditional Random Fields in Deep Neural Networks, Anurag Arnab, University of Oxford; Sadeep Jayasumana, University of Oxford; Shuai Zheng, University of Oxford; Philip Torr, Oxford University

Superpixel Convolutional Networks using Bilateral Inceptions, Raghudeep Gadde, Ecole des Ponts Paris Tech; Varun Jampani, MPI-IS; Martin Kiefel, MPI for Intelligent Systems; Daniel Kappler, MPI Intelligent Systems; Peter Gehler

Fast, Exact and Multi-Scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs, Siddhartha Chandra, INRIA; Iasonas Kokkinos, INRIA

Deep Networks with Stochastic Depth, Gao Huang, Cornell University; Yu Sun, Cornell University; Zhuang Liu, Tsinghua University; Daniel Sedra, Cornell University; Kilian Weinberger, Cornell University

http://www.eccv2016.org/files/posters/S-3A-08.pdf

Among very specific topics with unproportionally many papers this year: 11 papers on person re-identification, 6 papers on object counting (5 of which use CNNs), 3 papers with colorization applications (Zhang, Larsson, Liu), and over 20 papers on segmentation and variations on segmentation (like portrait or scene matting, e.g., Shen et al.). For instance, there were many improvements in semantic segmentation, and some domain-specific (e.g. biomedical) segmentation approaches presented (e.g. Liu et al.).

Colorful Image Colorization, Richard Zhang, UC Berkeley; Phillip Isola, MIT; Alexei Efros

Learning Representations for Automatic Colorization, Gustav Larsson, University of Chicago; Michael Maire, Toyota Technological Institute at Chicago; Greg Shakhnarovich, TTI Chicago, USA

Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network, Sifei Liu, UC Merced; Jinshan Pan, UC Merced; Ming-Hsuan Yang, UC Merced

SSHMT: Semi-supervised Hierarchical Merge Tree for Electron Microscopy Image Segmentation, Ting Liu, University of Utah; Miaomiao Zhang, MIT; Mehran Javanmardi, University of Utah; Nisha Ramesh, University of Utah; Tolga Tasdizen, University of Utah

Deep Automatic Portrait Matting, Xiaoyong Shen, CUHK; Xin Tao, CUHK; Hongyun Gao, CUHK; Chao Zhou, ; Jiaya Jia, Chinese University of Hong Kong

Interestingly, none of the award-winning papers were about neural networks.

The future of vision conferences?

It is interesting to observe how fast this field evolves, and the impacts this has on researchers, research programs, the publishing pipeline, and the outcome of conferences. In particular, it is now common for papers to be hanging up on arxiv for over half a year before they are presented at a conference. Occasionally this can lead to confusion, with researchers scratching their heads, surprised to stumble upon a particular paper at the conference (hasn't this paper already been published for a while? hasn't it already appeared in the mass media?) By the time the conference rolls around, other researchers may already be familiar with the paper, and may have even innovated on top of it.

With the speed of innovation, at the same conference you might find both papers that build upon previous architectures to improve their pitfalls, and other papers that completely replace the original architectures with large performance gains. Small improvements are likely to be quickly overstepped by more significant leaps that leave the small improvements quickly forgotten. Lasting work requires qualitatively new approaches.

It was interesting to see that a number of researchers presented their original published results (from the camera ready version of the paper) alongside new results obtained since, in an attempt to stay current - after all, half a year of additional innovations can change many numbers. Some of these additional innovations are a result of building upon recently-arxived work. Some presenters even explicitly make reference to an extension of the presented work that is already available on arxiv or is published in another venue.

This might explain some of the proliferation of computer vision research to other conferences. To get innovations out fast enough for them to remain relevant, it might make sense to publish them in the nearest upcoming venue than to wait for the next computer vision conference to roll around. We're seeing related papers pop up in satellite workshops, and other conferences in machine learning, graphics, robotics, and language (take your favorite computer vision researcher and check which venues they've most recently published in).

It has become common to hear: "This was state of the art at the time of submission... But since then, we have been surpassed by multiple methods".

This leads to an interesting conundrum: arxived work is not peer-reviewed, but creeps into presentations of peer-reviewed work at conferences. This is one way that presented work is made more current and relevant. Is this a symptom of the progress in this field outrunning the current conference structure? In some other fields (physics, biology, neuroscience, etc.), conference presentations are based on submitted abstracts, and publications are disentangled from conferences. However, I don't believe there are precedents of a field moving this fast. This is a difficult question.

But on the topic of modernizing conferences, something needs to be done about the overcrowding situation around posters (especially with attendance growing considerably). It's quite hard to find a spot to stand in front of a poster presenter, within audible distance and without occlusion. Up in the balcony of the Theater Carre, filming the craziness below, I daydreamed of staying comfortably seated while flying a drone to a perfectly-selected location in full view of a desired poster and presenter. Perhaps that kind of swarm behavior could be much more efficiently optimized using some future conference logistics software ;) In the meantime, here's my birds-eye-view:

Diversifying bias: how dataset bias can hurt and what we can do about it

2016-06-29T23:27:00.000-07:00

A very important topic for consideration, the question of dataset bias has been getting into the mainstream more and more recently: e.g. "Why we should expect algorithms to be biased", and
"Artificial Intelligence's white guy problem".

As a research curiosity, dataset bias has been shown to affect model generalizability: a machine learning algorithm trained on one dataset, and receiving flying colors on a particular collection of test images, may have abysmal performance on a different dataset with different image statistics. You can think about this as the case of only ever seeing faces front and center in an image, and then being tested on off-center faces and realizing you are unexpectedly, but miserably, failing at detecting them. Some real-life examples: "HP investigates claims of racist computer", "Camera misses the mark on racial sensitivity".

http://www.selfieresearchers.com/wp-content/uploads/2014/09/CV-Dazzle-antiface.png

A more relevant and pressing example concerns the self-driving car. Ultimately, if trained correctly, it will learn to avoid pedestrians, lamp posts, barriers, and whatever else was meticulously labeled and included into its training set. But will it know to avoid a kangaroo if the most it ever saw in its training data and prior experience is a deer or a cow? For an algorithm to be capable of this type of generalization is a reasonable expectation. So even though the car might not be able to accurately determine what is happily hopping across the street, it will guess that it is a bit of a deer and a bit of a cow and since it is upright, maybe a bit of a pedestrian as well... but overall, and most importantly, whatever its identity, this happy creature should not be run over.

Speaking of cows...

https://www.insidescience.org/content/fitbit-cows/3076

Let me make an aside here and say that the sole fact that our machine learning algorithms (the ones behind the self-driving cars, autonomous appliances and robots of the future) are increasingly relying on data (are "data-driven") and are increasingly likely to be neural networks (which happen to chug through and learn from large amounts of data very well) is not in itself a reason for concern. I do not buy the argument that we should fear our "black box algorithms" because they are parameter-bloated, connected and intertwined networks that are "hard to understand and harder to interpret". Until quite recently, when asked what computer vision was up to, I would sarcastically answer: "detecting cows on fields... but only patched ones, on green fields, in the center of the image, and only if awake". We wouldn't even be having some of these conversations (in the media) about dataset bias if neural networks weren't performing this well (there are bigger problems at stake if even the cows can't be detected). Having large amounts of data and the architectural machinery to deal with it is precisely what is helping us learn and generalize better.

With that aside, it is nevertheless crucial to start thinking more carefully about dataset bias and model generalization. This thinking should not pit us against data-driven algorithms in any way; rather, we should continue to remind ourselves that it is the great prediction potential of the algorithms that is granting us the opportunity to think about these questions in the first place.

There is no doubt that our current, state-of-the-art models are suffering from bias in their datasets. Otherwise, Google wouldn't have made a headline that was centered around how well its algorithm could recognize cats (as opposed to anything else, really) by learning from YouTube videos. DeepDream wouldn't be imbuing every single photo with hallucinogenic dogs. Microsoft's AI chatbot wouldn't have learned in the span of only a day that being a racist asshole gets people's attention. Researchers wouldn't be spending their time fooling the algorithms. The list goes on.

The problem is that this kind of dataset bias is unavoidable - because people are biased and they are the source of the data that we're feeding to our algorithms (and if we can learn anything from the above, it's that people seem to be biased towards putting others down on social media, but then compensating by flooding the net with pictures of cute animals). This means that, unwillingly, we may be imbuing our algorithms with negative qualities, behavior, and biases. We may be perpetuating biases that we would otherwise like to remove from society (see "When algorithms discriminate").

And yet, when we try to actively interfere, we can make the problem worse. More frighteningly, if people know they can influence and have effects on the data, they may use this for their benefit, and this can have either positive or negative consequences for other members of society (as in this Ted Talk on the "Moral bias behind your search results"). We are all responsible for the data that we put on the net (what we upload and what we search), and need to recognize that the biases that are out there are our own.

But if the biases do get out there, should we get rid of them? What would "un-biasing" the data even mean? Who has the right to say that something should or should not appear in a search result? Would a top-down filtering of content, that would change the data everyone sees, even be appropriate? Most parents would disagree with a single, universal parental control for all of the world's children, and this is not much different. Different individuals, cultures, societies have different preferences and norms, different beliefs and taboos.

Which brings me to the importance of diversity in the data that we have. I would not argue for artificially nudging the numbers or tweaking the data to try to eliminate certain kinds of biases, as this can have all sorts of secondary and unintended consequences (as the Twitter example has shown). Instead, the more people that are participating in the data that is being harnessed for training algorithms, the better. This naturally adjusts the data balance to more accurately reflect the population that is using it. One solution is just to put more people on the web, and I think we will get there. Another is to bring more humanities folks into the tech loop. It would be great to have the perspective of anthropologists, sociologists, historians, policy and law makers, and psychologists for insights on cultural sensitivities, historical trends, crowd mentality, virality, societal pressures, etc. so that we can have better expectations of what the data may bring before its on our plates and we have to deal with the consequences. In this case, the suggested approach is to use this knowledge to adjust the data-collection procedures themselves rather than the data after-the-fact.

If you were collecting a survey, a change in wording but not meaning might drastically change who would respond to it. How different cultures look at concepts like success, individuality, and norms are also crucially different and affect how and what they communicate about these topics. Take this simple example: say you collect perceptions about an exam from a group of schoolchildren. You get two answers: "it was easy! [secondary reason: I passed]", "it was ok [I think I got only 98%... where did the 2% go?]". Without knowing the context of the cultures, societies, or families from which these two responses came, you would have a very biased dataset (I'm reminded of Malcolm Gladwell's book Outliers; or this talk). And this extends beyond surveys. The behaviors you elicit (and end up collecting) from a group of users can depend crucially on how information, a task, or a UI are presented. Psychologists and sociologists know this very well. But they are also less likely (currently) to be the ones collecting the large datasets that modern-day computational algorithms are trained on.

Questions of labeling are key. What do you call that thing? If you give it one label over another, a different set of properties or attributes might be retrieved. Consider an example: the labeling of street scenes. Here's a pedestrian, and another one. Here's a bicyclist. What about that person in a wheelchair? Is that a pedestrian or transportation device? How many body parts must be visible and moving for a pedestrian to be labeled as such? This labeling might affect how an algorithm analyzing the scene might predict the future movements of the participants and objects in the scene. This, in turn, might affect the decisions the algorithm (read: autonomous vehicle) makes.

http://fineartamerica.com/featured/psychedelic-city-pop-art-new-york-city-street-scene-miriam-danar.html

Every labeler is biased. Biased by their culture, their society, and their experience. Instead of attempting to unbias the labels, we should introduce even more biased labelers... to compensate (and please, let's throw something intellectual up on the net, at least once for every 100 cats...). We should increase the diversity of the bias until, on average, we get something reasonable. Ensembles work. That's the wisdom of crowds.

And what do we do in the meantime while we wait for the whole world's wisdom to accumulate on the net? We think harder about our data collection strategies and the tasks used; we spend more time debugging and visualizing the algorithms and the trends they pick up on; we consider how to present, display, and use the data; we brainstorm ways to annotate and make explicit whether certain labels, tags, or content are more likely to be controversial or subjective; and we treat predictions in this space with greater care, and importantly, less confidence. Just as we tend to dislike the individuals with the greatest bias but highest confidence, let's not fill our digital world with these personalities.

My free business lesson from an Uber driver

2016-02-27T11:11:00.000-08:00

Want a free business class? Find out more about Uber (or try talking to more Uber drivers). Every Uber driver has at least some kind of opinion to share about the Uber business model (its upsides and downsides for drivers and passengers alike), and some drivers (if you are inquisitive) will provide you with additional business logistics. Occasionally you will receive an outlook on the (potentially sobering) present and future societal impacts of Uber. If you are really lucky, you will have spent the whole ride entertained by a detailed rundown of the business' history and how it has been continuously adapting to novel locations, changing consumer demands, and emerging competition. I found myself the lucky passenger of precisely the latter kind of Uber driver on my last trip from SFO to Palo Alto: a young latin American named Marcos, with a square jaw and an equally square baseball cap bill. I will endeavor to provide a recount of this conversation (really, a monologue punctuated by my occasional requests for additional details). I aim also for my recounting to have the properties of a conversation, in that regardless of the factual accuracy of individual details or the exact temporal sequence of events (potentially tainted by the knowledge of my driver and my interpretation), the general outlines of the high-level picture should nevertheless surface.

source: http://johnbracamontes.com/

From my driver Marcos I learned that Uber sprouted up in SF to fill an existing gap in the market: the need for a professional and, importantly, reliable chauffeuring service. An emerging sentiment at the time was one of dissatisfaction with cab services, passengers having to unwillingly deal with unreliable service and rude or disrespectful drivers. The drivers seemed to have the upper hand in this market and behaved accordingly.

Although I am not sure how widespread this sentiment was, I can attest to the fact that this is the reason I've never liked taking cabs.

Naturally, this was the sort of inconvenience and unpleasantness that the more financially-privileged were willing to pay to avoid. Uber saw this opportunity, and was perfectly positioned to take advantage of it: in a city with (1) a dense population packed in a relatively small (drivable) area, (2) large and growing tech companies providing a continuous supply of financially-privileged individuals, and (3) a traditionally startup-friendly environment, where bold new ideas regularly surface and are picked up by the wave of tech hype. And so, Uber was born (in 2010). As a professional and reliable chauffeuring service with a convenient mobile app interface (and up-to-date updates on driver location), passengers would be picked up in shiny black cars by courteous drivers in formal attire, offering additional frills like water and mints for the on-the-run business man. Sure, this was an expensive alternative to cabs, but to the users of UberBLACK, it was well worth it. Behind the scenes the structure was quite clever as well: Uber provided the cars and phones for the drivers (equipped with Uber app and Google maps), and regular people stepped up to provide themselves as drivers, no formal interviewing procedure required. Marcos dropped out of his community college to take on this new, respectable job that required a full-time commitment. As the success of a personalized on-demand transportation service had shown its colors, it opened up the market for new variants. And this is where Lyft comes into the picture.

source: techcrunch.com

Lyft aimed to capture another SF-based market segment: the young crowd of current students and recent graduates, now employed at local startups. An alternative transportation solution was needed for the kind of person that ate Mexican from a food truck and sported giveaway t-shirts acquired at hackathons and career fairs. Lyft was marketed as an affordable ride-share, the distinctive pink mustache on car bumpers trumpeting the friendly, hip, and easy-going atmosphere that customers would learn to expect from it. In stark contrast to UberBLACK, passengers would sit in the front, engage in conversations with their casually-dressed drivers, and ride in whatever car the driver happened to own. Whereas Uber lent its drivers cars and phones, Lyft sent them giant pink furry mustaches. The latter was more financially viable, allowing prices to drop to student standards, well below cab fees. Importantly, Lyft drivers could work on flexible schedules, squeezing in rides in the free moments of the day, morning, evening, and between activities. Marcos could now go back to college and pick up passengers in his free time.

Uber wisely recognized that much of its infrastructure was already in place to allow its service to be differentiated for different kinds of customers. Uber then branched to provide a new option: UberX. Learning from the successes and failures of the Lyft model, UberX allowed drivers to work flexible hours in flexible attire, operating their own vehicles – provided, and this is important, that the vehicles passed some minimal quality standards (Lyft passengers had begun to complain about the run-down condition of some of the cars). The water and mints were still there. Drivers were encouraged to be friendly and hip.

Uber had a first-player advantage: it had been first in the market and thus enough time to acquire a good reputation and loyal customers through its UberBLACK service. UberX brought in new customers and gave the old ones a flexible option. Provided with the same reliability and courteousness, some of UberBLACK customers now opted for the cheaper, more informal option. It is part of SF culture not to flaunt financial well-being, as evidenced by the casual hoodies and slacks regularly worn by some top tech executives. So black cars became regular cars (that were nevertheless guaranteed not to be run-downs).

As an aside, Uber now has a variant that is intermediate between UberBLACK and UberX. Do you want to get picked up by a casually-dressed Uber driver but in a brand-name car like a BMW or Mercedez for an intermediate price? Well now you can with Uber Select. And if you don't want a fancy car to pull up at your office entrance in SF, you can stick to UberX. Different Uber options happen to be dominant in different cities. For instance, perhaps unsurprisingly, LA tends to prefer the luxurious option.

After the introduction of UberX, Uber's customer pool grew. This meant that the density of ride requests was often higher on Uber than on Lyft. Drivers had more customers overall and could cover smaller distances between ride requests. Marcos and his friends signed back on with Uber.

source: techcrunch.com

New measures had to be taken. Lyft gave its drivers new incentives: “complete X rides and receive a rebate on the hefty commissions paid back to Lyft”. Uber followed suit. The new incentives served an additional purpose: having to complete a minimum number of rides, many drivers could no longer afford enough time to work for both companies and still complete enough rides with each. Choices had to be made. Uber tried to give drivers incentives for accepting all ride requests in a row. Drivers obliged and accepted all that came their way. They accepted requests even if it required going around the whole block just to pick up a passenger directly on the opposite side of the street. Passenger wait times increased. Passengers were not happy. Uber pivoted its incentives structure.

A vicious price war ensued. The water and mints disappeared from Uber cars. With few noticeable differences between the two services from the customer perspective, customers went where prices were lower. Lower prices meant more ride requests and a quicker way to hit the incentive ride minimum. Drivers went where there were more customers.

As Marcos prepared to drop me off in Palo Alto, he got his Lyft app ready. He said he'd take the first request he got - Uber or Lyft. Palo Alto has longer ride distances and fewer customers per square area than SF. Time is costly, and Marcos would not spend it passenger-less. After all, he needed to be in class soon. He let me out. My half-hour, 21-mile ride cost $37.78, including a $3.85 airport surcharge. Uber would take 20-30%, gas would cost Marcos another few dollars, and car depreciation isn't to be forgotten either. Marcos told me that the prices are more expensive in SF than surrounding areas. (In fact, my trip back to SFO from Palo Alto 2 days later cost $28.47). On my Uber app, I gave Marcos 5 stars and left some feedback about what a knowledgable guide he turned out to be. Then again, I don't remember the last time I gave a poor review.

source: http://images.cdn.stuff.tv

Lowering prices means even more burden on the drivers. Already the fraction of a cab fee, Uber fees are reaching new lows. Two days later, I logged onto my Uber app at 5 a.m. to request a car back to the airport. I could see some cars circling around the Googleplex complex, 15 minutes from where I was. After about 2 minutes, an Uber driver accepted my request. Another minute later, he canceled the request. He'd probably gotten a more conveniently-located ride request and would make more money by keeping the distance driven without passengers minimal (and 15 minutes was already pushing it). His car stayed around the Googleplex complex. I placed another request, finding myself irritated that it was taking me longer than 5 minutes to get a car. My last dozen or so Uber trips involved instantaneous request acceptance, with a car picking me up 1-2 minutes later. How spoiled I had become. Finally, after another 3 minutes, my request was accepted by a middle-aged Latin American gentleman named Juan Carlos, and in 15 minutes, he was at my hotel.

I was really thankful to Juan for picking me up. He was surprised to find out there wasn't a swarm of cars ready to take me. Uber cars often outnumber passengers at this early time in the morning, he told me. I was in turn surprised to hear this, having spent that night tossing and turning in bed worried that no Uber drivers would be on the roads so early (I didn't even consider cabs as an alternative anymore). Our differing expectations for what would be the Uber availability situation that morning led me to thinking that there are too many variables at play to fully predict driver behavior. Uber drivers have to somehow optimize ride fares, company incentive structures, passenger availability, and competition with other Uber cars to figure out if a particular ride is going to bring them more than it will cost. Earlier that morning Juan had driven another passenger to San Jose airport - a 20 minute ride that cost the passenger $10, of which Juan would probably get less than $6-7.

I told Juan about one of my recent Uber experiences in Boston. I had decided to try UberPOOL for the first time: a variant where multiple passengers can share the same ride, with different initial and destination locations, as long as the trips are relatively in the same direction. Each passenger pays less in return for the potentially longer ride. If multiple passengers are picked up, the Uber driver can hope to make a sliver more in the same fraction of time by combining the trips. The interesting catch is: you get a guaranteed UberPOOL price regardless of whether another passenger is taken. In other words, you pay a lower price (even lower than UberX) just by agreeing to potentially share the ride. Talking to my other friends in Boston, it is pretty common for no additional passenger to show up. So my friend and I took an UberPOOL. We counted as a single passenger (it would be the same price if only one of us was there), but didn't end up picking up a third passenger on the way. Our ride was 10 minutes from Downtown Boston across the bridge to East Cambridge, and cost us a total of $6. Splitting it, each of us paid $3, almost the price of a subway ticket, but with the walking distance (from subway to house) cut from 15 minutes down to zero.

Who takes the loss when no additional passenger request is made on UberPOOL: the company or the driver? I asked Juan. Turns out, it's the driver (in Lyft's case, the company pays the difference). So if drivers are making so little money, how can Uber remain a viable longterm business model? Without missing a beat, Juan replied that it doesn't need to be viable for longer than a decade at the most. "After all, Uber is building a fleet of self-driving cars. No paid drivers will be needed." Juan paused. But there's a bigger problem: Juan is concerned about the strawberry-picking robots that are now working on farms day and night, 24 hours straight. Soon, there'll be even more robotic farm hands. Juan's family back in South America along with thousands of other people are going to be out of the farm jobs that provided their livelihood. "What happens then?"

Juan Carlos got some fraction of the $28.47 I paid via my Uber app, and 5 stars.

source: http://www.econlife.com

On effective communication: because it matters.

2016-02-13T17:55:00.000-08:00

I've been thinking quite a bit recently about effective communication, party because there were 2 seminars last month at MIT about giving good talks (one by Patrick Winston, one by Jean-luc Doumont), party because we recently published a paper about what makes visualizations effective (for communicating messages), and partly because I've been TA-ing a research course for undergraduates (with a large communication component to it).

I'll summarize here some notes from the talks I went to, as well as my own thoughts and insights. Though I'm sure I'll have lots more to say on this topic in the future.

Patrick Winston started off his talk with the following statement*: "you (the researcher) will be judged first by your speaking, then by your writing, and finally by your ideas". This is a common phenomenon: a great communicator can sell you on the simplest ideas and make you see beauty in them; a poor communicator can obscure the most beautiful of ideas. Both examples regularly occur in lectures, in research talks, and in business presentations (but I'll focus on the researchers, here). It really is a shame when beautiful ideas don't come to light because the researchers behind them lack in explanatory artistry. It is an art, this whole communication business - which is why it is not commonly taught in a formal manner. Aside from the occasional seminar, the occasional resource exchanged among students, and the occasional tip given by one researcher to another during a practice talk, aspiring researchers (e.g. students) get no formal coaching and are told to "just do good work". Feedback and tips from advisors can be quite uneven, depending on the experience of the advisors themselves. (luckily, MIT professors are very good at selling their research, judging by the content on the front page of MIT news every morning; as Winston puts it: "your ideas should have the wrapping that they deserve")

The point is: many (esp. young) researchers need formal communication coaching, and often they underestimate how important it is for their careers (it pains me to hear yet another graduate student proclaim: "boy, these talks and posters I have to present are such a waste of my time"*). I would like to applaud MIT's initiative: the new EECS communication lab (and similar ones in other departments) for providing resources, training and advisors to students, when they need them. Additionally, I think MIT's SuperUROP course for undergraduates is a super valuable experience (essentially a how-to guide to being a researcher), where alongside a year's worth of academic research, students practice and receive feedback on important communication skills: writing research abstracts, proposals, and papers; performing peer reviews, creating academic posters, and giving research pitches and presentations. And yes, as a TA in the course, I sometimes hear the same excuses ("boy, all these written assignments are such a waste of my time, why can't I just do the research"). But when you're in an environment where industry representatives, senior researchers, and MIT faculty are following what you're doing (as is the case for these students), being able to sell your work can mean a lot for your future career. Last semester, for instance, the students participated in a large poster session, where they presented their work to all the aforementioned parties. I gathered some advice, common mistakes, and helpful suggestions in the linked-to set of slides.

* Yes, yes, groundbreaking ideas can speak for themselves, but I guarantee that most ideas need someone speaking for them (at least to get them off the ground).

Note that from one set of communication-related slides to another, from one talk to the next, the same kind of advice surfaces again and again. Most often, the views and suggestions presented are not idiosyncratic, but common, accepted, guidelines. We've all been in the audience: we know what catches our interest and what bores us to death (and it's often not the content to blame).

Let me summarize (and paraphrase in my own words) some of Winston's talk advice:

start with an empowerment promise: give your audience a sense that they will walk away with something (e.g. some newfound knowledge or ideas) from your talk, so they know what to look forward to and why they should care
get your idea out quickly, and cycle back: don't expect that all your audience members will follow along with you until the end, and do not leave the most important to last ("avoid the crescendo, just blurt it out"); come back to, and reinforce your points
use verbal punctuation: people fog out, so bring them back once in a while, especially to accentuate a switching of topics, slides, etc. (kind of like an "ehem, you can wake up now, even if you've missed the last few minutes, I'm starting a new thread...");
avoid near-misses: foresee what the audience could be confused about and clarify your contributions
what you end with is the last impression: make it count, clarify your contributions, show your audience what they're walking away with; and remember: the final slide will be there forever, "don't squander this real estate" (is your final slide the infamous and content-less "thank you"?)
whether a poster or a presentation, what should come clearly through are your vision, steps, and contributions (Winston even advocates naming the relevant sections/slides accordingly)

When approached once by a young researcher looking to get advice on his job-talk slides, Winston proclaimed: "too many slides and too many words".

"How do you know?" the researcher asked.

"It's almost universally true."

(Winston later added that allowing powerpoint to have less than 30-point font is probably Bill Gates' biggest fault. When text has to shrink that much, there is too much of it on the slides.)

This is the kind of advice that will come up again and again. People have the tendency to cram as much as possible into very small (spatial or temporal) frames. Researchers want to talk about all the great work they've done (not realizing that they're drowning out the most important parts). Students put all the details of their projects on their posters (not realizing that the contributions get lost). Here's my suggestion: do one pass of the content from which you want to pull slides/talking points, and extract the most important points. Sleep on it. Then pick the most important points out of your selection, and scrap the rest. Repeat. With enough cycles, you would have cleaned away the debris, exposing the shine of the main ideas.

What I like about Winston's communication advice is that he comes at it from the perspective of a scientist (he is, after all, a computer science professor at MIT). Sprinkled throughout his talk are technical references and examples. Most of all, he emphasizes the importance of projections - the way an idea or a piece of work is communicated to an audience: the context, the stance, the voice, the presentation style, all of it.

Another individual with a great technical take on communication advice is Jean-luc Doumont (got a physics PhD from Stanford). Jean-luc (he prefers to be called by his first name) consistently refers to the importance of increasing signal and eliminating noise in a presentation, whether visual or oral. This concept is ever-present in his book: Trees, Maps, and Theorems - which I highly recommend.

Note that "noise" can refer to many things at once. In the case of presentations, the noise is everything that is tangential to your main points - it is the 'ums' and 'likes' in your speech, the nervous pacing and awkward hand fidgeting, the excessive details on your slides (do you really need your institute's logo on every slide?). In the technical writing, noise includes all the superfluous words (why say it in 10 words when you can say it in 3? why talk like a politician?).

With regards to maximizing signal, Jean-luc also talks about maximizing effective redundancy - which is to say helping to carry the message across despite noisy channels (those you have no control over, like the audience's attention or knowledge; whereas noisy channels that you do have control over should be minimized). Redundancy can be verbal or nonverbal. It can be complementary. For instance, your slides could contain your main points, but you're also there to describe them. If someone misses it in your speech, they see it on the screen*. You can also get the important messages across again later, in the same or different words (remember the cycling that Winston referred to?).

* This does not mean that what is on the screen should be what is said. The slides complement, not replace, the oral presentation. If people are spending all their cognitive resources reading your slides, they'll fail to process what you're saying, and that is where the communication breaks down.

Jean-luc's three laws to optimize communication are:

first law: adapt to your audience
second law: maximize the signal to noise ratio
third law: use effective redundancy

(but remember: second law > third law)

When studying information visualizations (graphs, charts, plots, etc.), our research team also found that when given visualizations with redundant encodings - i.e. when the message was presented in a number of ways (as a trend line, as an annotation of the trend line, as a description of the plot, in the title, etc.), human observers were more likely to recall the message correctly (different people might need to see things presented in a different way). Conversely, too many extra details, unrelated visuals, or metaphors led to worse recall and confusion, in that observers might recall only a piece of the main message, or misremember it entirely. The take-away? Make your priority getting the signal across, scrap the rest. You can do so quite effectively using the title. Importantly, if your title contains your message, more observers will remember and recall it.

Here's a little piece of advice that also tends to repeat: make your titles count. Be it the titles of talks, slides, section headings, visualizations/graphs. Jean-luc places a lot of emphasis on this in Trees, Maps, and Theorems. He gives great examples of how scientists often caption their plots something like "Y as a function of X", where it is clear that what is plotted is, by no surprise: Y as a function of X. You haven't told the reader anything new or useful! Consider instead using this valuable real estate to convey the message of the plot, such as "Y peaks when X is at its lowest value due to the effect of...". After hearing all of Jean-luc's examples of the way scientists title their slides, figures, etc., I got to thinking. It's true, they do!

I have since tried to be extra careful about my captions, my titles, my paper section headings, even my e-mail subject lines (I guess the current generations get a lot of twitter practice). I try to limit the noise, to imbue as much of the written text with meaning as possible, to carry across the most important points. In fact, when writing my master's thesis, I wanted the essence of the whole thesis to come through the list of contents, figures, and tables. I wanted the reader to walk away with the outlines of the story without even getting to the introduction.

Importantly, if the message can come across simply and quickly, that is not a bad thing. If there's an easier way to say something, why not say it? Jean-luc had great anecdotes at his lecture on "Communicating science to nonscientists" about how unnecessarily jargon-filled scientific communication can be. Here are a few of my favorite anecdotes (again, paraphrased):

After a room full of experts took turns describing their own research topics to each other, they were asked: how many of those descriptions did you understand? Less than half. How many do you still remember? Maybe a few. And this is a room of scientists! Moreover, they consider this normal. How many talks do you remember from your last conference? How many were engaging from start to finish? (maybe... 1?)
When researchers are asked to describe what it is they do, and when they get to any specialized vocabulary, they tend to say it faster and to lower their voice. It is like they are trying to limit us the pain of trying to understand them by saying it fast and low. But that is exactly the opposite of what we need in order to understand!
A student shows Jean-luc a passage he has written. Jean-luc looks confused and asks the student to explain what he meant to say in the passage. The student says: "Well what I mean to say is [blabla]... but I just don't know how to say it." Well in the [blabla] was exactly the explanation!

Jean-luc advises scientists "not to write complicated out of the principle of revenge" (for other scientists who write this way). Do not try to prove to the whole world how complicated your research is. Define technical words, avoid jargon, avoid synonyms, write simply. Provide reference points, comparisons, and examples. Give the why before the what.

I'll leave you on my favorite Jean-luc quote from Trees, Maps, and Theorems: "Effective communication is getting messages across. Thus it implies someone else: it is about an audience, and it suggests that we get this audience to understand something. To ensure that they understand it, we must first get them to pay attention. In turn, getting them to understand is usually nothing but a means to an end: we may want them to remember the material communicated, by convinced of it, or ultimately, act or at least be able to act on the basis of it."

And getting messages across first and foremost requires caring about the importance of getting those messages across. It is about recognizing and believing that effective communication matters. It is about adjusting your habits, your jargon, the amount of content on your slides, your projection, your figure captions and titles, and most importantly your awareness of all these things. Happy communicating!

Self-driving cars, internet balloons, and why Google is radical

2016-01-19T21:43:00.001-08:00

Went to a talk about [x] today (formerly: Google[x]).

Unlike other Silicon Valley companies that are "making the world a better place" (according to Silicon Valley, the TV series), [x] is "making the world a radically better place" (according to today's presenter).

If you followed Google I/O this year or gone to some other Google talk about [x] and its moonshots, or if you've just glued yourself to the aforementioned TV series, in all three cases you would have seen the following slide:

At MIT today, we heard about 2 projects from [x] (out of the 10 or so that are currently public): self-driving cars and internet balloons.

-----

A huge problem? The 1.2 million annual traffic accidents, 93% of which are due to human error.
The radical solution: self-driving cars and the required road infrastructure.
Breakthrough tech: software with realtime sensor processing.

The principle on which this whole autonomous car thing hinges, is an initial full laser mapping of the urban area in which the car is to drive (road and buildings and all) with which the car's real-time sensor data is then aligned for accurate positioning and localization within the lane.

Is it feasible to have to pre-map every urban environment, and then to update it when it changes? Ah, well on the one hand Google Streetview cars have already shown us something about feasibility, but in the longterm, the self-driving cars will continue to collect data as they drive, driving and mapping simultaneously.

In fact, already these self-driving cars can send alerts to the other self-driving cars on the road when things look different than expected (lane closures on this street, sending you an updated map...). Such alerts also get sent to the home base for potential updates to the whole system. The cars are connected to each other and their home base via the internet. (Is this information transfer secure? Not to worry, Google knows a thing or two about internet security.)

So, some basic rules are hard-coded, but there's also a lot of machine learning from the many hours spent by these cars on the roads in all sorts of situations. These cars have to learn about the distribution of pedestrian movements (how fast they walk, how quickly they can switch direction, etc.), the typical behaviors of pedestrians, bicyclists, and pedestrians in response to bicyclists. They plot out the trajectories of all vehicles currently on the road and anticipate their next move.

The big challenge? Achieving ridiculous recall and precision. A recall of 99% when pedestrians are involved is not going to do it (you just can't afford to lose a few pedestrians here and there while you tweak your algorithm). Recall is very much about the safety of the pedestrians, but precision is also about the safety of the vehicles (and their passengers). If the car freaks out at every road sign blowing in the wind, not only will the ride be very uncomfortably jerky, but the car might swerve into other cars to avoid hitting the mistakenly classified "pedestrian".

There's other behaviors built-in for the comfort and safety of the passenger: for instance, shifting in the lane (all while avoiding other cars) when passing large trucks. Even if you have everything under control, you don't want your passenger getting antsy about a truck that's too close for comfort.

These cars also slow down at railroads, near parked cars, and while passing bicyclists. Their long-range and short-range sensors ensure the car is very much aware of its surroundings. So much so that the 15 cm resolution of its sensors allows the cars to recognize the fine hand gestures of the traffic controller waving in the middle of the intersection or the bicyclist signaling to change lanes. In making decisions, the cars also make use of all sorts of contextual information: are other cars moving? Why have they stopped? Are there traffic cones around?

And all of this computation and communication is happening on a single CPU. How's that for efficient resource sharing? (but watch out for GPUs coming to a car near you...)

These cars have been designed for zero driver assistance. Are you going to see any sort of control device built into them like a wheel or a break pedal? No chance. This is Google's approach. No need so far: of the 13 driverless car incidents to date, all were the fault of the other drivers. (Side thought: what if the sheer sight of this car on the road is distracting?)

But these cars sure go through a lot of situational testing. And yes, they're good in harsh weather conditions too (confirmed by hardware reliability tests and buckets of water). The QA testing position for the self-driving car project must be damn awesome.

-----

Another huge problem? 2/3 of the world does not have internet.
The radical solution? Balloons!
Breakthrough tech: large-scale dynamic optimization of balloon network.

We're talking global optimization (literally, global). Consider a network of balloons that are distributed around the world that need to follow pre-planned flight paths, adapt to changing wind conditions, and deal with intermittent (sometimes flaky) instructions - all while providing continuity of internet service. This is Project Loon.

Communication with these balloons as they pass over the oceans is through satellite phones. In these conditions, instructions can be dropped, intermittent, or conflicting, and the balloons must nevertheless make good decisions based on limited information and changing wind gusts.

So how does it all work? These balloons fly at an altitude of 20 km - twice as high as airplanes and the weather, so at least a few less problems to deal with. They follow air currents at different altitudes, and steer with vertical motion to end up in an air current moving in the desired direction. An internal balloon pumps air in and out, and with essentially the power of a fan, can move the exterior balloon up and down. Additional power comes from solar cells, but in most cases the wind currents are sufficient to propel the balloons.

A network of balloons thus moves through air currents, one displacing another, to provide continuous, seamless internet service to the cities below. It's kind of like how when you're moving, your service has to remain continuous despite shifting cell towers; but in this case, the city below is stationary, and it is the internet source that is moving above. This is the local optimization part.

Sometimes, balloons also need to be dispatched to the location of a natural disaster, and this has to happen fast. Balloons also need to function in all kinds of harsh conditions, and with local repair most often unavailable, redundancy is key. Redundancy, redundancy, redundancy. Remember how the self-driving cars had 1 CPU? Well these babies have upwards of 40. And if something does goes down, you have to go fetch it... wherever it ends up (can you climb trees and mountains?). Another damn awesome job.

These projects, and all the rest in the [x] repository are driven in part by the slogan: "we need to fail faster". Innovation comes from trying radically new things, and radically new things can often lead to failure. Failing faster means trying again sooner.

I gotta hand it to you, Google sells it well. Another take-away? It seems Google likes the word radical.

Pulling together the efforts of neuroscientists and computer scientists

2016-01-17T19:34:00.000-08:00

Had a great time last Friday at the intersection of Neuroscience and Computer Science: http://cbmm.mit.edu/science-engineering-vassar
Heard from 6 MIT Superstars: Bill Freeman, Joshua Tenenbaum, Ed Boyden, Nancy Kanwisher, Feng Zhang, and Tomaso Poggio. Learned about how neuronal cells can be stretched for physical magnification and imaging, how CRISPR can bring us neurological therapeutics, and the specialized brain areas that we may have for visual words and music. But those are just the topics I will not talk about.
Intrigued to learn more anyway?
Here are some related articles:
http://news.mit.edu/2014/crispr-technique-determines-gene-function-1210
http://news.mit.edu/2015/faculty-profile-edward-boyden-0522
http://news.mit.edu/2015/neural-population-music-brain-1216

In this post, I will focus instead on the common strands passing through the works of Bill Freeman, Joshua Tenenbaum, Josh McDermott, and Nancy Kanwisher - both to highlight the great interdisciplinary collaborations happening at MIT, and to give a broader sense of how neuroscience and computer science are informing each other, and leading to cool new insights and innovations.

Bill Freeman presented the work spearheaded by his graduate student Andrew Owens: "Visually indicated sounds". Teaming up with Josh McDermott, who studies computational audition at the MIT Department of Brain and Cognitive Sciences, they linked sound to material properties and vice versa. Given a silent video as input (of a wooden stick hitting or scratching some surface), the team developed an algorithm that synthesizes realistic sound to go along with it. To do so they needed to convert videos of different scenes (with a mixture of materials) into some perceptually-meaningful space, and link them to sounds that were also represented in some perceptually-meaningful way. What does "perceptually-meaningful" refer to? The goal is to transform the complex mess that is colored pixels and audio waveforms into some stable representations that allows similar materials to be matched together and associated with the same material properties. For instance, pictures (and videos) of different foliage will look very different from each other (the shape and the color may have almost no pixel-overlap) and yet, somehow, the algorithm needs to discover the underlying material similarity.

Here is one place where CNNs (convolutional neural nets) have been successful at transforming a set of pixels into some semantic representation (enough to perform scene recognition, object detection, and the other high-level computer vision tasks that the academic and industry communities have recently flooded the media outlets with). CNNs can learn almost human-like associations between images and semantics (like labels) or between images and other images. Owens and colleagues used CNNs to represent their silent video frames.

On the sound side of things, waveforms were converted into "cochleagrams" - stable representations of sound that allow waveforms coming from similar sources (e.g. materials, objects) to be associated with each other even if individual timestamps of the waveforms have almost no overlap. Now to go from silent video frames to synthesized sounds, RNNs (recurrent neural nets) were used (RNNs are great for representing and learning sequences, by keeping around information from previous timesteps to make predictions for successive timesteps). The cochleagrams predicted by the RNNs could then be transformed back into sound, the final output of the algorithm. More details in their paper.

This work is a great example of the creative new problems that computer vision researchers are tackling. With the powerful representational architectures that deep neural networks provide, higher and higher-level tasks can be achieved - tasks that we would typically associate with human imagination and creativity (e.g. inferring what sound is emitted by some object, what lies beyond the video frame, what is likely to happen next, etc.). In turn, these powerful architectures are interesting from a cognitive science perspective as well: how are the artificial neural networks representing different features? images? inputs? What kinds of associations, correlations, and relationships do they learn from unstructured visual data? Do they learn to meaningfully associate semantically-related concepts? Cognitive scientists can give computer scientists some ideas about which representations may be reasonable for different tasks, given what is known from decades of experiments on the human brain. But the other side of the story is that computer scientists can prod these artificial networks to learn about the representational choices that the networks have converged on, and then cognitive scientists can design experiments to check if the networks in the human brain do the same. This allows the exploration of a wide space of hypotheses at low cost (no poking human brains required), to narrow down the focus of cognitive scientists in asking whether the human brain has converged on similar representations (or if not, how can it be more optimal?)

Nancy Kanwisher mentioned how advances in deep neural networks are helping to understand functional representation in the brain. Kanwisher has done pioneering work on functional specialization in the brain (which brain areas are responsible for which of our capabilities) - including discovering the fusiform face area (FFA). In her talk, she discussed how the "Principle of Modular Design" (Marr, 1982) just makes sense - it is more efficient. She mentioned some examples of work from MIT showing there are specialized areas for faces, language, visual words, even theory of mind. By giving human participants different tasks to do and scanning their brain (using fMRI), neuroscientists can test hypotheses about the function of different brain regions (they check whether the brain signal in those regions changes significantly as they give participants different tasks to do). Some experiments, for instance, have demonstrated that certain language-specific areas of the brain are not involved during logic tasks, arithmetic, or music (tasks that are sometimes hypothesized to depend on language). Experiments have shown that the brain's specialization is not all natural selection, and that specialized brain areas can develop as a child learns. Other experiments (with Josh McDermott) have shown that uniquely-human brain regions exist, like ones selective to music and human speech (but not other sounds). Other experiments probe causality: what happens if specific brain regions are stimulated or dysfunctional? How are the respective functions affected or impaired? Interestingly, stimulating the FFA using electrodes can cause people's representations of faces to change. Correspondingly, stimulating other areas of the brain using TMS can cause moral judgements to shift.

Kanwisher is now working with Josh Tenenbaum to look for areas of the brain that might be responsible for intuitive physical inference. Initial findings are showing that the regions activated during intuitive physics reasoning are the same ones responsible for action planning and motor control. Knowing how various functional areas are laid out in the brain, how they communicate with each other, and which resources they pool together, can help provide insights for new artificial neural architectures. Conversely, artificial neural architectures can help us support or cast doubt on neuroscience hypotheses by replicating human performance on tasks using different architectures (not just the ones hypothesized).

Josh Tenenbaum is working on artificial architectures that can make the same inferences humans make, but also make the same mistakes (for instance, Facebook's AI that reasons about the stability of towers of blocks, makes different incorrect predictions than humans). The best CNNs today are great at the tasks for which they are trained, sometimes even outperforming humans, but often also making very different mistakes. Why is it not enough to just get right what humans get right, without also having to get wrong what they get wrong? The mistakes humans make are often indicative of the types of broad inferences they are capable of, and uncover the generalizing power of the human mind. This is why one-shot learning is possible: humans can learn whole new concepts from a single example (and Tenenbaum has many demos to prove it). This is why we can explain, imagine, problem solve, and plan. Tenenbaum says: "intelligence is not just pattern recognition. It is about modeling the world", and by this he means "analysis by synthesis".

Tenenbaum wants to re-engineer "the game engine in your head". His group is working on probabilistic programs that can permit causal inference. For example, their algorithm can successfully recognize the parameters of a face (shape and texture; the layout and type of facial features) as well as the lighting and viewing angle used for the picture. Their algorithm does this by sampling from a generative model that iteratively creates and refines new faces, and then matching the result to the target face. Once a match is found, the parameters chosen for the synthesized face can be considered a good approximation for the parameters of the target face. Interestingly, this model, given similar tasks as humans (e.g. to determine if two faces are the same or different) takes similar amounts of time (corresponding to task difficulty) and makes similar mistakes. This is a good hint that the human brain might be engaged in a similar simulation/synthesis process during recognition tasks.

Tenenbaum and colleagues have made great strides in showing how "analysis by synthesis" can be used to solve and achieve state-of-the-art performance on difficult tasks like face recognition, pose estimation, and character identification (even passing the visual Turing test). As is the case for much of current neural network research, the original inspiration comes from the 80s and 90s. In particular, Hinton's Helmholtz Machine had a wake-sleep cycle where recognition tasks were interspersed with a type of self-reinforcement (during "sleep") that helped the model learn on its own, even when not given new input. This approach helps the model gain representational power, and might give some clues as well about human intelligence (what do we do when we sleep?).

How does the human mind make the inferences it does? How does it jump to its conclusions? How does it transfer knowledge gained on one task and apply it to a novel one? How does it learn abstract concepts? How does it learn from a single example? How does the human mind represent the world around it, and what physical structures are in place in order to accomplish this? How is the brain wired? These questions are driving all of the research described here and will continue to pull together the efforts of neuroscientists and computer scientists in the coming years more than ever before. Our new and ever-developing tools for constructing artificial systems and probing into natural ones are establishing more and more points of contact between fields. Symposia such as these can give one a small hint of what the tip of the iceberg might look like.

"Computer Behind Pixar": Teaching Computational Thinking for 3D Modeling and Representation

2015-12-30T13:54:00.002-08:00

How do you teach a group of middle- or high-schoolers about computer graphics without setting them down in a computer lab or showing them code? How do you teach them about 3D geometry without writing down a single mathematical formula on the board? And how, without doing all these things, can you nevertheless equip them with the vocabulary and intuition to be able to discuss and understand concepts in computer graphics, geometry, and representation?

Pinart toy: what better way to explain height fields?

That was our goal, and our chosen plan of attack was to flood the senses: let our students touch and explore physical models, work through group activities, watch video clips, participate in class discussions, and see demos. We filled our classroom with 3D printed models of various materials, faceted animal shapes, wooded mannequins, pin art boards, crayons and fuzzy pipe cleaners.

What do all these objects have in common?

These physical models serve as examples and applications of different representational choices, including voxel grids, meshes, and height fields. Having physical examples to point to and explore can launch a discussion of different representational (3D modeling) choices.

Splash 2015 @ MIT

On November 22, 2015, Hijung Valentina Shin, Adriana Shulz, and I taught a 2-hour high-school class as part of MIT's yearly Splash! program - a Fall weekend during which thousands of high-schoolers flood hundreds of MIT's classrooms to be taught anything and everything.

In our Splash! classroom, we sought to ask and answer the following questions: How is an animated character created? How can we represent different types of 3D structures? What kind of modeling decisions are made for a special effects film? What techniques do anthropological reconstruction, 3D printing, and game design have in common?

Importantly, we believed that these questions could be answered on an intuitive level, with no mathematical prerequisites. What better way to motivate the study of the mathematical and computational sciences than to give students a faint whiff of the awesome things they would be able to accomplish and think about in greater depth if armed with the right tools?

Computational thinking to the rescue!

Here I will briefly outline the structure of our 2-hour class and the decisions made along the way, to provide possible inspiration for similar classroom activities and lessons. For the benefit of others, we have made all our slides available online.

Coding without coding

Target shape that one student described to the other using only
a set of provided primitives: colored squares, line segments, or
polygonal shapes.

Our ice-breaker activity first introduced the concepts of representational primitives and algorithmic decisions. Students split up into pairs, armed with grids and sketching utensils (colored crayons or pencils). One student was given a target shape, a set of primitives, and instructions. The goal was to supply one's partner with a sufficient and clear recipe to reproduce the target shape as accurately as possible. Some students could only specify one grid cell at a time with coordinates and a target color. Another set of instructions armed students with a ruler and the ability to specify starting and ending coordinates of line segments. A third group of students had polygonal shape rulers – e.g. triangles, squares, circles. Students could tell their partners to center a shape at specific coordinates.

Polygonal primitives
(ordered on Amazon)

Overall, we gave different student pairs different primitives:

pixels (colored squares)
line segments
polygonal shapes

We gave all students the same amount of time to complete this activity in pairs (15 minutes), after which students showed off their creations to their partners and other students in the class. These creations were hung around the classroom at the amusement of the students.

This gave us a great launching pad for discussion about the trade-offs between representational accuracy and algorithmic efficiency. We asked students: What did you find easy and hard? Were there parts of the shape that were well represented by your primitives? Could everything be represented by the primitives? What took you the longest? How many individual primitives did you end up using?

This kind of activity (or variants of it) is a good intro to programming activity, as students have to think about formalizing clear step-by-step instructions for their partner to carry out. The full instructions and templates for our activity are included here.

Computer behind Pixar

Inspired by the recent hype around Pixar* and particularly Boston Museum of Science's temporary Pixar exhibit, we called our class "Computer behind Pixar". The common goal of the exhibit and other educational media about Pixar is to hook in the general public with the beloved animations and characters for the purpose of introducing and motivating the underlying mathematical and scientific concepts. In fact, Mike from Monsters Inc. served as a repeating element throughout our activities, though we branched beyond Pixar, and beyond animation more generally.

* Reference links on the topic of math behind Pixar:

Khan Academy's Pixar in a Box

Ted talk about math behind Pixar

Intro to Boston Museum of Science's Pixar exhibit

Article about the science behind Pixar

Business Insider article about "Inside Out"

We described and showed a video about the rendering pipeline*, and drew attention to the importance of modeling at the core of this pipeline, as the initial step that all future steps crucially depend on. We defined modeling as a mathematical representation composed of primitives.

The rest of our discussion centered around different representational choices and their properties.

* More rendering resources:
Video about rendering in Pixar
Article about rendering in "Inside Out"
Character rendering (dark knight)
Rendering pipeline summary

Tangible examples of 3D representations

3D printed models are a tangible
demonstration of discretization and
the resolution issue.

Voxel grids

We introduced the concept of discretization, necessary for the representation of shapes in digital computers: 2D shapes as pixels and 3D shapes as voxels. We reminded students of the ice-breaker activity where grid cells were used as primitives.

We then discussed voxel grids as one form of representation for 3D objects, commonly used for 3D printing. We talked about the resolution issue: the trade-off between accuracy and efficiency. We passed around physical 3D printed models at various resolutions, similar to the models pictures on the right.

Physical models to demonstrate the
differences between volumetric and
boundary representations. One is much
lighter! Why? It requires less material
to represent (and store).

Triangular meshes

In talking about efficiency, we introduced the notion of boundary representations, specifically meshes, for representing 3D objects without having to represent and explicitly store all the internal voxels (the volume).

We connected the boundary representation to the ice-breaker activity, where in 2D, line segments were used to represent the target shape's boundary. We then showed students a demo of MeshLab, and passed around physical examples of volumetric and boundary representations.

CSG

We moved on to discuss how simple shapes can be combined with different operations to create more complex shapes, in 3D via constructive solid geometry (CSG). We reminded students that the ice-breaker activity also contained polygonal primitives in 2D. For 3D, we showed students a demo of OpenScad and discussed primitive operations (union, intersection, difference, ...) that can be performed on shapes. Applications in manufacturing were discussed.

Height Fields

Heigh fields were introduced with the help of pin art boards, as pictured at the beginning of this article. Students played with the pin boards and considered again the concepts of discretization and the representation issue. We asked students: which kind of shapes or surfaces can be represented this way and which can not?

Procedural Modeling

The grass in Pixar's Brave was created with procedural modeling,
using parametric curves and randomness.
A great hands-on demo of this kind of modeling can be found on:
Khan Academy's Pixar-in-a-Box.

We discussed how shapes could be created by specifying procedures on primitives (aside from the primitive operations in CSG). We showed demos of solids of revolution (what better way to motivate the concept that for most students appears for the first time only in college calculus?). We discussed how procedures like revolution and extrusion can be performed along different paths to create all sorts of complex shapes. We discussed how these paths can be further parametrized so that the revolution or extrusion procedure changes along the path. We introduced randomness as another concept that can be used to add variability to the representation.

We discussed applications to modeling trees, forests, grassy fields, crowds, and cities.

3D Representation Primitives Operations (recipe)

Voxel grids Voxels Material specification for each voxel

Triangle mesh Triangles List of triangles with locations

CSG Basic shapes CSG operations (union, intersection, etc.)

Height field Points with height Assignment of heights to points

Procedural model Basic shapes Procedure (e.g. extrusion along path)

A new way to look at things

With our class, we hoped to give students a look at the modeling decisions that underly all the animated films, video games, and special effects they see on a daily basis. We wrapped up our class with a thought exercise, putting students in the position of making decisions about how to model different objects. We told them to think about the different representations we discussed: the primitives and operations required. We told them to consider the trade-off between accuracy and efficiency. Given a representation, we also told them to think about its usability - what kind of use cases are being considered, e.g. whether the modeled object needs to be animated and how. Students were asked to brainstorm how they would model the following objects: buildings, cities, fabric, hair, grass, water. Along the way, we showed them image and video demos (all these links can be found in the slides). We passed around more physical models. Together, we watched a video "behind special effects" that showcased the kinds of 3D models used in movies, a great visual review of the many representations covered in our class. We told students to look around and realize that 3D modeling decisions underlie many other applications: special effects in films, video games, simulations, anthropological reconstructions, product design, urban planning, robotics, and 3D printing. To be reminded that they have been armed with a new way to look at things, students took home polygonal stickers.

Hyperconnectedness leads to hyperactivity

2015-08-24T21:47:00.001-07:00

Although the term "hyperconnected" already exists, I will use the following user-centric definition: the embedding of an individual within the internet - the individual's omnipresence on the web. People have all sorts of outlets for posting, storing, and sharing all sorts of content: for example, you can post your photos on Facebook, Google Photos, Instagram, Flikr, Snapchat, etc.; you can blog on Blogger, Wordpress, Tumblr, etc.; you can write about articles, news, your day and your thoughts on Twitter, Facebook, Google+, etc.; you can exchange information on Quora, Reddit, etc. and links on Pintrest and Delicious; you can share your professional information on LinkedIn, your video creations on YouTube, Vimeo, and Vine. You get the point. Although there is some redundancy to some of these internet services, they also have enough of their own features that they can be tailored for particular use cases (not to mention slightly different communities and audiences). I've personally found that there are enough differentiating features (at this point at least) to warrant separate posts on separate sites. And what does this all lead to? Hyperactivity, I claim.

source: http://www.coca-colacompany.com/stories/5-tools-for-staying-tech-savvy-in-a-hyper-connected-world

With so many ideas, thoughts, suggestions, and opinions swirling around, a whole world of possibilities opens up to the individual - from digesting all the content that is posted, to posting one's own content. The posts of others inspire one to create and do, and the social interconnectedness - the awareness that your content will be widely seen - drive one to post as well. This self-reinforcing vicious cycle is the perfect breeding ground for creativity and content-creation. We live not just in the information age - we live in the creativity age*. Yes, people have always created before, but now that creations are visible to the whole world, they can stand on the shoulders of creative giants. Ideas are exchanged and evolve at the speed of fiber optics. People hyperactively create.

* side note: because creativity correlates with content-creation here, we're generating significantly more data than ever before; stay tuned for a very intelligent (and creative) Internet of Things!

At this point, the discussion portion of this blog post ends, and I share my excitement for some of the awesomeness on the creative web below. These are the reasons why there are never enough hours in the day, or years in a lifetime. I am constantly inspired by how many different things people master and how creative they can be in all forms of things and activities. The rest of this post can be summarized as #peopleareawesome.

The activities I list below may at first glance seem like a random sampling, but they're a set of activities that are united by the ability to do them without being a total expert (with some practice you can already achieve something!) and the ability to do them on the side (as a hobby, for short periods of time, with limited equipment).

Electronics, robotics, RC

My 15-year-old brother has learned to put together electronics and build RC vehicles, planes, boats, and drones by watching lots of YouTube videos. This is the type of knowledge that no traditional education can deliver at such a density and speed. This creative maker culture is largely driven by being part of a large community of like-minded individuals (no matter what age) that positively reinforce each other via posts, discussions, and likes. An individual not connected to the internet might have a very small community (if any) with much sparser positive reinforcement, which I claim would result in fewer amazing creations.

more creations: http://rcgatorr.blogspot.ca/

New art styles

Art is a hobby of mine, and I'm big on constantly trying new things. There's always something new that the web coughs up in this regard, beyond the traditional styles of sketching and painting. For instance, consider these widely diverging artistic styles:

check out the art of wood burning

http://www.pyrographyonline.com/free_pyro_pattern.html

and the art of painting on birch bark

http://viola.bz/paintings-on-bark-by-sergey-surin/

and painting by wet felting

https://livingfelt.wordpress.com/tag/felted-landscapes/

and check out this crazy video of candle carving

also check out:

sand sculptures and ice sculptures

Scrapbooking and crafting

personal memories and trips can be creatively captured in scrapbooks

in both physical form: http://diyready.com/cool-scrapbook-ideas-you-should-make/

and via virtual tools: https://www.shutterfly.com/photo-books

Culinary masterpieces

Judging by the popularity of cooking channels, and food-, cooking-, and baking-related tags and posts on different social networks, people love to share the dishes, recipes, and culinary masterpieces that they create. I mean, just look at this:

https://www.pinterest.com/juanice/awesome-decorated-cakes/

and themed foods for any occasion:

https://www.pinterest.com/robinsweb/christmas-party-foods/

Travel blogs and photography

I'm also hugely inspired by all the travel blogs people put together. Not only do they find the time to visit amazing places and capture them from all sorts of beautiful angles, they also blog about it: http://fathomaway.com/slideshow/fathom-2015-best-travel-blogs-and-websites1/

The really creative also put together annotated, narrated, and musical slideshows and videos.

I'm not even going to go into all the amazing photography people do. I will leave you with this:

http://www.beforethey.com/

Data visualization

Both an art and a science, how to visually depict data is very relevant in this day and age. I'm inspired by creativity, once again:

http://flowingdata.com/2014/12/19/the-best-data-visualization-projects-of-2014-2/

Creative writing

Other than blog writing, I like the idea of creating writing on-the-side to de-stress and get some brain juices flowing - here's some things worth trying and checking out (and possibly submitting to if you're extra adventurous): short SF stories, poetry, funny captions.

Another form of "creative writing" is putting together tutorials, explanations, etc. on all sorts of topics that interest you. It allows you to organize your thoughts and attempt to explain some content with a specific audience in mind. I love to write, explain, and write explanations, but if only there was more time in a day...

How it all ties together

People inspire others by taking photos of their creations and posting them on photo-sharing sites, they create videos of the how-to process to motivate others to try, and they bookmark ideas/links they like. They then blog or tweet or chirp about their process and final products and otherwise share their creations with their social networks and the world. The resulting online interactions (sharing of ideas, discussions, comments, and likes) sparks the next cycle of creativity, and on it goes. (I posted some of the pictures above with the intention of inspiring others to try some new things as well.)

In short, there is no shortage of activities to occupy oneself with if there is some time on the side. Of all the activities and links listed above, I've tried about 70%. I am definitely hyperactive when it comes to creating, and the internet age is fueling a lot of that for me by constantly feeding me new ideas. I believe that when you try new things, you expand your brain (perhaps via the number of new connections/associations you make), which benefits you in many more ways than you first might think. I believe that engaging in all manner of creative activities has long-lasting positive effects on intellectual capability and psychological well-being, and that instead of plopping down statically to watch something, creating something keeps your brain "better-exercised", so to say.

The Experiencers: this is your last job

2015-07-14T11:06:00.003-07:00

With the rapid growth of what A.I. is capable of, the rapid advancements of technology (via Kurzweil's Law of Accelerating Returns), the massive reach of the internet and the cloud, the obvious question concerns what the role of humans is to be when even the intellect can be mechanized? I offer my musings on a potential kind of future here: http://web.mit.edu/zoya/www/TheExperiencers_SF.pdf

where is innovation, and who's pulling who along for the ride?

2015-07-02T21:57:00.000-07:00

In the modern landscape of giants like Google and Facebook, and the scurry of activity generated by tech start-ups in the SF and Boston areas and beyond, one of the big questions is: where does academia sit? And how do all these forces shape each other?

Big companies are no longer shaping just the industry world - they are having massive impacts on academia - both directly (by acquiring the best and brightest academics) and indirectly (by influencing what kinds of research directions get funded).

This leaves a few hard questions for academics to think about:
To what extent should industry drive academia and to what extent can academia affect where industry is going?

We can follow, for instance, the big companies - sit closely on their heels, learn about their latest innovations, and project where they're likely to be 5-10 years from now. Then use this knowledge to appropriately tailor funding proposals, to direct research initiatives, and to count on the emerging technologies to fall into place. For instance, if you know that certain sensors are going to be in development in the next few years, does it not make sense to already have ready the applications for those sensors, the algorithms, the methods for processing the data? Or does this build up an inappropriate dependence, turn academics into consumers? Taking this approach, you're likely to win financially in the long run - either via funding (because your proposed projects are tangible) or via having your projects, ideas, or you-yourself acquired by the big guys (and all the advantages that go along with that). However, does this approach squelch innovation - the thinking outside-the-box, outside the tangible, and further into the future?

Importantly, where is innovation coming from most these days? In one of the Google I/O talks this year, there was a projection that more than 50% of solutions will be from startups less than 3 years old in the coming future. Why is this the case? I can think of a number of reasons: bright young graduates of universities like MIT and Stanford are taking their most innovative research ideas and turning them into companies, and this is becoming an increasingly hot trend. More and more of my friends are getting into the start-up sphere, and those that aren't are at least well aware of it. Second of all, many startups are discovering niches for new technologies: whether it's tuning computer vision algorithms to the accuracy required for certain medical applications, applying sensors to developing-world problems like sanitation monitoring, or using data mining for applications where data mining has not been used before. Tuning an application, an algorithm, or an approach to a particular niche requires utmost innovation - that is where you discover that you need to use a computer vision algorithm to achieve an accuracy that was never achieved before, to create a sensor with a lifespan that was not previously imaginable, to make things work fast, make them work on mobile, make them work over unstable network connections, make the batteries last. Academically, you rarely think of all of the required optimizations and corner cases, as long as the proof-of-concept exists (does it ever really?), but in these cases, you have to.

Perhaps we can think of it this way: the big guys are developing the technologies that the others do not have the resources for; the small guys are applying the technologies to different niches; and the academics are scratching their heads over application areas for these technologies and the next-to-emerge technologies - never quite rooted in the "what we have now" and always stuck (or rather, comfortably seated) in the "what if". Who's shaping who? It looks like they're all pulling each other along, sometimes gradually, other times in abrupt jerks. At any given time you might be doing the pulling or be dragged along for the ride.

So where does that leave us? Are big companies, little companies, and academia taking distinctly different routes, or stepping on each other's toes? At this point, I think there is a kind of melting pot without sharp boundaries - a research project slowly transitions into a start-up, which then comes under the ownership of a big company; or a research lab that transplants its headquarters into a big company directly; or the internal organizations like the research labs or advancements labs (Google Research, GoogleX, ATAP) that have the feel of start-ups with the security and backing of a large company. It's a unique time, with everything so malleable. But I'm not sure this triangle-of-a-relationship has reached any sort of equilibrium quite yet... We have yet to wait until the motions stabilize to see where the companies and the universities stand, and whether they will continue to compete in the same divisions, or end up in vastly different leagues.

Imagining your imagination

2015-06-24T21:27:00.001-07:00

Given the news that are making such a splash recently - "dreaming A.I." and "machines with imagination" (http://googleresearch.blogspot.fr/2015/06/inceptionism-going-deeper-into-neural.html), a few interesting questions are up for pondering...

An NN's (neural network's) "imagination" is a property of the data it has seen and the task it has been trained to do. So an NN trained to recognize buildings will hallucinate buildings in novel images it is given, an NN trained on YouTube videos will discover cats where no cats have ever been, etc.,.. So, an NN trained on my experience, one that sees what I see very day, (and provided it has the machinery to make similar generalizations) should be able to imagine what I would imagine, right?

Facebook and Google and other social services should be jumping on this right now to offer you an app to upload all your photo streams and produce for you "figments of your imagined imagination" or "what your photos reveal about what might be in your mind" (the high-tech NN version of personality quizzes, perhaps). Basically, you can expect the output to be a bizarre juxtaposition of faces and objects and shapes (like in the news article) but customized just for you! Wait for it, I'm sure it's just around the corner.

So if we strap on our GoPros or our Google Glasses and run out into the world hungrily collecting every moment, every sight, and every experience that we live through, can we then hope that our very own personal A.I.s will be able to learn from all this data to remember our dreams when we can't, guess a word off the tip of our tongue, make the same connections, parallels, and metaphors? and know what new thought our mind could have jumped to from the context of the previous conversation? As we envision that A.I. will one day augment us, do we take into account the fact that the augmentation will not be a simple division of labor: "I as the human being will leave the superior, heuristic, and creative tasks to myself, and leave my duller mechanical half to deal with all the storage and lookup and speed that I lack" -- this may be an outdated thought; perhaps "your" A.I. will be able to make bigger generalizations, leap further, find more distant connections, to innovate and create. The correct question should then be: what can YOU contribute to your A.I.?

CVPR recap and where we're going

2015-06-18T22:02:00.002-07:00

The Computer Vision and Pattern Recognition (CVPR) conference was last week in Boston. For the sake of the computer vision folk (at least in my group), I created a summary/highlights document of some paper selections here: http://web.mit.edu/zoya/www/CVPR2015brief.pdf

It takes an hour just to read all the titles of all the sessions - over 120 posters/session, 2 sessions a day, 3 days... and workshops. This field is MONSTROUS in terms of output (and this is only the 20% or so of papers that actually make it to the main conference).
Thus, having a selection of papers instead of all of them becomes at least a tiny bit more manageable.

The selections I made are roughly grouped by topic area, although many papers fit in more than one topic, and multiple might not be optimally grouped - but hey, this is how my brain sees it.

The selection includes posters I went to see, so I can vouch that they are at least vaguely interesting. For some of them I also include a few point-form notes, which are likely to help with navigation even more.

Here's my summary of the whole conference:

I saw a few main lines of work throughout this conference: CNNs applied to computer vision problem X, metric for evaluating CNNs applied to computer vision problem X, new dataset for problem X (many times larger than previous, to allow for application of CNNs to problem X), new way of labeling the data for the new dataset for CNNs.

In summary, CNNs are here to stay. At this conference I think everyone realized how many people are actually working on CNNs... there have been arxiv entries popping up all over, but once you actually find yourself in a room full of CNN-related posters, it really hits you. I think many people also realized how many other groups are working on the exact same problems, thinking about the exact same issues, and planning on the exact same approaches and datasets. It's become quite crowded.

So this year was the CNN hammer applied to just about any vision problem you can think of - setting new baselines and benchmarks left and right. You're working on an old/new problem? Have you tried CNNs? No? The crowd moves on to the next poster that has. Many papers have "deep" or "nets" somewhere in the title, with a cute way of naming models applied to some standard problem (ShapeNets, DeepShape, DeepID, DevNet, DeepContour, DeepEdge, segDeep, ActivityNet). See a pattern? Are these people using vastly different approaches to solve similar problems? Who knows.

So what is the field going to do next year? Solve the same problem with the next hottest architecture? R-CNNs? even deeper? Some new networks with memory and attention modules? More importantly, do results get outdated the moment the papers are submitted because the next best architecture has already been released somewhere on arxiv, waiting for new benchmarking efforts? How do we track whether the numbers we are seeing reported are the latest numbers there are? Are papers really the best format to present this information and communicate progress?

These new trends in the computer vision are leaving us to think about a lot of very hard questions. It's becoming increasingly hard to predict where the field's going in a year, let alone a few years from now.

I think there are two emerging trends right now: more industry influence (all the big names seem to be moving to Google and Facebook), and more neuroscience influence (can the networks tell us more about the brain, and what can we learn about the brain to build better networks?). These two forces are beginning to increasingly shape the field. Thus, closely watching what these two forces have at their disposal might offer glimpses into where we might be going with all of this...

The Computer History Museum in SF

2015-06-17T00:03:00.000-07:00

The Computer History Museum in SF was great! It was a bit of a random stumble during a trip along the West Coast a few weeks ago, but it left a memorable trace! The collection of artifacts is quite amazing: name just about any time in computer history (ancient history included) and any famous computer (Babbage Engine, Eniac, Enigma, Univac, Cray, etc.) and some part of it is very likely at this museum. We totally assumed the museum would be a 2-hour stopover on the way to other SF sights, but ended up staying until closing, without even having covered all of it.

As a teaser I include a few random bits of the museum that caught my attention (I may have been too engrossed in the rest of the museum to remember taking pictures).

One of the oldest "computers": Antikythera mechanism - had never heard of it before! The Ancient Greeks continue to impress! Shows another timeless quality of humanity: our technological innovations are consistently driven by our need for entertainment (in the case of the Ancient Greeks, such innovations can be linked back to scheduling the Olympic Games). At this museum, there was a full gallery devoted to old calculators and various mechanical computing implements from different cultures.

A fully-working constructed version of Babbage's Difference Engine - completed in 2008 according to Babbage's original designs (which apparently worked like a charm without any modification!) Museum workers crank this mechanical beast up a few times a day for the marvel of the crowd. Once set, this machine can compute logarithms, print them on a rolling receipt, and simultaneously stamp an imprint of the same values into a mold (for later reprinting!) Babbage also thought of what happens when the imprinting fills up the whole mold - a mechanical mechanism halts the whole process, so that the tablet can be replaced! That's some advanced UI, developed without any debugger or user studies.

Based on the over-representation of this Babbage Engine in this post, you can tell that quite a bit of time was spent gawking at it:

By the way, here's a real (previously functional) component from the Univac. Unlike that panel with lights and switches at the top of this post. Apparently, that did not do anything. It was purely for marketing purposes for whenever the then-investors came around to check out this "machine" - much more believable that something real is happening when you have dashboard blinking of some kind and large buttons that serve no purpose but look "very computational". Looks like this continues to be a powerful marketing strategy to this day :)

Just a fun fact (no, this is not the origin of the word "bug", which is what I thought this was at first, but does demonstrate some successful debugging):

The following describes quite a few computer scientists I know:

There was a gallery devoted to Supercomputers and another gallery devoted to computer graphics. Look at what I found there - every Graphics PhD student's rite of passage (by the way, the Intrinsic Images dataset is sitting in my office, no glass case, but we will soon start charging chocolate to see it):

There was also a whole gallery devoted to robots and A.I. (an impressive collection), a gallery devoted to computer games, and a gallery devoted to the Apple computer just to name a few.

By the way, something I didn't know about the Apple computer - here is some awesome bit of marketing that came out in 1984:

There was a gallery devoted to the Google self-driving car. I like how this is in the Computer History museum, because really, you can't put any computer technology in a museum and assume it will remain current for very long. The drone in the corner of that room had a caption that mentioned something about possible future deliveries. Old news. I've seen bigger drones :)

That's about the extent of the photos I took, because photos really fail to convey the environment that a museum surrounds you with. It is a museum I would gladly recommend!

As an after-thought, it's interesting to visit a "history" museum where you recognize many of the artifacts. Gives you a sense of the timescale of technological innovation which continues to redefine what "history", "progression" and "timescale" really mean... notions that we have to regularly recalibrate to.

Google I/O Recap

2015-06-13T09:38:00.001-07:00

Announcements from Google I/O are increasingly popping up over the media.
Last year, after going to Google I/O I compiled a series of slides about some of the top announcements and some of the other sessions I went to: http://web.mit.edu/zoya/www/googleIOrecap.pdf
This year, I watched many Google I/O videos online, and I've compiled a small summary here: http://web.mit.edu/zoya/www/googleIO2015_small.pdf
As a researcher, I find it instructive to look to where such giants such as Google are moving in order to get a sense of which research directions and developments will be especially in need in the coming while. Thus, I look at the talks from an academic perspective: what are the key research questions surrounding every product? I tried to include some of these in my latest slides.

Why Google has the smartest business strategy: openness and the invisible workforce

2015-06-02T14:49:00.001-07:00

Google works on an input/output system. In other words, for everything that Google developers create, Google accepts input from users and developers around the world. Note that the latter group/community is orders of magnitudes larger than the former, so by harnessing the resources and power from the users and developers around the world, Google's Global footprint becomes significantly larger.

For instance, Google produces continuos output in the form of products and developer platforms, and accepts input in the form of development directions and most importantly, apps. By creating platforms on which developers can build on top of, Google harnesses the users that want the apps. The more that Google releases (e.g. SDKs), the more developers are looped in to create new apps, and the more users get pulled in to use the apps, thus acquiring the Google products in the process. Thus, the number of people around the world that are increasing the consumer base for Google products far extends past the number of Google employees.

In fact, the number of people indirectly working for Google is huge. Consider the Google Developer Groups (GDGs) that can be found all around the world - independent organizations of developers and enthusiasts that get-together to bond over Google's technology (they also give Google product-related talks and host help sessions for their local communities, all on their own time). What's in it for the members? Members of GDGs have the support and network of individuals with similar interests. Google wins by having a Global network of communities that are self-sufficient and self-reinforcing and do not require Google support or investment. Google Trusted Testers are non-employees that test beta products for Google. What's in it for the testers? First-hand experience with Google products. What's in it for Google? A workforce for whom being "first to try a product" is sufficient reward. The Google Student Ambassador Program gives college students an opportunity to exhibit leadership by acting as a liaison between Google and their home institution, putting on Google-supported events (information sessions, hackathons, etc.) and forming student communities. The student ambassador's motivation is a nice line on their resume and great experience communicating with both industrial and institutional personnel and organizing events. Google wins by being promoted on college campuses and having easier avenues for student recruitment... all for the price of providing some Google-themed freebies at college events. Then there's all the other smaller organizations that are not directly supported by, but have affiliation with, Google. For instance, the Google Anita Borg Alumni Planning Committee that I am part of is devoted to increasing visibility and interest in computer science among minorities and help promote diversity in computer science education. We, as a group of females distributed Globally, start initiatives and put on events (such as the following) in our local communities to advance these missions. Google provides the branding. We win through affiliation with Google, Google wins through affiliation with philanthropic organizations. These are just a few of the organizations and communities that are affiliated with but not directly supported (at least financially) by Google. In fact, Google does not need to directly support or control/govern any of these communities precisely because they are self-sufficient and self-motivated - a big win for Google, given the limited investment.

Now consider the yearly Google I/O conference that draws over 5,000 attendees. Many of these attendees are developers who come to the conference to hear first-hand about new product and platform releases (and participate in hands-on workshops with the Google product developers themselves). These developers then bring this knowledge back to their communities, and contribute their own apps and products to the Google community. Each year, at this conference, Google announces new support infrastructures to make the use of Google products increasingly easier (this year, for instance, Google announced new OS and language support for the Internet of Things so that developers can more easily add mobile support to physical objects - think: the smart home). Correspondingly, the number of Google product-driven apps increases and expands. Users of apps buy Google products and services and continuously provide feedback (either directly through surveys or indirectly by having their interactions and preferences logged on Google servers). Thus, we are all contributors to the growth of the Google footprint.

What can we infer from all of this? Google is firmly rooted in our societies and is here to stay. The number of people supporting, improving, and building on top of Google products is huge - it is Google's invisible workforce. Thus, Google will continue to grow and improve at great speeds.

What lesson can we learn from all of this? Being open (i.e. in terms of software and even hardware) can allow a company to harness the power of other developer and user communities, thus increasing the size of the effective workforce that builds the company's products, directions, and reputation. Google has one heck of a business strategy.

Freeing humans from redundancy

2015-05-31T13:48:00.002-07:00

This has been, and should be, the ultimate goal of mechanizing and automating the world around us. The human mind is too precious of a resource to waste on any types of repeating and repeatable actions, and we've been working on automating the latter since the Industrial Revolution. With modern A.I., more of this is becoming possible.

Consider showing a robot once how to clean the window - defining the physical boundaries, and indicating preferences for the action sequence: you can perform this specification of actions once, the robot repeats this for a specified period of time (e.g. an hour), repeating this sequence at regular intervals (say, once a week). Consider, in such a way, seeding a large variety of actions - watering plants, making mashed potatoes for dinner, tuning the bike, drycleaning your suits, etc. I am not imagining a single robot with the A.I. to do all these tasks (not to mention knowing when to do them) - I am rather imagining an army of simple machines that can be individually programmed by their owners (programming not in the coding sense, but in the show-by-example sense).

You put on your VR (virtual reality) headset while sitting on your hotel bed in SF, you log into your Boston home, seeding a bunch of actions through the FPV cameras on your machines (machine 1 will water the plans on your balcony, machine 2 will set some bread baking, machine 3 will scan some documents for you after finding them in the relevant folders on your shelf). You do the seeding for your cottage country house on Long Island as well. In such a way, not moving off your bed, you have now prepared your Boston home for your arrival tomorrow, and have checked in on your cottage. Here's the critical point: none of these machines has had to be hard-wired for your house or for any of the actions you have assigned them - they simply have the capacity to learn an action and a schedule for it (which does not require any complex A.I. and is completely feasible already). It is up to you to make the difficult human decisions of setting the schedule - when and how much to water the bonsai and the petunias, how long to wait before the bread has just the right crust for your taste, which of your clothes need special attention during dry cleaning, etc. Then the machine executes a repeatable sequence. If a condition arises for which the machine requires a decision to be made, you are pinged. Next time this condition arises, the machine has a stored solution. With time, your army of machines has been customized to all your preferences, and you have been taken out of the loop for anything that does not require an expert opinion (yours) or an indication of specific preferences (yours as well). Your mind becomes freed from anything at all repeatable or redundant, with its capacities available for the decision making and heuristic decisions that are the hallmark of human intelligence. You spend your time delivering instructions and managing outputs with utmost efficiency.

I think we will much sooner see this type of future with simple customizable learning agents than one with the courteous all-in-one robotic butler you see in the movies. In fact, you can already remotely control your house heating via Nest and your music system via Sonos, as two examples, all from your Android devices (cell phone, watch). The next step is simply the augmentation of your control options (from clicks and menus) to actions with as many degrees of freedom as your arm gestures permit via a VR device. This puts a larger portion of your household devices and tasks at your virtual fingertips. The Internet of Things is here.

Filling the internet with more cool science

2015-04-30T15:17:00.001-07:00

So the second year of the extreme science SciEx video competition has come to an end, with a new series of cool videos to show for it: http://sciex.mit.edu/videos/. The goal of this initiative is to take more direct steps towards making science and engineering catchier and more (socially) shareable than say... videos of cute kittens. Videos related to science should not be left to sit in some isolated corner of the internet, to be found only by people who were already looking for them... rather, they should be sprinkled around in every shape and form - extreme, cool, artistic, you name it. They should be able to affect, in some positive way, all sorts of individuals with diverse interests and personalities. Why? Because science teaches us to think, and individuals capable of thinking makes society a better place. More sharing of science will lead to a greater appreciation, at least a little bit of extra understanding, and a potential reduction of ignorance, about science. If nothing more, by actively sharing science we promote positive attitudes towards it (even if we don't always succeed at longer-ranged interest)... and positive attitudes lead to positive change... in people, in society, and in government. We need to educate the next generation of scientists, and we can contribute to this mission in the subtlest of ways and by taking small steps - including having more science and engineering videos circulating on the web. Let's show the world (the next generations) that science can be as extreme as an extreme sport, as beautiful as a work of art, as releasing as a dance. Let's celebrate scientists and engineers for the rockstars that they are. Let's cheer for them louder than we cheer for football players - because they are the ones changing the world we live in.

Educational nuggets

2015-02-15T10:24:00.003-08:00

Quite a while back I attended an MIT Task Force Retreat on Digital Learning. Numerous talks and discussions were given (by various internal MIT groups and committees) about the future of online education and the issues surrounding it. One concept that stood out to me was that of "educational nuggets". I think this is very suitable (and sticky) terminology to describe the bite-sized educational modules - like the 5-10 minute lectures - that have become a popular medium for online courseware and educational websites such as Khan Academy.

The idea of bite-sized lectures comes from educational research showing that a student's attention span does not extend much, past about 10 minutes. A sad truth. That is not to say, however, that a student can not internalize concepts past 10 minutes (if that was really the case, then school systems would not work at all). Rather, efficiency of learning goes down, and more mental effort needs to be expended to stay attentive - instead of, say, all that mental effort being channelled to learning the concepts.

So, it seems to be most effective to present material to students for about 10 minutes at a time, and then break up the stream by giving students some time to think about the concepts, asking students questions (or asking for their questions), providing a quiz module, initiating a discussion (if applicable), etc. This allows students to more actively internalize the material, apply the concepts, and check that they have understood the past 10 minutes worth of content.

The additional advantage of splitting educational material into nuggets is to break up a course into little, self-encapsulated, independent units. To go back to the previous post, this provides a means for customization: both for the individual student, and for the individual course. Imagine the course of the future: you are a biologist looking to brush up on statistics. Instead of pointing you to a full course offered by the statistics department, or instead of having to specifically design a course on statistics in the biology department, you could be given a set of "nuggets" to complete. These nuggets could come from different places - from the statistics department, from the math department, from the biology department - such that when they all come together, they give you - the biologist - the statistics knowledge you need, in the right context, with maximal relevance.

The concept of educational nuggets naturally raises some questions: is everything really nugget-izable? what about basic courses like calculus that need to be taken in full? who will decide what goes into a nugget? can many small nuggets really be equivalent to a course?
I think if we become more accepting of this form of education and the benefits that we can glean from it, then the answers to these questions will start to emerge through discussions.

The bigger philosophical question is whether we are changing too much, as a human species, and becoming too ADD with all the bite-sized facts, bite-sized tweets, bite-sized news, and potentially bite-sized education thrown at us. Like many things, this is a two-edged sword -- and like the related notion of multitasking, can either make or break productivity and long-term memories and understanding. The related benefits/downsides of multitasking will be left for a future post...

Education as customized paths through knowledge graphs

2015-02-08T10:17:00.000-08:00

Lately very frequently I've been involved with, and witness to, discussions about the upsides/downsides of online learning in comparison to traditional classroom learning. I'd like to summarize a few of my main views on this point.

The traditional classroom has a 1:N ratio of teachers to students, where N grows large for many basic-level courses. Tutoring can provide a 1:1 ratio, and has been found (by multiple quantitative studies) to be more successful at getting concepts across to students. Why? Tutoring provides customization to the individual, and thus can build off of the knowledge base of that individual. New information can hook onto any existing understanding the individual already has, and this is what can let the concepts stick. New concepts become more tightly intertwined with what the individual already knows (and perhaps cares about), and are thus more relevant than concepts presented in the most general setting, with no customization.

In a recent talk of Peter Norvig's that I went to (Norvig is originator of MOOCs: massive open online courses), he indicated that even artificial tutoring systems can have the same benefits as human tutors, with statistically-significant benefits over the traditional classroom. This is very promising, because artificial tutoring is a potentially infinite resource (unlike the finite number of good-quality human tutors). In the same talk, Norvig put up a slide of a dense knowledge graph of all the information that can be available to a student on a particular topic in a particular course(s). He drew some squiggly lines through this graph, standing in for unique paths that could be taken through that material. This is the same visual representation of customized learning that I envision, and deeply believe in, for the future of education.

There is no reason why different individuals should take the same paths through learning. Different types of information may be relevant to different people, and a different ordering of material may make more sense to some individuals but not others. Naturally it should be possible to constrain which points an individual should definitely pass through for a particular course/subject matter (to cover the fundamentals), but the paths themselves should be less constrained. This is the diversity that I referred to in my previous post, which is why I believe that online education is the way forward.

We already have almost all the tools to make this a reality: (1) sophisticated machine learning algorithms that can pick up on trends in user data, detect clusters of similarly-behaving individuals, and make predictions about user preferences; (2) thorough user data through logging and cloud storage, integration of physical and virtual presence and social networks, integration of all of a user's applications and data (and the future of the "internet of things"), universal login systems, etc.

Thus, the question is only one of time.

The Popping Rate of Knowledge

2015-02-03T18:21:00.001-08:00

I use the term "popping rate" to refer to the amount of novel/interesting/useful material gained in a given time period. If you're used to microwaveable popcorn, you know that as the popping rate decreases past a certain point and pops become rarer - if you don't take the popcorn out of the microwave, it will fry. I think the same goes for my brain when it is trying to suck up knowledge. If I'm watching a really interesting documentary, reading a good nonfiction book, or listening to a captivating talk, I can almost feel the new knowledge and facts pop and fill my brain. I consider the time well-spent if the popping rate is above a certain threshold... however, if the pops become too rare, I feel my brain frying under lack of stimulation. That is when I know to turn off the TV, put down the book, or zone out of the talk... and pursue an activity with a higher popping rate.

In fact, I've found that quantifying the informational/factual content of something using some notion of pops per minute or pops per hour (referring to # of novel bits of information/facts gained during that time), provides a useful frame of comparison between activities.

Increasing diversity in computer science

2015-01-31T09:25:00.000-08:00

I recently organized a panel of MIT computer science researchers to answer questions about computer science to an audience of high-schoolers (http://web.mit.edu/cs-visit-day/qa.html). A lot of interesting discussions came out of it, and a lot to digest for panelists and audience alike.

One of the things that did stick out to me was how many of the panelists did not like their first formal training in computer science (in school, in college). I can't say I was terribly surprised, and I'd be interested to see such a survey done of the broader CS community (e.g. at a research institution) to poll about initial experiences and attitudes.

Here is where I think the problem lies: computer science courses tend to cater to a very narrow audience - maybe the type of audience that likes computer/video games, or the type of audience that likes tech gadgets, etc. Not that you can avoid it: you have to start somewhere - with some (salient) example or application or first program. But once you've settled on something, you might get one group of individuals hooked, but you'll also automatically repel a lot of other people (who are not interested in the application area you've chosen). If that was the only/first opportunity those people had to learn computer science, they might decide they don't like it and never pick it up again - which would be an enormous shame!

What's the solution? Introducing more variety and choice into the computer science curriculum - tailoring it to different tastes (and personalities!). Cater it to people who might like biology or psychology or architecture or design, and show them that computer science can provide them with a toolset, a simulation/virtual environment to test their ideas, a cool exploratory possibility. I believe this is the way forward to increasing the diversity of people in the field of computer science.

In practice, having a lot of variety in a computer science curriculum may not be possible (consider a school with a single programming course and a single teacher to teach it)... in this case, I think online education with its possibilities for individual customization, can come to the rescue... more about this later.

Externalizing creativity

2014-11-12T16:42:00.002-08:00

To follow-up on my previous post, I watched a relevant Ted talk the other day: http://www.ted.com/talks/elizabeth_gilbert_on_genius

A popular writer describes why being an artist, or more generally, being creative is so hard on a person - you get subjectively evaluated for your creations, you have to worry about not having enough creative ideas to fuel you further, you continuously have to compete with yourself (specifically your own creativity and imagination). And all of this can lead to depression (a depressed creative genius is not a rare sight).

She mentions that the ancient Greeks and Romans got it right: they attributed creativity to an external spirit that either did a good job or not. This is the type of thinking that could shield an artist from his creative genius, making creativity an external factor. A successful artist need not suffer from narcissism, an unsuccessful one need not blame himself (rather, blame the lame spirit).

I came to the same conclusions about the difficulty of being a grad student, and for the same reasons. Grad students are creative individuals, that first have to conjure up new problems... and then, solve them. Learning to separate oneself from one's ideas can do wonders for sanity.

Here's a possible approach: when you come up with an idea, step aside from it (go as far as sending yourself an e-mail with the idea or leave an "anonymous" note on your own desk). Pick up the note, read the idea, critique it (praise the idea, or politely refuse to follow through with it). If the idea doesn't end up succeeding in the long run, shrug your shoulders: the anonymous note writer might leave you a better idea next time...

----
post scriptum: I often catch myself unable to fall asleep because all the good ideas start coming at 1 am. There's many of them, and they're all scattered. Paralleling Liz Gilbert's story about the poet on the road, I want to shake my fists at all this "external" creativity and yell: "just let me sleep once in a while, will ya? come back tomorrow morning when my brain can sort through all this")

The double-think game of being a grad student

2014-11-12T16:09:00.000-08:00

Many people have written about grad school, often in a comical (even satirical) light. Perhaps I will leave my own detailed grad-school comments for after I graduate (it's safer that way :))

But my view of why it can be so difficult on a person, is that the issues faced and the pressures that exist are more often internal, rather than external. And internal issues are the worst kind of issues to deal with. Why? you are your own boss, you know all the weak spots and how to press on them. Probably no one can be as severe to you as you can be to yourself. Nothing you do is enough.

When an issue/pressure is external, you can distance yourself from it. An internal issue eats you from the inside, and there's no hard shell you can put up in your defense.

As a grad student you have to engage in a very difficult kind of double-think game: you have to care enough about your problems to push through them, but you have to know how to distance yourself from them at the right time, not dwell, and not let them get to you.

The good news is that grad students are the types of individuals that love challenges, and learning to play this double-think game is just one more challenge to be conquered.

Rabbits have good eyesight... they don't stare at computers

2013-10-14T09:36:00.001-07:00

It is no wonder that when spending most of the day staring at the computer, ones eyes get progressively worse at seeing distant objects: after all, you are increasingly training your eyes to read up close, and losing the need to see something far away. Recently, this fact hit me hard when I went to the grocery store and realized I could no longer make out the aisle banners. Those banners are made for average people, with average eyesight, and I am not used to being below average!

It turns out that exercising your eyes regularly can help prevent some deterioration (or at least the type that is caused by spending most of the time in close focus, without training a farther focus). For some ideas, check out this page: http://www.wikihow.com/Exercise-Your-Eyes

In an effort to train my own eyes, I covered my office with those eye charts you would see at the eye doctor's. I put these on multiple walls, at different distances and different heights, all around my office, and in the corridor across from my office. I now force myself to take regular break from my computer, to look up and read out some of the letters on the various walls.

When there is no horizon outside your window (but the wall of the nearest building), you have to improvise and create your own horizons to train your eyes on.

It is ok that I eventually learn which letters are on which walls - because the point is not to to read them out from memory, but to force one's eyes to follow the letter contours. Even if I know there's an "E" there, it's hard to find all its pieces.

Ever since starting this, I've started noticing my eyes getting tired at staring at the screen (something I haven't noticed, or probably ignored previously) - a sign that my eyes know they should be doing something else as well. Moreover, I can track my progress by noticing which letters my eyes can make out. Similarly, I can see how my eyesight degrades throughout the day (as my eyes get more tired), if I don't do the exercises.

In any case, even if this type of eye training will not save me from the expected deterioration due to excessive computer usage, at least the regular breaks of looking away from the monitor will help my sanity.

"the web, the chain, the tree"

2012-09-17T16:00:00.000-07:00

Last week I went to a talk by Steven Pinker on elements of style (http://web.mit.edu/nse/events/communicating-science-and-technology.html). Of the whole talk, what I found most memorable was his description of the data structures for the representation of information: as he puts it - 'the web, the chain, the tree'. Our ideas are stored as an interconnected web, and we are often faced with the dilemma of writing them down in a linear structure (one sentence at a time, transitioning between at most two ideas: the previous, and the current sentence). To do this, we arrange our web into a hierarchy of ideas, which then dictates the overall order in which we can express it. In other words, we go from web to hierarchy, and then parse the hierarchy to arrive at a linear structure which we can put down on paper. This is a great way to put it! I've always been faced with the dilemma of wanting to say too many things at once, transitioning from any one idea to ten others, but have always been constrained by the necessary linear ordering of paragraphs. By the time you're done explaining the connection between the first and second ideas, you've lost a possible transition from the first to the third ideas (often requiring a return to the first idea and a new transition). I don't think this linear structure serves as a great representation of the ideas I want to express - I'd much rather hand in my essays in web form...