ICSI Works with Yahoo Labs and Lawrence Livermore Lab to Offer Analytics Tools for Over 100 Million Flickr Images and Videos

50TB computing program runs analysis on the entire Flickr Creative Commons dataset, one of the largest public multimedia datasets ever released to the public

July 3, 2014
ICSI today announced a collaboration with Yahoo Labs and Lawrence Livermore National Laboratory to process and analyze the recently released Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset, a publicly available corpus of user-generated content comprising more than 100 million images and videos.

ICSI has developed a number of research tools to extract meaning from the vast amounts of multimedia data freely available online, giving researchers the ability to draw powerful conclusions from the data. Such work includes:

  • Audio and visual recognition techniques that can reliably identify the geographic location of a video or photo’s origin point.
  • Video concept detection, which uses acoustic analysis and segmentation of similar sounds to treat sounds like keywords, making it possible to reliably search abstract concepts like “baby catching a ball” or “animal dancing to music.”

ICSI is collaborating directly with Lawrence Livermore Lab to process the massive dataset using the lab’s supercomputer, the Cray Catalyst.

“The media that people choose to upload with a Creative Commons License are full of information: they tell us about the people in them, where they are and what is happening, even if none of that is explicitly laid out,” said Gerald Friedland, research director of Audio and Multimedia at ICSI. “ICSI’s sophisticated computing tools help us make sense of that data at scale, and there is so much we can learn by fully leveraging the rich Creative Commons dataset that Flickr has amassed over the past decade.”

The dataset can be requested through Yahoo’s Webscope program here, and ICSI’s research analytics tools will be hosted on an Amazon instance via ICSI’s web site in August of this year.

The development of ICSI’s research tools are supported by grants from the NSF, NGA, and IARPA’s ALADDIN program. Read more about the Audio and Multimedia group's work.