Natural Language Image Search with Yahoo Boss and Google App Engine

Natural language processing is partly used in text search today, but its use in image search is mostly unexplored. I did a quick hack: askBoss, which retrieves images to questions posed in natural language. askBoss attempts to enhance image results for queries around factual question answering. It uses Yahoo Boss (Search) APIs through Boss Mashup framework and is deployed on Google App Engine.

This hack is an extension of Vik Singh’s qna service, which finds answer using the popular phrases in the top search results for a query. I do image search for the best answers and blend them with the regular image search results. The hack is a basic prototype and natural language image search gets triggered only for questions (queries including who/what/which).

Try askBoss:

Below is a quick comparison of search results obtained by askBoss, Google image search and Y! image search for query: who is batman in the dark knight?

askBoss results: who is batman in the dark knight?

askboss: who is batman in the dark knight

Google image search results: who is batman in the dark knight?

google: who is batman in the dark knight

Yahoo Image search results: who is batman in the dark knight?

yahoo: who is batman in the dark knight

Try askBoss:

With Yahoo Boss APIs and a deployment platform like Google App Engine, building a decent search service is pretty easy. I could finish this hack within a few hours by using Boss Mashup Framework and App Engine. Apart form the qna service, other popular Boss API/app engine integrations include 4hoursearch aka YUIL.

Update 1: askBOSS got covered in TechCrunch and Yahoo Search Blog.

Update 2: Now running on BOSS v2 and code open sourced on github

Read More

The new search era, where are we?

I have been damn lazy to write this post, but recently after reading a RWW post, 11 Search Trends That May Disrupt Google, I decided to gather my thoughts here.

Adding to the RWW post, let me try to bring up some minus and plus of todays search era ruled by Google & partly Yahoo/MS.

Things which are still not touched efficiently by the popular search engines:

  • Natural language Processing:
    • We would like all questions like “Which is the world’s tallest mountain peak?” to be answered on search: Google/Yahoo could not answer it, but to my surprise did it! Still, we have to wait for a breakthrough.
    • Ignoring stop words, doing word stemming, etc. can really change the meaning significantly. For eg, searching Apples on google, returns results mainly for Apple Inc.
  • Multi-lingual search: Web being driven with focus on US market, problems of the rest of the world (especially eastern world) do not really get sufficient attention. Today’s web search experience does not have multi-lingual features!
    I spent couple of years during my masters at Media Lab Asia, IIT Bombay, under Prof. Krithi, with people working on Multi-lingual search for project Multi-lingual search works pretty good here. Try searching “onion” or “कांदा“, you get identical results :) . Such a search experience on the whole web, will be awesome!
  • Treatment of Symantec data: Lot of standard formats have emerged like RDFs, microformat, RSS, etc, but still they are treated in almost same way as other web pages.
  • Personalization & Data mining: There are a few signs of google personalizing the results. But, nothing significant yet!
  • Multimedia search: None of the search engines is doing a great job here, which is attributed to complex and computationally expensive image processing. But, pretty significant research is up for the same in Google, Yahoo and Microsoft. A recent publication in WWW08 from googlers suggested a concept ImageRank, similar to PageRank which can actually work well.

Some of the cool innovations in todays search:

  • Improved UI/visualizations: UI innovations are the most prominent amongst all. Here are a few set of examples:

  • Openness/APIs: Google/Yahoo have been pretty open in terms of providing search APIs, applications, etc. Want to experience google search in a terminal: Try out
  • Specialized searches like: local/maps: Local/maps and other focussed searches like publication search, patent search, etc. are doing pretty good. Directions are now available in India also with Yahoo Maps the only provider :)

What else can be tried on search?

There are couple of things things which I think can work for search, but we need to overcome spam problems for these:

  • WikiSearch: Allow users to tag/rank search results. Something like digg/delicious for keywords…
  • Push based update notification model: Search results are not uptodate. Even for popular pages they lag by few days. Introduce a push based model, something like, it can help?

Update: Nov 20: Google has released SearchWiki, my first suggestion/prediction comes true :)

Disclaimer: All opinions are solely mine and and do not necessarily reflect the opinions of my employer.

Read More

Flickr Downloader

Yesterday, a friend of mine was looking for a way to download original photos from one of his sets on flickr. Since we could not find any flickr downloader which can run on Linux, (for windows you can try :, I quickly wrote a script using flickr APIS. The sed and awk power made it v. easy :)

Below is the script:

SET=”<Set – id >” # Enter SET ID here from which photos have to be downloaded, for eg. SET=”72157604130281022″
APPKEY=”" #Your APP key here, get one from
curl “
api_key=$APPKEY&photoset_id=$SET&extras=original_format&per_page=500″ | sed ‘~s/title=”[a-zA-Z0-9_ :) ?(.]*//g’ | awk ‘/id=/ {print “”$4″/”$2″_”$8″_o.jpg” }’ | sed ‘~s/\(server=\|originalsecret=\|id=\|”\)//g’ > p
wget -i p
rm p

You can also download this script by clicking here

Update: Script does not work for video download.

Read More

Yahoo! acquires IndexTools

Early this week, Yahoo acquired IndexTools, an analytics company. IndexTools offers tools for monitoring and analyzing websites.

The interesting thing, which happened the very next day (10th April) after the acquisition, was an email send by Google Analytics team to all (or many) analytics users notifying about the benchmarking feature, which was launched on March 5th!

The email stated:

“We are writing to let you know about a change in our service offerings. If you have logged into your account recently, you may have noticed that you can now choose to share your Google Analytics data. … We’re also happy to announce industry benchmarking as the first new feature available …. Benchmarking lets you compare your metrics against industry verticals….”

I feel the email, on the next day of acquisition was not a coincidence, rather Google wanted to try its best to avoid any chunk of users migrating to Yahoo/IndexTools. If Google just had to notify the analytics users about the service change, they must have done this a month back.

Read More

Add more life to your photos!

Last week, Flickr introduced the long awaited feature to add video clips.

Flickr videos come to users with few restrictions:
1. Videos can be uploaded only by pro members though anyone can see them
2. Videos can be atmost 90 seconds long
3. No more than 150 MB per video. (Well, I do not see any 90 sec video going beyond 150MB)

The restrictions are more seen as a way to avoid illegal videos coming in and infringing copyrights. The goal is not to have another youtube but rather a place where you can upload videos that you have created.

As noted by Michael Arrington, the feature to play videos from thumnail screen is just awesome!

Videos can also be embeded in webapages just like this:

We at MyBlogLog, immediately updated “New with me“, to show the distinction amongst the flickr videos and photos.

Mostly videos have been taken up as positive move. But, there are segments of flickr users, who are opposing the videos. They have started groups on Flickr opposing videos (like No video on flickr) and have been posting photos to support their protest. Techcrunch has even started a poll: “Do you support video on flickr“? As of now, people supporting video are 14% more than the counterparts.

Read More