Monthly Archives: October 2020

first steps towards the first triangulation

hand detection through AI

The research question I am exploring in this first triangulation comes from the visual essay I produced at the end of last year and it revolves around the possibility to interpret the gestures of the handshake as prearranged signs. My original idea was to create an archive of sorts storing a substantial number of journalistic photos and videos of politicians in the act of greeting each others and shaking hands. Through the archive it would be possible to highlight analogies and differences among the gestures and eventually organise a taxonomy.

The method I chose to adopt is to implement either a tool or an algorithm in order to create a series of subsequent iterations. This is the method I am less familiar with among the ones proposed; however, it is the one which strikes my interest the most. Moreover, I believe this initial exploratory phase to represent a good moment to take some risks and possibly gain a new skill.

Last year I tried to mask a video of a meeting between Macron and Trump in order to put in evidence their hands and the gestures they were tracing in the space. At that time I used the technique of motion tracking in After effects, which soon turned out to be extremely time demanding, so I set it aside.

Recently I found a quite new JavaScript library called Handtrack.js.

“Handtrack.js is a library for prototyping realtime hand detection (bounding box), directly in the browser. Underneath, it uses a trained convolutional neural network that provides bounding box predictions for the location of hands in an image. The convolutional neural network (ssdlite, mobilenetv2) is trained using the tensorflow object detection api” (source https://github.com/victordibia/handtrack.js/)

I delved into the library and following some online resources I managed to create an app which using the user webcam is able to detect the presence and position of hands in the video.

The detection is pretty accurate but it loses precision when the distance increases. This could depend on the quality of the video streamed through the webcam. For my purpose it would be necessary to verify whether with a HQ video the app can still detect hands from afar, as my intention now would be to upload the videos in the app, isolate the hands of the politicians and finally export a series of still frames which can enrich the archive. Further explorations from there could be to develop other lines of code that can find similarities among the gestures.

So far, with my knowledge of coding I achieved to pass the videos of the politicians to the app but inexplicably no hands are detected. To assure the problem were not the videos of the politicians I also used another video where two hands are very close to the camera and perfectly visible – even in this case there was no detection. This makes me assume there is an error in the code which I hope to solve soon.