Jerin Philip

Education

International Institute of Information Technology Hyderabad, India

Aug 2014 - Aug 2020

B.Tech and MS by Research in Computer Science and Engineering

Thesis: Neural and Multilingual approaches to Machine Translation in Indian languages

Work Experience

University of Edinburgh, Research Assistant, Edinburgh, United Kingdom

Sep 2020 - Present

Research engineering role part of the Bergamot Project. Collaborating with academic and industry partners to build a browser extension that uses neural models to translate on the client machine.

Largest individual contributor to bergamot-translator C++ library. Implemented multithreaded library code operating with tight memory requirements from scratch. Integrated testing and documentation tooling, and infrastructure supporting continuous integration (GitHub Actions, Jenkins) along the course of development.
Implemented optimized assembly using SIMD intrinsics on ARM to provide a fast inference backend following specifications from Mozilla Firefox web browser. Currently working on integrating the feature as a backend to the open-source marian-dev library.
Built and currently maintaining Python bindings for interoperating with C++ library enabling faster experimentation for non-performance-critical features.

Advised by:
Kenneth Heafield

NAVER LABS Europe, Research Intern, Meylan, France

Feb 2020 - Jul 2020

Investigated parameter efficiency in multilingual neural machine translation. Explored the use of adapter layers in already trained large models as a means to achieve performance in domain adaptation and fine-tuning without degrading performance on the original within a limited training budget. Work published at EMNLP 2020 and a patent application in progress.

International Institute of Information Technology, Research Assistant, Hyderabad, India

Aug 2018 - Dec 2019

Research experience spread across machine learning, text recognition and machine translation.

Multiway Neural Machine Translation for Indian languages: Extended fairseq library with multilingual support to train multiway NMT models for 10 Indian Languages. Used the trained models to create a feedback loop to acquire more data. Published at LREC 2020 and ACM CODS-COMAD 2021.
Text Detection and Recognition: Worked on the digital library project in the text group in CVIT. Provisioned SWIG based bindings to Python from C++ for rnnlib. Upgraded this OCR infrastructure to a modern deep-learning stack to use PyTorch. This was used in further experimentation on optimizing human annotation cost in a large digitization effort and published at ICDAR 2019.

Select Publications

Revisiting Low Resource Status of Indian Languages in Machine Translation (ACM CODS-COMAD 2021)
Monolingual Adapters for Zero Shot Neural Machine Translation (EMNLP 2020)
A Multilingual Parallel Corpora Collection Effort for Indian Languages (LREC 2020)
A Cost Efficient Approach to Correct OCR Errors in Large Document Collections (ICDAR 2019)
Towards Automatic Face to Face Translation (ACM-MM 2019)

Select Projects

bergamot-translator

C++ Python CMake SentencePiece pybind11 Javascript WebAssembly marian

Neural Machine Translation (NMT) on the client machine. Built on top of Marian NMT library. Powers Mozilla Firefox's privacy-focused local translation feature.

MozIntGemm

C++ Assembly ARM NEON

Integer GEMM wrapper library built on top of google/ruy towards capability of efficient matrix multiplications for neural models on ARM CPUs.

ilmulti

Python Flask PyTorch

Multilingual NMT models - data collection, training and inference for translating between 11 language pairs spoken in the Indian subcontinent.

lemonade

C++ iBus

Input Method Engine (IME) enabling real-time translation into a target language while typing in the source language. Application use-case of bergamot-translator.

Parallel SVD

C++ CUDA CUBLAS Jupyter Python

Singular Value Decomposition implementation for CPU and GPU using Householder reflections and Givens Rotations. Implemented as part of Introduction to Parallel and Scientific Computing Course.

MaskGAN.PyTorch

Python PyTorch SentencePiece

A PyTorch reimplementation of MaskGAN: Better Text Generation via Filling in the _______ William Fedus, Ian Goodfellow, Andrew M. Dai.

tagtransfer

Python Flask

Experimental Python code written to evaluate and aid HTML translations feature to bergamot-translator. Implements a google-translate web page feature to provide fast visual feedback on the upstream library's HTML translation capabilities.

Matching Handwritten Document Images

Python PyTorch OpenCV

Using representations learned by a Convolutional Neural Network trained to classify handwritten words, built a content-matching system for handwritten documents. Worked with text-detection systems followed bipartite matching and IR noise removal techniques to match regions of one document to another.

Multilingual Sentence Embeddings

Python PyTorch faiss

Stacked LSTM encoder and decoder with a bottleneck at the sentence encoding in order to obtain language-agnostic sentence representations. Implementation in PyTorch supports multi-node multi-GPU parallelism.

Movie Recommendation System

Django sklearn pandas

Low-Rank Approximation through SVD++ providing recommendations through a web application built in Django.

flat-b compiler

C++ yacc LLVM

Undergraduate course project. Parsing, interpreting, and codegen via LLVM. Implemented parsing using YACC. Built an interpreter. Further extended the implementation to emit LLVM IR to be compiled into machine code.

MiniSQL engine

Python SQL

Mini SQL engine that processes SQL queries. Implemented a parser that parses a subset of SQL using pyparsing to an abstract syntax tree (AST). Query executed as an evaluation of parsed AST.

Volunteering

University of Edinburgh, GPU Cluster Administrator, Edinburgh, United Kingdom

Sep 2020 - Present

Helping out in maintaining Valhalla GPU cluster of about 20 machines managed by the group to run machine learning experiments. Responsibilities include replacing hardware like GPUs and hard disks, provisioning software updates to ensure the smooth running of the cluster.

The Wee Spoke Hub, Bicycle Mechanic, Edinburgh, United Kingdom

Sep 2020 - Present

Fixing donated bicycles that are in need of repair before they can be reused.

International Institute of Information Technology, Cluster System Administrator, Hyderabad, India

Jan 2017 - Dec 2019

Configured and deployed SLURM and supporting ecosystem on lab cluster of 20 High Performance Computing (HPC) nodes. Maintained software capable of running deep learning experiments on the machines, supporting a community of ~100 research scholars. Also volunteered for moderation duties for the larger university HPC cluster.