Jerin Philip

Education

International Institute of Information Technology Hyderabad, India
Aug 2014 - Aug 2020
B.Tech and MS by Research in Computer Science and Engineering
Thesis: Neural and Multilingual approaches to Machine Translation in Indian languages

Work Experience

University of Edinburgh, Research Assistant, Edinburgh, United Kingdom
Sep 2020 - Present

Research engineering role part of the Bergamot Project. Collaborating with academic and industry partners to build a browser extension that uses neural models to translate on the client machine.

  • Largest individual contributor to bergamot-translator C++ library. Implemented multithreaded library code operating with tight memory requirements from scratch. Integrated testing and documentation tooling, and infrastructure supporting continuous integration (GitHub Actions, Jenkins) along the course of development.
  • Implemented optimized assembly using SIMD intrinsics on ARM to provide a fast inference backend following specifications from Mozilla Firefox web browser. Currently working on integrating the feature as a backend to the open-source marian-dev library.
  • Built and currently maintaining Python bindings for interoperating with C++ library enabling faster experimentation for non-performance-critical features.
NAVER LABS Europe, Research Intern, Meylan, France
Feb 2020 - Jul 2020

Investigated parameter efficiency in multilingual neural machine translation. Explored the use of adapter layers in already trained large models as a means to achieve performance in domain adaptation and fine-tuning without degrading performance on the original within a limited training budget. Work published at EMNLP 2020 and a patent application in progress.

International Institute of Information Technology, Research Assistant, Hyderabad, India
Aug 2018 - Dec 2019

Research experience spread across machine learning, text recognition and machine translation.

  • Multiway Neural Machine Translation for Indian languages: Extended fairseq library with multilingual support to train multiway NMT models for 10 Indian Languages. Used the trained models to create a feedback loop to acquire more data. Published at LREC 2020 and ACM CODS-COMAD 2021.
  • Text Detection and Recognition: Worked on the digital library project in the text group in CVIT. Provisioned SWIG based bindings to Python from C++ for rnnlib. Upgraded this OCR infrastructure to a modern deep-learning stack to use PyTorch. This was used in further experimentation on optimizing human annotation cost in a large digitization effort and published at ICDAR 2019.

Select Publications

Select Projects

C++ Python CMake SentencePiece pybind11 Javascript WebAssembly marian

Neural Machine Translation (NMT) on the client machine. Built on top of Marian NMT library. Powers Mozilla Firefox's privacy-focused local translation feature.

C++ Assembly ARM NEON

Integer GEMM wrapper library built on top of google/ruy towards capability of efficient matrix multiplications for neural models on ARM CPUs.

Python Flask PyTorch

Multilingual NMT models - data collection, training and inference for translating between 11 language pairs spoken in the Indian subcontinent.

C++ iBus

Input Method Engine (IME) enabling real-time translation into a target language while typing in the source language. Application use-case of bergamot-translator.

C++ CUDA CUBLAS Jupyter Python

Singular Value Decomposition implementation for CPU and GPU using Householder reflections and Givens Rotations. Implemented as part of Introduction to Parallel and Scientific Computing Course.

Python PyTorch SentencePiece

A PyTorch reimplementation of MaskGAN: Better Text Generation via Filling in the _______ William Fedus, Ian Goodfellow, Andrew M. Dai.

Python Flask

Experimental Python code written to evaluate and aid HTML translations feature to bergamot-translator. Implements a google-translate web page feature to provide fast visual feedback on the upstream library's HTML translation capabilities.

Python PyTorch OpenCV

Using representations learned by a Convolutional Neural Network trained to classify handwritten words, built a content-matching system for handwritten documents. Worked with text-detection systems followed bipartite matching and IR noise removal techniques to match regions of one document to another.

Python PyTorch faiss

Stacked LSTM encoder and decoder with a bottleneck at the sentence encoding in order to obtain language-agnostic sentence representations. Implementation in PyTorch supports multi-node multi-GPU parallelism.

Django sklearn pandas

Low-Rank Approximation through SVD++ providing recommendations through a web application built in Django.

C++ yacc LLVM

Undergraduate course project. Parsing, interpreting, and codegen via LLVM. Implemented parsing using YACC. Built an interpreter. Further extended the implementation to emit LLVM IR to be compiled into machine code.

Python SQL

Mini SQL engine that processes SQL queries. Implemented a parser that parses a subset of SQL using pyparsing to an abstract syntax tree (AST). Query executed as an evaluation of parsed AST.

Volunteering

University of Edinburgh, GPU Cluster Administrator, Edinburgh, United Kingdom
Sep 2020 - Present

Helping out in maintaining Valhalla GPU cluster of about 20 machines managed by the group to run machine learning experiments. Responsibilities include replacing hardware like GPUs and hard disks, provisioning software updates to ensure the smooth running of the cluster.

The Wee Spoke Hub, Bicycle Mechanic, Edinburgh, United Kingdom
Sep 2020 - Present

Fixing donated bicycles that are in need of repair before they can be reused.

International Institute of Information Technology, Cluster System Administrator, Hyderabad, India
Jan 2017 - Dec 2019

Configured and deployed SLURM and supporting ecosystem on lab cluster of 20 High Performance Computing (HPC) nodes. Maintained software capable of running deep learning experiments on the machines, supporting a community of ~100 research scholars. Also volunteered for moderation duties for the larger university HPC cluster.