This module was written from scratch and efficiently detects constituent words from a given agglutinated word. The identification part is done agnostic to the language, while splitting and joining are currently rule based.
The module is hosted at libindic/sandhi-splitter.
Documentation with examples of training, testing and using the API provided is currently hosted using gh-pages, and is available here.
Next proposed addition was enhancing the spellchecker with sandhi-splitter.
The implementation inherits from the existing Malayalam spellchecker, which was improved to handle inflections by Balasankar C, and with sandhi-splitter improves the results further.
The pull request corresponding to this can be found here.
WSME was used to provide REST services of libindic modules. This is interfaced as a modular application through flask blueprints, on the main libindic application.