We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what's happening.
There were two presenters. To set the context for the audience, Steve Loughran, member of technical staff at Hortonworks, delivered an extemporaneous high-level overview of Apache Slider within Apache Hadoop YARN framework.
Running Dockerized Applications on YARN via Slider
Yu "Thomas" Liu gave a demo of his hot-off-the-IDE docker deployment work. This allows you to declare one or more docker containers for each component type: named docker images hosted on docker hub. The slider "Application Instance" configuration file resources.json is then configured with the number of instances each slider component to deploy, along with the YARN requirements. When the instance is started, Slider instructs YARN to create containers, runs the python Slider Agent in each container in which the agent builds the docker commands to deploy the docker images, sets the command line and propagates values such as port assignments across docker containers.
While this is work in progress, it does show how YARN clusters can host complex applications -and that building Hadoop applications via docker is something that will be possible in future. Keep an eye on SLIDER-780 for this work as well as the YARN-2466 "Yarn launched Docker Containers" work coming out of Altiscale. SLIDER-780 doesn't depend upon that work-but some members of the Altiscale team came to the meetup to ask hard questions.
Presentation slides for this topic are here.
And you can hear the recording
KOYA: Apache Kafka on Apache Hadoop YARN via Slider
In the second talk, Thomas Weise and Siyuan Hua of DataTorrent showed how they used Slider to run Apache Kafka on YARN. Essentially, the DataTorrent analytics platform runs atop YARN; deploying Kafka on YARN via Slider ensures the entire application runs this way, eliminating the need for dedicated Kafka hosts. Following the talk, the demo illustrated not just how Slider could start a Kafka cluster, but how it would detect failures and react and remedy by attempting to re-instantiate the instance.
The Kafka-on-Slider work has been ongoing for a while, with Thomas and Siyuan active on the Slider mailing list. Because restarting Kafka on a new node is expensive, Slider's optional "strict" parameter policy ensures node affinity is observed. That is, Slider requests YARN to allocate containers on the same machines on which Kafka is hosted.
In SLIDER-799 we're building something more sophisticated, in which Slider ascertains when to convert a container request from a specific machine to elsewhere in the rack, thus allowing Kafka on YARN, "KOYA, and other applications to have placement relaxation policies on a component-by-component basis.
KOYA has some other needs too, such as per-instance configuration, which again will keep us busy in the future. What is encouraging is how people are using Slider and how they benefit from it. For anyone else who wants to run Kafka on YARN, the source is on github at DataTorrent/koya
Finally, the Apache Slider (incubating) project is in the voting process for the 0.70-incubating release. This release adds lots of improvements and fixes to the 0.60-incubating version. Stay tuned for its release.
Presentation slides for this topic are here.