{"id":43679,"date":"2016-09-19T21:04:43","date_gmt":"2016-09-19T18:04:43","guid":{"rendered":"https:\/\/www.altoros.com\/blog\/?p=43679"},"modified":"2019-05-30T02:37:50","modified_gmt":"2019-05-29T23:37:50","slug":"using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix","status":"publish","type":"post","link":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/","title":{"rendered":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/stream-processing-apache-spark-kafka-on-ibm-bluemix.png\" alt=\"stream-processing-apache-spark-kafka-on-ibm-bluemix\" width=\"220\" style=\"margin: 0px 0px 10px 20px;\" class=\"alignright size-full wp-image-43680\" \/><\/p>\n<p>Some of the services provided by IBM Bluemix enable you to significantly speed up the implementation of the IoT use cases. With Bluemix, you are not required to deploy and configure Hadoop, Apache Kafka, or other big data tools. It allows you to launch service instances in a few clicks.<\/p>\n<p>In this tutorial, we explain how to integrate and use the most popular open-source tools for stream processing. We explore <a href=\"https:\/\/www.ibm.com\/software\/products\/en\/ibm-message-hub\" target=\"_blank\" rel=\"noopener noreferrer\">IBM Message Hub<\/a> (for collecting streams), the <a href=\"https:\/\/spark.apache.org\" target=\"_blank\" rel=\"noopener noreferrer\">Apache Spark<\/a> service (for processing events), and <a href=\"https:\/\/cloud.ibm.com\/catalog\/services\/object-storage\/\" target=\"_blank\" rel=\"noopener noreferrer\">IBM Object Storage<\/a> (for storing results).<\/p>\n<p>&nbsp;<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Sample_scenario\" >Sample scenario<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Apache_Spark_on_Bluemix\" >Apache Spark on Bluemix<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Integrating_Spark_with_Message_Hub\" >Integrating Spark with Message Hub<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Integrating_Spark_Streaming_with_Bluemix_Object_Storage\" >Integrating Spark Streaming with Bluemix Object Storage<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Generating_and_processing_events\" >Generating and processing events<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#Further_reading\" >Further reading<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Sample_scenario\"><\/span>Sample scenario<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Below is the scheme of a stream processing flow that we will implement in this post.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/stream-processing-with-ibm-bluemix-message-hab-apache-kafka-spark.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/stream-processing-with-ibm-bluemix-message-hab-apache-kafka-spark-1024x539.png\" alt=\"\" width=\"640\" class=\"aligncenter size-large wp-image-43681\" \/><\/a><small>A sample stream processing workflow<\/small><\/center><\/p>\n<p>Event Producer generates sample messages, which then go to Message Hub. Spark jobs pick them up from Message Hub, process, and store in the Object Storage files.<\/p>\n<p><a href=\"https:\/\/gist.github.com\/idrabenia\/29fba930170de438ee8842c5182f6f91\" target=\"_blank\" rel=\"noopener noreferrer\">Here is the code<\/a> of our message producer that sends real-time data to Bluemix Message Hub.<\/p>\n<p>IBM Message Hub for Bluemix supports two message queuing systems: Apache Kafka and IBM MQ Light. In our sample flow, we use Apache Kafka as a tool for big data stream processing.<\/p>\n<p>For configuring a connection to IBM Message Hub, check out <a href=\"https:\/\/github.com\/ibm-messaging\/message-hub-samples\/tree\/master\/java\/message-hub-kafka-ssl\/src\/com\/messagehub\/samples\" target=\"_blank\" rel=\"noopener noreferrer\">this sample<\/a> and <a href=\"https:\/\/cloud.ibm.com\/docs\/services\/MessageHub\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">documentation<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Apache_Spark_on_Bluemix\"><\/span>Apache Spark on Bluemix<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>To enable support for Spark Streaming, you need to include <a href=\"https:\/\/github.com\/ibm-messaging\/iot-messgehub-spark-samples\/releases\/download\/v0.1\/streaming-kafka.jar\" target=\"_blank\" rel=\"noopener noreferrer\">this library<\/a> that contains the implementation of Spark Streaming into Spark dependencies.<\/p>\n<p>Now, you can use StreamingContext with the following code:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">val ssc = new StreamingContext(sc, Seconds(2))<\/pre>\n<p><\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix.png\" alt=\"apache-spark-on-ibm-bluemix\" width=\"640\" class=\"aligncenter size-full wp-image-43682\" \/><\/a><small>Developing a Spark job<\/small><\/center><\/p>\n<p>For debugging our Spark job, we used Jupyter, a tool provided by the Bluemix Spark service for interactive job development. First, we develop some functionality in Jupyter and then copy it to our job that will further be submitted to the Apache Spark service.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png\" alt=\"apache-spark-on-ibm-bluemix-jupyter\" width=\"640\" class=\"aligncenter size-full wp-image-43683\" \/><\/a><small>Debugging a Spark job<\/small><\/center><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Integrating_Spark_with_Message_Hub\"><\/span>Integrating Spark with Message Hub<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>To integrate Apache Spark with Message Hub:<\/p>\n<ol>\n<li>Include <a href=\"https:\/\/github.com\/ibm-messaging\/event-streams-samples\/blob\/master\/java\/message-hub-kafka-ssl\/lib-message-hub\/messagehub.login-1.0.0.jar\" target=\"_blank\" rel=\"noopener noreferrer\">this additional IBM-specific library<\/a> into your Spark job configuration.<\/li>\n<li>In your Spark job, provide the following configuration:<\/li>\n<\/ol>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">val kafkaProps = new MessageHubConfig\r\n\r\nkafkaProps.setConfig(&quot;bootstrap.servers&quot;, &quot;kafka01-prod01.messagehub.services.us-south.bluemix.net:9093&quot;)\r\nkafkaProps.setConfig(&quot;kafka.user.name&quot;, &quot;XXXXXXXXXXXXXXXXXX&quot;)\r\nkafkaProps.setConfig(&quot;kafka.user.password&quot;, &quot;**************&quot;)\r\nkafkaProps.setConfig(&quot;kafka.topic&quot;, &quot;mytopic&quot;)\r\nkafkaProps.setConfig(&quot;api_key&quot;, &quot;*******************************&quot;)\r\nkafkaProps.setConfig(&quot;kafka_rest_url&quot;, &quot;https:\/\/kafka-rest-prod01.messagehub.services.us-south.bluemix.net:443&quot;)\r\n\r\nkafkaProps.createConfiguration()\r\n\r\nval sc = new SparkContext()\r\nval ssc = new StreamingContext(sc, Seconds(2))<\/pre>\n<p>You can find all the credentials needed for running Kafka in the Credentials section of the Message Hub service in the Bluemix console.<\/p>\n<p>Now, you should be able to connect to Kafka and receive new events in real time.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">val stream = ssc.createKafkaStream&#x5B;String, String, StringDeserializer, StringDeserializer](\r\n kafkaProps,\r\n List(kafkaProps.getConfig(&quot;kafka.topic&quot;))\r\n)\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Integrating_Spark_Streaming_with_Bluemix_Object_Storage\"><\/span>Integrating Spark Streaming with Bluemix Object Storage<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>You might want to check out <a href=\"https:\/\/www.altoros.com\/blog\/using-ibm-analytics-for-apache-spark-in-java-scala-apps-on-bluemix\/\" target=\"_blank\" rel=\"noopener noreferrer\">this post<\/a> that gives you details on how to integrate the Spark service with Object Storage. To connect to Object Storage, you need to provide the following configuration in your Spark job:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">val pfx = &quot;fs.swift.service.&quot; + name\r\n\r\nval conf = sc.getConf\r\nconf.set(pfx + &quot;.auth.url&quot;, &quot;https:\/\/identity.open.softlayer.com&quot;)\r\nconf.set(pfx + &quot;.tenant&quot;, &quot;sf56-d54664602866ee-20565106c03e&quot;)\r\nconf.set(pfx + &quot;.username&quot;, &quot;Admin_58ad00f71fbcbebe819624b6d70df9ec6a494887&quot;)\r\nconf.set(pfx + &quot;.auth.endpoint.prefix&quot;, &quot;endpoints&quot;)\r\nconf.set(pfx + &quot;.password&quot;, &quot;************&quot;)\r\nconf.set(pfx + &quot;.apikey&quot;, &quot;************&quot;)\r\nconf.set(pfx + &quot;.region&quot;, &quot;dallas&quot;)\r\nconf.set(pfx + &quot;.hostname&quot;, &quot;notebooks&quot;)<\/pre>\n<p>In this configuration, <code style=\"color: #222222; background-color: #e6e6e6; padding: 1px 2px;\">name<\/code> is the job name. You can find the corresponding configuration settings in the Credentials section of your Object Storage service. Also, keep in mind that you need to pass the <code style=\"color: #222222; background-color: #e6e6e6; padding: 1px 2px;\">user_id<\/code> property from the Credentials section to the <code style=\"color: #222222; background-color: #e6e6e6; padding: 1px 2px;\">username<\/code> option.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Generating_and_processing_events\"><\/span>Generating and processing events<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The event producer code (we copied it with small modifications from IBM Message Hub samples):<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">public void run() {\r\n   logger.log(Level.INFO, ProducerRunnable.class.toString() + &quot; is starting.&quot;);\r\n\r\n    while (!closing) {\r\n        String fieldName = &quot;records&quot;;\r\n        \/\/ Push a message into the list to be sent.\r\n        MessageList list = new MessageList();\r\n        list.push(&quot;This is a test message&quot; + producedMessages);\r\n\r\n        try {\r\n            \/\/ Create a producer record which will be sent\r\n            \/\/ to the Message Hub service, providing the topic\r\n            \/\/ name, field name and message. The field name and\r\n            \/\/ message are converted to UTF-8.\r\n            ProducerRecord&lt;byte&#x5B;], byte&#x5B;]&gt; record = new ProducerRecord&lt;byte&#x5B;], byte&#x5B;]&gt;(\r\n                topic,\r\n                fieldName.getBytes(&quot;UTF-8&quot;),\r\n                list.toString().getBytes(&quot;UTF-8&quot;));\r\n\r\n            \/\/ Synchronously wait for a response from Message Hub \/ Kafka.\r\n            RecordMetadata m = kafkaProducer.send(record).get();\r\n            producedMessages++;\r\n\r\n            logger.log(Level.INFO, &quot;Message produced, offset: &quot; + m.offset());\r\n\r\n            Thread.sleep(1000);\r\n        } catch (final Exception e) {\r\n            e.printStackTrace();\r\n            shutdown();\r\n            \/\/ Consumer will hang forever, so exit program.\r\n            System.exit(-1);\r\n        }\r\n    }\r\n\r\n    logger.log(Level.INFO, ProducerRunnable.class.toString() + &quot; is shutting down.&quot;);\r\n}<\/pre>\n<p>The code for the stream processing job:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">object StreamProcessor {\r\n  \r\n  def main(args: Array&#x5B;String]) {\r\n    val configureKafka: MessageHubConfig = configureKafka\r\n\r\n    val sc = new SparkContext()\r\n    configureObjectStore(sc, &quot;test3&quot;)\r\n\r\n    val ssc = new StreamingContext(sc, Seconds(2))\r\n\r\n    val stream = ssc.createKafkaStream&#x5B;String, String, \r\n  StringDeserializer, StringDeserializer](\r\n      configureKafka,\r\n      List(configureKafka.getConfig(&quot;kafka.topic&quot;))\r\n    )\r\n\r\n    stream.saveAsTextFiles(&quot;swift:\/\/notebook.test3\/result.csv&quot;)\r\n\r\n    ssc.start()\r\n    ssc.awaitTermination()\r\n  }\r\n\r\n  def kafkaProps: MessageHubConfig = {\r\n    val kafkaProps = new MessageHubConfig\r\n\r\n    kafkaProps.setConfig(&quot;bootstrap.servers&quot;, &quot;kafka01-prod01.messagehub.services.us-south.bluemix.net:9093&quot;)\r\n    kafkaProps.setConfig(&quot;kafka.user.name&quot;, &quot;*******************&quot;)\r\n    kafkaProps.setConfig(&quot;kafka.user.password&quot;, &quot;********************&quot;)\r\n    kafkaProps.setConfig(&quot;kafka.topic&quot;, &quot;mytopic&quot;)\r\n    kafkaProps.setConfig(&quot;api_key&quot;, &quot;***********************&quot;)\r\n    kafkaProps.setConfig(&quot;kafka_rest_url&quot;, &quot;https:\/\/kafka-rest-prod01.messagehub.services.us-south.bluemix.net:443&quot;)\r\n\r\n    kafkaProps.createConfiguration()\r\n    kafkaProps\r\n  }\r\n\r\n  def configureObjectStore(sc: SparkContext, name: String): Unit = {\r\n    val pfx = &quot;fs.swift.service.&quot; + name\r\n\r\n    val conf = sc.getConf\r\n    conf.set(pfx + &quot;.auth.url&quot;, &quot;https:\/\/identity.open.softlayer.com&quot;)\r\n    conf.set(pfx + &quot;.tenant&quot;, &quot;sf56-d54664602866ee-20565106c03e&quot;)\r\n    conf.set(pfx + &quot;.username&quot;, &quot;Admin_58ad00f71fbcbebe819624b6d70df9ec6a494887&quot;)\r\n    conf.set(pfx + &quot;.auth.endpoint.prefix&quot;, &quot;endpoints&quot;)\r\n    conf.set(pfx + &quot;.password&quot;, &quot;****************&quot;)\r\n    conf.set(pfx + &quot;.apikey&quot;, &quot;****************&quot;)\r\n    conf.set(pfx + &quot;.region&quot;, &quot;dallas&quot;)\r\n    conf.set(pfx + &quot;.hostname&quot;, &quot;notebooks&quot;)\r\n  }\r\n\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>During the development, we found out that Zookeeper\u2014the regular tool for managing Kafka\u2014was replaced by IBM with a special REST API, which makes the Message Hub API incompatible with many software created for typical Kafka deployments.<\/p>\n<p>Also, the authentication and authorization process to access Message Hub is not trivial: it includes multiple steps and works with files somewhat unsuitable for a cloud environment.<\/p>\n<p>We also found out that the integration of the Spark service with other Bluemix services is quite challenging. For integrating Spark with Message Hub, you need an additional dependency from IBM.<\/p>\n<p>In general, usage of IBM Bluemix allows you to save about one man-month of work, since we avoid installation, configuration, and integration of multiple big data tools. However, development could be even more simplified by providing a mechanism for easier integration of different Bluemix services with each other. Let\u2019s hope that IBM will implement it some day.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Further_reading\"><\/span>Further reading<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/processing-data-on-ibm-bluemix-streaming-analytics-apache-spark-and-biginsights\/\">Processing Data on IBM Bluemix: Streaming Analytics, Apache Spark, and BigInsights<\/a><\/li>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/iot-platform-on-bluemix-experimenting-with-ibm-watson-and-gobot\/\">An IoT Platform on Bluemix: Experimenting with IBM Watson and Gobot<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<hr\/>\n<p><center><small>The post was written by Ilya Drabenia; edited and published by Victoria Fedzkovich and <a href=\"https:\/\/www.altoros.com\/blog\/author\/alex\/\">Alex Khizhniak<\/a>.<\/small><\/center><\/p>\n","protected":false},"excerpt":{"rendered":"<\/p>\n<p>Some of the services provided by IBM Bluemix enable you to significantly speed up the implementation of the IoT use cases. With Bluemix, you are not required to deploy and configure Hadoop, Apache Kafka, or other big data tools. It allows you to launch service instances in a few clicks.<\/p>\n<p>In [&#8230;]<\/p>\n","protected":false},"author":71,"featured_media":43683,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","_links_to":"","_links_to_target":""},"categories":[214],"tags":[873,187,117],"class_list":["post-43679","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-cloud-native","tag-ibm-bluemix","tag-iot"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros<\/title>\n<meta name=\"description\" content=\"One of the key points in Industry 4.0 is data stream processing\u2014for equipment fault monitoring, predictive maintenance, real-time diagnostics, etc.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros\" \/>\n<meta property=\"og:description\" content=\"Some of the services provided by IBM Bluemix enable you to significantly speed up the implementation of the IoT use cases. With Bluemix, you are not required to deploy and configure Hadoop, Apache Kafka, or other big data tools. It allows you to launch service instances in a few clicks. In [...]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/\" \/>\n<meta property=\"og:site_name\" content=\"Altoros\" \/>\n<meta property=\"article:published_time\" content=\"2016-09-19T18:04:43+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-05-29T23:37:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1999\" \/>\n\t<meta property=\"og:image:height\" content=\"1338\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Ilya Drabenia\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ilya Drabenia\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/\"},\"author\":{\"name\":\"Ilya Drabenia\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/#\\\/schema\\\/person\\\/48c2eaf6d86abc9fa945bc3860fa2dc2\"},\"headline\":\"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix\",\"datePublished\":\"2016-09-19T18:04:43+00:00\",\"dateModified\":\"2019-05-29T23:37:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/\"},\"wordCount\":1298,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/05\\\/apache-spark-on-ibm-bluemix-jupyter.png\",\"keywords\":[\"Cloud-Native\",\"IBM Bluemix\",\"IoT\"],\"articleSection\":[\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/\",\"url\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/\",\"name\":\"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/05\\\/apache-spark-on-ibm-bluemix-jupyter.png\",\"datePublished\":\"2016-09-19T18:04:43+00:00\",\"dateModified\":\"2019-05-29T23:37:50+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/#\\\/schema\\\/person\\\/48c2eaf6d86abc9fa945bc3860fa2dc2\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/05\\\/apache-spark-on-ibm-bluemix-jupyter.png\",\"contentUrl\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/05\\\/apache-spark-on-ibm-bluemix-jupyter.png\",\"width\":1999,\"height\":1338},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/\",\"name\":\"Altoros\",\"description\":\"Insight\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/#\\\/schema\\\/person\\\/48c2eaf6d86abc9fa945bc3860fa2dc2\",\"name\":\"Ilya Drabenia\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2017\\\/01\\\/1108467-150x150.jpg\",\"url\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2017\\\/01\\\/1108467-150x150.jpg\",\"contentUrl\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/wp-content\\\/uploads\\\/2017\\\/01\\\/1108467-150x150.jpg\",\"caption\":\"Ilya Drabenia\"},\"description\":\"Ilya Drabenia is a Technical Lead at Altoros. He has broad experience in building software architectures, including design and development of complex solutions. Ilya is passionate about microservices, domain-driven design, as well as scalable and parallel algorithms. He also holds an MSc degree in Computer Science. See his profile on GitHub.\",\"url\":\"https:\\\/\\\/www.altoros.com\\\/blog\\\/author\\\/ilya-drabenia\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros","description":"One of the key points in Industry 4.0 is data stream processing\u2014for equipment fault monitoring, predictive maintenance, real-time diagnostics, etc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/","og_locale":"en_US","og_type":"article","og_title":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros","og_description":"Some of the services provided by IBM Bluemix enable you to significantly speed up the implementation of the IoT use cases. With Bluemix, you are not required to deploy and configure Hadoop, Apache Kafka, or other big data tools. It allows you to launch service instances in a few clicks. In [...]","og_url":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/","og_site_name":"Altoros","article_published_time":"2016-09-19T18:04:43+00:00","article_modified_time":"2019-05-29T23:37:50+00:00","og_image":[{"width":1999,"height":1338,"url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png","type":"image\/png"}],"author":"Ilya Drabenia","twitter_misc":{"Written by":"Ilya Drabenia","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#article","isPartOf":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/"},"author":{"name":"Ilya Drabenia","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/48c2eaf6d86abc9fa945bc3860fa2dc2"},"headline":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix","datePublished":"2016-09-19T18:04:43+00:00","dateModified":"2019-05-29T23:37:50+00:00","mainEntityOfPage":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/"},"wordCount":1298,"commentCount":0,"image":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#primaryimage"},"thumbnailUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png","keywords":["Cloud-Native","IBM Bluemix","IoT"],"articleSection":["Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/","url":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/","name":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix | Altoros","isPartOf":{"@id":"https:\/\/www.altoros.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#primaryimage"},"image":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#primaryimage"},"thumbnailUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png","datePublished":"2016-09-19T18:04:43+00:00","dateModified":"2019-05-29T23:37:50+00:00","author":{"@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/48c2eaf6d86abc9fa945bc3860fa2dc2"},"breadcrumb":{"@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#primaryimage","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/apache-spark-on-ibm-bluemix-jupyter.png","width":1999,"height":1338},{"@type":"BreadcrumbList","@id":"https:\/\/www.altoros.com\/blog\/using-spark-streaming-apache-kafka-and-object-storage-for-stream-processing-on-bluemix\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.altoros.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Using Spark Streaming, Apache Kafka, and Object Storage on IBM Bluemix"}]},{"@type":"WebSite","@id":"https:\/\/www.altoros.com\/blog\/#website","url":"https:\/\/www.altoros.com\/blog\/","name":"Altoros","description":"Insight","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.altoros.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/48c2eaf6d86abc9fa945bc3860fa2dc2","name":"Ilya Drabenia","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/01\/1108467-150x150.jpg","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/01\/1108467-150x150.jpg","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/01\/1108467-150x150.jpg","caption":"Ilya Drabenia"},"description":"Ilya Drabenia is a Technical Lead at Altoros. He has broad experience in building software architectures, including design and development of complex solutions. Ilya is passionate about microservices, domain-driven design, as well as scalable and parallel algorithms. He also holds an MSc degree in Computer Science. See his profile on GitHub.","url":"https:\/\/www.altoros.com\/blog\/author\/ilya-drabenia\/"}]}},"_links":{"self":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/43679","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/users\/71"}],"replies":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/comments?post=43679"}],"version-history":[{"count":18,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/43679\/revisions"}],"predecessor-version":[{"id":43701,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/43679\/revisions\/43701"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media\/43683"}],"wp:attachment":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media?parent=43679"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/categories?post=43679"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/tags?post=43679"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}