{"id":39211,"date":"2017-05-19T19:23:56","date_gmt":"2017-05-19T16:23:56","guid":{"rendered":"https:\/\/www.altoros.com\/blog\/?p=39211"},"modified":"2018-11-29T14:37:22","modified_gmt":"2018-11-29T11:37:22","slug":"deep-q-networks-and-practical-reinforcement-learning-with-tensorflow","status":"publish","type":"post","link":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/","title":{"rendered":"Deep Q-networks and Practical Reinforcement Learning with TensorFlow"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Making_reinforcement_learning_work\" >Making reinforcement learning work<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Deep_Q-learning\" >Deep Q-learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Asynchronous_Advantage_Actor-Critic\" >Asynchronous Advantage Actor-Critic<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Want_details_Watch_the_video\" >Want details? Watch the video!<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Related_session\" >Related session<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#Further_reading\" >Further reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#About_the_expert\" >About the expert<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Making_reinforcement_learning_work\"><\/span>Making reinforcement learning work<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>What are the things-to-know while enabling reinforcement learning with TensorFlow? <a href=\"https:\/\/www.linkedin.com\/in\/illia-polosukhin-77b6538\/\" target=\"_blank\">Illia Polosukhin<\/a>, a co-founder of XIX.ai, provided certain answers to this question, as well as delivered some practical insights at <a href=\"https:\/\/www.altoros.com\/blog\/tag\/tensorbeat\/\">TensorBeat 2017<\/a>.<\/p>\n<p>As Illia puts it, one doesn\u2019t actually have to train data as part of <a href=\"https:\/\/www.altoros.com\/blog\/what-is-behind-deep-reinforcement-learning-and-transfer-learning-with-tensorflow\/\">reinforcement learning<\/a>, but rather drive different types of observations form an environment, perform actions, etc.<\/p>\n<p>To do so, one can employ <a href=\"https:\/\/gym.openai.com\/\" target=\"_blank\">OpenAI Gym<\/a>, which is a toolkit for developing and comparing reinforcement learning algorithms. It features a library of environments for games, classical control systems, etc. to aid developers in creating algorithms of their own. Each of the environments has the same API. The library also enables users to compare \/ share the results.<\/p>\n<p>Illia demonstrated a sample code of an agent acting within an environment.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-acting-in-an-environment.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-acting-in-an-environment.png\" alt=\"reinforcement-learning-with-tensorflow-acting-in-an-environment\" width=\"640\" class=\"aligncenter size-full wp-image-39213\" \/><\/a><\/center><\/p>\n<p>He also showed the code behind an agent.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-a-random-agent.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-a-random-agent.png\" alt=\"reinforcement-learning-with-tensorflow-a-random-agent\" width=\"640\" class=\"aligncenter size-full wp-image-39214\" \/><\/a><\/center><\/p>\n<p>So, what makes it all work?<\/p>\n<p>The set of states and actions, coupled with the rules for transitioning from one state to another, make up the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Markov_decision_process\" target=\"_blank\">Markov decision process<\/a> (MDP). One episode of this process (e.g., a single game) produces a finite sequence of states, actions, and rewards.<\/p>\n<p>What one has to define is:<\/p>\n<ul>\n<li>a return (a total discounted reward)<\/li>\n<li>a policy: the agent\u2019s behavior (deterministic or stochastic)<\/li>\n<li>the expected return starting from a particular state (state-value function, action-value function)<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Deep_Q-learning\"><\/span>Deep Q-learning<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>One of the ways to approach reinforcement learning is <a href=\"https:\/\/www.altoros.com\/blog\/learning-game-control-strategies-with-deep-q-networks-and-tensorflow\/\">deep Q-learning<\/a>\u2014a model-free, off-policy technique. What it means is that there is no MDP approximation or learning inside the agent. Observations are stored into replay buffers and are further used as training data for the model. Being off-policy ensures that an optimal learning policy is independent of the agent\u2019s actions.<\/p>\n<p>Illia then demonstrated what the Q-network code looks likes.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-q-network-sample-code.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-q-network-sample-code.png\" alt=\"reinforcement-learning-with-tensorflow-q-network-sample-code\" width=\"640\" class=\"aligncenter size-full wp-image-39215\" \/><\/a><\/center><\/p>\n<p>As well as how to run optimization.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-optimization-of-a-q-network.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-optimization-of-a-q-network.png\" alt=\"reinforcement-learning-with-tensorflow-optimization-of-a-q-network\" width=\"640\" class=\"aligncenter size-full wp-image-39216\" \/><\/a><\/center><\/p>\n<p>More examples can be found in <a href=\"https:\/\/github.com\/ilblackdragon\/tensorflow-rl\/blob\/master\/examples\/atari-rl.py\" target=\"_blank\">this GitHub repo<\/a>.<\/p>\n<p>As one of the tricks at hand when training a TensorFlow model, <a href=\"https:\/\/github.com\/tensorflow\/tensorflow\/blob\/master\/tensorflow\/python\/training\/monitored_session.py\" target=\"_blank\">MonitoredSession<\/a> can be employed for:<\/p>\n<ul>\n<li>handling pitfalls of distributed training<\/li>\n<li>saving and restoring checkpoints<\/li>\n<li>injecting computation into TensorFlow training loop via hooks<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Asynchronous_Advantage_Actor-Critic\"><\/span>Asynchronous Advantage Actor-Critic<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>To enhance reinforcement learning, the <a href=\"https:\/\/arxiv.org\/pdf\/1602.01783v1.pdf\" target=\"_blank\">Asynchronous Advantage Actor-Critic<\/a> (A3C) algorithm can be used. In contrast to a deep Q-learning network, it makes use of multiple agents represented by multiple neural networks, which interact with multiple environments. Each of the agents interacts with its own copy of the environment and is independent of the experience of the other agents.<\/p>\n<p>Furthermore, this algorithm allows for estimating both a value function and a policy (a set of action probability outputs). The agent uses the value estimate (the critic) to update the policy (the actor) more intelligently than traditional policy gradient methods. Finally, one can estimate how different the output is from the expected one.<\/p>\n<p><center><a href=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-asynchronous-advantage-actor-critic-algorithm.png\"><img decoding=\"async\" src=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2018\/11\/reinforcement-learning-with-tensorflow-asynchronous-advantage-actor-critic-algorithm.png\" alt=\"reinforcement-learning-with-tensorflow-asynchronous-advantage-actor-critic-algorithm\" width=\"640\" class=\"aligncenter size-full wp-image-39217\" \/><\/a><\/center><\/p>\n<p>All the above mentioned can be applied in such spheres as robotics, finance, industrial optimization, and predictive assistance.<\/p>\n<p>Join <a href=\"https:\/\/www.meetup.com\/TensorFlow\/\" target=\"_blank\">our group<\/a> to stay tuned with the upcoming events.<\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Want_details_Watch_the_video\"><\/span>Want details? Watch the video!<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><center><script src=\"https:\/\/fast.wistia.com\/embed\/medias\/a4sm9h4ea4.jsonp\" async><\/script><script src=\"https:\/\/fast.wistia.com\/assets\/external\/E-v1.js\" async><\/script><\/p>\n<div class=\"wistia_embed wistia_async_a4sm9h4ea4\" style=\"height:360px;width:640px\">&nbsp;<\/div>\n<p><\/center><\/p>\n<p>&nbsp;<br \/>\n<small>You can also check out the full presentation by Illia Polosukhin below.<\/small><\/p>\n<p><center><iframe loading=\"lazy\" src=\"\/\/www.slideshare.net\/slideshow\/embed_code\/key\/6U40U5C1zAkP57\" width=\"595\" height=\"485\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen><\/iframe><\/center><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Related_session\"><\/span>Related session<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><small>At another <a href=\"https:\/\/www.meetup.com\/ru-RU\/TensorFlow-London\/events\/254471478\/\" target=\"_blank\">TensorFlow meetup<\/a> in London, <a href=\"https:\/\/www.linkedin.com\/in\/leonardo-de-marchi\/\" target=\"_blank\">Leonardo De Marchi<\/a> (Lead Data Scientist at Badoo) also shared how to apply reinforcement learning within the gaming industry.<\/small><\/p>\n<p><center><iframe loading=\"lazy\" width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/xLYIeeWTJ0U\" frameborder=\"0\" allow=\"autoplay; encrypted-media\" allowfullscreen><\/iframe><\/center><\/p>\n<p><small>Below, you will find the slides by Leonardo De Marchi.<\/small><\/p>\n<p><center><iframe loading=\"lazy\" src=\"\/\/www.slideshare.net\/slideshow\/embed_code\/key\/o9okMzZmtx4fT5\" width=\"595\" height=\"485\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe><\/center><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Further_reading\"><\/span>Further reading<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/what-is-behind-deep-reinforcement-learning-and-transfer-learning-with-tensorflow\/\">What Is Behind Deep Reinforcement Learning and Transfer Learning with TensorFlow?<\/a><\/li>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/learning-game-control-strategies-with-deep-q-networks-and-tensorflow\/\">Learning Game Control Strategies with Deep Q-Networks and TensorFlow<\/a><\/li>\n<li><a href=\"https:\/\/www.altoros.com\/blog\/tensorflow-in-action-tensorboard-training-a-model-and-deep-q-learning\/\">TensorFlow in Action: TensorBoard, Training a Model, and Deep Q-learning<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"About_the_expert\"><\/span>About the expert<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><small><a href=\"https:\/\/www.linkedin.com\/in\/illia-polosukhin-77b6538\/\" target=\"_blank\">Illia Polosukhin<\/a> is a chief scientist and a co-founder at XIX.ai. Prior to that, he worked as an engineering manager at Google. Illia is passionate about all things artificial intelligence and machine learning. He has gained master&#8217;s degree in Applied Math and Computer Science from Kharkiv Polytechnic Institute. You can check out <a href=\"https:\/\/github.com\/ilblackdragon\/\" target=\"_blank\">his GitHub profile<\/a>.<\/small><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Making reinforcement learning work<\/p>\n<p>What are the things-to-know while enabling reinforcement learning with TensorFlow? Illia Polosukhin, a co-founder of XIX.ai, provided certain answers to this question, as well as delivered some practical insights at TensorBeat 2017.<\/p>\n<p>As Illia puts it, one doesn\u2019t actually have to train data as part of reinforcement learning, [&#8230;]<\/p>\n","protected":false},"author":3,"featured_media":39223,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","_links_to":"","_links_to_target":""},"categories":[214],"tags":[748,916,749],"class_list":["post-39211","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-machine-learning","tag-tensorbeat","tag-tensorflow"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros<\/title>\n<meta name=\"description\" content=\"Learn how to handle pitfalls of distributed learning, boost performance across multiple environments, and what toolkits come helpful on the way.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros\" \/>\n<meta property=\"og:description\" content=\"Making reinforcement learning work What are the things-to-know while enabling reinforcement learning with TensorFlow? Illia Polosukhin, a co-founder of XIX.ai, provided certain answers to this question, as well as delivered some practical insights at TensorBeat 2017. As Illia puts it, one doesn\u2019t actually have to train data as part of reinforcement learning, [...]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/\" \/>\n<meta property=\"og:site_name\" content=\"Altoros\" \/>\n<meta property=\"article:published_time\" content=\"2017-05-19T16:23:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-29T11:37:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif\" \/>\n\t<meta property=\"og:image:width\" content=\"640\" \/>\n\t<meta property=\"og:image:height\" content=\"360\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"author\" content=\"Sophia Turol\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sophia Turol\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/\",\"url\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/\",\"name\":\"Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros\",\"isPartOf\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif\",\"datePublished\":\"2017-05-19T16:23:56+00:00\",\"dateModified\":\"2018-11-29T11:37:22+00:00\",\"author\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/58194952af19fe7b2b830846e077a58e\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage\",\"url\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif\",\"contentUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif\",\"width\":640,\"height\":360},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.altoros.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Q-networks and Practical Reinforcement Learning with TensorFlow\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#website\",\"url\":\"https:\/\/www.altoros.com\/blog\/\",\"name\":\"Altoros\",\"description\":\"Insight\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.altoros.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/58194952af19fe7b2b830846e077a58e\",\"name\":\"Sophia Turol\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/trello_card-96x96.jpg\",\"contentUrl\":\"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/trello_card-96x96.jpg\",\"caption\":\"Sophia Turol\"},\"description\":\"Sophia Turol is passionate about delivering well-structured articles that cater for picky technical audience. With 3+ years in technical writing and 5+ years in editorship, she enjoys collaboration with developers to create insightful, yet intelligible technical tutorials, overviews, and case studies. Sophie is enthusiastic about deep learning solutions\u2014TensorFlow in particular\u2014and PaaS systems, such as Cloud Foundry.\",\"url\":\"https:\/\/www.altoros.com\/blog\/author\/sophie-turol\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros","description":"Learn how to handle pitfalls of distributed learning, boost performance across multiple environments, and what toolkits come helpful on the way.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/","og_locale":"en_US","og_type":"article","og_title":"Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros","og_description":"Making reinforcement learning work What are the things-to-know while enabling reinforcement learning with TensorFlow? Illia Polosukhin, a co-founder of XIX.ai, provided certain answers to this question, as well as delivered some practical insights at TensorBeat 2017. As Illia puts it, one doesn\u2019t actually have to train data as part of reinforcement learning, [...]","og_url":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/","og_site_name":"Altoros","article_published_time":"2017-05-19T16:23:56+00:00","article_modified_time":"2018-11-29T11:37:22+00:00","og_image":[{"width":640,"height":360,"url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif","type":"image\/gif"}],"author":"Sophia Turol","twitter_misc":{"Written by":"Sophia Turol","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/","url":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/","name":"Deep Q-networks and Practical Reinforcement Learning with TensorFlow | Altoros","isPartOf":{"@id":"https:\/\/www.altoros.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage"},"image":{"@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage"},"thumbnailUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif","datePublished":"2017-05-19T16:23:56+00:00","dateModified":"2018-11-29T11:37:22+00:00","author":{"@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/58194952af19fe7b2b830846e077a58e"},"breadcrumb":{"@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#primaryimage","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2017\/05\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow.gif","width":640,"height":360},{"@type":"BreadcrumbList","@id":"https:\/\/www.altoros.com\/blog\/deep-q-networks-and-practical-reinforcement-learning-with-tensorflow\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.altoros.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Deep Q-networks and Practical Reinforcement Learning with TensorFlow"}]},{"@type":"WebSite","@id":"https:\/\/www.altoros.com\/blog\/#website","url":"https:\/\/www.altoros.com\/blog\/","name":"Altoros","description":"Insight","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.altoros.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/58194952af19fe7b2b830846e077a58e","name":"Sophia Turol","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.altoros.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/trello_card-96x96.jpg","contentUrl":"https:\/\/www.altoros.com\/blog\/wp-content\/uploads\/2019\/05\/trello_card-96x96.jpg","caption":"Sophia Turol"},"description":"Sophia Turol is passionate about delivering well-structured articles that cater for picky technical audience. With 3+ years in technical writing and 5+ years in editorship, she enjoys collaboration with developers to create insightful, yet intelligible technical tutorials, overviews, and case studies. Sophie is enthusiastic about deep learning solutions\u2014TensorFlow in particular\u2014and PaaS systems, such as Cloud Foundry.","url":"https:\/\/www.altoros.com\/blog\/author\/sophie-turol\/"}]}},"_links":{"self":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/39211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/comments?post=39211"}],"version-history":[{"count":16,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/39211\/revisions"}],"predecessor-version":[{"id":39653,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/posts\/39211\/revisions\/39653"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media\/39223"}],"wp:attachment":[{"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/media?parent=39211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/categories?post=39211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.altoros.com\/blog\/wp-json\/wp\/v2\/tags?post=39211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}