{"id":851,"date":"2019-03-06T20:21:23","date_gmt":"2019-03-06T20:21:23","guid":{"rendered":"http:\/\/janbosch.com\/blog\/?p=851"},"modified":"2019-03-06T20:21:32","modified_gmt":"2019-03-06T20:21:32","slug":"machine-deep-learning-experimentation-stage","status":"publish","type":"post","link":"https:\/\/janbosch.com\/blog\/index.php\/2019\/03\/06\/machine-deep-learning-experimentation-stage\/","title":{"rendered":"Machine &#038; Deep Learning: Experimentation Stage"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/janbosch.com\/blog\/wp-content\/uploads\/2019\/03\/algorithm-3859537_1920-1024x683.jpg\" alt=\"\" class=\"wp-image-852\" srcset=\"https:\/\/janbosch.com\/blog\/wp-content\/uploads\/2019\/03\/algorithm-3859537_1920-1024x683.jpg 1024w, https:\/\/janbosch.com\/blog\/wp-content\/uploads\/2019\/03\/algorithm-3859537_1920-300x200.jpg 300w, https:\/\/janbosch.com\/blog\/wp-content\/uploads\/2019\/03\/algorithm-3859537_1920-768x512.jpg 768w, https:\/\/janbosch.com\/blog\/wp-content\/uploads\/2019\/03\/algorithm-3859537_1920.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Photo by geralt on Pixabay<\/figcaption><\/figure>\n\n\n\n<p>This week I got the opportunity to speak at the <a href=\"https:\/\/www.chalmers.se\/en\/areas-of-advance\/ict\/events\/initiative-seminar-AI2019\/Pages\/default.aspx\">initiative seminar<\/a> organized by the Chalmers AI Research center (<a href=\"https:\/\/www.chalmers.se\/en\/centres\/chair\/Pages\/default.aspx\">CHAIR<\/a>). The key message in my <a href=\"https:\/\/chalmersuniversity.box.com\/s\/f1huwa2z4y03zh3jm1vz0aaxm3vhi5uj\">presentation<\/a> was that working with artificial intelligence (AI) and specifically machine &amp; deep learning (ML\/DL) constitutes a major software engineering challenge that is severely underestimated by companies that start to experiment with machine and deep learning. <br><\/p>\n\n\n\n<p>Although I have discussed some of the challenges in an <a href=\"https:\/\/janbosch.com\/blog\/index.php\/2018\/09\/01\/engineering-deep-learning-systems-is-hard\/\">earlier blog post<\/a>, we have continued to conduct research in this area and we have collected additional data concerning the specific challenges. In addition, we have developed a model that captures how companies typically evolve in their adoption of AI\/ML\/DL. As shown in the figure below, we show the steps that companies typically evolve through. In this and the upcoming posts, I intend to discuss the challenges associated specifically with each step. This is based on an article that recently was accepted for publication in the proceedings of the <a href=\"https:\/\/www.agilealliance.org\/xp2019\/\">XP 2019<\/a> conference. <br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/YS5bu78A4-_vX8_RjRrIns7TW_9bcFCvKGZjNBeVq6aFTlyVLFalV49aI1Gt4dqEcg1V1pypTXJsOC3hvX6xW0ks5Zm7lxhh-W8SjS6c02DQ59llb5gFI808qFrmviA0NjtH7rEs\" alt=\"\"\/><figcaption>How use of AI\/ML\/DL evolves in industry<\/figcaption><\/figure>\n\n\n\n<p>As the figure illustrates, the first step that most companies engage in is experimentation and prototyping. In this case, the work on machine &amp; deep learning models is conducted purely in-house and without any connection to the products and services that the company offers to its customers. The work with basically any ML\/DL approaches follows the process shown in the figure below. Basically, there are four stages, i.e. assemble datasets (or data pipes), create models, train &amp; evaluate and, finally, deployment. There are two iterative processes. The inner loop is concerned with the typical activity of creating a model, training it, evaluating it and then tweaking the model with the intent of increasing accuracy and reducing error rates. The outer loop illustrates the periodic or continuous retraining of models based on the most recent data and the subsequent continuous deployment of models into operation. <br><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/ofFbPz6TpYDm_qEpdfsLVhBJx9lgHNw1-aJ4PHZ3UPzeqPJzuE07t8E9m3CWQdTwH1I8DJ4G7ZvFtkl4Yw1YDwqK_3kz1QZRusaSFGRm8Y13KLt5S379zpQCY-iREFpi4fo5uzEU\" alt=\"\"\/><figcaption>The basic ML\/DL development process<\/figcaption><\/figure>\n\n\n\n<p>Our research shows that companies experience various challenges in each of the steps of the process and that these challenges depend on the evolution stage where the company finds itself. For companies in the experimentation and prototyping stage, I\u2019ll describe the key challenge in each process step.<br><\/p>\n\n\n\n<p>For the \u201cassemble datasets\u201d step (which in later stages becomes the data pipelines step), the very activity of assembling the right datasets for training and validation purposes often proves to be a significant challenge. Although all companies tend to drown in data, this data often has unclear semantics and the way it has been collected is often unclear, resulting in datasets that are not necessarily representative of the operational data that would be used during operations. As a data point: a company that I visited recently claimed that more than 90% of all effort in the data analytics team went to assembling datasets and setting up reliable data pipelines. Although easily underestimated, this is a major challenge.<br><\/p>\n\n\n\n<p>The \u201ccreate models\u201d step is concerned with creating ML\/DL models that perform well for the data that the problem domain is characterised by. As a well performing model is highly dependent on the characteristics of the input data, any issues during the previous step automatically affect the quality of the model. In addition, especially in this early stage, often companies experience a lack of talent with experience, exacerbating the situation.<br><\/p>\n\n\n\n<p>The \u201ctrain &amp; evaluate\u201d step typically struggles with the fact that establishing the problem specification and desired outcome as well as having datasets that capture a solid ground truth that can be used as a reference for training and evaluating models. As a consequence, it can prove to be difficult to determine which model is superior as well as whether any of the models is of sufficient accuracy.<br><\/p>\n\n\n\n<p>Due to the nature of this stage, that is no deployment mechanism yet. The challenges with setting up a deployment mechanism are discussed in future articles discussing the higher stages in the evolution model.<br><\/p>\n\n\n\n<p>Concluding, the first stage in adoption ML\/DL in your products, systems and solutions is concerned with experimentation and prototyping. During this stage, the predominant challenge is the establishment of datasets of sufficient quality as a basis for model creation, training and evaluation. These datasets need to be representative of the data that will, during operations, come through the data pipelines. Our research shows that companies struggle with data quality in this stage and the subsequent steps in the development process are negatively affected. So, get going with ML\/DL yesterday, but focus your energy where it counts: high-quality data sets.<br><\/p>\n\n\n\n<p><strong>Reference<\/strong>: Lucy Ellen Lwakatare, Aiswarya Raj, Jan Bosch, Helena Holmstr\u00f6m Olsson and Ivica Crnkovic, A taxonomy of software engineering challenges for machine learning systems: An empirical investigation, XP 2019 (forthcoming), 2019.<\/p>\n\n\n\n<p><em>To get more insights earlier, sign up for my newsletter at <\/em><a href=\"https:\/\/mailto:jan@janbosch.com\/\"><em>jan@janbosch.com<\/em><\/a><em> or follow me on<\/em><a href=\"https:\/\/janbosch.com\/blog\"> <em>janbosch.com\/blog<\/em><\/a><em>, LinkedIn (<\/em><a href=\"https:\/\/www.linkedin.com\/in\/janbosch\/\"><em>linkedin.com\/in\/janbosch<\/em><\/a><em>) or Twitter (<\/em><a href=\"https:\/\/twitter.com\/JanBosch\"><em>@JanBosch<\/em><\/a><em>).<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week I got the opportunity to speak at the initiative seminar organized by the Chalmers AI Research center (CHAIR). The key message in my presentation was that working with artificial intelligence (AI) and specifically machine &amp; deep learning (ML\/DL) constitutes a major software engineering challenge that is severely underestimated by companies that start to &#8230; <a title=\"Machine &#038; Deep Learning: Experimentation Stage\" class=\"read-more\" href=\"https:\/\/janbosch.com\/blog\/index.php\/2019\/03\/06\/machine-deep-learning-experimentation-stage\/\" aria-label=\"Read more about Machine &#038; Deep Learning: Experimentation Stage\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"generate_page_header":"","footnotes":""},"categories":[15,4,9],"tags":[],"_links":{"self":[{"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/851"}],"collection":[{"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=851"}],"version-history":[{"count":2,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/851\/revisions"}],"predecessor-version":[{"id":854,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/851\/revisions\/854"}],"wp:attachment":[{"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=851"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=851"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/janbosch.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=851"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}