{"id":258,"date":"2022-04-11T09:00:00","date_gmt":"2022-04-11T09:00:00","guid":{"rendered":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/?p=258"},"modified":"2024-03-10T16:44:42","modified_gmt":"2024-03-10T16:44:42","slug":"metric-learning-for-simulation-analytics","status":"publish","type":"post","link":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/2022\/04\/11\/metric-learning-for-simulation-analytics\/","title":{"rendered":"Metric Learning For Simulation Analytics"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<p class=\"wp-block-paragraph\">Usual output analysis of simulations, which is done at an aggregate level, gives limited insight on how a system and its performance change throughout the simulation. To gain greater insight regarding this, you can think of a simulation as a generator of dynamic sample paths. When we consider that we are in the age of &#8220;big data&#8221;, it&#8217;s now pretty reasonable to keep the full sample path data and to explore how to use it for deeper analysis. This can be done in a way that supports real-time predictions and reveals the factors that drive the dynamic performance. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this post, we&#8217;ll look at the emerging field of <strong>simulation analytics<\/strong>.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>What is simulation analytics?<\/li><li>Metric learning for simulation<\/li><li>A simple example<\/li><li>Some final thoughts<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">1. What is Simulation Analytics?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The idea of <em>simulation analytics<\/em> was first described by <a rel=\"noreferrer noopener\" href=\"https:\/\/doi.org\/10.1057\/jos.2015.22\" data-type=\"URL\" data-id=\"https:\/\/doi.org\/10.1057\/jos.2015.22\" target=\"_blank\">Barry Nelson<\/a>. It is not just &#8220;saving all the simulation data&#8221; and then applying modern data-analysis tools. It explores the differences between real and simulated data. Nelson outlines that the objectives of simulation analytics are to generate the following:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>dynamic conditional statements<\/strong>: relationships of inputs and system state to outputs; and outputs to other (possibly time-lagged) outputs.<\/li><li><strong>inverse conditional statements<\/strong>: relationships of outputs to inputs or the system state<\/li><li><strong>dynamic distributional statements<\/strong>: full characterization of the observed output behaviour<\/li><li><strong>statements on multiple time scales<\/strong>: both high-level aggregation and individual event times<\/li><li><strong>comparative statements<\/strong>: how and why alternative system designs differ<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">2. Metric Learning for Simulation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The remainder of this post is a discussion of the <a href=\"https:\/\/ieeexplore.ieee.org\/document\/9383904\" data-type=\"URL\" data-id=\"https:\/\/ieeexplore.ieee.org\/document\/9383904\" target=\"_blank\" rel=\"noreferrer noopener\">work done<\/a> by one of my STOR-i colleagues, Graham Laidler and his supervisors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can use sample path data available to build a predictive model for dynamic system response. In particular they use <strong>k-nearest-neighbour<\/strong> <strong>classification <\/strong>of the system state with metric learning to define the measure of distance <a id=\"_ftn1\" href=\"#_ftnref1\">[1]<\/a> . In kNN classification, a simple rule is used to classify instances according to the labels of their k nearest neighbours. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From this definition, the paper uses <\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>binary labels <span class=\"wp-katex-eq\" data-display=\"false\"> y_i \\in \\{0,1\\} <\/span> <\/li><li>instance <span class=\"wp-katex-eq\" data-display=\"false\"> x_i <\/span>&nbsp;is the system state at time <span class=\"wp-katex-eq\" data-display=\"false\"> t_i <\/span>. More specifically, this refers to some subset of information generated by the simulation up to time <span class=\"wp-katex-eq\" data-display=\"false\"> t_i <\/span>.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The classification for an instance <span class=\"wp-katex-eq\" data-display=\"false\"> x^* <\/span> is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><span class=\"wp-katex-eq katex-display\" data-display=\"true\"> \\hat{y}^* = \\begin{cases} 1, &amp; \\text{if} \\sum_{i=1}^k y^{*(i)} \\geq c \\\\ 0, &amp; \\text{otherwise}, \\end{cases}<\/span><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where <span class=\"wp-katex-eq\" data-display=\"false\"> c \\in [0, \\infty) <\/span> is some threshold and <span class=\"wp-katex-eq\" data-display=\"false\"> y^{*(i)} \\text{ for } i = 1\\cdots k <\/span>  are the observed classification labels that correspond to the k instances nearest to <span class=\"wp-katex-eq\" data-display=\"false\"> x^* <\/span>. In words, <strong>if c or more of the k nearest neighbours to <span class=\"wp-katex-eq\" data-display=\"false\"> x^* <\/span> are observed to be 1, then <span class=\"wp-katex-eq\" data-display=\"false\"> y^* <\/span> is classified as 1 by the model. <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The discussion then turned to the idea of quantifying the similarity of instances since nearest neighbour classifiers assume that instances that are similar in terms of <span class=\"wp-katex-eq\" data-display=\"false\"> x <\/span>   are also similar in terms of <span class=\"wp-katex-eq\" data-display=\"false\"> y <\/span>. The authors attempt to fully characterise the system by including multiple predictors in their kNN model. Because of the multi-dimensionality of <span class=\"wp-katex-eq\" data-display=\"false\"> x_i <\/span>, all variables may not be comparable with respect to scale or interpretation, so using the Euclidean distance is not appropriate. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So we now look at <strong>metric learning<\/strong>, which automates the process of defining a suitable distance metric.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The aim of metric learning is to adapt a distance function over the space of <span class=\"wp-katex-eq\" data-display=\"false\"> x <\/span> . The paper uses <em>Mahalanobis metric learning <\/em>which has a distance function parametrized by <span class=\"wp-katex-eq\" data-display=\"false\"> M <\/span>, a symmetric positive semi-definite matrix. The metric learning problem is an optimization which minimizes, with respect to  <span class=\"wp-katex-eq\" data-display=\"false\"> M <\/span>, the sum of a loss function to penalize violations of the training constraints under the distance metric and a function which regularizes the values of <span class=\"wp-katex-eq\" data-display=\"false\"> M <\/span>. The metric learning task is subject to similarity constraints, dissimilarity constraints and relative similarity constraints which are set based on prior knowledge about the instances or using the class labels.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. A simple example<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To evaluate the model, the authors create a formulation of the problem. In this formulation, the similarity and dissimilarity constraints are partly based on LMNN<a href=\"#_ftn1\">[2]<\/a>. Because of the high-dimensional input, a global clustering of each class may not be appropriate, so a local neighbourhood approach was used when defining these constraint sets. The local neighbourhood of an instance  <span class=\"wp-katex-eq\" data-display=\"false\"> x_i <\/span>  was defined as the <em>q<\/em> nearest points in Euclidean distance. Points in that local neighbourhood are classified as similar if they had the same   <span class=\"wp-katex-eq\" data-display=\"false\"> y <\/span>   value and dissimilar if they did not. The aim was to minimise the sum of squared distances of instances classified as similar while keeping the average distance of dissimilar instances greater than 1. They set the local neighbourhood size <em>q = 20 <\/em>and <em>k = 50 <\/em>nearest neighbours. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One of the illustrations they applied it to was a simple <strong>stochastic activity network<\/strong>. The input space was the 5 activity times and the output was whether the longest path length is greater than 5. The activity times were i.i.d <span class=\"wp-katex-eq\" data-display=\"false\"> X_i \\sim Exp(1) <\/span>.  10000 replications of the network were run. Because the data generating mechanism is exactly known, this example was useful for evaluating the model since the authors understood what the output <em>M <\/em>should reveal. &nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-191830.png\" alt=\"\" class=\"wp-image-270\" width=\"556\" height=\"312\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-191830.png 910w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-191830-300x168.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-191830-768x431.png 768w\" sizes=\"auto, (max-width: 556px) 100vw, 556px\" \/><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The diagonal elements of <em>M<\/em> indicate the weight given to the difference in each variable in the classification of instances as similar or not. From the results, <span class=\"wp-katex-eq\" data-display=\"false\"> X_1, X_3, X_5 <\/span>  were the most relevant, as was expected from the intuition of the problem. The off-diagonal terms of <em>M <\/em>indicate impact of interaction terms. Using the 2-5 fold CV, metric kNN model was a better classifier than a logistic regression model.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"392\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355-1024x392.png\" alt=\"\" class=\"wp-image-271\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355-1024x392.png 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355-300x115.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355-768x294.png 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355-1536x589.png 1536w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192355.png 1639w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Visualisation of M (left), ROC curves for the classification (right)<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The authors then added noise variables to the model. This makes the model more realistic since multi-dimensional characterizations are likely to include variables that have little or no relationship to the output variable. Metric learning was able to filter out the noise variables while still detecting the relationship between the 5 initial variables.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"393\" src=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626-1024x393.png\" alt=\"\" class=\"wp-image-272\" srcset=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626-1024x393.png 1024w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626-300x115.png 300w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626-768x295.png 768w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626-1536x589.png 1536w, https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-content\/uploads\/sites\/38\/2022\/04\/Screenshot-2022-04-11-192626.png 1582w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>M for the noise-augmented data (left), ROC curves for classification (right)<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">4. Some Final Thoughts<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I believe this solution is valuable for 2 main reasons:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>It proposes a method for more in-depth analysis of simulation results which may be useful for real-time predictions and identifying drivers of system performance. The method is useful for revealing relationships between different components of the system and their effect on performance.<\/li><li>The method allows us to apply kNN on high-dimension input data without the needing to manually trim the state space. This allows analysis to be done without prior knowledge about what variables may or may not be relevant, as they can all be included and the metric learning will reveal the relevance.<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Learn More<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a id=\"_ftn1\" href=\"#_ftnref1\">[1]<\/a> Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-admin\/post.php?post=258&amp;action=edit#_ftnref1\">[2]<\/a> Weinberger, K. Q., and L. K. Saul. 2009. \u201cDistance Metric Learning for Large Margin Nearest Neighbor Classification\u201d. Journal of Machine Learning Research 10(9):207\u2013244.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>The idea of simulation analytics was first described by Barry Nelson. It is not just &#8220;saving all the simulation data&#8221; and then applying modern data-analysis tools. In this post, we look at this emerging field.<\/p>\n","protected":false},"author":41,"featured_media":260,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[14],"tags":[18,16,17],"class_list":["post-258","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-random-research","tag-knn","tag-operations-research","tag-simulation","post-with-thumbnail","post-with-thumbnail-large"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/comments?post=258"}],"version-history":[{"count":11,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/258\/revisions"}],"predecessor-version":[{"id":276,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/posts\/258\/revisions\/276"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/media\/260"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/media?parent=258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/categories?post=258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/danielle-notice\/wp-json\/wp\/v2\/tags?post=258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}