adding listening eng, contact update

2022-09-13 22:13:10 +01:00 · 2022-09-13 22:13:10 +01:00 · edf5c6ace2
commit edf5c6ace2
parent 63c6bc4fd9
13 changed files with 122 additions and 8 deletions
--- a/config.toml
+++ b/config.toml
@ -5,6 +5,16 @@ theme = 'hugo-coder'

 paginate = 40

+[taxonomies]
+  category = 'categories'
+  tag = 'tags'
+  art = 'art'
+
+[markup]
+  [markup.goldmark]
+    [markup.goldmark.renderer]
+      unsafe = true
+
 [params]
  author = "sarsoo"
  info = "dev & engineering"
@ -50,12 +60,22 @@ paginate = 40
  weight = 2
  url  = "dev-engineering/"

+#[[menu.main]]
+#  name = "Music"
+#  weight = 3
+#  url  = "music/"
+
+#[[menu.main]]
+#  name = "Art"
+#  weight = 4
+#  url  = "art/"
+
 [[menu.main]]
  name = "Posts"
-  weight = 3
+  weight = 5
  url  = "posts/"

 [[menu.main]]
  name = "Contact"
-  weight = 4
+  weight = 6
  url  = "contact/"
--- a/content/contact.md
+++ b/content/contact.md
@ -3,8 +3,12 @@ title: "Contact"
 date: 2020-12-25T00:04:40+00:00
 ---

+{{% avatar "/images/holo-avatar.jpg" %}}
+
 UK-based, award-winning post-grad [electronic engineering](/dev-engineering) student & previous Disney intern

 Multilingual programmer working from [embedded systems](/posts/iot) to [holoportation](/holo) and [full-stack web-dev](/mixonomer)

-I draw sometimes too
+[I draw sometimes too](https://www.instagram.com/pack_it_in_/)
+
+Give us a bell at [hello@sarsoo.xyz](mailto:hello@sarsoo.xyz)
--- a/content/holo/index.md
+++ b/content/holo/index.md
@ -12,6 +12,8 @@ draft: false
 "LiveScan3D: A Fast and Inexpensive 3D Data Acquisition System for Multiple Kinect v2 Sensors". in 3D Vision (3DV), 2015 International Conference on, Lyon, France, 2015
 `

+{{% giphy l2JJmXRcFoEJNXyEM %}}
+
 The app works by capturing what is called a [_point cloud_](https://en.wikipedia.org/wiki/Point_cloud), a cluster of coloured points. These points act as the pixels of the image with an extra third coordinate for depth. The coordinates of these points are sent with their RGB values; a good enough resolution allows rendering these points to create a decent real-time image for AR/VR streaming. The original version of the software used the Xbox Kinect camera for the Xbox One but it also supports the new Azure Kinect. 

 ## On This Page
--- a/content/posts/draught/index.md
+++ b/content/posts/draught/index.md
@ -6,11 +6,9 @@ draft: false

 ![ci](https://github.com/sarsoo/draught/actions/workflows/test.yml/badge.svg)

-Rust is a great language for low-level work. Its memory model and syntactic sugar make it attractive for new projects, eliminating many types of bugs by design.
+_I wrote a checkers game with a computer AI player that runs locally in the browser using compiled Rust + WASM. An AI can take a long time to make a move if you let it look many moves in the future, using compiled Rust allows a smarter computer player than if the player were run in Javascript._

-Likewise, WebAssembly is an attractive prospect for speeding up client-side web code. Combining the compiled performance, type safety and memory safety of Rust with the mature GUI development ecosystem of HTML/Javascript/CSS extends the possibilities for what can be completed client-side.
-
-It's these ideas that made me want to explore Rust + WASM - prior to this, I knew little about either beyond the basic tutorials. In my masters AI module, I studied adversarial game models including the [MiniMax algorithm](https://en.wikipedia.org/wiki/Minimax). This algorithm generates a tree of possible moves and compares scores to decide which move should be made. Generating this tree can be expensive as it explodes exponentially, as such it would be a good candidate to try and speed up using compiled Rust over interpreted Javascript.
+I wanted to play with Rust + WASM to see what could be done in the browser without Javascript, previously I knew little about either beyond the basic tutorials. In my masters AI module, I studied adversarial game models including the [MiniMax algorithm](https://en.wikipedia.org/wiki/Minimax). This algorithm generates a tree of possible moves and compares scores to decide which move should be made. Generating this tree can be expensive as it explodes exponentially, as such it would be a good candidate to try and speed up using compiled Rust over interpreted Javascript. If I'm honest, I'm not crazy passionate about checkers but I thought it would be a cool application of some of my uni theory

 ![checkers board](checkers-board.png)
 ###### Standard checkers board rendered on an HTML canvas using Rust
--- a/content/posts/listening-analysis/hrsperday.png
+++ b/content/posts/listening-analysis/hrsperday.png
--- a/content/posts/listening-analysis/index.md
+++ b/content/posts/listening-analysis/index.md
@ -4,3 +4,84 @@ date: 2021-02-20T12:22:40+00:00
 draft: false
 ---

+[Source Code](https://github.com/Sarsoo/listening-analysis)
+
+As my [Music Tools](https://sarsoo.xyz/music-tools/) project progressed, I found myself with a cloud environment and a growing dataset of my listening habits to explore. __Spotify__ provides audio features for all of the tracks on its service. These features describe qualities about the track such as how instrumental it is, how much energy it has. I wanted to investigate whether the features that describe my larger genre-playlists were coherent enough to use as the classes of a classifier. I compared the performance of SVM’s with shallow multi-layer perceptrons.
+
+All of these investigations are part of my [`listening-analysis`](https://github.com/Sarsoo/listening-analysis) repo, the work is spread out over a couple of different notebooks
+
+## [`analysis`](https://github.com/Sarsoo/listening-analysis/blob/master/analysis.ipynb)
+
+Introducing the dataset, high-level explorations including average Spotify descriptor over time and hours per day of music visualisations
+
+## [`artist`](https://github.com/Sarsoo/listening-analysis/blob/master/artist.ipynb), [`album`](https://github.com/Sarsoo/listening-analysis/blob/master/album.ipynb), [`track`](http://github.com/Sarsoo/listening-analysis/blob/master/track.ipynb), [`playlist`](https://github.com/Sarsoo/listening-analysis/blob/master/playlist.ipynb)
+
+Per-object investigations such as how much have I listened to this over time, it’s average descriptor and comparisons of the most-listened-to items
+
+## `playlist classifier` ([`SVM`](https://github.com/Sarsoo/listening-analysis/blob/master/playlist-svm.ipynb)/[`MLP`](https://github.com/Sarsoo/listening-analysis/blob/master/playlist-nn.ipynb))
+
+Investigations into whether my large genre-playlists can be used as classes for a genre classifier. Comparing and evaluating different types of support-vector machine and neural networks
+
+## [`stats`](https://github.com/Sarsoo/listening-analysis/blob/master/stats.ipynb)
+
+Dataset statistics including the amount of __Last.fm__ scrobbles that have an associated __Spotify__ URI (critical for attaching a __Spotify__ descriptor)
+
+## On This Page
+
+    1 Dataset
+    2 Playlist Classifier
+        - Class Weighting
+        - Data Stratification
+
+{{< figure src="svm-1.png" caption="Confusion matrix for a SVM playlist classifier" alt="svm" >}}
+
+## Dataset
+
+The dataset I wanted to explore was a combination of my __Spotify__ and __Last.fm__ data. __Last.fm__ records the music that you listen to on __Spotify__, I’ve been collating the information since November 2017. __Spotify__ was used primarily for two sources of data,
+
+    1 Playlist tracklists
+        - Initially for exploring habits such as which playlists I listen to the most, the data then formed the classes for applied machine learning models
+    2 Audio features
+        - For each track on the Spotify service, you can query for its audio features. This is a data structure describing information about the track including the key it’s in and the tempo. Additionally, there are 7 fields describing subjective qualities of the track including how instrumental a track is and how much energy it has
+
+These two sides of the dataset were joined using the __Spotify__ URI. As the __Last.fm__ dataset identifies tracks, albums and artists by name alone, these were mapped to __Spotify__ objects using the search API endpoint. With __Spotify__ URIs attached to the majority of my __Last.fm__ scrobbles, these scrobbles could then easily have their __Spotify__ audio features attached. This was completed using Google’s Big Query service to store the data with a SQL interface.
+
+{{< figure src="hrsperday.png" caption="Average time spent listening to music each day. Per-year polynomial line of best fit also visualised" alt="hours listening per day graph" >}}
+
+## Playlist Classifier
+
+My large genre playlists describe my tastes in genres across Rap, Rock, Metal and EDM. They’re some of my go-to playlists and they can be quite long. With these, I wanted to see how useful they could be from a classification perspective.
+
+The premise was this: could arbitrary tracks be correctly classified as one of these genres as I would describe them through my playlist tracklists.
+
+{{< figure src="playlist-descriptor.png" caption="Average Spotify descriptor for each genre playlist being investigated and modelled" alt="average descriptor by playlist graph" >}}
+
+The scikit-learn library makes beginning to explore a dataset using ML models really fast. I began by using a support-vector machine. SVMs of differing kernels were evaluated and compared to see which type of boundaries best discriminated between the genres. The differences can be seen below,
+
+{{< figure src="svm-classes.png" caption="Confusion matrices for the different type of SVM evaluated" alt="playlist classifier svm by class" >}}
+
+| SVM Kernel  | RBF | Linear | Poly | Sigmoid |
+|-------------|-----|--------|------|---------|
+| Accuracy, % | 71% | 68%    | 70%  | 29%     |
+
+###### `.score()` for each SVM model
+
+From these, it can be seen that the _Radial Basis Function_ (RBF) and _polynomial_ kernels were the best performing with the _Sigmoid_ function being just awful. When implementing one of these models in __Music Tools__ playlist generation, these two kernels will be considered.
+
+### Class Weighting
+
+The playlists that I’m using aren’t all of the same length. My Rap and EDM playlists are around 1,000 tracks long while my Pop playlist is only around 100. This poses an issue when attempting to create models from these playlists. The Rap model, for example, will be a much larger model than the Pop playlist and take up more volume in the descriptor space. This can make it much harder to correctly classify tracks as these under-represented classes when larger ones dominate.
+
+This issue can be seen visualised below, in the left matrix no tracks were correctly classified as Pop. Instead, half were classified as EDM and 20% as Rock.
+
+{{< figure src="svm.png" caption="Difference in classification accuracy when weighting the genres based on proportion" alt="svm classes" >}}
+
+There are many ways to begin mitigating this issue. One way is to penalise misclassifying under-represented classes more than the larger ones. This can be implemented in __scikit__ by initialising the model with the `class_weight` parameter equal to `'balanced'`. This was the method used in the matrix on the right, above. It was highly effective in rebalancing the classes, all of them have comparable accuracy afterwards.
+
+### Data Stratification
+
+Similar to class rebalancing, the dataset also required processing. Before using a model, a dataset is split between a _training_ set and a _test_ set. A default way to do this is to just take a random subset of the data for each set, it is crucial for properly evaluating the model that these datasets are distinct without overlap. However, this doesn’t take into account the relative occurrence of each class in either dataset. For example, a random split could leave more of one class or genre in one dataset than the other. As mentioned, the Pop class is much smaller than the other classes, it is feasible that the whole genre could end up in either the training or test dataset instead of properly splitting.
+
+Instead of allowing this to be determined by a random split, the dataset was _stratified_ when splitting. This applies the given proportion of training to test set to each class during the split such that the same proportion of tracks occur in either dataset.
+
+[Source Code](https://github.com/Sarsoo/listening-analysis)
--- a/content/posts/listening-analysis/playlist-descriptor.png
+++ b/content/posts/listening-analysis/playlist-descriptor.png
--- a/content/posts/listening-analysis/svm-1.png
+++ b/content/posts/listening-analysis/svm-1.png
--- a/content/posts/listening-analysis/svm-classes.png
+++ b/content/posts/listening-analysis/svm-classes.png
--- a/content/posts/listening-analysis/svm.png
+++ b/content/posts/listening-analysis/svm.png
--- a/layouts/partials/footer.html
+++ b/layouts/partials/footer.html
@ -1,6 +1,6 @@
 <footer class="footer">
  <section class="container">
-    <!-- ©
+    <!--
    {{ if (and .Site.Params.since (lt .Site.Params.since now.Year)) }}
      {{ .Site.Params.since }} -
    {{ end }}
@ -19,5 +19,6 @@
    {{ end }}

    <img src="/images/andy.png" width="80px" />
+    <span style="color: #b0b0b0">©</span>
  </section>
 </footer>
--- a/layouts/shortcodes/avatar.html
+++ b/layouts/shortcodes/avatar.html
@ -0,0 +1 @@
+<img src="{{ .Get 0 }}" style="border-radius: 50%; width: 40rem; margin-left: auto; margin-right: auto; display: block;"/>
--- a/layouts/shortcodes/giphy.html
+++ b/layouts/shortcodes/giphy.html
@ -0,0 +1,7 @@
+<!-- https://todayilearned.jm3.net/learnings/hugo-shortcode-for-giphy-embeds/ -->
+<div style="width:100%;height:0;padding-bottom:40%;position:relative;">
+    <iframe src="https://giphy.com/embed/{{ (index .Params 0) }}"
+      width="100%" height="100%" style="position:absolute"
+      frameBorder="0" allowFullScreen>
+    </iframe>
+</div>
				`@ -0,0 +1 @@`
				`<img src="{{ .Get 0 }}" style="border-radius: 50%; width: 40rem; margin-left: auto; margin-right: auto; display: block;"/>`