Skip to content

Blog

node packages updating

tl;dr

  1. > npm install depcheck -g - install depcheck globally
  2. > depcheck - check for redundant packages
  3. > npm un this-redundant-package - uninstall redundant packages (repeat for all redundant packages)
  4. Create a pull-request remove-redundant-packages

  1. > npm i - make order in node_modules
  2. > npm audit - see vulnerability issues
  3. > npm audit fix - fix vulnerability issues that don't require attention
  4. Create a pull-request fix-vulnerability-issues

  1. > npm i npm-check-updates -g - install npm-check-updates globally
  2. > npm-check-updates - see how outdated packages are
  3. > npm outdated - see how outdated packages are
  4. > npm update --save - update packages respecting your semver constraints from packages.json
  5. If you have packages that use major version 0.*.* you'll need to manually update these now
    • > npm install that-one-package@latest
  6. Create a pull-request update-packages-minor

If you're brave and can test/run you project easily:

  1. ncu -u - updates packages.json to all latest versions as shown by npm-check-updates
    • this might introduce breaking changes
  2. npm i - update package-lock.json
  3. Test your project.
  4. Create a pull-request update-packages-major

If you're not brave or can't just YOLO and update all major versions:

  1. npm-check-updates - check again what is left to update
  2. npm i that-package@latest - update major version of of that-package
  3. Test your project.
    • .js is dynamically typed so you might have just updated a package that breaks your project but you'll not know until you run your code
  4. Repeat for all packages.
  5. Create a pull-request update-packages-major

longer read

Need to update dependencies in a node js project? Here are my notes on this.

> npm i (npm install)

> npm i

added 60 packages, removed 124 packages, changed 191 packages, and audited 522 packages in 13s

96 packages are looking for funding
  run `npm fund` for details

10 vulnerabilities (2 low, 7 moderate, 1 high)

To address issues that do not require attention, run:
  npm audit fix

To address all issues possible (including breaking changes), run:
  npm audit fix --force

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.
- installs missing packages in node_modules - removes redundant packages in node_modules - installs correct versions of mismatched packages (if packages-lock.json wants a different version than found in node_modules) - shows what is going on with packaged in your project

> npm audit - shows a report on vulnerability issues in your dependencies

> npm audit fix - updates packages to address vulnerability issues (updates that do not require attention)

> npm outdated - shows a table with your packages and versions

$ npm outdated
Package      Current   Wanted   Latest  Location                  Depended by
glob          5.0.15   5.0.15    6.0.1  node_modules/glob         dependent-package-name
nothingness    0.0.3      git      git  node_modules/nothingness  dependent-package-name
npm            3.5.1    3.5.2    3.5.1  node_modules/npm          dependent-package-name
local-dev      0.0.3   linked   linked  local-dev                 dependent-package-name
once           1.3.2    1.3.3    1.3.3  node_modules/once         dependent-package-name

  • Current - what is in nodes_modules
  • Wanted - most recent version that respect the version constraint from packages.json
  • Latest - latest version from npm registry

To update to latest minor+patch versions of your dependencies (Wanted) - npm outdated shows all you need to know but I prefer the output of npm-check-updates

> npm i npm-check-updates -g (-g -> global mode - package will be available on your whole machine)

> npm-check-updates - shows where an update will be a major/minor/patch update (I like the colors)

Checking C:\git\blog\package.json
[====================] 39/39 100%

 @azure/storage-blob         ^12.5.0  →      ^12.17.0
 adm-zip                     ^0.4.16  →       ^0.5.12
 axios                       ^0.27.2  →        ^1.6.8
 basic-ftp                    ^5.0.1  →        ^5.0.5
 cheerio                 ^1.0.0-rc.6  →  ^1.0.0-rc.12
 eslint                      ^8.12.0  →        ^9.2.0
 eslint-config-prettier       ^8.5.0  →        ^9.1.0
 eslint-plugin-import        ^2.25.4  →       ^2.29.1
 fast-xml-parser              ^4.2.4  →        ^4.3.6
 humanize-duration           ^3.27.3  →       ^3.32.0
 iconv                        ^3.0.0  →        ^3.0.1
 jsonwebtoken                 ^9.0.0  →        ^9.0.2
 luxon                        ^3.4.3  →        ^3.4.4

Let us update something

> npm update - perform updates respecting your semver constraints and update package-lock.json

> npm update --save - same as above but also update packages.json, use this one always

The behavior for packages with major version 0.*.* is different than for versions >=1.0.0 (see npm help update)

npm update will most likely bump all minor and patch versions for you.

You can run npm update --save often.

What do the symbols in package.json mean?

https://stackoverflow.com/questions/22343224/whats-the-difference-between-tilde-and-caret-in-package-json/25861938#25861938

npm update --save vs npm audit fix

npm audit fix will only update packages to fix vulnerability issues

npm update --save will update all packages it can (respecting semver constraints)

Do I have unused dependencies?

> npm install depcheck -g

> depcheck - shows unused dependencies. depcheck scans for require/import statements in your code so you might be utilizing a package differently but depcheck will consider it unused (ex. when you import packages using importLazy).

npm-check

> npm i npm-check -g

> npm-check - a different tool to help with dependencies (I didn't use it)

honorable mentions

> npm ls - list installed packages (from node_modules)

> npm ls axios - show all versions of axios and why we have them

npm ls will not show you origin of not-installed optional dependencies.

Consider this - you devleop on a win maching and deploy your solution to a linux box. On windows (see below) you might think node-gyp-build is not used in your solution.

> npm ls node-gyp-build
test-npm@1.0.0 C:\git\test-npm
`-- (empty)

But on a linux box it will be used:

> npm ls node-gyp-build
npm-test-proj@1.0.0 /git/npm-test-proj
└─┬ kafka-lz4-lite@1.0.5
  └─┬ piscina@3.2.0
    └─┬ nice-napi@1.0.2
      └── node-gyp-build@4.8.1

axios, cookies & more

axios

axios - promise-based HTTP client for node.js

  • when used in node.js axios uses http module (https://nodejs.org/api/http.html)
  • in node axios does not support cookies by itself (https://github.com/axios/axios/issues/5742)
    • there are npm packages that add cookies support to axios
  • when used in browsers it uses XMLHttpRequest
  • when used in browsers cookies work by default

Why would you use axios over plain http module from node?

Axios makes http requests much easier. Try using plain http and you'll convince your self.


Are there other packages like axios?

Yes - for example node-fetch https://github.com/node-fetch/node-fetch


When making a request axios creates a default http and https agent - https://axios-http.com/docs/req_config (axios probably uses global agents). You can specify custom agents for a specific request or set custom agents as default agents to use with an axios instance.

const a = require('axios');
const http = require('node:http');

(async () => {
    // configure your agent as needed
    const myCustomAgent = new http.Agent({ keepAlive: true });

    // use your custom agent for a specific request
    const x = await a.get('https://example.com/', { httpAgent: myCustomAgent });
    console.log(x);

    // set you agent as default for all requests
    a.default.httpAgent = myCustomAgent;
})();

What are http/s agents responsible for?

http/s agents handle creating/closing sockets, TCP, etc. They talk to the OS, manage connection to hosts.


cookies

Without extra packages you need to code reading response headers, look for Set-Cookie headers. Store cookies somewhere. Code adding cookie headers to subsequent request.

https://www.npmjs.com/package/http-cookie-agent

Manages cookies for node.js HTTP clients (e.g. Node.js global fetch, undici, axios, node-fetch). http-cookie-agent implements a http/s agent that inspects request headers and does cookie related magic for you. It uses the class CookieJar from package tough-cookie to parse&store cookies.

import axios from 'axios';
import { CookieJar } from 'tough-cookie';
import { HttpCookieAgent, HttpsCookieAgent } from 'http-cookie-agent/http';

const jar = new CookieJar();

const a = axios.create({
  httpAgent: new HttpCookieAgent({ cookies: { jar } }),
  httpsAgent: new HttpsCookieAgent({ cookies: { jar } }),
});
// now we have an axios instance supporting cookies
await a.get('https://example.com');

axios-cookiejar-support

https://www.npmjs.com/package/axios-cookiejar-support

Depends on http-cookie-agent and tough-cookie. Does the same as http-cookie-agent but you don't have to create http/s agents yourself. This is a small package that just intercepts axios requests and makes sure custom http/s agents are used source.

Saves you a bit of typing but you can't use your own custom agents. If you need to configure your http/s agents (ex. with a certificate) - use http-cookie-agent (see github issue and github issue)

import axios from 'axios';
import { wrapper } from 'axios-cookiejar-support';
import { CookieJar } from 'tough-cookie';

const jar = new CookieJar();
const client = wrapper(axios.create({ jar }));

await client.get('https://example.com');

https://www.npmjs.com/package/tough-cookie

npm package - cookie parsing/storage/retrieval (tough-cookie itself does nothing with http request).

A bit about cookies

https://datatracker.ietf.org/doc/html/rfc6265 - RFC describing cookies.

https://datatracker.ietf.org/doc/html/rfc6265#page-28 - concise paragraph on Third-party cookies.

Servers responds with a Set-Cookie header. Client can set the requested cookie. Cookies have a specific format described in this document.

Random stuff

https://npmtrends.com/cookie-vs-cookiejar-vs-cookies-vs-tough-cookie

interesting - cookie for servers are more popular than tough-cookie for clients since ~2023.

Is this due to more serve side apps being written in node?

Packages we don't use

  • cookie - npm package - cookies for servers
  • cookies - npm package - cookies for servers (different then cookie)
  • cookiejar - npm package - a different cookie jar for clients

fetch & fetch & node-fetch

fetch - standard created by WHATWG meant to replace XMLHttpRequest - https://fetch.spec.whatwg.org/

fetch - an old npm package to fetch web content - don't use it

node-fetch - community implemented fetch standard as a npm package - go ahead and use it

fetch - node's native implementation of the fetch standard - https://nodejs.org/dist/latest-v21.x/docs/api/globals.html#fetch

Since fetch standard is the standard for both browsers and node chrome has a neat feature to export requests to fetch

chat-gpt crap

When researching I came across some chat-gpt generated content. You read it thinking it will be something but it's trash.

https://www.dhiwise.com/post/managing-secure-cookies-via-axios-interceptors -> this article from 2024 that tell you to implement cookies your self, doesn't even mention the word "package", "module"

https://medium.com/@stheodorejohn/managing-cookies-with-axios-simplifying-cookie-based-authentication-911e53c23c8a -> doesn't mention that cookies don't work in axios run in node without extra packages (at least this one mentions that chat-gpt helped, thought I bet it's fully written by chat-gpt)


inco note - our http client is misleading, it uses same agent for http and https, it should maybe be called customAgent

axios and fiddler

Using a request interceptor (proxy) like fiddler helps during development and debugging.

To make fiddler intercept axios request we have to tell axios that there is a proxy where all requests from should go. The proxy forwards those requests to the actual destination.

http_proxy=... // set proxy for http requests
https_proxy=... // set proxy for https requests
no_proxy=domain1.com,domain2.com // comma separated list of domains that should not be proxied

The proxy for both http and https can be the same url.

Read more - https://axios-http.com/docs/req_config

When using fiddler on windows I suggest going to Network & internet > Proxy and disableing proxies there (fiddler by default sets this). This way fiddler will only receive requests from the process where we set http(s)_proxy env vars.

fiddler and client certificates

I was not able to make fiddler work with client certificates. It should be done like this - https://docs.telerik.com/fiddler/configure-fiddler/tasks/respondwithclientcert but I couldn't get it to work

honorable mentions

I would like to try out - https://www.npmjs.com/package/proxy-agent at some point

I don't fully understand withCredentials

axios & cookies demo

> npm i
> node server.mjs
open browser and go to http://127.0.0.1:3000
cookies are supported
> node test.js (from another console)
cookies are not supported

axios, certificates, etc

To use axios with a client certificate you need to configure the https agent with the key and cert. the key and cert need to be in pem format. They both can be in the same pem file, or in separate pem files. (did not try it) but you should be able to merge and split your pem.

https://nodejs.org/api/tls.html#tlscreatesecurecontextoptions

to try out - https://www.npmjs.com/package/proxy-agent

Short post on premature optimization and error handling

Premature optimization is the root of all evil ~Donald Knuth

Don't pretend to handle more than you handle ~me just now

My team took over some scrapers. After a few months an issue is reported stating that accounting is missing data from one of the scrapers. No errors were logged and or seen by our team.

My colleague investigates the issue. Findings:

  • most recent data is indeed missing in our data base (it is already available in the API)
  • the data is often delayed (compared to when it's available in the remote API)
  • the data is not time critical but a delay of hours or days is vexing (remember folks - talk to your users or customers)
  • the scraper is using parallelism to send all requests at once (probably to get the data faster)
  • the API doesn't like our intense scraping and bans us from accessing the API, sometimes for hours
  • we never saw any Errors as the error handling looks like this:
try {
    data = hit the REST API using multiple parallel requests
    persist(data)
} catch {
    log.info("No data found")
}

Take away

  • talk to your users - in this case to learn that this data is not time critical
  • don't optimize prematurely
  • don't catch all exception pretending you handle them

More venting

I have seen my share of premature optimizations. AFAIR I always managed to have a conversation about the (un)necessity of an optimization and agree to prefer readability/simplicity over premature optimization.

If you see premature optimization my advise is "talk to the perp". People do what they consider necessary and right. They might optimize code hoping to save the company money or time.

If you have the experience to know that saving 2kB of RAM in an invoicing app run once a month is not worth using that obscure data structure - talk to those who don't yet know it. Their intentions are good.

I'm pretty sure I'm also guilty of premature optimization, just can't recall any instance as my brain is probably protecting my ego by erasing any memories of such mistakes from my past.

An example

One example of premature optimization stuck with me. I recall reviewing code as below

foreach(var gasPoint in gasPoints)
{
    if (gasPoint.Properties.Any())
    {
        foreach (var x in gasPoint.Properties)
        {
            // do sth with x
        }
    }
}

The review went something like this:

me: drop the if, foreach handles empty collections just fine

author: but this is better

me: why?

author: if the collection is empty we don't even use the iterator

me: how often is this code run?

author: currently only for a single entry in gasPoints, but there can be more

me: how many more and when?

author: users might create an entry for every gas pipeline connection in Europe

me: ok, how many is that?

We agreed to drop the if after realizing that:

We have ~30 countries in Europe, even if they all connect with each other there will be at most ~400 gas connections to handle here. We don't know that the if is faster then the iterator. 400 is extremely optimistic. We have 1 entry now, and realistically we will have 10 gasPoints in 5 years.

The conversation wasn't as smooth as I pretend here but we managed.

https://wiki.c2.com/?PrematureOptimization

https://wiki.c2.com/?ProfileBeforeOptimizing

https://youtube.com/CodeAesthetics/PrematureOptimization

Exercises in bash/shell/scripting

Being fluent in shell/scripting allows you to improve your work by 20%. It doesn't take you to another level. You don't suddenly poses the knowledge to implement flawless distributed transactions but some things get done much faster with no frustration.

Here is my collection of shell/scripting exercises for others to practice shell skills.

A side note - I'm still not sure if I should learn more PowerShell, try out a different shell or do everything in F# fsx. PowerShell is just so ugly ;(

Scroll down for answers

Exercise 1

What were the arguments of DetectOrientationScript function in https://github.com/tesseract-ocr/tesseract when it was first introduced?

Exercise 2

Get Hadoop distributed file system log from https://github.com/logpai/loghub?tab=readme-ov-file

Find the ratio of (failed block serving)/(failed block serving + successful block serving) for each IP

The result should like:

...
10.251.43.210  0.452453987730061
10.251.65.203  0.464609355865785
10.251.65.237  0.455237129089526
10.251.66.102  0.452124935995904
...

Exercise 3

This happened to me once - I had to find all http/s links to a specific domains in the export of our company's messages as someone shared proprietary code on websites available publicly.

Exercise - find all distinct http/s links in https://github.com/tesseract-ocr/tesseract

Exercise 4

Task - remove the string "42" from each line of multiple CSV files.

You can use this to generate the input CSV files:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
    [PSCustomObject]@{
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"
    }
}

$fileNames | % { $csvData | Export-Csv -Path $_ }

Exercise 5

Just like me you created tens of repositories while writing code katas. Now you would like to keep all katas in a single repository. Write a script to move several repositories to a single repository. Each repo's content will end up in a dedicated directory in the new "master" repo. Remember to merge unrelated histories in the "master" repo.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Exercise 1 - answer

Answer:

bool DetectOrientationScript(int& orient_deg, float& orient_conf, std::string& script, float& script_conf);

[PowerShell]
> git log -S DetectOrientationScript # get sha of oldest commit
> git show bc95798e011a39acf9778b95c8d8c5847774cc47 | sls DetectOrientationScript

[bash]
> git log -S DetectOrientationScript # get sha of oldest commit
> git show bc95798e011a39acf9778b95c8d8c5847774cc47 | grep DetectOrientationScript

One-liner:

[PowerShell]
> git log -S " DetectOrientationScript" -p | sls DetectOrientationScript | select -Last 1

[bash]
> git log -S " DetectOrientationScript" -p | grep DetectOrientationScript | tail -1

Bonus - execution times

[PowerShell 7.4]
> measure-command { git log -S " DetectOrientationScript" -p | sls DetectOrientationScript | select -Last 1 }
...
TotalSeconds      : 3.47
...

[bash]
> time git log -S " DetectOrientationScript" -p | grep DetectOrientationScript | tail -1
...
real    0m3.471s
...

Without git log -S doing heavy lifting times look different:

[PowerShell 7.4]
> @(1..10) | % { Measure-Command { git log -p | sls "^\+.*\sDetectOrientationScript" } } | % { $_.TotalSeconds } | Measure-Object -Average

Count    : 10
Average  : 9.27122774
[PowerShell 5.1]
> @(1..10) | % { Measure-Command { git log -p | sls "^\+.*\sDetectOrientationScript" } } | % { $_.TotalSeconds } | Measure-Object -Average

Count    : 10
Average  : 27.33900077
[bash]
> seq 10 | xargs -I '{}' bash -c "TIMEFORMAT='%3E' ; time git log -p | grep -E '^\+.*\sDetectOrientationScript' > /dev/null" 2> times
> awk '{s+=$1} END {print s}' times
6.7249 # For convince I moved to dot one place to the left

Reflections

Bash is faster then PowerShell. PowerShell 7 is much faster then PowerShell 5. It was surprisingly easy to get the average with Measure-Object in PowerShell and surprisingly difficult in bash.

Exercise 2 - answer

[PowerShell 7.4]
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort > sorted
> cat .\sorted | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, ($g.Length/$_.count)) } | write-host

This is how I got to the answer:

> sls "Served block" -Path .\HDFS.log | select -first 10
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 10
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 100
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select matches -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select Matches -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select Matches
> $a = sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw
> $a[0]
> get-type $a[0]
> Get-TypeData $a
> $a[0]
> $a[0].Matches[0].Value
> $a = sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log
> $a[0]
> $a[0].Matches[0].Value
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" }
> "asdf" -replace "a","b"
> "asdf" -replace "a","b" -replace "d","x"
> "asdf" -replace "a.","b" -replace "d","x"
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk" }
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" }
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" }
> "aaxxaa" -replace "a.","b"
> "aaxxaa" -replace "a.","b$0"
> "aaxxaa" -replace "a.","b$1"
> "aaxxaa" -replace "a.","b${1}"
> "aaxxaa" -replace "a.","b${0}"
> "aaxxaa" -replace "a.","b`${0}"
> "okaaxxokaa" -replace "(ok|no)aa","_`{$1}_"
> "okaaxxokaa" -replace "(ok|no)aa","_`${1}_"
> "okaaxxokaa" -replace "(ok|no)aa","_`${1}_`${0}"
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"}
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort > sorted
> cat .\sorted -First 10
> cat | group
> cat | group -Property {$_}
> cat .\sorted | group -Property {$_}
> cat .\sorted -Head 10 | group -Property {$_}
> cat .\sorted -Head 100 | group -Property {$_}
> cat .\sorted -Head 1000 | group -Property {$_}
> cat .\sorted -Head 10000 | group -Property {$_}
> cat .\sorted -Head 10000 | group -Property {$_} | select name,count
> cat .\sorted | group -Property {$_} | select name,count
> cat .\sorted | group -Property {$_ -replace "nk|ok",""}
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""}
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length / $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length / $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $__
> $__[0]
> $__[1]
> $__[2]
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $a[0]
> $a[1]
> $a[2]
> $a[1].GetType()
> $a[2].GetType()
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, ($g.Length) / ($_.count) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, (($g.Length) / ($_.count)) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,$_.name, (($g.Length) / ($_.count)) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; @($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; [Array] ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; [Array]@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,$_.name, (($g.Length) / ($_.count)) }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,$_.name, (($g.Length) / ($_.count)) }
> $a[0]
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); $x }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); ,$x }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x }
> $a[0]
> $a[0][0]
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { wirte-output "$_[0]" }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_" }
> cat .\sorted | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_" }

[F#]
open System.IO
open System.Text.RegularExpressions

let lines = File.ReadAllLines("HDFS.log")

let a =
    lines
    |> Array.filter (fun x -> x.Contains("Served block") || x.Contains("Got exception while serving"))

a
// |> Array.take 10000
|> Array.map (fun x ->
    let m = Regex.Match(x, "(Served block|Got exception while serving).*/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")
    m.Groups[2].Value,
    match m.Groups[1].Value with
    | "Served block"                -> true
    | "Got exception while serving" -> false )
|> Array.groupBy fst
|> Array.map (fun (key, group) ->
    let total = group.Length
    let failed = group |> Array.map snd |> Array.filter not |> Array.length
    key, (decimal failed)/(decimal total)
    )
|> Array.sortBy fst
|> Array.map (fun (i,m) -> sprintf "%s  %.15f" i m)
|> fun x -> File.AppendAllLines("fsout", x)

Exercise 3 - answer

[PowerShell 7.4]
> ls -r -file | % { sls -path $_.FullName -pattern https?:.* -CaseSensitive } | % { $_.Matches[0].Value } | sort | select -Unique

# finds 234 links
[bash]
> find . -type f -not -path './.git/*' | xargs grep -E https?:.* -ho | sort | uniq

# finds 234 links

Exercise 4 - answer

[PowerShell 7.4]
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ }

[bash]
> sed -i 's/43//' *.csv
> sed -ibackup 's/43//' *.csv # creates backup files
This neat, perhaps unix people had wisdom that is lost now.

Exercise 5 - answer

$repos = @(
    @("https://github.com/inwenis/kata.sortingitout", "sortingitout", "kata_sorting_it_out"  ),
    @("https://github.com/inwenis/anagrams_kata2",    "anagrams2",    "kata_anagrams2"  ),
    @("https://github.com/inwenis/anagram_kata",      "anagrams",     "kata"  )
)

$repos | ForEach-Object {
    $repo, $branch, $dir = $_
    $repoName = $repo.Split("/")[-1]
    git clone $repo
    pushd $repoName
    git co -b $branch
    $all = Get-ChildItem
    mkdir $dir
    $all | ForEach-Object {
        Move-Item $_ -Destination $dir
    }
    git add -A
    git cm -am "move"
    git remote add kata https://github.com/inwenis/kata
    git push -u kata $branch
    popd
    Remove-Item $repoName -Recurse -Force
    Read-Host "Press Enter to continue"
}

F# async - be mindful of what you put in async {}

open System

let r = Random()

let m () =
  let random_num = r.Next()
  async {
    printfn "%i" random_num
  }

m () |> Async.RunSynchronously // prints a random number
m () |> Async.RunSynchronously // prints another random number
let x = m ()
x |> Async.RunSynchronously // prints another random number
x |> Async.RunSynchronously // prints same number as above

Why does it matter that lines 14 and 15 print the same number?

Let's consider the following code:

// We're sending http requests and if they fail we'd like to retry them

#r "System.Net.Http"
open System.Net.Http

let HTTP_CLIENT = new HttpClient()

let send url =
  let httpRequest = new HttpRequestMessage()
  httpRequest.RequestUri <- Uri url

  async {
    let! r =
      HTTP_CLIENT.SendAsync httpRequest
      |> Async.AwaitTask
    return r
  }

send "http://test" |> Async.RunSynchronously
send "http://test" |> Async.RunSynchronously
let y = send "http://test"
y |> Async.RunSynchronously
y |> Async.RunSynchronously

let retry computation =
  async {
    try
      let! r = computation
      return r
    with
    | e ->
      printf "ups, err, let's retry"
      let! r2 = computation
      return r2
  }

send "http://test" |> retry |> Async.RunSynchronously
// retrying will fail always with "The request message was already sent. Cannot send the same request message multiple times."
// This is because just like L14/15 print the same number, here we send the exact same request object and that's not allowed

The fix

let send2 url =
  async {
    let httpRequest = new HttpRequestMessage()
    httpRequest.RequestUri <- Uri url
    let! r =
      HTTP_CLIENT.SendAsync httpRequest
      |> Async.AwaitTask
    return r
  }

send2 "http://test" |> retry |> Async.RunSynchronously