Skip to content

Blog

My blog about programming & other stuff

If you find a typo or would like to improve my posts feel free to submit a pull request to https://github.com/inwenis/blog.

Check out my other projects on my github like https://github.com/inwenis/collider where I tried to simulate Brownian motions.

Enjoy!

You can reach me at inwenis at gmail.com

PowerShell "Oopsie"

Task - remove a specific string from each line of multiple CSV files.

This task was added to the scripting exercise list.

First - let's generate some CSV files to work with:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
    [PSCustomObject]@{
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"
    }
}

$fileNames | % { $csvData | Export-Csv -Path $_ }

The "Oopsie"

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file $_ -Append }

This command will never finish. Run it for a moment (and then kill it), see the result, and try to figure out what happens. Explanation below.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The explanation

Get-Content (aka. cat) keeps the file open and reads the content that our command is appending, thus creating an infinite loop.

The fix

There are many ways to fix this this "oopsie"

Perhaps the simplest one is to not write to and read from the exact same file. A sensible rule is when processing files always write to a different file:

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file -path "fixed$($_.Name)" }

Knowing the reason for our command hanging we can make sure the whole file is read before we overwrite it:

ls *.csv | % { (cat $_ ) | % { $_ -replace "42","" } | out-file $_ }
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ } # we can also use -replace as an array operator

I'm amazed by github's co-pilot answer for "powershell one liner to remove a specific text from multiple CSV files":

Get-ChildItem -Filter "*.csv" | ForEach-Object { (Get-Content $_.FullName) -replace "string_to_replace", "replacement_string" | Set-Content $_.FullName }

How the way I work/code/investigate/debug changed with time & experience

I use this metaphor when describing how I work these days.

TL;DR;

  1. Quick feedback is king
    • unit tests
    • quick test in another way
    • reproduce issues locally
    • try things out in a small project on the side, not in the project you're working on
  2. One thing at a time
    • experimenting
    • refactoring preparing for feature addition
    • feature coding
    • cleaning up after feature coding
  3. Divide problems into smaller problems
    • and remember - one thing (problem) at a time

Example

You're working with code that talks to a remote API, you want to test different API calls to the remote API.

don't - change API parameters in code and run the project each time you test something. It takes too long.

do - write a piece of code to send an HTTP request, fiddle with this code

do - intercept request with Fiddler/Postman/other interceptor and reissue requests with different parameters


Example

Something fails in the CI pipeline.

don't - make a change, commit, wait for remote CI to trigger, see result

do - reproduce issue locally


Longer read

  1. Quick feedback
  2. do - write a test for it
  3. do - isolate your issue/suspect/the piece of code you're working with
    • it is helpful if you can run just a module/sub-system/piece of your project/system
    • partial execution helps - like in Python/Jupyter or F# fsx
  4. if you rely on external data and it takes time to retrieve it (even a 5-second delay can be annoying) - dump data to a file and read it from the file instead of hitting an external API or a DB every time you run your code
  5. don't try to understand how List.foldBack() works while debugging a big project. Do it on the side.
  6. spin up a new solution/project on the side to test things
  7. occasional juniors ask "does this work this way" - you can test it yourself easily if you do it on the side

  8. One thing at a time

  9. separate refactoring from feature addition
  10. fiddle first, find the walls/obstacles
  11. git reset --hard
  12. refactor preparing for a new feature (can become a separate PR)
  13. code feature
  14. if during coding you find something that needs refactoring/renaming/cleaning up - any kind of "WTF is this? I need to fix this!" try a) or b)
    • a) make a note to fix it later
    • b) fix immediately
      > git stash
      > git checkout master
      > git checkout -b fix-typo
      fix stuff
      merge or create a PR
      git checkout feature
      > git merge fix-typo or git rebase fix-typo
      continue work
      
  15. always have a paper notepad on your desk

    • note things you would like to come back to or investigate
    • it gives me great satisfaction to go through a list of "side quests" I have noted and strike through all of them, knowing I have dealt with each one before starting a new task
    • when investigating something I also note questions I would like to be able to answer after I'm done investigating
      • example: while working with Axios and cookies I found conflicting information about whether Axios supports cookies. After the investigation, I knew that Axios supports cookies by default in a browser but not in Node.js
  16. Divide problems into smaller problems

  17. example - coding logic for a new feature in a CLI tool and designing the CLI arguments - these can be 2 sub-tasks

Big bang vs baby steps

The old me often ended up doing the big bang. Rewriting large chunks of code at once. Starting things from scratch. Working for hours or days with a codebase that can't even compile.

Downsides - for a long time the project doesn't even compile, I lose motivation, I feel like I'm walking in the dark, I don't see errors for a long time - requires a lot of context keeping in my mind since I've ripped the project apart - if I abandon work for a few days sometimes I forget everything and progress is lost

The new me prefers baby steps

Fiddle with the code knowing I'll git reset --hard. Try renaming some stuff - helps me understand the codebase better. Try out different things and abandon them. At this point, I usually get an idea/feeling of what needs to be done. Plan a few smaller refactorings. After them, I am usually closer to the solution and am able to code it without a big bang.

My recommendations

Terminal etc.

git

node

http

other

PowerShell quirk

tl;dr

In PowerShell if you want to return an array instead of one element of the array at the time do this:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong! will pipe/return 1 element at a time
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct! will pipe/return pairs
Beware! Result of both snippets will be displayed in the exact same way even though they have different types! See below:
> @(1,2,3,4)
1
2
3
4
> @((1,2),(3,4))
1
2
3
4

To check actual types:

> $x = @(1..2) | % { $a = "a" * $_; @($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
> $x = @(1..2) | % { $a = "a" * $_; ,@($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
# Alternatively
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType
# Get-Member only shows output for every distinct type

Longer read

Occasionally I have a fist fight with PS to return an Array instead of one element at a time. PS is a tough oponent. I think I get it now though.

The comma , in PS is a binary and unary operator. You can use it with a single or 2 arguments.

> ,1 # as an unary operator the comma creates an array with 1 member
1
> 1,2 # as an binary operator the comma creates an array with 2 members
1
2

Beware that both an array[] and array[][] will be displayed the same way. $y is an array[][], it is printed the same way to the output as $x

> $x = @(1,2,3,4) ; $x.GetType().Name ; $x[0].GetType().Name ; $x
Object[]
Int32
1
2
3
4
> $y = @((1,2),(3,4)) ; $y.GetType().Name ; $y[0].GetType().Name ; $y
Object[]
Object[]
1
2
3
4

If you're trying to return an array of pairs:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong
a
1
aa
2
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct!
a
1
aa
2
# Even though the result looks like a flat array this this time it's an array of arrays
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType # we get strings and ints

   TypeName: System.String

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

   TypeName: System.Int32

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType # we get arrays

   TypeName: System.Object[]

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

More on printing your arrays of pairs:

> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output # write-output will "unwind" your array
a
1
aa
2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-host
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | % { write-output "$_" }
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output -NoEnumerate # returns an array of arrays but it's printed as if it's a flat array
a
1
aa
2

This https://stackoverflow.com/a/29985418/2377787 explains how @() works in PS.

> $a='A','B','C'
> $b=@($a;)
> $a
A
B
C
> $b
A
B
C
> [Object]::ReferenceEquals($a, $b)
False
Above $a; is understood as $a is a collection, collections should be enumerated and each item is passed to the pipeline. @($a;) sees 3 elements but not the original array and creates an array from the 3 elements. In PS @($collection) creates a copy of $collection. @(,$collection) - creates an array with a single element $collection.

It will be great, set it up, don't use it, remove it

2024-05-02

Today, I removed something I allowed to be created despite initially feeling it was redundant.

A year ago, a student found a tool to auto-generate documentation for our internal SDK from code annotations. They proposed embedding this documentation in our Continuous Integration pipeline. Although the idea sounded good on paper, I felt it wouldn’t be used by our team. However, the team was enthusiastic.

Everyone in our team has the SDK's git repo on their machine, and we rely on IntelliSense and the code for documentation. It seemed unlikely that we would change our habits since the new documentation wouldn’t be easier to use than just using F12 to view the source code.

Despite my doubts, I allowed this feature as the team lead. I had already rejected a few initiatives from that student and didn’t want to kill their motivation. I wanted to let them work on something they found interesting. There was a slight chance I was wrong and the documentation might be used.

Today, a year later, I noticed the docs website no longer works. It had been down for some time, and no one noticed because no one used it. I removed any trace of the published documentation.

This made me reflect: was it wrong to allow something to be created that never paid off?

In this case, the investment was small. The gain was that the student got to work on something interesting. So, I think it was right to let our team try it out and remove it once we were certain it wasn’t used

Notes on keys, certs, certificates, HTTPS, SSL, SSH, TLS

key != cert (a key is different from a certificate)

Keys are used to encrypt connections, certs are used to verify that the key owner is who he says he is.

Certificate aka cert

A certificate proves that a public key belongs to a given entity. The cert includes:

  • public key
  • information about the key
  • CA's signature validating the cert

CA's signature basically says "I confirm that this public key belongs to this person/entity". The signature is made using CA's private key.

This is wikipedia certificate which I've exported from my chrome browser. This is the certificate in PEM format.

-----BEGIN CERTIFICATE-----
MIIISzCCB9GgAwIBAgIQB0GeOVg6THbPHqFDR/pfOjAKBggqhkjOPQQDAzBWMQsw
CQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMTAwLgYDVQQDEydEaWdp
Q2VydCBUTFMgSHlicmlkIEVDQyBTSEEzODQgMjAyMCBDQTEwHhcNMjMxMDE4MDAw
MDAwWhcNMjQxMDE2MjM1OTU5WjB5MQswCQYDVQQGEwJVUzETMBEGA1UECBMKQ2Fs
aWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZyYW5jaXNjbzEjMCEGA1UEChMaV2lraW1l
ZGlhIEZvdW5kYXRpb24sIEluYy4xGDAWBgNVBAMMDyoud2lraXBlZGlhLm9yZzBZ
MBMGByqGSM49AgEGCCqGSM49AwEHA0IABDVh9CEa/2rEO/oGR8YZbr5wOPHcFrG8
OBQS1BQrHAsxgVn1Z/bnKtE8Hvqup+0GXdZvXYlMa8iw4A+Dz/XTitqjggZcMIIG
WDAfBgNVHSMEGDAWgBQKvAgpF4ylOW16Ds4zxy6z7fvDejAdBgNVHQ4EFgQUyqwM
Z6LjhkM/u0PnQdmhhzp43TMwggLtBgNVHREEggLkMIIC4IIPKi53aWtpcGVkaWEu
b3Jngg13aWtpbWVkaWEub3Jngg1tZWRpYXdpa2kub3Jngg13aWtpYm9va3Mub3Jn
ggx3aWtpZGF0YS5vcmeCDHdpa2luZXdzLm9yZ4INd2lraXF1b3RlLm9yZ4IOd2lr
aXNvdXJjZS5vcmeCD3dpa2l2ZXJzaXR5Lm9yZ4IOd2lraXZveWFnZS5vcmeCDndp
a3Rpb25hcnkub3Jnghd3aWtpbWVkaWFmb3VuZGF0aW9uLm9yZ4IGdy53aWtpghJ3
bWZ1c2VyY29udGVudC5vcmeCESoubS53aWtpcGVkaWEub3Jngg8qLndpa2ltZWRp
YS5vcmeCESoubS53aWtpbWVkaWEub3JnghYqLnBsYW5ldC53aWtpbWVkaWEub3Jn
gg8qLm1lZGlhd2lraS5vcmeCESoubS5tZWRpYXdpa2kub3Jngg8qLndpa2lib29r
cy5vcmeCESoubS53aWtpYm9va3Mub3Jngg4qLndpa2lkYXRhLm9yZ4IQKi5tLndp
a2lkYXRhLm9yZ4IOKi53aWtpbmV3cy5vcmeCECoubS53aWtpbmV3cy5vcmeCDyou
d2lraXF1b3RlLm9yZ4IRKi5tLndpa2lxdW90ZS5vcmeCECoud2lraXNvdXJjZS5v
cmeCEioubS53aWtpc291cmNlLm9yZ4IRKi53aWtpdmVyc2l0eS5vcmeCEyoubS53
aWtpdmVyc2l0eS5vcmeCECoud2lraXZveWFnZS5vcmeCEioubS53aWtpdm95YWdl
Lm9yZ4IQKi53aWt0aW9uYXJ5Lm9yZ4ISKi5tLndpa3Rpb25hcnkub3JnghkqLndp
a2ltZWRpYWZvdW5kYXRpb24ub3JnghQqLndtZnVzZXJjb250ZW50Lm9yZ4INd2lr
aXBlZGlhLm9yZ4IRd2lraWZ1bmN0aW9ucy5vcmeCEyoud2lraWZ1bmN0aW9ucy5v
cmcwPgYDVR0gBDcwNTAzBgZngQwBAgIwKTAnBggrBgEFBQcCARYbaHR0cDovL3d3
dy5kaWdpY2VydC5jb20vQ1BTMA4GA1UdDwEB/wQEAwIDiDAdBgNVHSUEFjAUBggr
BgEFBQcDAQYIKwYBBQUHAwIwgZsGA1UdHwSBkzCBkDBGoESgQoZAaHR0cDovL2Ny
bDMuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0VExTSHlicmlkRUNDU0hBMzg0MjAyMENB
MS0xLmNybDBGoESgQoZAaHR0cDovL2NybDQuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0
VExTSHlicmlkRUNDU0hBMzg0MjAyMENBMS0xLmNybDCBhQYIKwYBBQUHAQEEeTB3
MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wTwYIKwYBBQUH
MAKGQ2h0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRMU0h5YnJp
ZEVDQ1NIQTM4NDIwMjBDQTEtMS5jcnQwDAYDVR0TAQH/BAIwADCCAYAGCisGAQQB
1nkCBAIEggFwBIIBbAFqAHcA7s3QZNXbGs7FXLedtM0TojKHRny87N7DUUhZRnEf
tZsAAAGLQ1J6cgAABAMASDBGAiEA5reeeuLSzGPvJQ5hT3Bd8aOVxmIltXMTLhY6
19qDGWUCIQDO0LMbF3s42tyxgFIOt7rVOpsHe9Sy0wFQQj8BWO0LIQB2AEiw42va
pkc0D+VqAvqdMOscUgHLVt0sgdm7v6s52IRzAAABi0NSegEAAAQDAEcwRQIgYJdu
BrioIun6FTeQhDxqK2eyZehguOkxScS3nwsGSakCIQC1FyuCpm+QQBRJFSTAnStR
iP+hgGIhgzyZ837usahB0QB3ANq2v2s/tbYin5vCu1xr6HCRcWy7UYSFNL2kPTBI
1/urAAABi0NSeg8AAAQDAEgwRgIhAOm1GvY8M4V+tUyjV9/PCj8rcWHUOvfY0a/o
nsKg/bitAiEA1Vm1pP8CDp7hGcQzBBTscpCVebzWCe8DK231mtv97QUwCgYIKoZI
zj0EAwMDaAAwZQIwKuOOLjmwGgtjG6SASF4W2e8KtQZANRsYXMXJDGwBCi9fM7Qy
S9dvlFLwrcDg1gxlAjEA5XwJikbpk/qyQerzeUspuZKhqh1KPuj2uBdp8vicuBxu
TJUd1W+d3LmikOUgGzil
-----END CERTIFICATE-----

We can double click the cert file to view it (on Windows) or use many other different tools to view its content.

https://stackoverflow.com/questions/9758238/how-to-view-the-contents-of-a-pem-certificate

> $cert = New-Object Security.Cryptography.X509Certificates.X509Certificate2([string]"C:\Users\inwen\Downloads\_.wikipedia.org.crt")
> $cert | select *

EnhancedKeyUsageList : {Server Authentication (1.3.6.1.5.5.7.3.1), Client Authentication (1.3.6.1.5.5.7.3.2)}
DnsNameList          : {*.wikipedia.org, wikimedia.org, mediawiki.org, wikibooks.org...}
SendAsTrustedIssuer  : False
Archived             : False
Extensions           : {System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid...}
FriendlyName         :
IssuerName           : System.Security.Cryptography.X509Certificates.X500DistinguishedName
NotAfter             : 17/10/2024 01:59:59
NotBefore            : 18/10/2023 02:00:00
HasPrivateKey        : False
PrivateKey           :
PublicKey            : System.Security.Cryptography.X509Certificates.PublicKey
RawData              : {48, 130, 8, 75...}
SerialNumber         : 07419E39583A4C76CF1EA14347FA5F3A
SubjectName          : System.Security.Cryptography.X509Certificates.X500DistinguishedName
SignatureAlgorithm   : System.Security.Cryptography.Oid
Thumbprint           : 483F0C71F34AE0EA30D99BD60463DCDAA8F49DFB
Version              : 3
Handle               : 2140299849504
Issuer               : CN=DigiCert TLS Hybrid ECC SHA384 2020 CA1, O=DigiCert Inc, C=US
Subject              : CN=*.wikipedia.org, O="Wikimedia Foundation, Inc.", L=San Francisco, S=California, C=US

SSL & TLS & SSH & SFTP

SSH (Secure SHell protocol) - protocol that allows to execute shell commands over a secure connection.

SFTP is an extension of SSH. SFTP != FTP over SSH. To connect to a SFTP server you need a private ssh key. The public ssh key (your private key's counterpart) is stored at the server.

TLS & SSL - think of SSL as the older/first protocol for secure communication. SSL was outphased by TLS. TLS is THE protocol used by HTTPS for secure connections.

Clients can be anonymous in TLS - usually the case on web - the server provides a cert to your browser but you don't need a cert of your own. TLS can be mutual - if the client has a cert the servers will/can validate it.

PuTTy

PuTTy is free+open source software than can do SSH. PuTTy has its own format of key files -> .ppk

ppk - putty private key (ppk can be changed to pem with some software) A PPK file stores a private key, and the corresponding public key. Both are contained in the same file. https://tartarus.org/~simon/putty-snapshots/htmldoc/AppendixC.html

PEM

Privacy-Enhanced Mail (PEM) is THE file format for exhanging keys, certificates.

  • .cer & .crt - PEM file with a certificate
  • .key - PEM with with a private or public key

The file extensions doesn't really matter. Just open the file and see the headers to be sure what it is.

To view a pem certificate on Windows - rename it to .crt and double click.

You can open a .pem file as plain text as see its content:

// pem ignores stuff between the headers so you can put comments here

-----BEGIN RSA PRIVATE KEY-----
izfrNTmQLnfsLzi2Wb9xPz2Qj9fQYGgeug3N2MkDuVHwpPcgkhHkJgCQuuvT+qZI
MbS2U6wTS24SZk5RunJIUkitRKeWWMS28SLGfkDs1bBYlSPa5smAd3/q1OePi4ae
dU6YgWuDxzBAKEKVSUu6pA2HOdyQ9N4F1dI+F8w9J990zE93EgyNqZFBBa2L70h4
M7DrB0gJBWMdUMoxGnun5glLiCMo2JrHZ9RkMiallS1sHMhELx2UAlP8I1+0Mav8
iMlHGyUW8EJy0paVf09MPpceEcVwDBeX0+G4UQlO551GTFtOSRjcD8U+GkCzka9W
/SFQrSGe3Gh3SDaOw/4JEMAjWPDLiCglwh0rLIO4VwU6AxzTCuCw3d1ZxQsU6VFQ
PqHA8haOUATZIrp3886PBThVqALBk9p1Nqn51bXLh13Zy9DZIVx4Z5Ioz/EGuzgR
d68VW5wybLjYE2r6Q9nHpitSZ4ZderwjIZRes67HdxYFw8unm4Wo6kuGnb5jSSag
vwBxKzAf3Omn+J6IthTJKuDd13rKZGMcRpQQ6VstwihYt1TahQ/qfJUWPjPcU5ML
9LkgVwA8Ndi1wp1/sEPe+UlL16L6vO9jUHcueWN7+zSUOE/cDSJyMd9x/ZL8QASA
ETd5dujVIqlINL2vJKr1o4T+i0RsnpfFiqFmBKlFqww/SKzJeChdyEtpa/dJMrt2
8S86b6zEmkser+SDYgGketS2DZ4hB+vh2ujSXmS8Gkwrn+BfHMzkbtio8lWbGw0l
eM1tfdFZ6wMTLkxRhBkBK4JiMiUMvpERyPib6a2L6iXTfH+3RUDS6A==
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIICMzCCAZygAwIBAgIJALiPnVsvq8dsMA0GCSqGSIb3DQEBBQUAMFMxCzAJBgNV
BAYTAlVTMQwwCgYDVQQIEwNmb28xDDAKBgNVBAcTA2ZvbzEMMAoGA1UEChMDZm9v
MQwwCgYDVQQLEwNmb28xDDAKBgNVBAMTA2ZvbzAeFw0xMzAzMTkxNTQwMTlaFw0x
ODAzMTgxNTQwMTlaMFMxCzAJBgNVBAYTAlVTMQwwCgYDVQQIEwNmb28xDDAKBgNV
BAcTA2ZvbzEMMAoGA1UEChMDZm9vMQwwCgYDVQQLEwNmb28xDDAKBgNVBAMTA2Zv
bzCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAzdGfxi9CNbMf1UUcvDQh7MYB
OveIHyc0E0KIbhjK5FkCBU4CiZrbfHagaW7ZEcN0tt3EvpbOMxxc/ZQU2WN/s/wP
xph0pSfsfFsTKM4RhTWD2v4fgk+xZiKd1p0+L4hTtpwnEw0uXRVd0ki6muwV5y/P
+5FHUeldq+pgTcgzuK8CAwEAAaMPMA0wCwYDVR0PBAQDAgLkMA0GCSqGSIb3DQEB
BQUAA4GBAJiDAAtY0mQQeuxWdzLRzXmjvdSuL9GoyT3BF/jSnpxz5/58dba8pWen
v3pj4P3w5DoOso0rzkZy2jEsEitlVM2mLSbQpMM+MUVQCQoiG6W9xuCFuxSrwPIS
pAqEAuV4DNoxQKKWmhVv+J0ptMWD25Pnpxeq5sXzghfJnslJlQND
-----END CERTIFICATE-----

Contents between header and footer (-----BEGIN CERTIFICATE----- + -----END CERTIFICATE-----) is base64 encoded. The content can be DER binary data.

DER

Distinguished Encoding Rules - is a way of encoding data structures. A certificate is a data structure containing various entires like validity date, issuer, etc. For certificates to work you need to store this information and transfer it. DER encodes this information is a binary format. This is then after base64 encoded and then it goes into a PEM file.

X.509

X.509 is the standard defining public key certificates for TLS/SSL (HTTPS)

PFX/P12/PKCS12

PFX seems to be Microsoft's complicated file format for storing cryptographic data.

P12/PKCS12 is the successor to PFX. Sometimes the terms PFX/P12/PKCS12 are used interchangeably.


base64 offline decoder: https://www.glezen.org/Base64Decoder.html

Nice description of certs vs key: https://superuser.com/questions/620121/what-is-the-difference-between-a-certificate-and-a-key-with-respect-to-ssl

Generate yourself a certificate: https://getacert.com/index.html

Important info on rejectUnauthorized: false and certificates in axios/node: https://stackoverflow.com/questions/51363855/how-to-configure-axios-to-use-ssl-certificate

convention - propose - specify format in secret name - use plain - not base64 encoded

Lazy websites

Website's certificates are usually signed by intermediate CA, which in turn are signed by a trusted root CA. The idea is that the server you connect to send you its certificate with all the intermediate certificates. Your app/machine should have the root CA certificate stored so it can validate the chain of certificates it received from the server (by just validating the root cert sent with its own root CA).

Some servers are misconfigured and do not send the intermediate certificates. You do not notice because browsers fill in the gaps for a better browsing experience. However when you try to scrape the same website with ex. node your connection will be rejected.

don't's (for node)

Several answers on SO suggest:

  • NODE_TLS_REJECT_UNAUTHORIZED=0 or
  • const httpsAgent = new https.Agent({ rejectUnauthorized: false });

Both are terrible ideas - they make your app accept unauthorized connections. They are the equivalent of this conversation:

"I can't verify this certificate, we can not be sure who we are connecting to" - says Node with care in its voice

"Doesn't matter, YOLO, carry on" - you reply shrugging your shoulders

Read more here

does (for node)

Use NODE_EXTRA_CA_CERTS. Alternatively use a library to programmatically give node the missing certificate link

Good read - https://stackoverflow.com/questions/31673587/error-unable-to-verify-the-first-certificate-in-nodejs

root CA stores

Node

It seems everyone has their own root CA store these days. Nodes has a hardcoded list of root CA see:

  • https://github.com/nodejs/node/blob/main/src/node_root_certs.h
  • https://github.com/nodejs/node/issues/4175

Windows

You can view Windows certificates with PowerShell:

Get-ChildItem -Recurse Cert:

Chrome

https://chromium.googlesource.com/chromium/src/+/main/net/data/ssl/chrome_root_store/root_store.md

If you would like to become chrome's trusted CA - https://www.chromium.org/Home/chromium-security/root-ca-policy/

https://blog.chromium.org/2022/09/announcing-launch-of-chrome-root-program.html

node packages updating

tl;dr

  1. > npm install depcheck -g - install depcheck globally
  2. > depcheck - check for redundant packages
  3. > npm un this-redundant-package - uninstall redundant packages (repeat for all redundant packages)
  4. Create a pull-request remove-redundant-packages

  1. > npm i - make order in node_modules
  2. > npm audit - see vulnerability issues
  3. > npm audit fix - fix vulnerability issues that don't require attention
  4. Create a pull-request fix-vulnerability-issues

  1. > npm i npm-check-updates -g - install npm-check-updates globally
  2. > npm-check-updates - see how outdated packages are
  3. > npm outdated - see how outdated packages are
  4. > npm update --save - update packages respecting your semver constraints from packages.json
  5. If you have packages that use major version 0.*.* you'll need to manually update these now
    • > npm install that-one-package@latest
  6. Create a pull-request update-packages-minor

If you're brave and can test/run you project easily:

  1. ncu -u - updates packages.json to all latest versions as shown by npm-check-updates
    • this might introduce breaking changes
  2. npm i - update package-lock.json
  3. Test your project.
  4. Create a pull-request update-packages-major

If you're not brave or can't just YOLO and update all major versions:

  1. npm-check-updates - check again what is left to update
  2. npm i that-package@latest - update major version of of that-package
  3. Test your project.
    • .js is dynamically typed so you might have just updated a package that breaks your project but you'll not know until you run your code
  4. Repeat for all packages.
  5. Create a pull-request update-packages-major

longer read

Need to update dependencies in a node js project? Here are my notes on this.

> npm i (npm install)

> npm i

added 60 packages, removed 124 packages, changed 191 packages, and audited 522 packages in 13s

96 packages are looking for funding
  run `npm fund` for details

10 vulnerabilities (2 low, 7 moderate, 1 high)

To address issues that do not require attention, run:
  npm audit fix

To address all issues possible (including breaking changes), run:
  npm audit fix --force

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.
- installs missing packages in node_modules - removes redundant packages in node_modules - installs correct versions of mismatched packages (if packages-lock.json wants a different version than found in node_modules) - shows what is going on with packaged in your project

> npm audit - shows a report on vulnerability issues in your dependencies

> npm audit fix - updates packages to address vulnerability issues (updates that do not require attention)

> npm outdated - shows a table with your packages and versions

$ npm outdated
Package      Current   Wanted   Latest  Location                  Depended by
glob          5.0.15   5.0.15    6.0.1  node_modules/glob         dependent-package-name
nothingness    0.0.3      git      git  node_modules/nothingness  dependent-package-name
npm            3.5.1    3.5.2    3.5.1  node_modules/npm          dependent-package-name
local-dev      0.0.3   linked   linked  local-dev                 dependent-package-name
once           1.3.2    1.3.3    1.3.3  node_modules/once         dependent-package-name

  • Current - what is in nodes_modules
  • Wanted - most recent version that respect the version constraint from packages.json
  • Latest - latest version from npm registry

To update to latest minor+patch versions of your dependencies (Wanted) - npm outdated shows all you need to know but I prefer the output of npm-check-updates

> npm i npm-check-updates -g (-g -> global mode - package will be available on your whole machine)

> npm-check-updates - shows where an update will be a major/minor/patch update (I like the colors)

Checking C:\git\blog\package.json
[====================] 39/39 100%

 @azure/storage-blob         ^12.5.0  →      ^12.17.0
 adm-zip                     ^0.4.16  →       ^0.5.12
 axios                       ^0.27.2  →        ^1.6.8
 basic-ftp                    ^5.0.1  →        ^5.0.5
 cheerio                 ^1.0.0-rc.6  →  ^1.0.0-rc.12
 eslint                      ^8.12.0  →        ^9.2.0
 eslint-config-prettier       ^8.5.0  →        ^9.1.0
 eslint-plugin-import        ^2.25.4  →       ^2.29.1
 fast-xml-parser              ^4.2.4  →        ^4.3.6
 humanize-duration           ^3.27.3  →       ^3.32.0
 iconv                        ^3.0.0  →        ^3.0.1
 jsonwebtoken                 ^9.0.0  →        ^9.0.2
 luxon                        ^3.4.3  →        ^3.4.4

Let us update something

> npm update - perform updates respecting your semver constraints and update package-lock.json

> npm update --save - same as above but also update packages.json, use this one always

The behavior for packages with major version 0.*.* is different than for versions >=1.0.0 (see npm help update)

npm update will most likely bump all minor and patch versions for you.

You can run npm update --save often.

What do the symbols in package.json mean?

https://stackoverflow.com/questions/22343224/whats-the-difference-between-tilde-and-caret-in-package-json/25861938#25861938

npm update --save vs npm audit fix

npm audit fix will only update packages to fix vulnerability issues

npm update --save will update all packages it can (respecting semver constraints)

Do I have unused dependencies?

> npm install depcheck -g

> depcheck - shows unused dependencies. depcheck scans for require/import statements in your code so you might be utilizing a package differently but depcheck will consider it unused (ex. when you import packages using importLazy).

npm-check

> npm i npm-check -g

> npm-check - a different tool to help with dependencies (I didn't use it)

honorable mentions

> npm ls - list installed packages (from node_modules)

> npm ls axios - show all versions of axios and why we have them

npm ls will not show you origin of not-installed optional dependencies.

Consider this - you devleop on a win maching and deploy your solution to a linux box. On windows (see below) you might think node-gyp-build is not used in your solution.

> npm ls node-gyp-build
test-npm@1.0.0 C:\git\test-npm
`-- (empty)

But on a linux box it will be used:

> npm ls node-gyp-build
npm-test-proj@1.0.0 /git/npm-test-proj
└─┬ kafka-lz4-lite@1.0.5
  └─┬ piscina@3.2.0
    └─┬ nice-napi@1.0.2
      └── node-gyp-build@4.8.1

axios, cookies & more

axios

axios - promise-based HTTP client for node.js

  • when used in node.js axios uses http module (https://nodejs.org/api/http.html)
  • in node axios does not support cookies by itself (https://github.com/axios/axios/issues/5742)
    • there are npm packages that add cookies support to axios
  • when used in browsers it uses XMLHttpRequest
  • when used in browsers cookies work by default

Why would you use axios over plain http module from node?

Axios makes http requests much easier. Try using plain http and you'll convince your self.


Are there other packages like axios?

Yes - for example node-fetch https://github.com/node-fetch/node-fetch


When making a request axios creates a default http and https agent - https://axios-http.com/docs/req_config (axios probably uses global agents). You can specify custom agents for a specific request or set custom agents as default agents to use with an axios instance.

const a = require('axios');
const http = require('node:http');

(async () => {
    // configure your agent as needed
    const myCustomAgent = new http.Agent({ keepAlive: true });

    // use your custom agent for a specific request
    const x = await a.get('https://example.com/', { httpAgent: myCustomAgent });
    console.log(x);

    // set you agent as default for all requests
    a.default.httpAgent = myCustomAgent;
})();

What are http/s agents responsible for?

http/s agents handle creating/closing sockets, TCP, etc. They talk to the OS, manage connection to hosts.


cookies

Without extra packages you need to code reading response headers, look for Set-Cookie headers. Store cookies somewhere. Code adding cookie headers to subsequent request.

https://www.npmjs.com/package/http-cookie-agent

Manages cookies for node.js HTTP clients (e.g. Node.js global fetch, undici, axios, node-fetch). http-cookie-agent implements a http/s agent that inspects request headers and does cookie related magic for you. It uses the class CookieJar from package tough-cookie to parse&store cookies.

import axios from 'axios';
import { CookieJar } from 'tough-cookie';
import { HttpCookieAgent, HttpsCookieAgent } from 'http-cookie-agent/http';

const jar = new CookieJar();

const a = axios.create({
  httpAgent: new HttpCookieAgent({ cookies: { jar } }),
  httpsAgent: new HttpsCookieAgent({ cookies: { jar } }),
});
// now we have an axios instance supporting cookies
await a.get('https://example.com');

axios-cookiejar-support

https://www.npmjs.com/package/axios-cookiejar-support

Depends on http-cookie-agent and tough-cookie. Does the same as http-cookie-agent but you don't have to create http/s agents yourself. This is a small package that just intercepts axios requests and makes sure custom http/s agents are used source.

Saves you a bit of typing but you can't use your own custom agents. If you need to configure your http/s agents (ex. with a certificate) - use http-cookie-agent (see github issue and github issue)

import axios from 'axios';
import { wrapper } from 'axios-cookiejar-support';
import { CookieJar } from 'tough-cookie';

const jar = new CookieJar();
const client = wrapper(axios.create({ jar }));

await client.get('https://example.com');

https://www.npmjs.com/package/tough-cookie

npm package - cookie parsing/storage/retrieval (tough-cookie itself does nothing with http request).

A bit about cookies

https://datatracker.ietf.org/doc/html/rfc6265 - RFC describing cookies.

https://datatracker.ietf.org/doc/html/rfc6265#page-28 - concise paragraph on Third-party cookies.

Servers responds with a Set-Cookie header. Client can set the requested cookie. Cookies have a specific format described in this document.

Random stuff

https://npmtrends.com/cookie-vs-cookiejar-vs-cookies-vs-tough-cookie

interesting - cookie for servers are more popular than tough-cookie for clients since ~2023.

Is this due to more serve side apps being written in node?

Packages we don't use

  • cookie - npm package - cookies for servers
  • cookies - npm package - cookies for servers (different then cookie)
  • cookiejar - npm package - a different cookie jar for clients

fetch & fetch & node-fetch

fetch - standard created by WHATWG meant to replace XMLHttpRequest - https://fetch.spec.whatwg.org/

fetch - an old npm package to fetch web content - don't use it

node-fetch - community implemented fetch standard as a npm package - go ahead and use it

fetch - node's native implementation of the fetch standard - https://nodejs.org/dist/latest-v21.x/docs/api/globals.html#fetch

Since fetch standard is the standard for both browsers and node chrome has a neat feature to export requests to fetch

chat-gpt crap

When researching I came across some chat-gpt generated content. You read it thinking it will be something but it's trash.

https://www.dhiwise.com/post/managing-secure-cookies-via-axios-interceptors -> this article from 2024 that tell you to implement cookies your self, doesn't even mention the word "package", "module"

https://medium.com/@stheodorejohn/managing-cookies-with-axios-simplifying-cookie-based-authentication-911e53c23c8a -> doesn't mention that cookies don't work in axios run in node without extra packages (at least this one mentions that chat-gpt helped, thought I bet it's fully written by chat-gpt)


inco note - our http client is misleading, it uses same agent for http and https, it should maybe be called customAgent

axios and fiddler

Using a request interceptor (proxy) like fiddler helps during development and debugging.

To make fiddler intercept axios request we have to tell axios that there is a proxy where all requests from should go. The proxy forwards those requests to the actual destination.

http_proxy=... // set proxy for http requests
https_proxy=... // set proxy for https requests
no_proxy=domain1.com,domain2.com // comma separated list of domains that should not be proxied

The proxy for both http and https can be the same url.

Read more - https://axios-http.com/docs/req_config

When using fiddler on windows I suggest going to Network & internet > Proxy and disableing proxies there (fiddler by default sets this). This way fiddler will only receive requests from the process where we set http(s)_proxy env vars.

fiddler and client certificates

I was not able to make fiddler work with client certificates. It should be done like this - https://docs.telerik.com/fiddler/configure-fiddler/tasks/respondwithclientcert but I couldn't get it to work

honorable mentions

I would like to try out - https://www.npmjs.com/package/proxy-agent at some point

I don't fully understand withCredentials

axios & cookies demo

> npm i
> node server.mjs
open browser and go to http://127.0.0.1:3000
cookies are supported
> node test.js (from another console)
cookies are not supported

axios, certificates, etc

To use axios with a client certificate you need to configure the https agent with the key and cert. the key and cert need to be in pem format. They both can be in the same pem file, or in separate pem files. (did not try it) but you should be able to merge and split your pem.

https://nodejs.org/api/tls.html#tlscreatesecurecontextoptions

to try out - https://www.npmjs.com/package/proxy-agent

Short post on premature optimization and error handling

Premature optimization is the root of all evil ~Donald Knuth

Don't pretend to handle more than you handle ~me just now

My team took over some scrapers. After a few months an issue is reported stating that accounting is missing data from one of the scrapers. No errors were logged and or seen by our team.

My colleague investigates the issue. Findings:

  • most recent data is indeed missing in our data base (it is already available in the API)
  • the data is often delayed (compared to when it's available in the remote API)
  • the data is not time critical but a delay of hours or days is vexing (remember folks - talk to your users or customers)
  • the scraper is using parallelism to send all requests at once (probably to get the data faster)
  • the API doesn't like our intense scraping and bans us from accessing the API, sometimes for hours
  • we never saw any Errors as the error handling looks like this:
try {
    data = hit the REST API using multiple parallel requests
    persist(data)
} catch {
    log.info("No data found")
}

Take away

  • talk to your users - in this case to learn that this data is not time critical
  • don't optimize prematurely
  • don't catch all exception pretending you handle them

More venting

I have seen my share of premature optimizations. AFAIR I always managed to have a conversation about the (un)necessity of an optimization and agree to prefer readability/simplicity over premature optimization.

If you see premature optimization my advise is "talk to the perp". People do what they consider necessary and right. They might optimize code hoping to save the company money or time.

If you have the experience to know that saving 2kB of RAM in an invoicing app run once a month is not worth using that obscure data structure - talk to those who don't yet know it. Their intentions are good.

I'm pretty sure I'm also guilty of premature optimization, just can't recall any instance as my brain is probably protecting my ego by erasing any memories of such mistakes from my past.

An example

One example of premature optimization stuck with me. I recall reviewing code as below

foreach(var gasPoint in gasPoints)
{
    if (gasPoint.Properties.Any())
    {
        foreach (var x in gasPoint.Properties)
        {
            // do sth with x
        }
    }
}

The review went something like this:

me: drop the if, foreach handles empty collections just fine

author: but this is better

me: why?

author: if the collection is empty we don't even use the iterator

me: how often is this code run?

author: currently only for a single entry in gasPoints, but there can be more

me: how many more and when?

author: users might create an entry for every gas pipeline connection in Europe

me: ok, how many is that?

We agreed to drop the if after realizing that:

We have ~30 countries in Europe, even if they all connect with each other there will be at most ~400 gas connections to handle here. We don't know that the if is faster then the iterator. 400 is extremely optimistic. We have 1 entry now, and realistically we will have 10 gasPoints in 5 years.

The conversation wasn't as smooth as I pretend here but we managed.

https://wiki.c2.com/?PrematureOptimization

https://wiki.c2.com/?ProfileBeforeOptimizing

https://youtube.com/CodeAesthetics/PrematureOptimization

Exercises in bash/shell/scripting

Being fluent in shell/scripting allows you to improve your work by 20%. It doesn't take you to another level. You don't suddenly poses the knowledge to implement flawless distributed transactions but some things get done much faster with no frustration.

Here is my collection of shell/scripting exercises for others to practice shell skills.

A side note - I'm still not sure if I should learn more PowerShell, try out a different shell or do everything in F# fsx. PowerShell is just so ugly ;(

Scroll down for answers

Exercise 1

What were the arguments of DetectOrientationScript function in https://github.com/tesseract-ocr/tesseract when it was first introduced?

Exercise 2

Get Hadoop distributed file system log from https://github.com/logpai/loghub?tab=readme-ov-file

Find the ratio of (failed block serving)/(failed block serving + successful block serving) for each IP

The result should like:

...
10.251.43.210  0.452453987730061
10.251.65.203  0.464609355865785
10.251.65.237  0.455237129089526
10.251.66.102  0.452124935995904
...

Exercise 3

This happened to me once - I had to find all http/s links to a specific domains in the export of our company's messages as someone shared proprietary code on websites available publicly.

Exercise - find all distinct http/s links in https://github.com/tesseract-ocr/tesseract

Exercise 4

Task - remove the string "42" from each line of multiple CSV files.

You can use this to generate the input CSV files:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
    [PSCustomObject]@{
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"
    }
}

$fileNames | % { $csvData | Export-Csv -Path $_ }

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Exercise 1 - answer

Answer:

bool DetectOrientationScript(int& orient_deg, float& orient_conf, std::string& script, float& script_conf);

[PowerShell]
> git log -S DetectOrientationScript # get sha of oldest commit
> git show bc95798e011a39acf9778b95c8d8c5847774cc47 | sls DetectOrientationScript

[bash]
> git log -S DetectOrientationScript # get sha of oldest commit
> git show bc95798e011a39acf9778b95c8d8c5847774cc47 | grep DetectOrientationScript

One-liner:

[PowerShell]
> git log -S " DetectOrientationScript" -p | sls DetectOrientationScript | select -Last 1

[bash]
> git log -S " DetectOrientationScript" -p | grep DetectOrientationScript | tail -1

Bonus - execution times

[PowerShell 7.4]
> measure-command { git log -S " DetectOrientationScript" -p | sls DetectOrientationScript | select -Last 1 }
...
TotalSeconds      : 3.47
...

[bash]
> time git log -S " DetectOrientationScript" -p | grep DetectOrientationScript | tail -1
...
real    0m3.471s
...

Without git log -S doing heavy lifting times look different:

[PowerShell 7.4]
> @(1..10) | % { Measure-Command { git log -p | sls "^\+.*\sDetectOrientationScript" } } | % { $_.TotalSeconds } | Measure-Object -Average

Count    : 10
Average  : 9.27122774
[PowerShell 5.1]
> @(1..10) | % { Measure-Command { git log -p | sls "^\+.*\sDetectOrientationScript" } } | % { $_.TotalSeconds } | Measure-Object -Average

Count    : 10
Average  : 27.33900077
[bash]
> seq 10 | xargs -I '{}' bash -c "TIMEFORMAT='%3E' ; time git log -p | grep -E '^\+.*\sDetectOrientationScript' > /dev/null" 2> times
> awk '{s+=$1} END {print s}' times
6.7249 # For convince I moved to dot one place to the left

Reflections

Bash is faster then PowerShell. PowerShell 7 is much faster then PowerShell 5. It was surprisingly easy to get the average with Measure-Object in PowerShell and surprisingly difficult in bash.

Exercise 2 - answer

[PowerShell 7.4]
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort > sorted
> cat .\sorted | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, ($g.Length/$_.count)) } | write-host

This is how I got to the answer:

> sls "Served block" -Path .\HDFS.log | select -first 10
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 10
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 100
> sls "Served block|Got exception while serving" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select matches -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select Matches -first 1000
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw | select Matches
> $a = sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log -raw
> $a[0]
> get-type $a[0]
> Get-TypeData $a
> $a[0]
> $a[0].Matches[0].Value
> $a = sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log
> $a[0]
> $a[0].Matches[0].Value
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" }
> "asdf" -replace "a","b"
> "asdf" -replace "a","b" -replace "d","x"
> "asdf" -replace "a.","b" -replace "d","x"
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk" }
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" }
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" }
> "aaxxaa" -replace "a.","b"
> "aaxxaa" -replace "a.","b$0"
> "aaxxaa" -replace "a.","b$1"
> "aaxxaa" -replace "a.","b${1}"
> "aaxxaa" -replace "a.","b${0}"
> "aaxxaa" -replace "a.","b`${0}"
> "okaaxxokaa" -replace "(ok|no)aa","_`{$1}_"
> "okaaxxokaa" -replace "(ok|no)aa","_`${1}_"
> "okaaxxokaa" -replace "(ok|no)aa","_`${1}_`${0}"
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"}
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort
> sls "Served block.*|Got exception while serving.*" -Path .\HDFS.log | % { $_.Matches[0].Value -replace "Served block.*/","ok/" -replace "Got exception while serving.*/","nk/" -replace ":","" } | % { $_ -replace "(ok|nk)/(.*)", "`${2} `${1}"} | sort > sorted
> cat .\sorted -First 10
> cat | group
> cat | group -Property {$_}
> cat .\sorted | group -Property {$_}
> cat .\sorted -Head 10 | group -Property {$_}
> cat .\sorted -Head 100 | group -Property {$_}
> cat .\sorted -Head 1000 | group -Property {$_}
> cat .\sorted -Head 10000 | group -Property {$_}
> cat .\sorted -Head 10000 | group -Property {$_} | select name,count
> cat .\sorted | group -Property {$_} | select name,count
> cat .\sorted | group -Property {$_ -replace "nk|ok",""}
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""}
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length / $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length / $_.count }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $__
> $__[0]
> $__[1]
> $__[2]
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, $g.Length, $_.count }
> $a[0]
> $a[1]
> $a[2]
> $a[1].GetType()
> $a[2].GetType()
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, ($g.Length) / ($_.count) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $_.name, (($g.Length) / ($_.count)) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,$_.name, (($g.Length) / ($_.count)) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; @($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; [Array] ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; [Array]@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,$_.name, (($g.Length) / ($_.count)) }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,$_.name, (($g.Length) / ($_.count)) }
> $a[0]
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; return ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; ,@($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))) }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); $x }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); ,$x }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x }
> $a[0]
> $a[0][0]
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { wirte-output "$_[0]" }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> $a = cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_[0]" }
> cat .\sorted -Head 10000 | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_" }
> cat .\sorted | group -Property {$_ -replace "nk|ok",""} | % { $g = $_.group | ? {$_.contains("nk") }; $x = @($_.name, (($g.Length) / ($_.count))); return ,$x } | % { write-output "$_" }

[F#]
open System.IO
open System.Text.RegularExpressions

let lines = File.ReadAllLines("HDFS.log")

let a =
    lines
    |> Array.filter (fun x -> x.Contains("Served block") || x.Contains("Got exception while serving"))

a
// |> Array.take 10000
|> Array.map (fun x ->
    let m = Regex.Match(x, "(Served block|Got exception while serving).*/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")
    m.Groups[2].Value,
    match m.Groups[1].Value with
    | "Served block"                -> true
    | "Got exception while serving" -> false )
|> Array.groupBy fst
|> Array.map (fun (key, group) ->
    let total = group.Length
    let failed = group |> Array.map snd |> Array.filter not |> Array.length
    key, (decimal failed)/(decimal total)
    )
|> Array.sortBy fst
|> Array.map (fun (i,m) -> sprintf "%s  %.15f" i m)
|> fun x -> File.AppendAllLines("fsout", x)

Exercise 3 - answer

[PowerShell 7.4]
> ls -r -file | % { sls -path $_.FullName -pattern https?:.* -CaseSensitive } | % { $_.Matches[0].Value } | sort | select -Unique

# finds 234 links
[bash]
> find . -type f -not -path './.git/*' | xargs grep -E https?:.* -ho | sort | uniq

# finds 234 links

Exercise 4 - answer

[PowerShell 7.4]
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ }

[bash]
> sed -i 's/43//' *.csv
> sed -ibackup 's/43//' *.csv # creates backup files
This neat, perhaps unix people had wisdom that is lost now.