Skip to content

2024

Json

Should I use System.Text.Json (STJ) or Newtonsoft.Json (previously Json.NET)?

use STJ, Newtonsoft is no longer enhanced with new features. The author works for Microsoft now on some non-json stuff.

JamesNK reddit comment

Terms

marshal - assemble and arrange (a group of people, especially troops) in order.

"the general marshalled his troops"

marshalling (UK) (in computer science) (marshal US) - getting parameters from here to there

serialization - transforming something (data) to a format usable for storage or transmission over the network

https://stackoverflow.com/questions/770474/what-is-the-difference-between-serialization-and-marshaling

JSON - Java Script Object Notation - data interchange format. https://www.json.org/json-en.html

Why this post

While analysing some logs I used FSharp.Data's JsonProvider. Only a few properties were relevant but JsonProvider stores the whole json in memory. With 10GB of logs to analyse I quick run out of memory.

Let's do some testing!

open System
open System.IO
open System.Text.Json

fsi.AddPrinter<DateTimeOffset>(fun dt -> dt.ToString("O"))

Environment.CurrentDirectory <- __SOURCE_DIRECTORY__ // ensures the script runs from the directory it's located in
// -------------------------------------------------------------------------

// sample log entry for testing
type LogEntry = {
    Timestamp       : DateTimeOffset
    Level           : string
    Message         : string
}

// only the properties we're interested in
type LogEntryRecord = {
    Timestamp: DateTimeOffset
    Level    : string
}

let random = Random()
let levels = [ "INFO"; "WARN"; "ERROR"; "DEBUG" ]

let generateLogEntry () =
    {
        Timestamp = DateTimeOffset.Now.AddSeconds(-random.Next(0, 10000))
        Level     = levels.[random.Next(levels.Length)]
        Message   = String.replicate(random.Next(10, 100)) "x" // random string to simulate redundant content
    }

List.init 7_000_000 (fun _ -> generateLogEntry()) // 7M entries is around 1GB of data
|> List.map (fun entry -> JsonSerializer.Serialize(entry))
|> fun lines -> File.WriteAllLines("./logs.json", lines)

let lines = File.ReadAllLines "./logs.json"

let runWithMemoryCheck lines singleLineParser =
    GC.Collect()
    let before = GC.GetTotalMemory(true)
    let x = lines |> Array.map singleLineParser
    GC.Collect()
    let after = GC.GetTotalMemory(true)
    let m = ((after - before) |> float) / 1024. / 1024. / 1024. // GB
    x, m

#time
// -------------------------------------------------------------------------

#r "nuget: FSharp.Data"
open FSharp.Data
open System.Text.Json.Nodes

type LogEntryJsonProvider = JsonProvider<"""
{
    "Timestamp"        : "2024-12-23T20:51:18.2020753+01:00",
    "Level"            : "ERROR",
    "Message"          : "File not found"
}""">

let fSharpDataJsonProvider = LogEntryJsonProvider.Parse
let fSharpDataJsonNode (x:string) =
    let line = x |> FSharp.Data.JsonValue.Parse
    let t = line.GetProperty("Timestamp").AsDateTimeOffset()
    let l = line.GetProperty("Level").AsString()
    { Timestamp = t; Level = l }
let jsonSerializer (x:string) = JsonSerializer.Deserialize<LogEntryRecord>(x)
let jsonNode (line:string) =
    let line = line |> JsonNode.Parse
    let t = line.["Timestamp"].GetValue<DateTimeOffset>()
    let l = line.["Level"].GetValue<string>()
    { Timestamp = t; Level = l }
let jsonDocument (x:string) =
    use doc = x |> JsonDocument.Parse
    let t = doc.RootElement.GetProperty("Timestamp").GetDateTimeOffset()
    let l = doc.RootElement.GetProperty("Level").GetString()
    { Timestamp = t; Level = l }

runWithMemoryCheck lines fSharpDataJsonProvider |> snd |> printfn "Memory used: %f GB"
// Memory used: 4.420363 GB
// Real: 00:00:35.829, CPU: 00:02:07.312, GC gen0: 84, gen1: 25, gen2: 8

runWithMemoryCheck lines fSharpDataJsonNode     |> snd |> printfn "Memory used: %f GB"
//Memory used: 0.521624 GB
//Real: 00:00:16.557, CPU: 00:00:35.281, GC gen0: 29, gen1: 10, gen2: 4

runWithMemoryCheck lines jsonSerializer         |> snd |> printfn "Memory used: %f GB"
// Memory used: 0.521555 GB
// Real: 00:00:10.823, CPU: 00:00:44.453, GC gen0: 11, gen1: 6, gen2: 4

runWithMemoryCheck lines jsonNode               |> snd |> printfn "Memory used: %f GB"
// Memory used: 0.521419 GB
// Real: 00:00:09.533, CPU: 00:00:27.359, GC gen0: 16, gen1: 7, gen2: 4

runWithMemoryCheck lines jsonDocument           |> snd |> printfn "Memory used: %f GB"
// Memory used: 0.521525 GB
// Real: 00:00:06.208, CPU: 00:00:17.546, GC gen0: 5, gen1: 4, gen2: 4

Conclusion

  • FSharp.Data.JsonProvider is terrible compared to any other alternative (slow and uses lots more memory)
  • STJ.JsonDocument is the speed winner.

System.Text.Json cheat sheet

open System
open System.Text.Json

// The System.Text.Json namespace contains all the entry points and the main types.
// The System.Text.Json.Serialization namespace contains attributes and APIs for advanced scenarios and customization specific to serialization and deserialization.

fsi.AddPrinter<DateTimeOffset>(fun dt -> dt.ToString("O"))

// System.Text.Json.JsonSerializer -> is a static class
//                                 -> you can instantiate and reuse the JsonSerialization options

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [
        42,
        11
    ]
}"""

type InnerType = {
    NestedProperty: int
}

type DummyType = {
    PropertyName1: string
    PropertyName2: int
    PropertyName3: DateTimeOffset
    PropertyName4: InnerType
    PropertyName5: int list
}

type LogEntryRecord = {
    Timestamp: DateTimeOffset
    Level    : string
}


// # JsonSerializer.Deserialize

// JsonSerializer.Deserialize<'Type>(jsonString)
// JsonSerializer.Deserialize<'Type>(jsonString, options)
// JsonSerializer.DeserializeAsync(stream, ...) <- only streams can be parsed async cuz parsing string is purely CPU bound

// Deserialization behaviour:
//  - By default, property name matching is case-sensitive. You can specify case-insensitivity.
//  - Non-public constructors are ignored by the serializer.
//  - Deserialization to immutable objects or properties that don't have public set accessors is supported but not enabled by default.
//    ^ I'm not sure about this cuz F# records seem to work just fine

JsonSerializer.Deserialize<LogEntryRecord>(jsonString)
// { Timestamp = 0001-01-01T00:00:00.0000000+00:00 Level = null }
// no properties match but JsonSerializer just returns default values

JsonSerializer.Deserialize<DummyType>(jsonString)
// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 2024-12-29T10:31:36.3774099+01:00
//                       PropertyName4 = { NestedProperty = 42 }
//                       PropertyName5 = [42; 11] }

// Deserialization is case sensitive by default!
let jsonString2 = """{
    "propertyName1" : "dummyValue",
    "propertyName2" : 42
}"""
JsonSerializer.Deserialize<DummyType>(jsonString2)
// val it: DummyType = { PropertyName1 = null
//                       PropertyName2 = 0
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }
let options = new JsonSerializerOptions()
options.PropertyNameCaseInsensitive <- true
JsonSerializer.Deserialize<DummyType>(jsonString2, options)
// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }


// # JsonSerializer.Serialize

// let's pretty print during testing
// by default the json is minified
let options = new JsonSerializerOptions()
options.WriteIndented <- true

JsonSerializer.Serialize(options, options)
//val it: string =
//  "{
//  "Converters": [],
//  "TypeInfoResolver": {},
//  "TypeInfoResolverChain": [
//    {}
//  ],
//  "AllowOutOfOrderMetadataProperties": false,
//  "AllowTrailingCommas": false,
//  "DefaultBufferSize": 16384,
//  "Encoder": null,
//  "DictionaryKeyPolicy": null,
//  "IgnoreNullValues": false,
//  "DefaultIgnoreCondition": 0,
//  ...

// Serialization behaviour:
//  - by default, all public properties are serialized. You can specify properties to ignore. You can also include private members.
//  - by default, JSON is minified. You can pretty-print the JSON.
//  - by default, casing of JSON names matches the .NET names. You can customize JSON name casing.
//  - by default, fields are ignored. You can include fields.


// # JsonNode and JsonDocument

// Should you use JsonNode or JsonDocument? see link below
// https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom#json-dom-choices

// JsonDocument -> immutable
// JsonDocument -> faster, IDisposable, uses some shared memory pool
// https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom
// JsonNode     -> mutable

open System.Text.Json.Nodes
let x = JsonNode.Parse(jsonString) // JsonObject
x.ToJsonString()
x.["PropertyName3"].GetValue<DateTimeOffset>()
x.["PropertyName3"].GetPath()
x.["PropertyName4"].["NestedProperty"].GetPath()
x.["PropertyName2"] |> int
// x.["PropertyName3"] |> DateTimeOffset // TODO - why can't I use this explicit conversion?

x["PropertyName4"].GetValueKind() |> string // "Object"
x["NonExistingProperty"] // null
x["NonExistingProperty"].GetValue<int>() // err - System.NullReferenceException
x["PropertyName5"].AsArray() |> Seq.map (fun a -> a.GetValue<int>()) // ok
x["PropertyName5"].AsArray() |> Seq.map int // ok
x["PropertyName5"].[0].GetValue<int>() // ok

// create a json object
let m = new JsonObject()
m["TimeStamp"] <- DateTimeOffset.Now
m.ToJsonString() // {"TimeStamp":"2024-12-29T16:06:17.046746+01:00"}
m["SampleProperty"] <- new JsonArray(1,2)
m.Remove("TimeStamp")

let a = JsonNode.Parse("""{"x":{"y":[1,2,3]}}""")
a.["x"] // this is a JasonNode
a.["x"].AsObject() // this returns a JsonObject
a.["x"].AsObject() |> Seq.map (fun x -> printfn "%A" x) // iterate over properties of the object
a.["x"].ToJsonString() // you can serialize subsection of the json
// {"y":[1,2,3]}

JsonNode.DeepEquals(x, a) // comparison

F# types

open System.Text.Json

// Record - OK
type DummyRecord = {
    Text: string
    Num:  int
    }

let r = { Text = "asdf"; Num = 1 }

JsonSerializer.Serialize(r) |> JsonSerializer.Deserialize<DummyRecord>

let tuple = (42, "asdf")
JsonSerializer.Serialize(tuple) |> JsonSerializer.Deserialize<int * string>

type TupleAlias = int * string
let tuple2 = (43, "sfdg") : TupleAlias
JsonSerializer.Serialize(tuple2) |> JsonSerializer.Deserialize<TupleAlias>

// Discriminated Union :(
type SampleDiscriminatedUnion =
    | A of int
    | B of string
    | C of int * string
let x = A 1
JsonSerializer.Serialize(x) // eeeeeeeeeeeeee !

// Option - OK
JsonSerializer.Serialize(Some 42) |> JsonSerializer.Deserialize<int option>
JsonSerializer.Serialize(None) |> JsonSerializer.Deserialize<int option>
open System
type RecordTest2 = {
    Timestamp: DateTimeOffset
    Level: string
    TestOp: int option
    }

// Discriminated Union is supported in FSharp.Json
// https://github.com/fsprojects/FSharp.Json
#r "nuget: FSharp.Json"
open FSharp.Json
let data = C (42, "The string")
let json = Json.serialize data
// val json: string = "{
//   "C": [
//     42,
//     "The string"
//   ]
// }

let deserialized = Json.deserialize<SampleDiscriminatedUnion> json
// val deserialized: SampleDiscriminatedUnion = C (42, "The string")

More on FSharp.Data JsonValue

#r "nuget:FSharp.Data"
open FSharp.Data

let j = JsonValue.Parse("""{"x":{"y":[1,2,3]}}""")
j.Properties()
// val it: (string * JsonValue) array =
//   [|("x", {
//   "y": [
//     1,
//     2,
//     3
//   ]
// })|]
j.["x"].["y"].AsArray()
j.TryGetProperty "x"

// JsonValue is a discriminated union
// union JsonValue =
//   | String  of string
//   | Number  of decimal
//   | Float   of float
//   | Record  of properties: (string * JsonValue) array
//   | Array   of elements: JsonValue array
//   | Boolean of bool
//   | Null
//
// docs:
// https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-jsonvalue.html
// https://fsprojects.github.io/FSharp.Data/library/JsonValue.html <- if you'll be working with JsonValue read this
//
// there are also extension methods:
// https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-jsonextensions.html
//
// AsArray doesn't fail if the value is not an array, as opposed to other AsSth methods
// See below how extension methods are defined
// source: https://github.com/fsprojects/FSharp.Data/blob/main/src/FSharp.Data.Json.Core/JsonExtensions.fs
open System.Globalization
open System.Runtime.CompilerServices
open System.Runtime.InteropServices
open FSharp.Data.Runtime
open FSharp.Core

[<Extension>]
type JsonExtensions =
    /// Get all the elements of a JSON value.
    /// Returns an empty array if the value is not a JSON array.
    [<Extension>]
    static member AsArray(x: JsonValue) =
        match x with
        | (JsonValue.Array elements) -> elements
        | _ -> [||]

    /// Get a number as an integer (assuming that the value fits in integer)
    [<Extension>]
    static member AsInteger(x, [<Optional>] ?cultureInfo) =
        let cultureInfo = defaultArg cultureInfo CultureInfo.InvariantCulture

        match JsonConversions.AsInteger cultureInfo x with
        | Some i -> i
        | _ ->
            failwithf "Not an int: %s"
            <| x.ToString(JsonSaveOptions.DisableFormatting)

// construct a json object
let d =
    JsonValue.Record [|
        "event",      JsonValue.String "asdf"
        "properties", JsonValue.Record [|
            "token",       JsonValue.String "tokenId"
            "distinct_id", JsonValue.String "123123"
        |]
    |]

d.ToString().Replace("\r\n", "").Replace(" ", "")

// if you want to process the json object
for (k, v) in d.Properties() do
    printfn "Property: %s" k
    match v with
    | JsonValue.Record props -> printfn "\t%A" props
    | JsonValue.String s     -> printfn "\t%A" s
    | JsonValue.Number n     -> printfn "\t%A" n
    | JsonValue.Float f      -> printfn "\t%A" f
    | JsonValue.Array a      -> printfn "\t%A" a
    | JsonValue.Boolean b    -> printfn "\t%A" b
    | JsonValue.Null         -> printfn "\tnull"

Serialize straight to UTF-8

JsonSerializer.SerializeToUtf8Bytes(value, options) <- why does this one exist?

Strings in .Net are stored in memory as UTF-16, so if you don't need a string, you can use this method and serialize straight to UTF-8 bytes (it's 5-10% faster, see link) https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/how-to#serialize-to-utf-8

https://stu.dev/a-look-at-jsondocument/

https://blog.ploeh.dk/2023/12/18/serializing-restaurant-tables-in-f/

https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-apis/?ref=stu.dev

a post from when they introduced the new json API

TODO for myself - watch these maybe

<3 regex

https://regex101.com/r/RdCR7j/1 - set the global flag (g) to get all matches

https://www.debuggex.com/ - havent't played with this a lot but I might give it a try, looks like a decent learning tool

regex - use static Regex.Matches() or instantiante Regex()?

By default use static method.

.NET regex engine caches regexes (by default 15).

Are you using more than 15 regexes and use them frequently and they're complex and you care about a performance?

Investigate Regex() and RegexOptions.Compiled RegexOptions.CompiledToAssembly

Test performance before you optimize

https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-regex#static-regular-expressions

https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regexoptions?view=net-9.0

What is the whole fus about backtracing?

Microsoft's documentation does a bad job explaning backtracking.

Read about backtracking here - https://www.regular-expressions.info/catastrophic.html

To experience backtracing yourself - https://regex101.com/r/1rWKNN/1 - keep on adding "x" to the input and see how the execution time increses - with 35*"x" it takes 5 seconds for the regex to find out it doesn't match!

Code

These are the methods you need:

open System
open System.Text.RegularExpressions


Regex.Matches("input", "pattern")
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline)
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline, TimeSpan.FromSeconds(10.)) // you can use a timeout to prevent a DoS attack with malicous inputs
Regex.Match()
Regex.IsMatch()
Regex.Replace()
Regex.Split()
Regex.Count()

let r = new Regex("pattern") // instance Regex offers the same methods
r.Matches("input")
Regex class - https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-9.0

Sample:

let matches = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)o")
matches |> Seq.iter (fun x -> printfn "%s" x.Value)
matches |> Seq.iter (fun x -> printfn "%A" x.Groups)
matches.[0].Groups.[1].Value |> printfn "%s"

// Lo             // these are the whole matches
// do             //
// lo             //
// co             //
// seq [Lo; L]    // group 0 is the whole match, group 1 is the (\w)
// seq [do; d]    //
// seq [lo; l]    //
// seq [co; c]    //
// L              // this is the letter captured by (\w)

let matches2 = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)+o")
matches2.[1].Groups.[1].Value |> printfn "%A"
matches2.[1].Groups.[1].Captures |> Seq.iter (fun c -> printfn "%s" c.Value)
// l              // gotcha! the value of the group is the last thing captured by that group
// d              // here the (\w)+ group captures 3 times
// o              //
// l              //
Match object properties:
Match.Success -> bool   | true      | false        |
Match.Value   -> string | the match | String.Empty |
let match3 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[a-z ]+i")
match3.Success |> printfn "%A"
match3.Value   |> printfn "%A"
// true
// "Lorem ipsum dolor si"

let match4 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[A-Z ]+i")
match4.Success            |> printfn "%A"
match4.Value              |> printfn "%A"
match4.Groups.Count       |> printfn "%A"
match4.Groups.[0].Success |> printfn "%A"
// false
// ""    // notice this is String.empty not <null>
// 1     // even for a failed match there is always at least one group
// false

let mutable m = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "\wo")
while m.Success do
    printfn "%s" m.Value
    m <- m.NextMatch()

let lines = [
    "The next day the children were ready to go to the plum thicket in the"
    "peach orchard as soon as they had their breakfast, but while they were"
    "talking about it a new trouble arose. It grew out of a question asked by"
    "Drusilla."
]

lines
|> List.filter (fun line -> Regex.IsMatch(line, "the"))
|> List.map    (fun line -> Regex.Replace(line, "(\w+) the", "the $1"))

let text =
    "don't we all love\n" +
    "dealing with different\r\n" +
    "line endings\n" +
    "it's so much fun"
Regex.Split(text, "\r?\n")
|> Array.iter (printfn "%s")

open System.Net.Http
let book = (new HttpClient()).GetStringAsync("https://www.gutenberg.org/cache/epub/74886/pg74886.txt").Result
Regex.Count(book, "[^\w]\w{3}[^\w]") |> printfn "%d" // count 3 letter words

regex - Quick Reference (Microsoft)

https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Cheat sheet

Character escapes

\t     matches a tab \u0009
\r     match a carriage return \u000D
\n     new line \u000A
\unnnn match a unicode character by hexadecimal representation, exactly 4 digits
\.     match a dot (not any character) aka. match literally
\*     match an asterisk (don't interpret * as a regex special quantifier)

Character classes

[character_group]       /[ae]/ will match "a" in "gray"
[^not_character_group]
[a-z] [A-Z] [a-z0-9A-Z] character ranges
.                       wildcard - any character except \n except when using SingleLine option
\w                      word character - upper/lower case letters and numbers
\W                      non word character
\s                      white-space character
\S                      non whitespace character
\d                      digit
\D                      non digit

Anchors

^   $ beginning and end of a string (in multiline mode beginning and end of a line)

Grouping

(subexpression)               (\w)\1 - match a character and the same character again - "aa" in "xaax"
(?<name>subexpression)        named group (?<double>\w)\k<double> - same as above
(?:subexpression)             noncapturing group - Write(?:Line)? - will match both Write and WriteLine in a string
                              (:?Mr\. |Ms\. |Mrs\. )?\w+\s\w+ -> match fist name, last name and optional preceding title
(?imnsx-imnsx: subexpression) turn options on or off for a group
(?=subexp)                    zero-width positive lookahead assertion
(?!subexp)                    negative lookahead
(?<=subexp)
(?<!subexp)                   look behind assertions
                              make sure a subexp is/is not following (but don't match it, ie. don't consume the characters)

Quantifiers

*     0...n (all these are greedy by default -> match as many as possible)
+     1...n
?     0...1
{n}   exactly n
{n,}  at least n
{n,m} n...m
*?
+?
??
{n,}?
{n,m}? question mark makes the match nongreedy (mach as few as possible)

Backreference

\number   match the value of a previous subexpression - (\w)\1 - matches the same \w character twice
\k<name>  backreference using group name

Alternation Constructs

| - any element separated by | - th(e|is|at) and the|this|that both match "the" "this" "that"
    ala|ma|kota - match "ala" or "ma" or "kota"
    ala ma (kota|psa) - match "ala ma kota" or "ala ma psa"
TODO - match yes if expresion else match no

Substitution

$number use numbered group
${name} use named group
$$      literal $
$&      whole match
$`      text before the match
$'      text after the match
$+      last group
$_      entier input string

Inline options

(?imnsx-imnsx)               use it like this at the beginning
(?imnsx-imnsx:subexpression) use for a group
i                            case insensetive
m                            multiline - match beginning and end of a line
n                            do not capture unnamed groups
s                            signle line - . matches \n also
More options are available using RegexOptions enum

Practice regex

https://regex101.com/quiz

https://regexcrossword.com/

https://alf.nu/RegexGolf

Tutorial:

I recall reading this tutorial years ago and I liked it - https://www.regular-expressions.info/tutorial.html

Misc

https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

I love regex.

However I used to say "if you solve a problem with regex now you have 2 problems"

Not knowing how this quote came to be I repeated it for years. I'll smack the next person to repeat this quote without elaborating.

If regex did not exist, it would be necessary to invent it.

Why does .Matches() return a custom collection instead of List<Match>?

Historic reasons. Regex was made in .Net 1.0 before generic were a thing.

https://github.com/dotnet/runtime/discussions/74919?utm_source=chatgpt.com

I used (?<!\[.*?)(?<!\(")https?://\S+ with replace [$&]($&) to linkify links in this post

My lovely regex helpers

let regexExtract  regex                      text = Regex.Match(text, regex).Value
let regexExtractg regex                      text = Regex.Match(text, regex).Groups.[1].Value
let regexExtracts regex                      text = Regex.Matches(text, regex) |> Seq.map (fun x -> x.Value)
let regexReplace  regex (replacement:string) text = Regex.Replace(text, regex, replacement)
let regexRemove   regex                      text = Regex.Replace(text, regex, String.Empty)

PowerShell "Oopsie"

Task - remove a specific string from each line of multiple CSV files.

This task was added to the scripting exercise list.

First - let's generate some CSV files to work with:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
    [PSCustomObject]@{
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"
    }
}

$fileNames | % { $csvData | Export-Csv -Path $_ }

The "Oopsie"

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file $_ -Append }

This command will never finish. Run it for a moment (and then kill it), see the result, and try to figure out what happens. Explanation below.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The explanation

Get-Content (aka. cat) keeps the file open and reads the content that our command is appending, thus creating an infinite loop.

The fix

There are many ways to fix this this "oopsie"

Perhaps the simplest one is to not write to and read from the exact same file. A sensible rule is when processing files always write to a different file:

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file -path "fixed$($_.Name)" }

Knowing the reason for our command hanging we can make sure the whole file is read before we overwrite it:

ls *.csv | % { (cat $_ ) | % { $_ -replace "42","" } | out-file $_ }
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ } # we can also use -replace as an array operator

I'm amazed by github's co-pilot answer for "powershell one liner to remove a specific text from multiple CSV files":

Get-ChildItem -Filter "*.csv" | ForEach-Object { (Get-Content $_.FullName) -replace "string_to_replace", "replacement_string" | Set-Content $_.FullName }

How the way I work/code/investigate/debug changed with time & experience

I use this metaphor when describing how I work these days.

TL;DR;

  1. Quick feedback is king
    • unit tests
    • quick test in another way
    • reproduce issues locally
    • try things out in a small project on the side, not in the project you're working on
  2. One thing at a time
    • experimenting
    • refactoring preparing for feature addition
    • feature coding
    • cleaning up after feature coding
  3. Divide problems into smaller problems
    • and remember - one thing (problem) at a time

Example

You're working with code that talks to a remote API, you want to test different API calls to the remote API.

don't - change API parameters in code and run the project each time you test something. It takes too long.

do - write a piece of code to send an HTTP request, fiddle with this code

do - intercept request with Fiddler/Postman/other interceptor and reissue requests with different parameters


Example

Something fails in the CI pipeline.

don't - make a change, commit, wait for remote CI to trigger, see result

do - reproduce issue locally


Longer read

  1. Quick feedback
  2. do - write a test for it
  3. do - isolate your issue/suspect/the piece of code you're working with
    • it is helpful if you can run just a module/sub-system/piece of your project/system
    • partial execution helps - like in Python/Jupyter or F# fsx
  4. if you rely on external data and it takes time to retrieve it (even a 5-second delay can be annoying) - dump data to a file and read it from the file instead of hitting an external API or a DB every time you run your code
  5. don't try to understand how List.foldBack() works while debugging a big project. Do it on the side.
  6. spin up a new solution/project on the side to test things
  7. occasional juniors ask "does this work this way" - you can test it yourself easily if you do it on the side

  8. One thing at a time

  9. separate refactoring from feature addition
  10. fiddle first, find the walls/obstacles
  11. git reset --hard
  12. refactor preparing for a new feature (can become a separate PR)
  13. code feature
  14. if during coding you find something that needs refactoring/renaming/cleaning up - any kind of "WTF is this? I need to fix this!" try a) or b)
    • a) make a note to fix it later
    • b) fix immediately
      > git stash
      > git checkout master
      > git checkout -b fix-typo
      fix stuff
      merge or create a PR
      git checkout feature
      > git merge fix-typo or git rebase fix-typo
      continue work
      
  15. always have a paper notepad on your desk

    • note things you would like to come back to or investigate
    • it gives me great satisfaction to go through a list of "side quests" I have noted and strike through all of them, knowing I have dealt with each one before starting a new task
    • when investigating something I also note questions I would like to be able to answer after I'm done investigating
      • example: while working with Axios and cookies I found conflicting information about whether Axios supports cookies. After the investigation, I knew that Axios supports cookies by default in a browser but not in Node.js
  16. Divide problems into smaller problems

  17. example - coding logic for a new feature in a CLI tool and designing the CLI arguments - these can be 2 sub-tasks

Big bang vs baby steps

The old me often ended up doing the big bang. Rewriting large chunks of code at once. Starting things from scratch. Working for hours or days with a codebase that can't even compile.

Downsides - for a long time the project doesn't even compile, I lose motivation, I feel like I'm walking in the dark, I don't see errors for a long time - requires a lot of context keeping in my mind since I've ripped the project apart - if I abandon work for a few days sometimes I forget everything and progress is lost

The new me prefers baby steps

Fiddle with the code knowing I'll git reset --hard. Try renaming some stuff - helps me understand the codebase better. Try out different things and abandon them. At this point, I usually get an idea/feeling of what needs to be done. Plan a few smaller refactorings. After them, I am usually closer to the solution and am able to code it without a big bang.

My recommendations

Terminal etc.

git

node

http

other

PowerShell quirk

tl;dr

In PowerShell if you want to return an array instead of one element of the array at the time do this:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong! will pipe/return 1 element at a time
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct! will pipe/return pairs
Beware! Result of both snippets will be displayed in the exact same way even though they have different types! See below:
> @(1,2,3,4)
1
2
3
4
> @((1,2),(3,4))
1
2
3
4

To check actual types:

> $x = @(1..2) | % { $a = "a" * $_; @($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
> $x = @(1..2) | % { $a = "a" * $_; ,@($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
# Alternatively
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType
# Get-Member only shows output for every distinct type

Longer read

Occasionally I have a fist fight with PS to return an Array instead of one element at a time. PS is a tough oponent. I think I get it now though.

The comma , in PS is a binary and unary operator. You can use it with a single or 2 arguments.

> ,1 # as an unary operator the comma creates an array with 1 member
1
> 1,2 # as an binary operator the comma creates an array with 2 members
1
2

Beware that both an array[] and array[][] will be displayed the same way. $y is an array[][], it is printed the same way to the output as $x

> $x = @(1,2,3,4) ; $x.GetType().Name ; $x[0].GetType().Name ; $x
Object[]
Int32
1
2
3
4
> $y = @((1,2),(3,4)) ; $y.GetType().Name ; $y[0].GetType().Name ; $y
Object[]
Object[]
1
2
3
4

If you're trying to return an array of pairs:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong
a
1
aa
2
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct!
a
1
aa
2
# Even though the result looks like a flat array this this time it's an array of arrays
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType # we get strings and ints

   TypeName: System.String

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

   TypeName: System.Int32

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType # we get arrays

   TypeName: System.Object[]

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

More on printing your arrays of pairs:

> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output # write-output will "unwind" your array
a
1
aa
2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-host
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | % { write-output "$_" }
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output -NoEnumerate # returns an array of arrays but it's printed as if it's a flat array
a
1
aa
2

This https://stackoverflow.com/a/29985418/2377787 explains how @() works in PS.

> $a='A','B','C'
> $b=@($a;)
> $a
A
B
C
> $b
A
B
C
> [Object]::ReferenceEquals($a, $b)
False
Above $a; is understood as $a is a collection, collections should be enumerated and each item is passed to the pipeline. @($a;) sees 3 elements but not the original array and creates an array from the 3 elements. In PS @($collection) creates a copy of $collection. @(,$collection) - creates an array with a single element $collection.

It will be great, set it up, don't use it, remove it

2024-05-02

Today, I removed something I allowed to be created despite initially feeling it was redundant.

A year ago, a student found a tool to auto-generate documentation for our internal SDK from code annotations. They proposed embedding this documentation in our Continuous Integration pipeline. Although the idea sounded good on paper, I felt it wouldn’t be used by our team. However, the team was enthusiastic.

Everyone in our team has the SDK's git repo on their machine, and we rely on IntelliSense and the code for documentation. It seemed unlikely that we would change our habits since the new documentation wouldn’t be easier to use than just using F12 to view the source code.

Despite my doubts, I allowed this feature as the team lead. I had already rejected a few initiatives from that student and didn’t want to kill their motivation. I wanted to let them work on something they found interesting. There was a slight chance I was wrong and the documentation might be used.

Today, a year later, I noticed the docs website no longer works. It had been down for some time, and no one noticed because no one used it. I removed any trace of the published documentation.

This made me reflect: was it wrong to allow something to be created that never paid off?

In this case, the investment was small. The gain was that the student got to work on something interesting. So, I think it was right to let our team try it out and remove it once we were certain it wasn’t used

Notes on keys, certs, certificates, HTTPS, SSL, SSH, TLS

key != cert (a key is different from a certificate)

Keys are used to encrypt connections, certs are used to verify that the key owner is who he says he is.

Certificate aka cert

A certificate proves that a public key belongs to a given entity. The cert includes:

  • public key
  • information about the key
  • CA's signature validating the cert

CA's signature basically says "I confirm that this public key belongs to this person/entity". The signature is made using CA's private key.

This is wikipedia certificate which I've exported from my chrome browser. This is the certificate in PEM format.

-----BEGIN CERTIFICATE-----
MIIISzCCB9GgAwIBAgIQB0GeOVg6THbPHqFDR/pfOjAKBggqhkjOPQQDAzBWMQsw
CQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMTAwLgYDVQQDEydEaWdp
Q2VydCBUTFMgSHlicmlkIEVDQyBTSEEzODQgMjAyMCBDQTEwHhcNMjMxMDE4MDAw
MDAwWhcNMjQxMDE2MjM1OTU5WjB5MQswCQYDVQQGEwJVUzETMBEGA1UECBMKQ2Fs
aWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZyYW5jaXNjbzEjMCEGA1UEChMaV2lraW1l
ZGlhIEZvdW5kYXRpb24sIEluYy4xGDAWBgNVBAMMDyoud2lraXBlZGlhLm9yZzBZ
MBMGByqGSM49AgEGCCqGSM49AwEHA0IABDVh9CEa/2rEO/oGR8YZbr5wOPHcFrG8
OBQS1BQrHAsxgVn1Z/bnKtE8Hvqup+0GXdZvXYlMa8iw4A+Dz/XTitqjggZcMIIG
WDAfBgNVHSMEGDAWgBQKvAgpF4ylOW16Ds4zxy6z7fvDejAdBgNVHQ4EFgQUyqwM
Z6LjhkM/u0PnQdmhhzp43TMwggLtBgNVHREEggLkMIIC4IIPKi53aWtpcGVkaWEu
b3Jngg13aWtpbWVkaWEub3Jngg1tZWRpYXdpa2kub3Jngg13aWtpYm9va3Mub3Jn
ggx3aWtpZGF0YS5vcmeCDHdpa2luZXdzLm9yZ4INd2lraXF1b3RlLm9yZ4IOd2lr
aXNvdXJjZS5vcmeCD3dpa2l2ZXJzaXR5Lm9yZ4IOd2lraXZveWFnZS5vcmeCDndp
a3Rpb25hcnkub3Jnghd3aWtpbWVkaWFmb3VuZGF0aW9uLm9yZ4IGdy53aWtpghJ3
bWZ1c2VyY29udGVudC5vcmeCESoubS53aWtpcGVkaWEub3Jngg8qLndpa2ltZWRp
YS5vcmeCESoubS53aWtpbWVkaWEub3JnghYqLnBsYW5ldC53aWtpbWVkaWEub3Jn
gg8qLm1lZGlhd2lraS5vcmeCESoubS5tZWRpYXdpa2kub3Jngg8qLndpa2lib29r
cy5vcmeCESoubS53aWtpYm9va3Mub3Jngg4qLndpa2lkYXRhLm9yZ4IQKi5tLndp
a2lkYXRhLm9yZ4IOKi53aWtpbmV3cy5vcmeCECoubS53aWtpbmV3cy5vcmeCDyou
d2lraXF1b3RlLm9yZ4IRKi5tLndpa2lxdW90ZS5vcmeCECoud2lraXNvdXJjZS5v
cmeCEioubS53aWtpc291cmNlLm9yZ4IRKi53aWtpdmVyc2l0eS5vcmeCEyoubS53
aWtpdmVyc2l0eS5vcmeCECoud2lraXZveWFnZS5vcmeCEioubS53aWtpdm95YWdl
Lm9yZ4IQKi53aWt0aW9uYXJ5Lm9yZ4ISKi5tLndpa3Rpb25hcnkub3JnghkqLndp
a2ltZWRpYWZvdW5kYXRpb24ub3JnghQqLndtZnVzZXJjb250ZW50Lm9yZ4INd2lr
aXBlZGlhLm9yZ4IRd2lraWZ1bmN0aW9ucy5vcmeCEyoud2lraWZ1bmN0aW9ucy5v
cmcwPgYDVR0gBDcwNTAzBgZngQwBAgIwKTAnBggrBgEFBQcCARYbaHR0cDovL3d3
dy5kaWdpY2VydC5jb20vQ1BTMA4GA1UdDwEB/wQEAwIDiDAdBgNVHSUEFjAUBggr
BgEFBQcDAQYIKwYBBQUHAwIwgZsGA1UdHwSBkzCBkDBGoESgQoZAaHR0cDovL2Ny
bDMuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0VExTSHlicmlkRUNDU0hBMzg0MjAyMENB
MS0xLmNybDBGoESgQoZAaHR0cDovL2NybDQuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0
VExTSHlicmlkRUNDU0hBMzg0MjAyMENBMS0xLmNybDCBhQYIKwYBBQUHAQEEeTB3
MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wTwYIKwYBBQUH
MAKGQ2h0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRMU0h5YnJp
ZEVDQ1NIQTM4NDIwMjBDQTEtMS5jcnQwDAYDVR0TAQH/BAIwADCCAYAGCisGAQQB
1nkCBAIEggFwBIIBbAFqAHcA7s3QZNXbGs7FXLedtM0TojKHRny87N7DUUhZRnEf
tZsAAAGLQ1J6cgAABAMASDBGAiEA5reeeuLSzGPvJQ5hT3Bd8aOVxmIltXMTLhY6
19qDGWUCIQDO0LMbF3s42tyxgFIOt7rVOpsHe9Sy0wFQQj8BWO0LIQB2AEiw42va
pkc0D+VqAvqdMOscUgHLVt0sgdm7v6s52IRzAAABi0NSegEAAAQDAEcwRQIgYJdu
BrioIun6FTeQhDxqK2eyZehguOkxScS3nwsGSakCIQC1FyuCpm+QQBRJFSTAnStR
iP+hgGIhgzyZ837usahB0QB3ANq2v2s/tbYin5vCu1xr6HCRcWy7UYSFNL2kPTBI
1/urAAABi0NSeg8AAAQDAEgwRgIhAOm1GvY8M4V+tUyjV9/PCj8rcWHUOvfY0a/o
nsKg/bitAiEA1Vm1pP8CDp7hGcQzBBTscpCVebzWCe8DK231mtv97QUwCgYIKoZI
zj0EAwMDaAAwZQIwKuOOLjmwGgtjG6SASF4W2e8KtQZANRsYXMXJDGwBCi9fM7Qy
S9dvlFLwrcDg1gxlAjEA5XwJikbpk/qyQerzeUspuZKhqh1KPuj2uBdp8vicuBxu
TJUd1W+d3LmikOUgGzil
-----END CERTIFICATE-----

We can double click the cert file to view it (on Windows) or use many other different tools to view its content.

https://stackoverflow.com/questions/9758238/how-to-view-the-contents-of-a-pem-certificate

> $cert = New-Object Security.Cryptography.X509Certificates.X509Certificate2([string]"C:\Users\inwen\Downloads\_.wikipedia.org.crt")
> $cert | select *

EnhancedKeyUsageList : {Server Authentication (1.3.6.1.5.5.7.3.1), Client Authentication (1.3.6.1.5.5.7.3.2)}
DnsNameList          : {*.wikipedia.org, wikimedia.org, mediawiki.org, wikibooks.org...}
SendAsTrustedIssuer  : False
Archived             : False
Extensions           : {System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid...}
FriendlyName         :
IssuerName           : System.Security.Cryptography.X509Certificates.X500DistinguishedName
NotAfter             : 17/10/2024 01:59:59
NotBefore            : 18/10/2023 02:00:00
HasPrivateKey        : False
PrivateKey           :
PublicKey            : System.Security.Cryptography.X509Certificates.PublicKey
RawData              : {48, 130, 8, 75...}
SerialNumber         : 07419E39583A4C76CF1EA14347FA5F3A
SubjectName          : System.Security.Cryptography.X509Certificates.X500DistinguishedName
SignatureAlgorithm   : System.Security.Cryptography.Oid
Thumbprint           : 483F0C71F34AE0EA30D99BD60463DCDAA8F49DFB
Version              : 3
Handle               : 2140299849504
Issuer               : CN=DigiCert TLS Hybrid ECC SHA384 2020 CA1, O=DigiCert Inc, C=US
Subject              : CN=*.wikipedia.org, O="Wikimedia Foundation, Inc.", L=San Francisco, S=California, C=US

SSL & TLS & SSH & SFTP

SSH (Secure SHell protocol) - protocol that allows to execute shell commands over a secure connection.

SFTP is an extension of SSH. SFTP != FTP over SSH. To connect to a SFTP server you need a private ssh key. The public ssh key (your private key's counterpart) is stored at the server.

TLS & SSL - think of SSL as the older/first protocol for secure communication. SSL was outphased by TLS. TLS is THE protocol used by HTTPS for secure connections.

Clients can be anonymous in TLS - usually the case on web - the server provides a cert to your browser but you don't need a cert of your own. TLS can be mutual - if the client has a cert the servers will/can validate it.

PuTTy

PuTTy is free+open source software than can do SSH. PuTTy has its own format of key files -> .ppk

ppk - putty private key (ppk can be changed to pem with some software) A PPK file stores a private key, and the corresponding public key. Both are contained in the same file. https://tartarus.org/~simon/putty-snapshots/htmldoc/AppendixC.html

PEM

Privacy-Enhanced Mail (PEM) is THE file format for exhanging keys, certificates.

  • .cer & .crt - PEM file with a certificate
  • .key - PEM with with a private or public key

The file extensions doesn't really matter. Just open the file and see the headers to be sure what it is.

To view a pem certificate on Windows - rename it to .crt and double click.

You can open a .pem file as plain text as see its content:

// pem ignores stuff between the headers so you can put comments here

-----BEGIN RSA PRIVATE KEY-----
izfrNTmQLnfsLzi2Wb9xPz2Qj9fQYGgeug3N2MkDuVHwpPcgkhHkJgCQuuvT+qZI
MbS2U6wTS24SZk5RunJIUkitRKeWWMS28SLGfkDs1bBYlSPa5smAd3/q1OePi4ae
dU6YgWuDxzBAKEKVSUu6pA2HOdyQ9N4F1dI+F8w9J990zE93EgyNqZFBBa2L70h4
M7DrB0gJBWMdUMoxGnun5glLiCMo2JrHZ9RkMiallS1sHMhELx2UAlP8I1+0Mav8
iMlHGyUW8EJy0paVf09MPpceEcVwDBeX0+G4UQlO551GTFtOSRjcD8U+GkCzka9W
/SFQrSGe3Gh3SDaOw/4JEMAjWPDLiCglwh0rLIO4VwU6AxzTCuCw3d1ZxQsU6VFQ
PqHA8haOUATZIrp3886PBThVqALBk9p1Nqn51bXLh13Zy9DZIVx4Z5Ioz/EGuzgR
d68VW5wybLjYE2r6Q9nHpitSZ4ZderwjIZRes67HdxYFw8unm4Wo6kuGnb5jSSag
vwBxKzAf3Omn+J6IthTJKuDd13rKZGMcRpQQ6VstwihYt1TahQ/qfJUWPjPcU5ML
9LkgVwA8Ndi1wp1/sEPe+UlL16L6vO9jUHcueWN7+zSUOE/cDSJyMd9x/ZL8QASA
ETd5dujVIqlINL2vJKr1o4T+i0RsnpfFiqFmBKlFqww/SKzJeChdyEtpa/dJMrt2
8S86b6zEmkser+SDYgGketS2DZ4hB+vh2ujSXmS8Gkwrn+BfHMzkbtio8lWbGw0l
eM1tfdFZ6wMTLkxRhBkBK4JiMiUMvpERyPib6a2L6iXTfH+3RUDS6A==
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIICMzCCAZygAwIBAgIJALiPnVsvq8dsMA0GCSqGSIb3DQEBBQUAMFMxCzAJBgNV
BAYTAlVTMQwwCgYDVQQIEwNmb28xDDAKBgNVBAcTA2ZvbzEMMAoGA1UEChMDZm9v
MQwwCgYDVQQLEwNmb28xDDAKBgNVBAMTA2ZvbzAeFw0xMzAzMTkxNTQwMTlaFw0x
ODAzMTgxNTQwMTlaMFMxCzAJBgNVBAYTAlVTMQwwCgYDVQQIEwNmb28xDDAKBgNV
BAcTA2ZvbzEMMAoGA1UEChMDZm9vMQwwCgYDVQQLEwNmb28xDDAKBgNVBAMTA2Zv
bzCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAzdGfxi9CNbMf1UUcvDQh7MYB
OveIHyc0E0KIbhjK5FkCBU4CiZrbfHagaW7ZEcN0tt3EvpbOMxxc/ZQU2WN/s/wP
xph0pSfsfFsTKM4RhTWD2v4fgk+xZiKd1p0+L4hTtpwnEw0uXRVd0ki6muwV5y/P
+5FHUeldq+pgTcgzuK8CAwEAAaMPMA0wCwYDVR0PBAQDAgLkMA0GCSqGSIb3DQEB
BQUAA4GBAJiDAAtY0mQQeuxWdzLRzXmjvdSuL9GoyT3BF/jSnpxz5/58dba8pWen
v3pj4P3w5DoOso0rzkZy2jEsEitlVM2mLSbQpMM+MUVQCQoiG6W9xuCFuxSrwPIS
pAqEAuV4DNoxQKKWmhVv+J0ptMWD25Pnpxeq5sXzghfJnslJlQND
-----END CERTIFICATE-----

Contents between header and footer (-----BEGIN CERTIFICATE----- + -----END CERTIFICATE-----) is base64 encoded. The content can be DER binary data.

DER

Distinguished Encoding Rules - is a way of encoding data structures. A certificate is a data structure containing various entires like validity date, issuer, etc. For certificates to work you need to store this information and transfer it. DER encodes this information is a binary format. This is then after base64 encoded and then it goes into a PEM file.

X.509

X.509 is the standard defining public key certificates for TLS/SSL (HTTPS)

PFX/P12/PKCS12

PFX seems to be Microsoft's complicated file format for storing cryptographic data.

P12/PKCS12 is the successor to PFX. Sometimes the terms PFX/P12/PKCS12 are used interchangeably.


base64 offline decoder: https://www.glezen.org/Base64Decoder.html

Nice description of certs vs key: https://superuser.com/questions/620121/what-is-the-difference-between-a-certificate-and-a-key-with-respect-to-ssl

Generate yourself a certificate: https://getacert.com/index.html

Important info on rejectUnauthorized: false and certificates in axios/node: https://stackoverflow.com/questions/51363855/how-to-configure-axios-to-use-ssl-certificate

convention - propose - specify format in secret name - use plain - not base64 encoded

Converting

# PFX/pkcs12 to PEM
openssl pkcs12 -in cert.pfx -out cert.pem -nodes

# PFX/pkcs12 to PEM no password
openssl pkcs12 -in cert.p12 -out cert_without_pwd.pem -nodes -password pass:1234

# PEM to PFX/pkcs12 (both have passwords)
openssl pkcs12 -export -out cert.pfx -in cert.pem -inkey cert.pem -passin pass:1234 -passout pass:1234

# PEM to PFX/pkcs12 (when key and cert are in separate .pem files)
openssl pkcs12 -export -out bob_pfx.pfx -inkey bob_key.pem -in bob_cert.cert

# if openssl hangs try running it using winpty
winpty openssl pkcs12 -in cert.pfx -out cert.pem -nodes

Sources:

https://stackoverflow.com/questions/15413646/converting-pfx-to-pem-using-openssl

https://stackoverflow.com/questions/808669/convert-a-cert-pem-certificate-to-a-pfx-certificate

https://stackoverflow.com/questions/9450120/openssl-hangs-and-does-not-exit

Lazy websites

Website's certificates are usually signed by intermediate CA, which in turn are signed by a trusted root CA. The idea is that the server you connect to send you its certificate with all the intermediate certificates. Your app/machine should have the root CA certificate stored so it can validate the chain of certificates it received from the server (by just validating the root cert sent with its own root CA).

Some servers are misconfigured and do not send the intermediate certificates. You do not notice because browsers fill in the gaps for a better browsing experience. However when you try to scrape the same website with ex. node your connection will be rejected.

don't's (for node)

Several answers on SO suggest:

  • NODE_TLS_REJECT_UNAUTHORIZED=0 or
  • const httpsAgent = new https.Agent({ rejectUnauthorized: false });

Both are terrible ideas - they make your app accept unauthorized connections. They are the equivalent of this conversation:

"I can't verify this certificate, we can not be sure who we are connecting to" - says Node with care in its voice

"Doesn't matter, YOLO, carry on" - you reply shrugging your shoulders

Read more here

does (for node)

Use NODE_EXTRA_CA_CERTS. Alternatively use a library to programmatically give node the missing certificate link

Good read - https://stackoverflow.com/questions/31673587/error-unable-to-verify-the-first-certificate-in-nodejs

root CA stores

Node

It seems everyone has their own root CA store these days. Nodes has a hardcoded list of root CA see:

  • https://github.com/nodejs/node/blob/main/src/node_root_certs.h
  • https://github.com/nodejs/node/issues/4175

Windows

You can view Windows certificates with PowerShell:

Get-ChildItem -Recurse Cert:

Chrome

https://chromium.googlesource.com/chromium/src/+/main/net/data/ssl/chrome_root_store/root_store.md

If you would like to become chrome's trusted CA - https://www.chromium.org/Home/chromium-security/root-ca-policy/

https://blog.chromium.org/2022/09/announcing-launch-of-chrome-root-program.html

node packages updating

tl;dr

  1. > npm install depcheck -g - install depcheck globally
  2. > depcheck - check for redundant packages
  3. > npm un this-redundant-package - uninstall redundant packages (repeat for all redundant packages)
  4. Create a pull-request remove-redundant-packages

  1. > npm i - make order in node_modules
  2. > npm audit - see vulnerability issues
  3. > npm audit fix - fix vulnerability issues that don't require attention
  4. Create a pull-request fix-vulnerability-issues

  1. > npm i npm-check-updates -g - install npm-check-updates globally
  2. > npm-check-updates - see how outdated packages are
  3. > npm outdated - see how outdated packages are
  4. > npm update --save - update packages respecting your semver constraints from packages.json
  5. If you have packages that use major version 0.*.* you'll need to manually update these now
    • > npm install that-one-package@latest
  6. Create a pull-request update-packages-minor

If you're brave and can test/run you project easily:

  1. ncu -u - updates packages.json to all latest versions as shown by npm-check-updates
    • this might introduce breaking changes
  2. npm i - update package-lock.json
  3. Test your project.
  4. Create a pull-request update-packages-major

If you're not brave or can't just YOLO and update all major versions:

  1. npm-check-updates - check again what is left to update
  2. npm i that-package@latest - update major version of of that-package
  3. Test your project.
    • .js is dynamically typed so you might have just updated a package that breaks your project but you'll not know until you run your code
  4. Repeat for all packages.
  5. Create a pull-request update-packages-major

longer read

Need to update dependencies in a node js project? Here are my notes on this.

> npm i (npm install)

> npm i

added 60 packages, removed 124 packages, changed 191 packages, and audited 522 packages in 13s

96 packages are looking for funding
  run `npm fund` for details

10 vulnerabilities (2 low, 7 moderate, 1 high)

To address issues that do not require attention, run:
  npm audit fix

To address all issues possible (including breaking changes), run:
  npm audit fix --force

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.
- installs missing packages in node_modules - removes redundant packages in node_modules - installs correct versions of mismatched packages (if packages-lock.json wants a different version than found in node_modules) - shows what is going on with packaged in your project

> npm audit - shows a report on vulnerability issues in your dependencies

> npm audit fix - updates packages to address vulnerability issues (updates that do not require attention)

> npm outdated - shows a table with your packages and versions

$ npm outdated
Package      Current   Wanted   Latest  Location                  Depended by
glob          5.0.15   5.0.15    6.0.1  node_modules/glob         dependent-package-name
nothingness    0.0.3      git      git  node_modules/nothingness  dependent-package-name
npm            3.5.1    3.5.2    3.5.1  node_modules/npm          dependent-package-name
local-dev      0.0.3   linked   linked  local-dev                 dependent-package-name
once           1.3.2    1.3.3    1.3.3  node_modules/once         dependent-package-name

  • Current - what is in nodes_modules
  • Wanted - most recent version that respect the version constraint from packages.json
  • Latest - latest version from npm registry

To update to latest minor+patch versions of your dependencies (Wanted) - npm outdated shows all you need to know but I prefer the output of npm-check-updates

> npm i npm-check-updates -g (-g -> global mode - package will be available on your whole machine)

> npm-check-updates - shows where an update will be a major/minor/patch update (I like the colors)

Checking C:\git\blog\package.json
[====================] 39/39 100%

 @azure/storage-blob         ^12.5.0  →      ^12.17.0
 adm-zip                     ^0.4.16  →       ^0.5.12
 axios                       ^0.27.2  →        ^1.6.8
 basic-ftp                    ^5.0.1  →        ^5.0.5
 cheerio                 ^1.0.0-rc.6  →  ^1.0.0-rc.12
 eslint                      ^8.12.0  →        ^9.2.0
 eslint-config-prettier       ^8.5.0  →        ^9.1.0
 eslint-plugin-import        ^2.25.4  →       ^2.29.1
 fast-xml-parser              ^4.2.4  →        ^4.3.6
 humanize-duration           ^3.27.3  →       ^3.32.0
 iconv                        ^3.0.0  →        ^3.0.1
 jsonwebtoken                 ^9.0.0  →        ^9.0.2
 luxon                        ^3.4.3  →        ^3.4.4

Let us update something

> npm update - perform updates respecting your semver constraints and update package-lock.json

> npm update --save - same as above but also update packages.json, use this one always

The behavior for packages with major version 0.*.* is different than for versions >=1.0.0 (see npm help update)

npm update will most likely bump all minor and patch versions for you.

You can run npm update --save often.

What do the symbols in package.json mean?

https://stackoverflow.com/questions/22343224/whats-the-difference-between-tilde-and-caret-in-package-json/25861938#25861938

npm update --save vs npm audit fix

npm audit fix will only update packages to fix vulnerability issues

npm update --save will update all packages it can (respecting semver constraints)

Do I have unused dependencies?

> npm install depcheck -g

> depcheck - shows unused dependencies. depcheck scans for require/import statements in your code so you might be utilizing a package differently but depcheck will consider it unused (ex. when you import packages using importLazy).

npm-check

> npm i npm-check -g

> npm-check - a different tool to help with dependencies (I didn't use it)

honorable mentions

> npm ls - list installed packages (from node_modules)

> npm ls axios - show all versions of axios and why we have them

npm ls will not show you origin of not-installed optional dependencies.

Consider this - you devleop on a win maching and deploy your solution to a linux box. On windows (see below) you might think node-gyp-build is not used in your solution.

> npm ls node-gyp-build
test-npm@1.0.0 C:\git\test-npm
`-- (empty)

But on a linux box it will be used:

> npm ls node-gyp-build
npm-test-proj@1.0.0 /git/npm-test-proj
└─┬ kafka-lz4-lite@1.0.5
  └─┬ piscina@3.2.0
    └─┬ nice-napi@1.0.2
      └── node-gyp-build@4.8.1

axios, cookies & more

axios

axios - promise-based HTTP client for node.js

  • when used in node.js axios uses http module (https://nodejs.org/api/http.html)
  • in node axios does not support cookies by itself (https://github.com/axios/axios/issues/5742)
    • there are npm packages that add cookies support to axios
  • when used in browsers it uses XMLHttpRequest
  • when used in browsers cookies work by default

Why would you use axios over plain http module from node?

Axios makes http requests much easier. Try using plain http and you'll convince your self.


Are there other packages like axios?

Yes - for example node-fetch https://github.com/node-fetch/node-fetch


When making a request axios creates a default http and https agent - https://axios-http.com/docs/req_config (axios probably uses global agents). You can specify custom agents for a specific request or set custom agents as default agents to use with an axios instance.

const a = require('axios');
const http = require('node:http');

(async () => {
    // configure your agent as needed
    const myCustomAgent = new http.Agent({ keepAlive: true });

    // use your custom agent for a specific request
    const x = await a.get('https://example.com/', { httpAgent: myCustomAgent });
    console.log(x);

    // set you agent as default for all requests
    a.default.httpAgent = myCustomAgent;
})();

What are http/s agents responsible for?

http/s agents handle creating/closing sockets, TCP, etc. They talk to the OS, manage connection to hosts.


cookies

Without extra packages you need to code reading response headers, look for Set-Cookie headers. Store cookies somewhere. Code adding cookie headers to subsequent request.

https://www.npmjs.com/package/http-cookie-agent

Manages cookies for node.js HTTP clients (e.g. Node.js global fetch, undici, axios, node-fetch). http-cookie-agent implements a http/s agent that inspects request headers and does cookie related magic for you. It uses the class CookieJar from package tough-cookie to parse&store cookies.

import axios from 'axios';
import { CookieJar } from 'tough-cookie';
import { HttpCookieAgent, HttpsCookieAgent } from 'http-cookie-agent/http';

const jar = new CookieJar();

const a = axios.create({
  httpAgent: new HttpCookieAgent({ cookies: { jar } }),
  httpsAgent: new HttpsCookieAgent({ cookies: { jar } }),
});
// now we have an axios instance supporting cookies
await a.get('https://example.com');

axios-cookiejar-support

https://www.npmjs.com/package/axios-cookiejar-support

Depends on http-cookie-agent and tough-cookie. Does the same as http-cookie-agent but you don't have to create http/s agents yourself. This is a small package that just intercepts axios requests and makes sure custom http/s agents are used source.

Saves you a bit of typing but you can't use your own custom agents. If you need to configure your http/s agents (ex. with a certificate) - use http-cookie-agent (see github issue and github issue)

import axios from 'axios';
import { wrapper } from 'axios-cookiejar-support';
import { CookieJar } from 'tough-cookie';

const jar = new CookieJar();
const client = wrapper(axios.create({ jar }));

await client.get('https://example.com');

https://www.npmjs.com/package/tough-cookie

npm package - cookie parsing/storage/retrieval (tough-cookie itself does nothing with http request).

A bit about cookies

https://datatracker.ietf.org/doc/html/rfc6265 - RFC describing cookies.

https://datatracker.ietf.org/doc/html/rfc6265#page-28 - concise paragraph on Third-party cookies.

Servers responds with a Set-Cookie header. Client can set the requested cookie. Cookies have a specific format described in this document.

Random stuff

https://npmtrends.com/cookie-vs-cookiejar-vs-cookies-vs-tough-cookie

interesting - cookie for servers are more popular than tough-cookie for clients since ~2023.

Is this due to more serve side apps being written in node?

Packages we don't use

  • cookie - npm package - cookies for servers
  • cookies - npm package - cookies for servers (different then cookie)
  • cookiejar - npm package - a different cookie jar for clients

fetch & fetch & node-fetch

fetch - standard created by WHATWG meant to replace XMLHttpRequest - https://fetch.spec.whatwg.org/

fetch - an old npm package to fetch web content - don't use it

node-fetch - community implemented fetch standard as a npm package - go ahead and use it

fetch - node's native implementation of the fetch standard - https://nodejs.org/dist/latest-v21.x/docs/api/globals.html#fetch

Since fetch standard is the standard for both browsers and node chrome has a neat feature to export requests to fetch

chat-gpt crap

When researching I came across some chat-gpt generated content. You read it thinking it will be something but it's trash.

https://www.dhiwise.com/post/managing-secure-cookies-via-axios-interceptors -> this article from 2024 that tell you to implement cookies your self, doesn't even mention the word "package", "module"

https://medium.com/@stheodorejohn/managing-cookies-with-axios-simplifying-cookie-based-authentication-911e53c23c8a -> doesn't mention that cookies don't work in axios run in node without extra packages (at least this one mentions that chat-gpt helped, thought I bet it's fully written by chat-gpt)


inco note - our http client is misleading, it uses same agent for http and https, it should maybe be called customAgent

axios and fiddler

Using a request interceptor (proxy) like fiddler helps during development and debugging.

To make fiddler intercept axios request we have to tell axios that there is a proxy where all requests from should go. The proxy forwards those requests to the actual destination.

http_proxy=... // set proxy for http requests
https_proxy=... // set proxy for https requests
no_proxy=domain1.com,domain2.com // comma separated list of domains that should not be proxied

The proxy for both http and https can be the same url.

Read more - https://axios-http.com/docs/req_config

When using fiddler on windows I suggest going to Network & internet > Proxy and disableing proxies there (fiddler by default sets this). This way fiddler will only receive requests from the process where we set http(s)_proxy env vars.

fiddler and client certificates

I was not able to make fiddler work with client certificates. It should be done like this - https://docs.telerik.com/fiddler/configure-fiddler/tasks/respondwithclientcert but I couldn't get it to work

honorable mentions

I would like to try out - https://www.npmjs.com/package/proxy-agent at some point

I don't fully understand withCredentials

axios & cookies demo

> npm i
> node server.mjs
open browser and go to http://127.0.0.1:3000
cookies are supported
> node test.js (from another console)
cookies are not supported

axios, certificates, etc

To use axios with a client certificate you need to configure the https agent with the key and cert. the key and cert need to be in pem format. They both can be in the same pem file, or in separate pem files. (did not try it) but you should be able to merge and split your pem.

https://nodejs.org/api/tls.html#tlscreatesecurecontextoptions

to try out - https://www.npmjs.com/package/proxy-agent