Should I use System.Text.Json (STJ) or Newtonsoft.Json (previously Json.NET)?

use STJ, Newtonsoft is no longer enhanced with new features. The author works for Microsoft now on some non-json stuff.

marshal - assemble and arrange (a group of people, especially troops) in order.

"the general marshalled his troops"

marshalling (UK) (in computer science) (marshal US) - getting parameters from here to there

serialization - transforming something (data) to a format usable for storage or transmission over the network

JSON - Java Script Object Notation - data interchange format.

Why this post

While analysing some logs I used FSharp.Data's JsonProvider. Only a few properties were relevant but JsonProvider stores the whole json in memory. With 10GB of logs to analyse I quick run out of memory.

Let's do some testing!

open System
open System.IO
open System.Text.Json

fsi.AddPrinter<DateTimeOffset>(fun dt -> dt.ToString("O"))

Environment.CurrentDirectory <- __SOURCE_DIRECTORY__ // ensures the script runs from the directory it's located in
// -------------------------------------------------------------------------

// sample log entry for testing
type LogEntry = {
    Timestamp : DateTimeOffset
    Level     : string
    Message   : string

// only the properties we're interested in
type LogEntryRecord = {
    Timestamp : DateTimeOffset
    Level     : string

let random = Random()
let levels = [ "INFO"; "WARN"; "ERROR"; "DEBUG" ]

let generateLogEntry () =
        Timestamp = DateTimeOffset.Now.AddSeconds(-random.Next(0, 10000))
        Level     = levels.[random.Next(levels.Length)]
        Message   = String.replicate(random.Next(10, 100)) "x" // random string to simulate redundant content

List.init 7_000_000 (fun _ -> generateLogEntry()) // 7M entries is around 1GB of data
|> (fun entry -> JsonSerializer.Serialize(entry))
|> fun lines -> File.WriteAllLines("./logs.json", lines)

let lines = File.ReadAllLines "./logs.json"

let runWithMemoryCheck lines singleLineParser =
    let before = GC.GetTotalMemory(true)
    let x = lines |> singleLineParser
    let after = GC.GetTotalMemory(true)
    let m = ((after - before) |> float) / 1024. / 1024. / 1024. // GB
    x, m

// -------------------------------------------------------------------------

open System.Text.Json.Nodes

#r "nuget: FSharp.Data"
open FSharp.Data

#r "nuget: FSharp.Json"
open FSharp.Json

type LogEntryJsonProvider = JsonProvider<"""
    "Timestamp"        : "2024-12-23T20:51:18.2020753+01:00",
    "Level"            : "ERROR",
    "Message"          : "File not found"

let fSharpDataJsonProvider = LogEntryJsonProvider.Parse

let fSharpDataJsonValue (x:string) =
    let line = x |> FSharp.Data.JsonValue.Parse
    let t = line.GetProperty("Timestamp").AsDateTimeOffset()
    let l = line.GetProperty("Level").AsString()
    { Timestamp = t; Level = l }

let stjJsonSerializer (x:string) = JsonSerializer.Deserialize<LogEntryRecord>(x)

let stjJsonNode (line:string) =
    let line = line |> JsonNode.Parse
    let t = line.["Timestamp"].GetValue<DateTimeOffset>()
    let l = line.["Level"].GetValue<string>()
    { Timestamp = t; Level = l }

let stjJsonDocument (x:string) =
    use doc = x |> JsonDocument.Parse
    let t = doc.RootElement.GetProperty("Timestamp").GetDateTimeOffset()
    let l = doc.RootElement.GetProperty("Level").GetString()
    { Timestamp = t; Level = l }

let sharpJson (x:string) = Json.deserialize<LogEntryRecord> x

runWithMemoryCheck lines fSharpDataJsonProvider |> snd |> printfn "Memory used: %f GB" // Memory used: 4.420363 GB | Real: 00:00:35.829, CPU: 00:02:07.312, GC gen0: 84,   gen1: 25,  gen2: 8
runWithMemoryCheck lines fSharpDataJsonValue    |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521624 GB | Real: 00:00:16.557, CPU: 00:00:35.281, GC gen0: 29,   gen1: 10,  gen2: 4
runWithMemoryCheck lines stjJsonSerializer      |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521555 GB | Real: 00:00:10.823, CPU: 00:00:44.453, GC gen0: 11,   gen1: 6,   gen2: 4
runWithMemoryCheck lines stjJsonNode            |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521419 GB | Real: 00:00:09.533, CPU: 00:00:27.359, GC gen0: 16,   gen1: 7,   gen2: 4
runWithMemoryCheck lines stjJsonDocument        |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521525 GB | Real: 00:00:06.208, CPU: 00:00:17.546, GC gen0: 5,    gen1: 4,   gen2: 4
runWithMemoryCheck lines sharpJson              |> snd |> printfn "Memory used: %f GB" // Memory used: 0.520846 GB | Real: 00:01:02.761, CPU: 00:01:20.578, GC gen0: 1022, gen1: 260, gen2: 4


  • FSharp.Data.JsonProvider is terrible compared to any other alternative (slow and uses lots more memory)
  • STJ.JsonDocument is the speed winner.
  • FSharp.Json supports F# types but it quite slow

System.Text.Json cheat sheet

System.Text.Json namespaces

  • JsonSerializer -> deserialize into fixed type
  • JsonDocument -> immutable (for reading only)
  • JsonDocument -> faster, IDisposable, uses shared memory pool
  • JsonNode -> mutable (you can construct json)

JsonNode vs JsonDocument see


open System
open System.Text.Json

// The System.Text.Json namespace contains all the entry points and the main types.
// The System.Text.Json.Serialization namespace contains attributes and APIs for advanced scenarios and customization specific to serialization and deserialization.

// System.Text.Json.JsonSerializer -> is a static class
//                                 -> you can instantiate and reuse the JsonSerialization options

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [

type InnerType = {
    NestedProperty: int

type DummyType = {
    PropertyName1: string
    PropertyName2: int
    PropertyName3: DateTimeOffset
    PropertyName4: InnerType
    PropertyName5: int list

type LogEntryRecord = {
    Timestamp: DateTimeOffset
    Level    : string

// # JsonSerializer.Deserialize

// JsonSerializer.Deserialize<'Type>(jsonString)
// JsonSerializer.Deserialize<'Type>(jsonString, options)
// JsonSerializer.DeserializeAsync(stream, ...) <- only streams can be parsed async cuz parsing string is purely CPU bound

// Deserialization behaviour:
//  - By default, property name matching is case-sensitive. You can specify case-insensitivity.
//  - Non-public constructors are ignored by the serializer.
//  - Deserialization to immutable objects or properties that don't have public set accessors is supported but not enabled by default.
//    ^ I'm not sure about this cuz F# records seem to work just fine

// { Timestamp = 0001-01-01T00:00:00.0000000+00:00 Level = null }
// no properties match but JsonSerializer just returns default values

// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 2024-12-29T10:31:36.3774099+01:00
//                       PropertyName4 = { NestedProperty = 42 }
//                       PropertyName5 = [42; 11] }

// Deserialization is case sensitive by default!
let jsonString2 = """{
    "propertyName1" : "dummyValue",
    "propertyName2" : 42
// val it: DummyType = { PropertyName1 = null
//                       PropertyName2 = 0
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }
let options = new JsonSerializerOptions()
options.PropertyNameCaseInsensitive <- true
JsonSerializer.Deserialize<DummyType>(jsonString2, options)
// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }

// # JsonSerializer.Serialize

// let's pretty print during testing
// by default the json is minified
let options = new JsonSerializerOptions()
options.WriteIndented <- true

JsonSerializer.Serialize(options, options)
//val it: string =
//  "{
//  "Converters": [],
//  "TypeInfoResolver": {},
//  "TypeInfoResolverChain": [
//    {}
//  ],
//  "AllowOutOfOrderMetadataProperties": false,
//  "AllowTrailingCommas": false,
//  "DefaultBufferSize": 16384,
//  "Encoder": null,
//  "DictionaryKeyPolicy": null,
//  "IgnoreNullValues": false,
//  "DefaultIgnoreCondition": 0,
//  ...

// Serialization behaviour:
//  - by default, all public properties are serialized. You can specify properties to ignore. You can also include private members.
//  - by default, JSON is minified. You can pretty-print the JSON.
//  - by default, casing of JSON names matches the .NET names. You can customize JSON name casing.
//  - by default, fields are ignored. You can include fields.


open System
open System.Text.Json.Nodes

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [

let x = JsonNode.Parse(jsonString) // type(x) = JsonNode
x.["PropertyName2"] |> int
// x.["PropertyName3"] |> DateTimeOffset // TODO - why can't I use this explicit conversion?

x["PropertyName4"].GetValueKind() |> string // "Object"
x["NonExistingProperty"] // null
x["NonExistingProperty"].GetValue<int>() // err - System.NullReferenceException
x["PropertyName5"].AsArray() |> (fun a -> a.GetValue<int>()) // ok
x["PropertyName5"].AsArray() |> int // ok
x["PropertyName5"].[0].GetValue<int>() // ok

// create a json object
let m = new JsonObject()
m["TimeStamp"] <- DateTimeOffset.Now
m.ToJsonString() // {"TimeStamp":"2024-12-29T16:06:17.046746+01:00"}
m["SampleProperty"] <- new JsonArray(1,2)

let a = JsonNode.Parse("""{"x":{"y":[1,2,3]}}""")
a.["x"] // this is a JasonNode
a.["x"].AsObject() // this returns a JsonObject
a.["x"].AsObject() |> (fun x -> printfn "%A" x) // iterate over properties of the object
a.["x"].ToJsonString() // you can serialize subsection of the json
// {"y":[1,2,3]}

JsonNode.DeepEquals(x, a) // comparison


open System
open System.Text.Json

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [

use x = JsonDocument.Parse(jsonString) // remember this is an IDisposable
x.RootElement.GetProperty("PropertyName5").EnumerateArray() |> (fun x -> x.GetInt32())

for i in x.RootElement.GetProperty("PropertyName5").EnumerateArray() do
    printfn "%A" i

// you could also write a generic helper

type JsonElement with
  member x.Get<'T>(name:string) : 'T =
    let p = x.GetProperty(name)
    match typeof<'T> with
    | t when t = typeof<string> -> p.GetString() |> unbox
    | t when t = typeof<int> -> p.GetInt32() |> unbox
    | t when t = typeof<DateTime> -> p.GetDateTime() |> unbox
    | t when t = typeof<JsonElement> -> p |> unbox
    | t when t = typeof<int[]> -> p.EnumerateArray() |> (fun x -> x.GetInt32()) |> Seq.toArray |> unbox
    | _ -> failwith "unsupported type"


F# types and json serialization

open System.Text.Json

// Record - OK
type DummyRecord = {
    Text: string
    Num:  int

let r = { Text = "asdf"; Num = 1 }

JsonSerializer.Serialize(r) |> JsonSerializer.Deserialize<DummyRecord>

let tuple = (42, "asdf")
JsonSerializer.Serialize(tuple) |> JsonSerializer.Deserialize<int * string>

type TupleAlias = int * string
let tuple2 = (43, "sfdg") : TupleAlias
JsonSerializer.Serialize(tuple2) |> JsonSerializer.Deserialize<TupleAlias>

// Discriminated Union :(
type SampleDiscriminatedUnion =
    | A of int
    | B of string
    | C of int * string
let x = A 1
JsonSerializer.Serialize(x) // eeeeeeeeeeeeee !

// Option - OK
JsonSerializer.Serialize(Some 42) |> JsonSerializer.Deserialize<int option>
JsonSerializer.Serialize(None) |> JsonSerializer.Deserialize<int option>
open System
type RecordTest2 = {
    Timestamp: DateTimeOffset
    Level: string
    TestOp: int option

// Discriminated Union is supported in FSharp.Json
#r "nuget: FSharp.Json"
open FSharp.Json
let data = C (42, "The string")
let json = Json.serialize data
// val json: string = "{
//   "C": [
//     42,
//     "The string"
//   ]
// }

let deserialized = Json.deserialize<SampleDiscriminatedUnion> json
// val deserialized: SampleDiscriminatedUnion = C (42, "The string")

More on FSharp.Data's JsonValue

#r "nuget:FSharp.Data"
open FSharp.Data

let j = JsonValue.Parse("""{"x":{"y":[1,2,3]}}""")
// val it: (string * JsonValue) array =
//   [|("x", {
//   "y": [
//     1,
//     2,
//     3
//   ]
// })|]
j.TryGetProperty "x"

// JsonValue is a discriminated union
// union JsonValue =
//   | String  of string
//   | Number  of decimal
//   | Float   of float
//   | Record  of properties: (string * JsonValue) array
//   | Array   of elements: JsonValue array
//   | Boolean of bool
//   | Null
// docs:
// <- if you'll be working with JsonValue read this
// there are also extension methods:
// AsArray doesn't fail if the value is not an array, as opposed to other AsSth methods
// See below how extension methods are defined
// source:
open System.Globalization
open System.Runtime.CompilerServices
open System.Runtime.InteropServices
open FSharp.Data.Runtime
open FSharp.Core

type JsonExtensions =
    /// Get all the elements of a JSON value.
    /// Returns an empty array if the value is not a JSON array.
    static member AsArray(x: JsonValue) =
        match x with
        | (JsonValue.Array elements) -> elements
        | _ -> [||]

    /// Get a number as an integer (assuming that the value fits in integer)
    static member AsInteger(x, [<Optional>] ?cultureInfo) =
        let cultureInfo = defaultArg cultureInfo CultureInfo.InvariantCulture

        match JsonConversions.AsInteger cultureInfo x with
        | Some i -> i
        | _ ->
            failwithf "Not an int: %s"
            <| x.ToString(JsonSaveOptions.DisableFormatting)

// construct a json object
let d =
    JsonValue.Record [|
        "event",      JsonValue.String "asdf"
        "properties", JsonValue.Record [|
            "token",       JsonValue.String "tokenId"
            "distinct_id", JsonValue.String "123123"

d.ToString().Replace("\r\n", "").Replace(" ", "")

// if you want to process the json object
for (k, v) in d.Properties() do
    printfn "Property: %s" k
    match v with
    | JsonValue.Record props -> printfn "\t%A" props
    | JsonValue.String s     -> printfn "\t%A" s
    | JsonValue.Number n     -> printfn "\t%A" n
    | JsonValue.Float f      -> printfn "\t%A" f
    | JsonValue.Array a      -> printfn "\t%A" a
    | JsonValue.Boolean b    -> printfn "\t%A" b
    | JsonValue.Null         -> printfn "\tnull"

Serialize straight to UTF-8

JsonSerializer.SerializeToUtf8Bytes(value, options) <- why does this one exist?

Strings in .Net are stored in memory as UTF-16, so if you don't need a string, you can use this method and serialize straight to UTF-8 bytes (it's 5-10% faster, see link) - a post from when they introduced the new json API

regex - use static Regex.Matches() or instantiante Regex()?

By default use static method.

.NET regex engine caches regexes (by default 15).

Are you using more than 15 regexes and use them frequently and they're complex and you care about a performance?

Investigate Regex() and RegexOptions.Compiled RegexOptions.CompiledToAssembly

Test performance before you optimize

What is the whole fus about backtracing?

Microsoft's documentation does a bad job explaning backtracking.

Read about backtracking here -

To experience backtracing yourself - - keep on adding "x" to the input and see how the execution time increses - with 35*"x" it takes 5 seconds for the regex to find out it doesn't match!


These are the methods you need:

open System
open System.Text.RegularExpressions

Regex.Matches("input", "pattern")
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline)
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline, TimeSpan.FromSeconds(10.)) // you can use a timeout to prevent a DoS attack with malicous inputs

let r = new Regex("pattern") // instance Regex offers the same methods
Regex class -


let matches = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)o")
matches |> Seq.iter (fun x -> printfn "%s" x.Value)
matches |> Seq.iter (fun x -> printfn "%A" x.Groups)
matches.[0].Groups.[1].Value |> printfn "%s"

// Lo             // these are the whole matches
// do             //
// lo             //
// co             //
// seq [Lo; L]    // group 0 is the whole match, group 1 is the (\w)
// seq [do; d]    //
// seq [lo; l]    //
// seq [co; c]    //
// L              // this is the letter captured by (\w)

let matches2 = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)+o")
matches2.[1].Groups.[1].Value |> printfn "%A"
matches2.[1].Groups.[1].Captures |> Seq.iter (fun c -> printfn "%s" c.Value)
// l              // gotcha! the value of the group is the last thing captured by that group
// d              // here the (\w)+ group captures 3 times
// o              //
// l              //
Match object properties:
Match.Success -> bool   | true      | false        |
Match.Value   -> string | the match | String.Empty |
let match3 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[a-z ]+i")
match3.Success |> printfn "%A"
match3.Value   |> printfn "%A"
// true
// "Lorem ipsum dolor si"

let match4 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[A-Z ]+i")
match4.Success            |> printfn "%A"
match4.Value              |> printfn "%A"
match4.Groups.Count       |> printfn "%A"
match4.Groups.[0].Success |> printfn "%A"
// false
// ""    // notice this is String.empty not <null>
// 1     // even for a failed match there is always at least one group
// false

let mutable m = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "\wo")
while m.Success do
    printfn "%s" m.Value
    m <- m.NextMatch()

let lines = [
    "The next day the children were ready to go to the plum thicket in the"
    "peach orchard as soon as they had their breakfast, but while they were"
    "talking about it a new trouble arose. It grew out of a question asked by"

|> List.filter (fun line -> Regex.IsMatch(line, "the"))
|>    (fun line -> Regex.Replace(line, "(\w+) the", "the $1"))

let text =
    "don't we all love\n" +
    "dealing with different\r\n" +
    "line endings\n" +
    "it's so much fun"
Regex.Split(text, "\r?\n")
|> Array.iter (printfn "%s")

open System.Net.Http
let book = (new HttpClient()).GetStringAsync("").Result
Regex.Count(book, "[^\w]\w{3}[^\w]") |> printfn "%d" // count 3 letter words

regex - Quick Reference (Microsoft)

Cheat sheet

Character escapes

\t     matches a tab \u0009
\r     match a carriage return \u000D
\n     new line \u000A
\unnnn match a unicode character by hexadecimal representation, exactly 4 digits
\.     match a dot (not any character) aka. match literally
\*     match an asterisk (don't interpret * as a regex special quantifier)

Character classes

[character_group]       /[ae]/ will match "a" in "gray"
[a-z] [A-Z] [a-z0-9A-Z] character ranges
.                       wildcard - any character except \n except when using SingleLine option
\w                      word character - upper/lower case letters and numbers
\W                      non word character
\s                      white-space character
\S                      non whitespace character
\d                      digit
\D                      non digit


^   $ beginning and end of a string (in multiline mode beginning and end of a line)


(subexpression)               (\w)\1 - match a character and the same character again - "aa" in "xaax"
(?<name>subexpression)        named group (?<double>\w)\k<double> - same as above
(?:subexpression)             noncapturing group - Write(?:Line)? - will match both Write and WriteLine in a string
                              (:?Mr\. |Ms\. |Mrs\. )?\w+\s\w+ -> match fist name, last name and optional preceding title
(?imnsx-imnsx: subexpression) turn options on or off for a group
(?=subexp)                    zero-width positive lookahead assertion
(?!subexp)                    negative lookahead
(?<!subexp)                   look behind assertions
                              make sure a subexp is/is not following (but don't match it, ie. don't consume the characters)


*     0...n (all these are greedy by default -> match as many as possible)
+     1...n
?     0...1
{n}   exactly n
{n,}  at least n
{n,m} n...m
{n,m}? question mark makes the match nongreedy (mach as few as possible)


\number   match the value of a previous subexpression - (\w)\1 - matches the same \w character twice
\k<name>  backreference using group name

Alternation Constructs

| - any element separated by | - th(e|is|at) and the|this|that both match "the" "this" "that"
    ala|ma|kota - match "ala" or "ma" or "kota"
    ala ma (kota|psa) - match "ala ma kota" or "ala ma psa"
TODO - match yes if expresion else match no


$number use numbered group
${name} use named group
$$      literal $
$&      whole match
$`      text before the match
$'      text after the match
$+      last group
$_      entier input string

Inline options

(?imnsx-imnsx)               use it like this at the beginning
(?imnsx-imnsx:subexpression) use for a group
i                            case insensetive
m                            multiline - match beginning and end of a line
n                            do not capture unnamed groups
s                            signle line - . matches \n also
More options are available using RegexOptions enum

Practice regex


I love regex.


I love regex.

However I used to say "if you solve a problem with regex now you have 2 problems"

Not knowing how this quote came to be I repeated it for years. I'll smack the next person to repeat this quote without elaborating.

If regex did not exist, it would be necessary to invent it.

Why does .Matches() return a custom collection instead of List<Match>?

Historic reasons. Regex was made in .Net 1.0 before generic were a thing.

I used (?<!\[.*?)(?<!\(")https?://\S+ with replace [$&]($&) to linkify links in this post

My lovely regex helpers

let regexExtract  regex                      text = Regex.Match(text, regex).Value
let regexExtractg regex                      text = Regex.Match(text, regex).Groups.[1].Value
let regexExtracts regex                      text = Regex.Matches(text, regex) |> (fun x -> x.Value)
let regexReplace  regex (replacement:string) text = Regex.Replace(text, regex, replacement)
let regexRemove   regex                      text = Regex.Replace(text, regex, String.Empty)

PowerShell "Oopsie"

Task - remove a specific string from each line of multiple CSV files.

This task was added to the scripting exercise list.

First - let's generate some CSV files to work with:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"

$fileNames | % { $csvData | Export-Csv -Path $_ }

The "Oopsie"

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file $_ -Append }

This command will never finish. Run it for a moment (and then kill it), see the result, and try to figure out what happens. Explanation below.





































The explanation

Get-Content (aka. cat) keeps the file open and reads the content that our command is appending, thus creating an infinite loop.

The fix

There are many ways to fix this this "oopsie"

Perhaps the simplest one is to not write to and read from the exact same file. A sensible rule is when processing files always write to a different file:

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file -path "fixed$($_.Name)" }

Knowing the reason for our command hanging we can make sure the whole file is read before we overwrite it:

ls *.csv | % { (cat $_ ) | % { $_ -replace "42","" } | out-file $_ }
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ } # we can also use -replace as an array operator

I'm amazed by github's co-pilot answer for "powershell one liner to remove a specific text from multiple CSV files":

Get-ChildItem -Filter "*.csv" | ForEach-Object { (Get-Content $_.FullName) -replace "string_to_replace", "replacement_string" | Set-Content $_.FullName }

How the way I work/code/investigate/debug changed with time & experience

I use this metaphor when describing how I work these days.


  1. Quick feedback is king
    • unit tests
    • quick test in another way
    • reproduce issues locally
    • try things out in a small project on the side, not in the project you're working on
  2. One thing at a time
    • experimenting
    • refactoring preparing for feature addition
    • feature coding
    • cleaning up after feature coding
  3. Divide problems into smaller problems
    • and remember - one thing (problem) at a time


You're working with code that talks to a remote API, you want to test different API calls to the remote API.

don't - change API parameters in code and run the project each time you test something. It takes too long.

do - write a piece of code to send an HTTP request, fiddle with this code

do - intercept request with Fiddler/Postman/other interceptor and reissue requests with different parameters


Something fails in the CI pipeline.

don't - make a change, commit, wait for remote CI to trigger, see result

do - reproduce issue locally

Longer read

  1. Quick feedback
  2. do - write a test for it
  3. do - isolate your issue/suspect/the piece of code you're working with
    • it is helpful if you can run just a module/sub-system/piece of your project/system
    • partial execution helps - like in Python/Jupyter or F# fsx
  4. if you rely on external data and it takes time to retrieve it (even a 5-second delay can be annoying) - dump data to a file and read it from the file instead of hitting an external API or a DB every time you run your code
  5. don't try to understand how List.foldBack() works while debugging a big project. Do it on the side.
  6. spin up a new solution/project on the side to test things
  7. occasional juniors ask "does this work this way" - you can test it yourself easily if you do it on the side

  8. One thing at a time

  9. separate refactoring from feature addition
  10. fiddle first, find the walls/obstacles
  11. git reset --hard
  12. refactor preparing for a new feature (can become a separate PR)
  13. code feature
  14. if during coding you find something that needs refactoring/renaming/cleaning up - any kind of "WTF is this? I need to fix this!" try a) or b)
    • a) make a note to fix it later
    • b) fix immediately
      > git stash
      > git checkout master
      > git checkout -b fix-typo
      fix stuff
      merge or create a PR
      git checkout feature
      > git merge fix-typo or git rebase fix-typo
      continue work
  15. always have a paper notepad on your desk

    • note things you would like to come back to or investigate
    • it gives me great satisfaction to go through a list of "side quests" I have noted and strike through all of them, knowing I have dealt with each one before starting a new task
    • when investigating something I also note questions I would like to be able to answer after I'm done investigating
      • example: while working with Axios and cookies I found conflicting information about whether Axios supports cookies. After the investigation, I knew that Axios supports cookies by default in a browser but not in Node.js
  16. Divide problems into smaller problems

  17. example - coding logic for a new feature in a CLI tool and designing the CLI arguments - these can be 2 sub-tasks

Big bang vs baby steps

The old me often ended up doing the big bang. Rewriting large chunks of code at once. Starting things from scratch. Working for hours or days with a codebase that can't even compile.

Downsides - for a long time the project doesn't even compile, I lose motivation, I feel like I'm walking in the dark, I don't see errors for a long time - requires a lot of context keeping in my mind since I've ripped the project apart - if I abandon work for a few days sometimes I forget everything and progress is lost

The new me prefers baby steps

Fiddle with the code knowing I'll git reset --hard. Try renaming some stuff - helps me understand the codebase better. Try out different things and abandon them. At this point, I usually get an idea/feeling of what needs to be done. Plan a few smaller refactorings. After them, I am usually closer to the solution and am able to code it without a big bang.

My recommendations

Terminal etc.





In PowerShell if you want to return an array instead of one element of the array at the time do this:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong! will pipe/return 1 element at a time
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct! will pipe/return pairs
Beware! Result of both snippets will be displayed in the exact same way even though they have different types! See below:
> @(1,2,3,4)
> @((1,2),(3,4))

To check actual types:

> $x = @(1..2) | % { $a = "a" * $_; @($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
> $x = @(1..2) | % { $a = "a" * $_; ,@($a,$_) } ; $x.GetType().Name ; $x[0].GetType().Name ; $x
# Alternatively
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType
# Get-Member only shows output for every distinct type

Longer read

Occasionally I have a fist fight with PS to return an Array instead of one element at a time. PS is a tough oponent. I think I get it now though.

The comma , in PS is a binary and unary operator. You can use it with a single or 2 arguments.

> ,1 # as an unary operator the comma creates an array with 1 member
> 1,2 # as an binary operator the comma creates an array with 2 members

Beware that both an array[] and array[][] will be displayed the same way. $y is an array[][], it is printed the same way to the output as $x

> $x = @(1,2,3,4) ; $x.GetType().Name ; $x[0].GetType().Name ; $x
> $y = @((1,2),(3,4)) ; $y.GetType().Name ; $y[0].GetType().Name ; $y

If you're trying to return an array of pairs:

> @(1..2) | % { $a = "a" * $_; @($a,$_) } # wrong
> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } # correct!
# Even though the result looks like a flat array this this time it's an array of arrays
> @(1..2) | % { $a = "a" * $_; @($a,$_) } | Get-Member -name GetType # we get strings and ints

   TypeName: System.String

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

   TypeName: System.Int32

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

> @(1..2) | % { $a = "a" * $_; ,@($a,$_) } | Get-Member -name GetType # we get arrays

   TypeName: System.Object[]

Name    MemberType Definition
----    ---------- ----------
GetType Method     type GetType()

More on printing your arrays of pairs:

> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output # write-output will "unwind" your array
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-host
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | % { write-output "$_" }
a 1
aa 2
> @(1..4) | % { $a = "a" * $_; ,@($a,$_) } | write-output -NoEnumerate # returns an array of arrays but it's printed as if it's a flat array

This explains how @() works in PS.

> $a='A','B','C'
> $b=@($a;)
> $a
> $b
> [Object]::ReferenceEquals($a, $b)
Above $a; is understood as $a is a collection, collections should be enumerated and each item is passed to the pipeline. @($a;) sees 3 elements but not the original array and creates an array from the 3 elements. In PS @($collection) creates a copy of $collection. @(,$collection) - creates an array with a single element $collection.

Notes on keys, certs, certificates, HTTPS, SSL, SSH, TLS

key != cert (a key is different from a certificate)

Keys are used to encrypt connections, certs are used to verify that the key owner is who he says he is.

Certificate aka cert

A certificate proves that a public key belongs to a given entity. The cert includes:

  • public key
  • information about the key
  • CA's signature validating the cert

CA's signature basically says "I confirm that this public key belongs to this person/entity". The signature is made using CA's private key.

This is wikipedia certificate which I've exported from my chrome browser. This is the certificate in PEM format.


We can double click the cert file to view it (on Windows) or use many other different tools to view its content.

> $cert = New-Object Security.Cryptography.X509Certificates.X509Certificate2([string]"C:\Users\inwen\Downloads\")
> $cert | select *

EnhancedKeyUsageList : {Server Authentication (, Client Authentication (}
DnsNameList          : {*,,,}
SendAsTrustedIssuer  : False
Archived             : False
Extensions           : {System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid, System.Security.Cryptography.Oid...}
FriendlyName         :
IssuerName           : System.Security.Cryptography.X509Certificates.X500DistinguishedName
NotAfter             : 17/10/2024 01:59:59
NotBefore            : 18/10/2023 02:00:00
HasPrivateKey        : False
PrivateKey           :
PublicKey            : System.Security.Cryptography.X509Certificates.PublicKey
RawData              : {48, 130, 8, 75...}
SerialNumber         : 07419E39583A4C76CF1EA14347FA5F3A
SubjectName          : System.Security.Cryptography.X509Certificates.X500DistinguishedName
SignatureAlgorithm   : System.Security.Cryptography.Oid
Thumbprint           : 483F0C71F34AE0EA30D99BD60463DCDAA8F49DFB
Version              : 3
Handle               : 2140299849504
Issuer               : CN=DigiCert TLS Hybrid ECC SHA384 2020 CA1, O=DigiCert Inc, C=US
Subject              : CN=*, O="Wikimedia Foundation, Inc.", L=San Francisco, S=California, C=US


SSH (Secure SHell protocol) - protocol that allows to execute shell commands over a secure connection.

SFTP is an extension of SSH. SFTP != FTP over SSH. To connect to a SFTP server you need a private ssh key. The public ssh key (your private key's counterpart) is stored at the server.

TLS & SSL - think of SSL as the older/first protocol for secure communication. SSL was outphased by TLS. TLS is THE protocol used by HTTPS for secure connections.

Clients can be anonymous in TLS - usually the case on web - the server provides a cert to your browser but you don't need a cert of your own. TLS can be mutual - if the client has a cert the servers will/can validate it.


PuTTy is free+open source software than can do SSH. PuTTy has its own format of key files -> .ppk

ppk - putty private key (ppk can be changed to pem with some software) A PPK file stores a private key, and the corresponding public key. Both are contained in the same file.


Privacy-Enhanced Mail (PEM) is THE file format for exhanging keys, certificates.

  • .cer & .crt - PEM file with a certificate
  • .key - PEM with with a private or public key

The file extensions doesn't really matter. Just open the file and see the headers to be sure what it is.

To view a pem certificate on Windows - rename it to .crt and double click.

You can open a .pem file as plain text as see its content:

// pem ignores stuff between the headers so you can put comments here


Contents between header and footer (-----BEGIN CERTIFICATE----- + -----END CERTIFICATE-----) is base64 encoded. The content can be DER binary data.


Distinguished Encoding Rules - is a way of encoding data structures. A certificate is a data structure containing various entires like validity date, issuer, etc. For certificates to work you need to store this information and transfer it. DER encodes this information is a binary format. This is then after base64 encoded and then it goes into a PEM file.


X.509 is the standard defining public key certificates for TLS/SSL (HTTPS)


PFX seems to be Microsoft's complicated file format for storing cryptographic data.

P12/PKCS12 is the successor to PFX. Sometimes the terms PFX/P12/PKCS12 are used interchangeably.

base64 offline decoder:

Nice description of certs vs key:

Generate yourself a certificate:

Important info on rejectUnauthorized: false and certificates in axios/node:

convention - propose - specify format in secret name - use plain - not base64 encoded


# PFX/pkcs12 to PEM
openssl pkcs12 -in cert.pfx -out cert.pem -nodes

# PFX/pkcs12 to PEM no password
openssl pkcs12 -in cert.p12 -out cert_without_pwd.pem -nodes -password pass:1234

# PEM to PFX/pkcs12 (both have passwords)
openssl pkcs12 -export -out cert.pfx -in cert.pem -inkey cert.pem -passin pass:1234 -passout pass:1234

# PEM to PFX/pkcs12 (when key and cert are in separate .pem files)
openssl pkcs12 -export -out bob_pfx.pfx -inkey bob_key.pem -in bob_cert.cert

# if openssl hangs try running it using winpty
winpty openssl pkcs12 -in cert.pfx -out cert.pem -nodes


Lazy websites

Website's certificates are usually signed by intermediate CA, which in turn are signed by a trusted root CA. The idea is that the server you connect to send you its certificate with all the intermediate certificates. Your app/machine should have the root CA certificate stored so it can validate the chain of certificates it received from the server (by just validating the root cert sent with its own root CA).

Some servers are misconfigured and do not send the intermediate certificates. You do not notice because browsers fill in the gaps for a better browsing experience. However when you try to scrape the same website with ex. node your connection will be rejected.

don't's (for node)

Several answers on SO suggest:

  • const httpsAgent = new https.Agent({ rejectUnauthorized: false });

Both are terrible ideas - they make your app accept unauthorized connections. They are the equivalent of this conversation:

"I can't verify this certificate, we can not be sure who we are connecting to" - says Node with care in its voice

"Doesn't matter, YOLO, carry on" - you reply shrugging your shoulders

Read more here

does (for node)

Use NODE_EXTRA_CA_CERTS. Alternatively use a library to programmatically give node the missing certificate link

Good read -

root CA stores


It seems everyone has their own root CA store these days. Nodes has a hardcoded list of root CA see:



You can view Windows certificates with PowerShell:

Get-ChildItem -Recurse Cert:


If you would like to become chrome's trusted CA -

node packages updating


  1. > npm install depcheck -g - install depcheck globally
  2. > depcheck - check for redundant packages
  3. > npm un this-redundant-package - uninstall redundant packages (repeat for all redundant packages)
  4. Create a pull-request remove-redundant-packages

  1. > npm i - make order in node_modules
  2. > npm audit - see vulnerability issues
  3. > npm audit fix - fix vulnerability issues that don't require attention
  4. Create a pull-request fix-vulnerability-issues

  1. > npm i npm-check-updates -g - install npm-check-updates globally
  2. > npm-check-updates - see how outdated packages are
  3. > npm outdated - see how outdated packages are
  4. > npm update --save - update packages respecting your semver constraints from packages.json
  5. If you have packages that use major version 0.*.* you'll need to manually update these now
    • > npm install that-one-package@latest
  6. Create a pull-request update-packages-minor

If you're brave and can test/run you project easily:

  1. ncu -u - updates packages.json to all latest versions as shown by npm-check-updates
    • this might introduce breaking changes
  2. npm i - update package-lock.json
  3. Test your project.
  4. Create a pull-request update-packages-major

If you're not brave or can't just YOLO and update all major versions:

  1. npm-check-updates - check again what is left to update
  2. npm i that-package@latest - update major version of of that-package
  3. Test your project.
    • .js is dynamically typed so you might have just updated a package that breaks your project but you'll not know until you run your code
  4. Repeat for all packages.
  5. Create a pull-request update-packages-major

longer read

Need to update dependencies in a node js project? Here are my notes on this.

> npm i (npm install)

> npm i

added 60 packages, removed 124 packages, changed 191 packages, and audited 522 packages in 13s

96 packages are looking for funding
  run `npm fund` for details

10 vulnerabilities (2 low, 7 moderate, 1 high)

To address issues that do not require attention, run:
  npm audit fix

To address all issues possible (including breaking changes), run:
  npm audit fix --force

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.
- installs missing packages in node_modules - removes redundant packages in node_modules - installs correct versions of mismatched packages (if packages-lock.json wants a different version than found in node_modules) - shows what is going on with packaged in your project

> npm audit - shows a report on vulnerability issues in your dependencies

> npm audit fix - updates packages to address vulnerability issues (updates that do not require attention)

> npm outdated - shows a table with your packages and versions

$ npm outdated
Package      Current   Wanted   Latest  Location                  Depended by
glob          5.0.15   5.0.15    6.0.1  node_modules/glob         dependent-package-name
nothingness    0.0.3      git      git  node_modules/nothingness  dependent-package-name
npm            3.5.1    3.5.2    3.5.1  node_modules/npm          dependent-package-name
local-dev      0.0.3   linked   linked  local-dev                 dependent-package-name
once           1.3.2    1.3.3    1.3.3  node_modules/once         dependent-package-name

  • Current - what is in nodes_modules
  • Wanted - most recent version that respect the version constraint from packages.json
  • Latest - latest version from npm registry

To update to latest minor+patch versions of your dependencies (Wanted) - npm outdated shows all you need to know but I prefer the output of npm-check-updates

> npm i npm-check-updates -g (-g -> global mode - package will be available on your whole machine)

> npm-check-updates - shows where an update will be a major/minor/patch update (I like the colors)

Checking C:\git\blog\package.json
[====================] 39/39 100%

 @azure/storage-blob         ^12.5.0  →      ^12.17.0
 adm-zip                     ^0.4.16  →       ^0.5.12
 axios                       ^0.27.2  →        ^1.6.8
 basic-ftp                    ^5.0.1  →        ^5.0.5
 cheerio                 ^1.0.0-rc.6  →  ^1.0.0-rc.12
 eslint                      ^8.12.0  →        ^9.2.0
 eslint-config-prettier       ^8.5.0  →        ^9.1.0
 eslint-plugin-import        ^2.25.4  →       ^2.29.1
 fast-xml-parser              ^4.2.4  →        ^4.3.6
 humanize-duration           ^3.27.3  →       ^3.32.0
 iconv                        ^3.0.0  →        ^3.0.1
 jsonwebtoken                 ^9.0.0  →        ^9.0.2
 luxon                        ^3.4.3  →        ^3.4.4

Let us update something

> npm update - perform updates respecting your semver constraints and update package-lock.json

> npm update --save - same as above but also update packages.json, use this one always

The behavior for packages with major version 0.*.* is different than for versions >=1.0.0 (see npm help update)

npm update will most likely bump all minor and patch versions for you.

You can run npm update --save often.

What do the symbols in package.json mean?

npm update --save vs npm audit fix

npm audit fix will only update packages to fix vulnerability issues

npm update --save will update all packages it can (respecting semver constraints)

Do I have unused dependencies?

> npm install depcheck -g

> depcheck - shows unused dependencies. depcheck scans for require/import statements in your code so you might be utilizing a package differently but depcheck will consider it unused (ex. when you import packages using importLazy).


> npm i npm-check -g

> npm-check - a different tool to help with dependencies (I didn't use it)

honorable mentions

> npm ls - list installed packages (from node_modules)

> npm ls axios - show all versions of axios and why we have them

npm ls will not show you origin of not-installed optional dependencies.

Consider this - you devleop on a win maching and deploy your solution to a linux box. On windows (see below) you might think node-gyp-build is not used in your solution.

> npm ls node-gyp-build
test-npm@1.0.0 C:\git\test-npm
`-- (empty)

But on a linux box it will be used:

> npm ls node-gyp-build
npm-test-proj@1.0.0 /git/npm-test-proj
└─┬ kafka-lz4-lite@1.0.5
  └─┬ piscina@3.2.0
    └─┬ nice-napi@1.0.2
      └── node-gyp-build@4.8.1

axios, cookies & more


axios - promise-based HTTP client for node.js

  • when used in node.js axios uses http module (
  • in node axios does not support cookies by itself (
    • there are npm packages that add cookies support to axios
  • when used in browsers it uses XMLHttpRequest
  • when used in browsers cookies work by default

Why would you use axios over plain http module from node?

Axios makes http requests much easier. Try using plain http and you'll convince your self.

Are there other packages like axios?

Yes - for example node-fetch

When making a request axios creates a default http and https agent - (axios probably uses global agents). You can specify custom agents for a specific request or set custom agents as default agents to use with an axios instance.

const a = require('axios');
const http = require('node:http');

(async () => {
    // configure your agent as needed
    const myCustomAgent = new http.Agent({ keepAlive: true });

    // use your custom agent for a specific request
    const x = await a.get('', { httpAgent: myCustomAgent });

    // set you agent as default for all requests
    a.default.httpAgent = myCustomAgent;

What are http/s agents responsible for?

http/s agents handle creating/closing sockets, TCP, etc. They talk to the OS, manage connection to hosts.


Without extra packages you need to code reading response headers, look for Set-Cookie headers. Store cookies somewhere. Code adding cookie headers to subsequent request.

Manages cookies for node.js HTTP clients (e.g. Node.js global fetch, undici, axios, node-fetch). http-cookie-agent implements a http/s agent that inspects request headers and does cookie related magic for you. It uses the class CookieJar from package tough-cookie to parse&store cookies.

import axios from 'axios';
import { CookieJar } from 'tough-cookie';
import { HttpCookieAgent, HttpsCookieAgent } from 'http-cookie-agent/http';

const jar = new CookieJar();

const a = axios.create({
  httpAgent: new HttpCookieAgent({ cookies: { jar } }),
  httpsAgent: new HttpsCookieAgent({ cookies: { jar } }),
// now we have an axios instance supporting cookies
await a.get('');


Depends on http-cookie-agent and tough-cookie. Does the same as http-cookie-agent but you don't have to create http/s agents yourself. This is a small package that just intercepts axios requests and makes sure custom http/s agents are used source.

Saves you a bit of typing but you can't use your own custom agents. If you need to configure your http/s agents (ex. with a certificate) - use http-cookie-agent (see github issue and github issue)

import axios from 'axios';
import { wrapper } from 'axios-cookiejar-support';
import { CookieJar } from 'tough-cookie';

const jar = new CookieJar();
const client = wrapper(axios.create({ jar }));

await client.get('');

npm package - cookie parsing/storage/retrieval (tough-cookie itself does nothing with http request).

A bit about cookies - RFC describing cookies. - concise paragraph on Third-party cookies.

Servers responds with a Set-Cookie header. Client can set the requested cookie. Cookies have a specific format described in this document.

Random stuff

Packages we don't use

  • cookie - npm package - cookies for servers
  • cookies - npm package - cookies for servers (different then cookie)
  • cookiejar - npm package - a different cookie jar for clients

fetch & fetch & node-fetch

fetch - standard created by WHATWG meant to replace XMLHttpRequest -

fetch - an old npm package to fetch web content - don't use it

node-fetch - community implemented fetch standard as a npm package - go ahead and use it

fetch - node's native implementation of the fetch standard -

Since fetch standard is the standard for both browsers and node chrome has a neat feature to export requests to fetch

chat-gpt crap

axios and fiddler

Using a request interceptor (proxy) like fiddler helps during development and debugging.

To make fiddler intercept axios request we have to tell axios that there is a proxy where all requests from should go. The proxy forwards those requests to the actual destination.

http_proxy=... // set proxy for http requests
https_proxy=... // set proxy for https requests, // comma separated list of domains that should not be proxied

The proxy for both http and https can be the same url.

Read more -

When using fiddler on windows I suggest going to Network & internet > Proxy and disableing proxies there (fiddler by default sets this). This way fiddler will only receive requests from the process where we set http(s)_proxy env vars.

fiddler and client certificates

I was not able to make fiddler work with client certificates. It should be done like this - but I couldn't get it to work

honorable mentions

I would like to try out - at some point

axios & cookies demo

> npm i
> node server.mjs
open browser and go to
cookies are supported
> node test.js (from another console)
cookies are not supported

axios, certificates, etc

To use axios with a client certificate you need to configure the https agent with the key and cert. the key and cert need to be in pem format. They both can be in the same pem file, or in separate pem files. (did not try it) but you should be able to merge and split your pem.

