Skip to content

Blog

My blog about programming & other stuff

If you find a typo or would like to improve my posts feel free to submit a pull request to https://github.com/inwenis/blog.

Check out my other projects on my github like https://github.com/inwenis/collider where I tried to simulate Brownian motions.

My collection of various kata's I've done: https://github.com/inwenis/kata

I used to teach programming, out of sentiment I kept these two repos: - https://github.com/inwenis/sda.javawwa13.prog1.day5.complexity - https://github.com/inwenis/sda.javawwa13.prog1.day3.array_vs_hashtable

Enjoy!

You can reach me at inwenis at gmail.com

Me gusta

I don't have a better place for this yet so here's a list of links I like: - dict in F# - https://www.fssnip.net/fy/title/Typeinference-friendly-division-and-multiplication

Post25

1024 * 1024 - this many bytes is an mibibyte (MiB).

A megabyte like a megameter is 10^6 bytes.

We all frequently say megabyte meaning a mibibyte. Like wise a kilobyte != kibibyte

Unit Abbreviation Size in Bytes
Kibibyte KiB 1,024
Mebibyte MiB 1,048,576
Gibibyte GiB 1,073,741,824
Kilobyte KB 1,000
Megabyte MB 1,000,000
Gigabyte GB 1,000,000,000

Network speeds are measured in Mbps - that is mega bits per second - that is 1,000,000 bits per second.

MB - Megabyte (SI) MiB - Mibibyte (IEC) Mb - Megabit (SI)

https://www.iec.ch/prefixes-binary-multiples

Some neat fsx F#

My company had a hackathon focused on data scraping/processing.

Each team had to scrape 3 endpoints. I came up with something similar to this:

open System
open System.Net.Http
open System.Text

let c = new HttpClient()
c.Timeout <- TimeSpan.FromSeconds(5.0)

let lockObject = new obj()
let printSync text =
    let now = DateTimeOffset.Now.ToString("O")
    lock lockObject (fun _ -> printfn "[%s] %s" now text)

let s = new HttpClient()
s.Timeout <- TimeSpan.FromSeconds(5.0)
s.DefaultRequestHeaders.Add("X-Sender", "this is me, Mario!")
let sendToDestination stream response = async {
    let template = """{
    "CreatedAt": "xxXCreatedAtXxx",
    "Stream": "xxXStreamXxx",
    "Data": [
        xxXDataXxx
    ]
}"""
    let payload = template.Replace("xxXCreatedAtXxx", DateTimeOffset.Now.ToString("O"))
                          .Replace("xxXStreamXxx", stream)
                          .Replace("xxXDataXxx", response)
    let! response = s.PostAsync("http://localhost:8080", new StringContent(payload, Encoding.UTF8, "application/json") ) |> Async.AwaitTask
    sprintf "%s done sending response code %A" stream response.StatusCode |> printSync
}

let scraper (url:string) stream = async {
    while true do
        try
            let! response = c.GetStringAsync(url) |> Async.AwaitTask
            do! sendToDestination stream response
            sprintf "scraped %40s sendTo %s" url stream |> printSync
        with
        | _ -> sprintf "failed to scrape/or send %40s" url |> printSync

        do! Async.Sleep 1000
}

let urls = [
    "https://jsonplaceholder.typicode.com/posts", "123"
    "https://jsonplaceholder.typicode.com/posts", "124"
    "https://jsonplaceholder.typicode.com/posts", "125"
]

urls
|> List.map (fun (url, stream) -> scraper url stream)
|> Async.Parallel
|> Async.Ignore
|> Async.Start

// Async.CancelDefaultToken()

Things to keep in mind:

  • always have a try/catch all exceptions in async/tasks/threads
    • you don't want your thread to die without you knowing
  • always set a timeout when scraping (default timeout in .NET is 100s which is excessive for this script)

A minimalistic http server to listen to our scrapers:

open System.Net
open System.Text

// https://sergeytihon.com/2013/05/18/three-easy-ways-to-create-simple-web-server-with-f/
// run with `fsi --load:ws.fsx`
// visit http://localhost:8080

let host = "http://localhost:8080/"

let listener (handler:(HttpListenerRequest->HttpListenerResponse->Async<unit>)) =
    let hl = new HttpListener()
    hl.Prefixes.Add host
    hl.Start()
    let task = Async.FromBeginEnd(hl.BeginGetContext, hl.EndGetContext)
    async {
        while true do
            let! context = task
            Async.Start(handler context.Request context.Response)
    } |> Async.Start

listener (fun req response ->
    async {
        response.ContentType <- "text/html"
        let bytes = UTF8Encoding.UTF8.GetBytes("thanks!")
        response.OutputStream.Write(bytes, 0, bytes.Length)
        response.OutputStream.Close()
    })

PowerShell Gotcha! - dynamic scoping

PowerShell uses dynamic scoping. Yet the about_Scopes page doesn't mention the word "dynamic".

Wird (wird - so weird that you need to misspell weird to get your point across).

tl;dr;

In PowerShell variables are copied into the stack frame created for the function you're calling. So the "child" function can use your variables but can only modify its own copies. You can avoid this by setting your variable to private $private:varName=... and using Set-StrictMode -version latest to throw an error if "child" functions try to access a undefined variable.

PowerShell uses dynamic scoping. What we know from most programming languages is lexical scoping.


function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
hello

Weird! (this is dynamic scoping)


Set-StrictMode -Version Latest
function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
Set-StrictMode -Off # remember to turn strict mode off for further testing
hello

Weird! (but makes sense since in PowerShell's world this is perfectly legal hence "strict" changes nothing here)


function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $private:t = "hello"
    Do-InnerFunction
}

Do-OutterFunction

Output is empty. No errors but at least $t behaves more like a variable we know from C#/F#.


Set-StrictMode -Version Latest
function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $private:t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
InvalidOperation: C:\Users\...\Temp\44f5ff41-4105-482b-a134-b505049d2c61\test3.ps1:2
Line |
   2 |      Write-Host $t
     |                 ~~
     | The variable '$t' cannot be retrieved because it has not been set.

Finally!


function Do-InnerFunction {
    Write-Host $t
    $t = "world"
    Write-Host $t
}

function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
    Write-Host $t
}

Do-OutterFunction
hello
world
hello

Ah! So variables are copied to the next "scope".


function Do-InnerFunction {
    Write-Host $t
    $global:t = "world"
    Write-Host $t
}

function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
    Write-Host $t
}

Do-OutterFunction
Write-Host $t
hello
hello
hello
world

Now we have created a global $t variable.

This https://ig2600.blogspot.com/2010/01/powershell-is-dynamically-scoped-and.html explains it nicely.

Post22

W chatce w lesie siedzi Pan
Nie odzywa siÄ™ do nikogo bo jest sam
Myśli ciężkie, głowa pogrążona w chorobie
Zaraz zawiśnie na grobie
Wspomnienia zaplÄ…tane same w sobie
Siedzi, mruga, własną głowę zruga
Pora zaraz będzie na spanie
A on wciąż, o matko Boska, gdzie jego posłanie?
Poradzić nic nie może, bo siedzi wciąż na dworze
Robaczki, wykałaczki go wkurwiają
Chciałby uciec jak ten zając
co poradzić temu Panu?
Myślę że to cud że doszedł aż tu
Drogi miał w brud
Co zrobić? Ktr pomoże
Matko boska on wciąż siedzi na dworze
Siedzi, mruga
FajkÄ™ pyka
Tytoń słaby
Jest już cały osiwiały

Post21

blood stains in the snow
you left a few
jumping home
for me to remember the last walk
snow will melt soon
this memory
I will not let fade
you were loved
and you loved us too
of that I'm sure
it's a tough call
to let you sleep
don't fear wherever you go
remember wide beaches
you used to love
we will be there someday to
Plamy krwi na śniegu
zostawiłaś kilka,
wracajÄ…c do domu,
abym zapamiętał ostatni spacer.
Śnieg wkrótce stopnieje,
ale to wspomnienie
nie pozwolę mu odejść.
Byłaś kochana
i kochałeś też nas,
tego jestem pewien.
To trudna decyzja,
pozwolić ci zasnąć.
Nie bój się, dokądkolwiek zmierzasz,
pamiętaj o szerokich plażach,
które tak kochałaś.
Czekaj tam na nas,
W końcu przyjdziemy.

Environment variable

but only in a specific directory

The idea - use the Prompt function to check if you're in a specific dir and set/unset an env var:

function Prompt {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

Extract the special env setting/unsetting to a function:

function SetOrUnSet-DirectoryDependent-EnvironmentVariables {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

function Prompt {
    SetOrUnSet-DirectoryDependent-EnvironmentVariables
}

If your Prompt function is already overwritten by ex. oh-my-posh:

function SetOrUnSet-DirectoryDependent-EnvironmentVariables {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

$promptFunction = (Get-Command Prompt).ScriptBlock

function Prompt {
    SetOrUnSet-DirectoryDependent-EnvironmentVariables
    $promptFunction.Invoke()
}

Why did I need this?

In a repository with several js scrapers run by NODE a few scrape data from misconfigured websites. These websites don't provide the intermediate certificate for https. Your browser automatically fills in the gap for convenience but a simple http client like axios will rightfully reject the connection as it can't verify who it is talking to (see more here)

Solution?

Use NODE_EXTRA_CA_CERTS

  • You configure your production server with NODE_EXTRA_CA_CERTS.
  • When testing locally you get tired of remembering to set NODE_EXTRA_CA_CERTS.
  • You add NODE_EXTRA_CA_CERTS to your powershell profile. Now every time you run anything using NODE (like vs code) you see
    Warning: Ignoring extra certs from `./extra.cer`, load failed: error:02000002:system library:OPENSSL_internal:No such file or directory
    
  • You get annoyed and you ask yourself how to set an environment variable but only in a specific directory

I use this myself here -> the public part of my powershell-profile

Json

Should I use System.Text.Json (STJ) or Newtonsoft.Json (previously Json.NET)?

use STJ, Newtonsoft is no longer enhanced with new features. The author works for Microsoft now on some non-json stuff.

JamesNK reddit comment

Terms

marshal - assemble and arrange (a group of people, especially troops) in order.

"the general marshalled his troops"

marshalling (UK) (in computer science) (marshal US) - getting parameters from here to there

serialization - transforming something (data) to a format usable for storage or transmission over the network

https://stackoverflow.com/questions/770474/what-is-the-difference-between-serialization-and-marshaling

JSON - Java Script Object Notation - data interchange format. https://www.json.org/json-en.html

Why this post

While analysing some logs I used FSharp.Data's JsonProvider. Only a few properties were relevant but JsonProvider stores the whole json in memory. With 10GB of logs to analyse I quick run out of memory.

Let's do some testing!

open System
open System.IO
open System.Text.Json

fsi.AddPrinter<DateTimeOffset>(fun dt -> dt.ToString("O"))

Environment.CurrentDirectory <- __SOURCE_DIRECTORY__ // ensures the script runs from the directory it's located in
// -------------------------------------------------------------------------

// sample log entry for testing
type LogEntry = {
    Timestamp : DateTimeOffset
    Level     : string
    Message   : string
}

// only the properties we're interested in
type LogEntryRecord = {
    Timestamp : DateTimeOffset
    Level     : string
}

let random = Random()
let levels = [ "INFO"; "WARN"; "ERROR"; "DEBUG" ]

let generateLogEntry () =
    {
        Timestamp = DateTimeOffset.Now.AddSeconds(-random.Next(0, 10000))
        Level     = levels.[random.Next(levels.Length)]
        Message   = String.replicate(random.Next(10, 100)) "x" // random string to simulate redundant content
    }

List.init 7_000_000 (fun _ -> generateLogEntry()) // 7M entries is around 1GB of data
|> List.map (fun entry -> JsonSerializer.Serialize(entry))
|> fun lines -> File.WriteAllLines("./logs.json", lines)

let lines = File.ReadAllLines "./logs.json"

let runWithMemoryCheck lines singleLineParser =
    GC.Collect()
    let before = GC.GetTotalMemory(true)
    let x = lines |> Array.map singleLineParser
    GC.Collect()
    let after = GC.GetTotalMemory(true)
    let m = ((after - before) |> float) / 1024. / 1024. / 1024. // GB
    x, m

#time
// -------------------------------------------------------------------------

open System.Text.Json.Nodes

#r "nuget: FSharp.Data"
open FSharp.Data

#r "nuget: FSharp.Json"
open FSharp.Json

type LogEntryJsonProvider = JsonProvider<"""
{
    "Timestamp"        : "2024-12-23T20:51:18.2020753+01:00",
    "Level"            : "ERROR",
    "Message"          : "File not found"
}""">

let fSharpDataJsonProvider = LogEntryJsonProvider.Parse

let fSharpDataJsonValue (x:string) =
    let line = x |> FSharp.Data.JsonValue.Parse
    let t = line.GetProperty("Timestamp").AsDateTimeOffset()
    let l = line.GetProperty("Level").AsString()
    { Timestamp = t; Level = l }

let stjJsonSerializer (x:string) = JsonSerializer.Deserialize<LogEntryRecord>(x)

let stjJsonNode (line:string) =
    let line = line |> JsonNode.Parse
    let t = line.["Timestamp"].GetValue<DateTimeOffset>()
    let l = line.["Level"].GetValue<string>()
    { Timestamp = t; Level = l }

let stjJsonDocument (x:string) =
    use doc = x |> JsonDocument.Parse
    let t = doc.RootElement.GetProperty("Timestamp").GetDateTimeOffset()
    let l = doc.RootElement.GetProperty("Level").GetString()
    { Timestamp = t; Level = l }

let sharpJson (x:string) = Json.deserialize<LogEntryRecord> x

runWithMemoryCheck lines fSharpDataJsonProvider |> snd |> printfn "Memory used: %f GB" // Memory used: 4.420363 GB | Real: 00:00:35.829, CPU: 00:02:07.312, GC gen0: 84,   gen1: 25,  gen2: 8
runWithMemoryCheck lines fSharpDataJsonValue    |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521624 GB | Real: 00:00:16.557, CPU: 00:00:35.281, GC gen0: 29,   gen1: 10,  gen2: 4
runWithMemoryCheck lines stjJsonSerializer      |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521555 GB | Real: 00:00:10.823, CPU: 00:00:44.453, GC gen0: 11,   gen1: 6,   gen2: 4
runWithMemoryCheck lines stjJsonNode            |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521419 GB | Real: 00:00:09.533, CPU: 00:00:27.359, GC gen0: 16,   gen1: 7,   gen2: 4
runWithMemoryCheck lines stjJsonDocument        |> snd |> printfn "Memory used: %f GB" // Memory used: 0.521525 GB | Real: 00:00:06.208, CPU: 00:00:17.546, GC gen0: 5,    gen1: 4,   gen2: 4
runWithMemoryCheck lines sharpJson              |> snd |> printfn "Memory used: %f GB" // Memory used: 0.520846 GB | Real: 00:01:02.761, CPU: 00:01:20.578, GC gen0: 1022, gen1: 260, gen2: 4

Conclusion

  • FSharp.Data.JsonProvider is terrible compared to any other alternative (slow and uses lots more memory)
  • STJ.JsonDocument is the speed winner.
  • FSharp.Json supports F# types but it quite slow

System.Text.Json cheat sheet

System.Text.Json namespaces

  • JsonSerializer -> deserialize into fixed type
  • JsonDocument -> immutable (for reading only)
  • JsonDocument -> faster, IDisposable, uses shared memory pool
  • JsonNode -> mutable (you can construct json)

JsonNode vs JsonDocument see https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom#json-dom-choices

System.Text.Json.JsonSerializer

open System
open System.Text.Json

// The System.Text.Json namespace contains all the entry points and the main types.
// The System.Text.Json.Serialization namespace contains attributes and APIs for advanced scenarios and customization specific to serialization and deserialization.

// System.Text.Json.JsonSerializer -> is a static class
//                                 -> you can instantiate and reuse the JsonSerialization options

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [
        42,
        11
    ]
}"""

type InnerType = {
    NestedProperty: int
}

type DummyType = {
    PropertyName1: string
    PropertyName2: int
    PropertyName3: DateTimeOffset
    PropertyName4: InnerType
    PropertyName5: int list
}

type LogEntryRecord = {
    Timestamp: DateTimeOffset
    Level    : string
}


// # JsonSerializer.Deserialize

// JsonSerializer.Deserialize<'Type>(jsonString)
// JsonSerializer.Deserialize<'Type>(jsonString, options)
// JsonSerializer.DeserializeAsync(stream, ...) <- only streams can be parsed async cuz parsing string is purely CPU bound

// Deserialization behaviour:
//  - By default, property name matching is case-sensitive. You can specify case-insensitivity.
//  - Non-public constructors are ignored by the serializer.
//  - Deserialization to immutable objects or properties that don't have public set accessors is supported but not enabled by default.
//    ^ I'm not sure about this cuz F# records seem to work just fine

JsonSerializer.Deserialize<LogEntryRecord>(jsonString)
// { Timestamp = 0001-01-01T00:00:00.0000000+00:00 Level = null }
// no properties match but JsonSerializer just returns default values

JsonSerializer.Deserialize<DummyType>(jsonString)
// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 2024-12-29T10:31:36.3774099+01:00
//                       PropertyName4 = { NestedProperty = 42 }
//                       PropertyName5 = [42; 11] }

// Deserialization is case sensitive by default!
let jsonString2 = """{
    "propertyName1" : "dummyValue",
    "propertyName2" : 42
}"""
JsonSerializer.Deserialize<DummyType>(jsonString2)
// val it: DummyType = { PropertyName1 = null
//                       PropertyName2 = 0
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }
let options = new JsonSerializerOptions()
options.PropertyNameCaseInsensitive <- true
JsonSerializer.Deserialize<DummyType>(jsonString2, options)
// val it: DummyType = { PropertyName1 = "dummyValue"
//                       PropertyName2 = 42
//                       PropertyName3 = 0001-01-01T00:00:00.0000000+00:00
//                       PropertyName4 = null
//                       PropertyName5 = null }


// # JsonSerializer.Serialize

// let's pretty print during testing
// by default the json is minified
let options = new JsonSerializerOptions()
options.WriteIndented <- true

JsonSerializer.Serialize(options, options)
//val it: string =
//  "{
//  "Converters": [],
//  "TypeInfoResolver": {},
//  "TypeInfoResolverChain": [
//    {}
//  ],
//  "AllowOutOfOrderMetadataProperties": false,
//  "AllowTrailingCommas": false,
//  "DefaultBufferSize": 16384,
//  "Encoder": null,
//  "DictionaryKeyPolicy": null,
//  "IgnoreNullValues": false,
//  "DefaultIgnoreCondition": 0,
//  ...

// Serialization behaviour:
//  - by default, all public properties are serialized. You can specify properties to ignore. You can also include private members.
//  - by default, JSON is minified. You can pretty-print the JSON.
//  - by default, casing of JSON names matches the .NET names. You can customize JSON name casing.
//  - by default, fields are ignored. You can include fields.

System.Text.Json.JsonNode

open System
open System.Text.Json.Nodes

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [
        42,
        11
    ]
}"""

let x = JsonNode.Parse(jsonString) // type(x) = JsonNode
x.ToJsonString()
x.["PropertyName3"].GetValue<DateTimeOffset>()
x.["PropertyName3"].GetPath()
x.["PropertyName4"].["NestedProperty"].GetPath()
x.["PropertyName2"] |> int
// x.["PropertyName3"] |> DateTimeOffset // TODO - why can't I use this explicit conversion?

x["PropertyName4"].GetValueKind() |> string // "Object"
x["NonExistingProperty"] // null
x["NonExistingProperty"].GetValue<int>() // err - System.NullReferenceException
x["PropertyName5"].AsArray() |> Seq.map (fun a -> a.GetValue<int>()) // ok
x["PropertyName5"].AsArray() |> Seq.map int // ok
x["PropertyName5"].[0].GetValue<int>() // ok

// create a json object
let m = new JsonObject()
m["TimeStamp"] <- DateTimeOffset.Now
m.ToJsonString() // {"TimeStamp":"2024-12-29T16:06:17.046746+01:00"}
m["SampleProperty"] <- new JsonArray(1,2)
m.Remove("TimeStamp")

let a = JsonNode.Parse("""{"x":{"y":[1,2,3]}}""")
a.["x"] // this is a JasonNode
a.["x"].AsObject() // this returns a JsonObject
a.["x"].AsObject() |> Seq.map (fun x -> printfn "%A" x) // iterate over properties of the object
a.["x"].ToJsonString() // you can serialize subsection of the json
// {"y":[1,2,3]}

JsonNode.DeepEquals(x, a) // comparison

System.Text.Json.JsonDocument

open System
open System.Text.Json

let jsonString = """{
    "PropertyName1" : "dummyValue",
    "PropertyName2" : 42,
    "PropertyName3" : "2024-12-29T10:31:36.3774099+01:00",
    "PropertyName4" : {"NestedProperty" : 42},
    "PropertyName5" : [
        42,
        11
    ]
}"""

use x = JsonDocument.Parse(jsonString) // remember this is an IDisposable
x.RootElement.GetProperty("PropertyName1").GetString()
x.RootElement.GetProperty("PropertyName2").GetInt32()
x.RootElement.GetProperty("PropertyName3").GetDateTime()
x.RootElement.GetProperty("PropertyName4").GetProperty("NestedProperty").GetInt32()
x.RootElement.GetProperty("PropertyName5").EnumerateArray() |> Seq.map (fun x -> x.GetInt32())

for i in x.RootElement.GetProperty("PropertyName5").EnumerateArray() do
    printfn "%A" i

// you could also write a generic helper

type JsonElement with
  member x.Get<'T>(name:string) : 'T =
    let p = x.GetProperty(name)
    match typeof<'T> with
    | t when t = typeof<string> -> p.GetString() |> unbox
    | t when t = typeof<int> -> p.GetInt32() |> unbox
    | t when t = typeof<DateTime> -> p.GetDateTime() |> unbox
    | t when t = typeof<JsonElement> -> p |> unbox
    | t when t = typeof<int[]> -> p.EnumerateArray() |> Seq.map (fun x -> x.GetInt32()) |> Seq.toArray |> unbox
    | _ -> failwith "unsupported type"

x.RootElement.Get<string>("PropertyName1")

F# types and json serialization

open System.Text.Json

// Record - OK
type DummyRecord = {
    Text: string
    Num:  int
    }

let r = { Text = "asdf"; Num = 1 }

JsonSerializer.Serialize(r) |> JsonSerializer.Deserialize<DummyRecord>

let tuple = (42, "asdf")
JsonSerializer.Serialize(tuple) |> JsonSerializer.Deserialize<int * string>

type TupleAlias = int * string
let tuple2 = (43, "sfdg") : TupleAlias
JsonSerializer.Serialize(tuple2) |> JsonSerializer.Deserialize<TupleAlias>

// Discriminated Union :(
type SampleDiscriminatedUnion =
    | A of int
    | B of string
    | C of int * string
let x = A 1
JsonSerializer.Serialize(x) // eeeeeeeeeeeeee !

// Option - OK
JsonSerializer.Serialize(Some 42) |> JsonSerializer.Deserialize<int option>
JsonSerializer.Serialize(None) |> JsonSerializer.Deserialize<int option>
open System
type RecordTest2 = {
    Timestamp: DateTimeOffset
    Level: string
    TestOp: int option
    }

// Discriminated Union is supported in FSharp.Json
// https://github.com/fsprojects/FSharp.Json
#r "nuget: FSharp.Json"
open FSharp.Json
let data = C (42, "The string")
let json = Json.serialize data
// val json: string = "{
//   "C": [
//     42,
//     "The string"
//   ]
// }

let deserialized = Json.deserialize<SampleDiscriminatedUnion> json
// val deserialized: SampleDiscriminatedUnion = C (42, "The string")

More on FSharp.Data's JsonValue

#r "nuget:FSharp.Data"
open FSharp.Data

let j = JsonValue.Parse("""{"x":{"y":[1,2,3]}}""")
j.Properties()
// val it: (string * JsonValue) array =
//   [|("x", {
//   "y": [
//     1,
//     2,
//     3
//   ]
// })|]
j.["x"].["y"].AsArray()
j.TryGetProperty "x"

// JsonValue is a discriminated union
// union JsonValue =
//   | String  of string
//   | Number  of decimal
//   | Float   of float
//   | Record  of properties: (string * JsonValue) array
//   | Array   of elements: JsonValue array
//   | Boolean of bool
//   | Null
//
// docs:
// https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-jsonvalue.html
// https://fsprojects.github.io/FSharp.Data/library/JsonValue.html <- if you'll be working with JsonValue read this
//
// there are also extension methods:
// https://fsprojects.github.io/FSharp.Data/reference/fsharp-data-jsonextensions.html
//
// AsArray doesn't fail if the value is not an array, as opposed to other AsSth methods
// See below how extension methods are defined
// source: https://github.com/fsprojects/FSharp.Data/blob/main/src/FSharp.Data.Json.Core/JsonExtensions.fs
open System.Globalization
open System.Runtime.CompilerServices
open System.Runtime.InteropServices
open FSharp.Data.Runtime
open FSharp.Core

[<Extension>]
type JsonExtensions =
    /// Get all the elements of a JSON value.
    /// Returns an empty array if the value is not a JSON array.
    [<Extension>]
    static member AsArray(x: JsonValue) =
        match x with
        | (JsonValue.Array elements) -> elements
        | _ -> [||]

    /// Get a number as an integer (assuming that the value fits in integer)
    [<Extension>]
    static member AsInteger(x, [<Optional>] ?cultureInfo) =
        let cultureInfo = defaultArg cultureInfo CultureInfo.InvariantCulture

        match JsonConversions.AsInteger cultureInfo x with
        | Some i -> i
        | _ ->
            failwithf "Not an int: %s"
            <| x.ToString(JsonSaveOptions.DisableFormatting)

// construct a json object
let d =
    JsonValue.Record [|
        "event",      JsonValue.String "asdf"
        "properties", JsonValue.Record [|
            "token",       JsonValue.String "tokenId"
            "distinct_id", JsonValue.String "123123"
        |]
    |]

d.ToString().Replace("\r\n", "").Replace(" ", "")

// if you want to process the json object
for (k, v) in d.Properties() do
    printfn "Property: %s" k
    match v with
    | JsonValue.Record props -> printfn "\t%A" props
    | JsonValue.String s     -> printfn "\t%A" s
    | JsonValue.Number n     -> printfn "\t%A" n
    | JsonValue.Float f      -> printfn "\t%A" f
    | JsonValue.Array a      -> printfn "\t%A" a
    | JsonValue.Boolean b    -> printfn "\t%A" b
    | JsonValue.Null         -> printfn "\tnull"

Serialize straight to UTF-8

JsonSerializer.SerializeToUtf8Bytes(value, options) <- why does this one exist?

Strings in .Net are stored in memory as UTF-16, so if you don't need a string, you can use this method and serialize straight to UTF-8 bytes (it's 5-10% faster, see link) https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/how-to#serialize-to-utf-8

https://stu.dev/a-look-at-jsondocument/

https://blog.ploeh.dk/2023/12/18/serializing-restaurant-tables-in-f/

https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-apis/?ref=stu.dev - a post from when they introduced the new json API

TODO for myself - watch these maybe

<3 regex

https://regex101.com/r/RdCR7j/1 - set the global flag (g) to get all matches

https://www.debuggex.com/ - havent't played with this a lot but I might give it a try, looks like a decent learning tool

regex - use static Regex.Matches() or instantiante Regex()?

By default use static method.

.NET regex engine caches regexes (by default 15).

Are you using more than 15 regexes and use them frequently and they're complex and you care about a performance?

Investigate Regex() and RegexOptions.Compiled RegexOptions.CompiledToAssembly

Test performance before you optimize

https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-regex#static-regular-expressions

https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regexoptions?view=net-9.0

What is the whole fus about backtracing?

Microsoft's documentation does a bad job explaning backtracking.

Read about backtracking here - https://www.regular-expressions.info/catastrophic.html

To experience backtracing yourself - https://regex101.com/r/1rWKNN/1 - keep on adding "x" to the input and see how the execution time increses - with 35*"x" it takes 5 seconds for the regex to find out it doesn't match!

Code

These are the methods you need:

open System
open System.Text.RegularExpressions


Regex.Matches("input", "pattern")
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline)
Regex.Matches("input", "pattern", RegexOptions.IgnoreCase ||| RegexOptions.Singleline, TimeSpan.FromSeconds(10.)) // you can use a timeout to prevent a DoS attack with malicous inputs
Regex.Match()
Regex.IsMatch()
Regex.Replace()
Regex.Split()
Regex.Count()

let r = new Regex("pattern") // instance Regex offers the same methods
r.Matches("input")
Regex class - https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex?view=net-9.0

Sample:

let matches = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)o")
matches |> Seq.iter (fun x -> printfn "%s" x.Value)
matches |> Seq.iter (fun x -> printfn "%A" x.Groups)
matches.[0].Groups.[1].Value |> printfn "%s"

// Lo             // these are the whole matches
// do             //
// lo             //
// co             //
// seq [Lo; L]    // group 0 is the whole match, group 1 is the (\w)
// seq [do; d]    //
// seq [lo; l]    //
// seq [co; c]    //
// L              // this is the letter captured by (\w)

let matches2 = Regex.Matches("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "(\w)+o")
matches2.[1].Groups.[1].Value |> printfn "%A"
matches2.[1].Groups.[1].Captures |> Seq.iter (fun c -> printfn "%s" c.Value)
// l              // gotcha! the value of the group is the last thing captured by that group
// d              // here the (\w)+ group captures 3 times
// o              //
// l              //
Match object properties:
Match.Success -> bool   | true      | false        |
Match.Value   -> string | the match | String.Empty |
let match3 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[a-z ]+i")
match3.Success |> printfn "%A"
match3.Value   |> printfn "%A"
// true
// "Lorem ipsum dolor si"

let match4 = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "Lorem i[A-Z ]+i")
match4.Success            |> printfn "%A"
match4.Value              |> printfn "%A"
match4.Groups.Count       |> printfn "%A"
match4.Groups.[0].Success |> printfn "%A"
// false
// ""    // notice this is String.empty not <null>
// 1     // even for a failed match there is always at least one group
// false

let mutable m = Regex.Match("Lorem ipsum dolor sit amet, consectetur adipiscing elit", "\wo")
while m.Success do
    printfn "%s" m.Value
    m <- m.NextMatch()

let lines = [
    "The next day the children were ready to go to the plum thicket in the"
    "peach orchard as soon as they had their breakfast, but while they were"
    "talking about it a new trouble arose. It grew out of a question asked by"
    "Drusilla."
]

lines
|> List.filter (fun line -> Regex.IsMatch(line, "the"))
|> List.map    (fun line -> Regex.Replace(line, "(\w+) the", "the $1"))

let text =
    "don't we all love\n" +
    "dealing with different\r\n" +
    "line endings\n" +
    "it's so much fun"
Regex.Split(text, "\r?\n")
|> Array.iter (printfn "%s")

open System.Net.Http
let book = (new HttpClient()).GetStringAsync("https://www.gutenberg.org/cache/epub/74886/pg74886.txt").Result
Regex.Count(book, "[^\w]\w{3}[^\w]") |> printfn "%d" // count 3 letter words

regex - Quick Reference (Microsoft)

https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Cheat sheet

Character escapes

\t     matches a tab \u0009
\r     match a carriage return \u000D
\n     new line \u000A
\unnnn match a unicode character by hexadecimal representation, exactly 4 digits
\.     match a dot (not any character) aka. match literally
\*     match an asterisk (don't interpret * as a regex special quantifier)

Character classes

[character_group]       /[ae]/ will match "a" in "gray"
[^not_character_group]
[a-z] [A-Z] [a-z0-9A-Z] character ranges
.                       wildcard - any character except \n except when using SingleLine option
\w                      word character - upper/lower case letters and numbers
\W                      non word character
\s                      white-space character
\S                      non whitespace character
\d                      digit
\D                      non digit

Anchors

^   $ beginning and end of a string (in multiline mode beginning and end of a line)

Grouping

(subexpression)               (\w)\1 - match a character and the same character again - "aa" in "xaax"
(?<name>subexpression)        named group (?<double>\w)\k<double> - same as above
(?:subexpression)             noncapturing group - Write(?:Line)? - will match both Write and WriteLine in a string
                              (:?Mr\. |Ms\. |Mrs\. )?\w+\s\w+ -> match fist name, last name and optional preceding title
(?imnsx-imnsx: subexpression) turn options on or off for a group
(?=subexp)                    zero-width positive lookahead assertion
(?!subexp)                    negative lookahead
(?<=subexp)
(?<!subexp)                   look behind assertions
                              make sure a subexp is/is not following (but don't match it, ie. don't consume the characters)

Quantifiers

*     0...n (all these are greedy by default -> match as many as possible)
+     1...n
?     0...1
{n}   exactly n
{n,}  at least n
{n,m} n...m
*?
+?
??
{n,}?
{n,m}? question mark makes the match nongreedy (mach as few as possible)

Backreference

\number   match the value of a previous subexpression - (\w)\1 - matches the same \w character twice
\k<name>  backreference using group name

Alternation Constructs

| - any element separated by | - th(e|is|at) and the|this|that both match "the" "this" "that"
    ala|ma|kota - match "ala" or "ma" or "kota"
    ala ma (kota|psa) - match "ala ma kota" or "ala ma psa"
TODO - match yes if expresion else match no

Substitution

$number use numbered group
${name} use named group
$$      literal $
$&      whole match
$`      text before the match
$'      text after the match
$+      last group
$_      entier input string

Inline options

(?imnsx-imnsx)               use it like this at the beginning
(?imnsx-imnsx:subexpression) use for a group
i                            case insensetive
m                            multiline - match beginning and end of a line
n                            do not capture unnamed groups
s                            signle line - . matches \n also
More options are available using RegexOptions enum

Practice regex

https://regex101.com/quiz

https://regexcrossword.com/

https://alf.nu/RegexGolf

Tutorial

I recall reading this tutorial years ago and I liked it - https://www.regular-expressions.info/tutorial.html

Misc

https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

I love regex.

However I used to say "if you solve a problem with regex now you have 2 problems"

Not knowing how this quote came to be I repeated it for years. I'll smack the next person to repeat this quote without elaborating.

If regex did not exist, it would be necessary to invent it.

Why does .Matches() return a custom collection instead of List<Match>?

Historic reasons. Regex was made in .Net 1.0 before generic were a thing.

https://github.com/dotnet/runtime/discussions/74919?utm_source=chatgpt.com

I used (?<!\[.*?)(?<!\(")https?://\S+ with replace [$&]($&) to linkify links in this post

My lovely regex helpers

let regexExtract  regex                      text = Regex.Match(text, regex).Value
let regexExtractg regex                      text = Regex.Match(text, regex).Groups.[1].Value
let regexExtracts regex                      text = Regex.Matches(text, regex) |> Seq.map (fun x -> x.Value)
let regexReplace  regex (replacement:string) text = Regex.Replace(text, regex, replacement)
let regexRemove   regex                      text = Regex.Replace(text, regex, String.Empty)

PowerShell "Oopsie"

Task - remove a specific string from each line of multiple CSV files.

This task was added to the scripting exercise list.

First - let's generate some CSV files to work with:

$numberOfFiles = 10
$numberOfRows = 100

$fileNames = 1..$numberOfFiles | % { "file$_.csv" }
$csvData = 1..$numberOfRows | ForEach-Object {
    [PSCustomObject]@{
        Column1 = "Value $_"
        Column2 = "Value $($_ * 2)"
        Column3 = "Value $($_ * 3)"
    }
}

$fileNames | % { $csvData | Export-Csv -Path $_ }

The "Oopsie"

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file $_ -Append }

This command will never finish. Run it for a moment (and then kill it), see the result, and try to figure out what happens. Explanation below.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The explanation

Get-Content (aka. cat) keeps the file open and reads the content that our command is appending, thus creating an infinite loop.

The fix

There are many ways to fix this this "oopsie"

Perhaps the simplest one is to not write to and read from the exact same file. A sensible rule is when processing files always write to a different file:

ls *.csv | % { cat $_ | % { $_ -replace "42","" } | out-file -path "fixed$($_.Name)" }

Knowing the reason for our command hanging we can make sure the whole file is read before we overwrite it:

ls *.csv | % { (cat $_ ) | % { $_ -replace "42","" } | out-file $_ }
ls *.csv | % { (cat $_ ) -replace "42","" | out-file $_ } # we can also use -replace as an array operator

I'm amazed by github's co-pilot answer for "powershell one liner to remove a specific text from multiple CSV files":

Get-ChildItem -Filter "*.csv" | ForEach-Object { (Get-Content $_.FullName) -replace "string_to_replace", "replacement_string" | Set-Content $_.FullName }

How the way I work/code/investigate/debug changed with time & experience

I use this metaphor when describing how I work these days.

TL;DR;

  1. Quick feedback is king
    • unit tests
    • quick test in another way
    • reproduce issues locally
    • try things out in a small project on the side, not in the project you're working on
  2. One thing at a time
    • experimenting
    • refactoring preparing for feature addition
    • feature coding
    • cleaning up after feature coding
  3. Divide problems into smaller problems
    • and remember - one thing (problem) at a time

Example

You're working with code that talks to a remote API, you want to test different API calls to the remote API.

don't - change API parameters in code and run the project each time you test something. It takes too long.

do - write a piece of code to send an HTTP request, fiddle with this code

do - intercept request with Fiddler/Postman/other interceptor and reissue requests with different parameters


Example

Something fails in the CI pipeline.

don't - make a change, commit, wait for remote CI to trigger, see result

do - reproduce issue locally


Longer read

  1. Quick feedback
  2. do - write a test for it
  3. do - isolate your issue/suspect/the piece of code you're working with
    • it is helpful if you can run just a module/sub-system/piece of your project/system
    • partial execution helps - like in Python/Jupyter or F# fsx
  4. if you rely on external data and it takes time to retrieve it (even a 5-second delay can be annoying) - dump data to a file and read it from the file instead of hitting an external API or a DB every time you run your code
  5. don't try to understand how List.foldBack() works while debugging a big project. Do it on the side.
  6. spin up a new solution/project on the side to test things
  7. occasional juniors ask "does this work this way" - you can test it yourself easily if you do it on the side

  8. One thing at a time

  9. separate refactoring from feature addition
  10. fiddle first, find the walls/obstacles
  11. git reset --hard
  12. refactor preparing for a new feature (can become a separate PR)
  13. code feature
  14. if during coding you find something that needs refactoring/renaming/cleaning up - any kind of "WTF is this? I need to fix this!" try a) or b)
    • a) make a note to fix it later
    • b) fix immediately
      > git stash
      > git checkout master
      > git checkout -b fix-typo
      fix stuff
      merge or create a PR
      git checkout feature
      > git merge fix-typo or git rebase fix-typo
      continue work
      
  15. always have a paper notepad on your desk

    • note things you would like to come back to or investigate
    • it gives me great satisfaction to go through a list of "side quests" I have noted and strike through all of them, knowing I have dealt with each one before starting a new task
    • when investigating something I also note questions I would like to be able to answer after I'm done investigating
      • example: while working with Axios and cookies I found conflicting information about whether Axios supports cookies. After the investigation, I knew that Axios supports cookies by default in a browser but not in Node.js
  16. Divide problems into smaller problems

  17. example - coding logic for a new feature in a CLI tool and designing the CLI arguments - these can be 2 sub-tasks

Big bang vs baby steps

The old me often ended up doing the big bang. Rewriting large chunks of code at once. Starting things from scratch. Working for hours or days with a codebase that can't even compile.

Downsides - for a long time the project doesn't even compile, I lose motivation, I feel like I'm walking in the dark, I don't see errors for a long time - requires a lot of context keeping in my mind since I've ripped the project apart - if I abandon work for a few days sometimes I forget everything and progress is lost

The new me prefers baby steps

Fiddle with the code knowing I'll git reset --hard. Try renaming some stuff - helps me understand the codebase better. Try out different things and abandon them. At this point, I usually get an idea/feeling of what needs to be done. Plan a few smaller refactorings. After them, I am usually closer to the solution and am able to code it without a big bang.