Skip to content

2025

Some neat fsx F#

My company had a hackathon focused on data scraping/processing.

Each team had to scrape 3 endpoints. I came up with something similar to this:

open System
open System.Net.Http
open System.Text

let c = new HttpClient()
c.Timeout <- TimeSpan.FromSeconds(5.0)

let lockObject = new obj()
let printSync text =
    let now = DateTimeOffset.Now.ToString("O")
    lock lockObject (fun _ -> printfn "[%s] %s" now text)

let s = new HttpClient()
s.Timeout <- TimeSpan.FromSeconds(5.0)
s.DefaultRequestHeaders.Add("X-Sender", "this is me, Mario!")
let sendToDestination stream response = async {
    let template = """{
    "CreatedAt": "xxXCreatedAtXxx",
    "Stream": "xxXStreamXxx",
    "Data": [
        xxXDataXxx
    ]
}"""
    let payload = template.Replace("xxXCreatedAtXxx", DateTimeOffset.Now.ToString("O"))
                          .Replace("xxXStreamXxx", stream)
                          .Replace("xxXDataXxx", response)
    let! response = s.PostAsync("http://localhost:8080", new StringContent(payload, Encoding.UTF8, "application/json") ) |> Async.AwaitTask
    sprintf "%s done sending response code %A" stream response.StatusCode |> printSync
}

let scraper (url:string) stream = async {
    while true do
        try
            let! response = c.GetStringAsync(url) |> Async.AwaitTask
            do! sendToDestination stream response
            sprintf "scraped %40s sendTo %s" url stream |> printSync
        with
        | _ -> sprintf "failed to scrape/or send %40s" url |> printSync

        do! Async.Sleep 1000
}

let urls = [
    "https://jsonplaceholder.typicode.com/posts", "123"
    "https://jsonplaceholder.typicode.com/posts", "124"
    "https://jsonplaceholder.typicode.com/posts", "125"
]

urls
|> List.map (fun (url, stream) -> scraper url stream)
|> Async.Parallel
|> Async.Ignore
|> Async.Start

// Async.CancelDefaultToken()

Things to keep in mind:

  • always have a try/catch all exceptions in async/tasks/threads
    • you don't want your thread to die without you knowing
  • always set a timeout when scraping (default timeout in .NET is 100s which is excessive for this script)

A minimalistic http server to listen to our scrapers:

open System.Net
open System.Text

// https://sergeytihon.com/2013/05/18/three-easy-ways-to-create-simple-web-server-with-f/
// run with `fsi --load:ws.fsx`
// visit http://localhost:8080

let host = "http://localhost:8080/"

let listener (handler:(HttpListenerRequest->HttpListenerResponse->Async<unit>)) =
    let hl = new HttpListener()
    hl.Prefixes.Add host
    hl.Start()
    let task = Async.FromBeginEnd(hl.BeginGetContext, hl.EndGetContext)
    async {
        while true do
            let! context = task
            Async.Start(handler context.Request context.Response)
    } |> Async.Start

listener (fun req response ->
    async {
        response.ContentType <- "text/html"
        let bytes = UTF8Encoding.UTF8.GetBytes("thanks!")
        response.OutputStream.Write(bytes, 0, bytes.Length)
        response.OutputStream.Close()
    })

PowerShell Gotcha! - dynamic scoping

PowerShell uses dynamic scoping. Yet the about_Scopes page doesn't mention the word "dynamic".

Wird (wird - so weird that you need to misspell weird to get your point across).

tl;dr;

In PowerShell variables are copied into the stack frame created for the function you're calling. So the "child" function can use your variables but can only modify its own copies. You can avoid this by setting your variable to private $private:varName=... and using Set-StrictMode -version latest to throw an error if "child" functions try to access a undefined variable.

PowerShell uses dynamic scoping. What we know from most programming languages is lexical scoping.


function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
hello

Weird! (this is dynamic scoping)


Set-StrictMode -Version Latest
function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
Set-StrictMode -Off # remember to turn strict mode off for further testing
hello

Weird! (but makes sense since in PowerShell's world this is perfectly legal hence "strict" changes nothing here)


function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $private:t = "hello"
    Do-InnerFunction
}

Do-OutterFunction

Output is empty. No errors but at least $t behaves more like a variable we know from C#/F#.


Set-StrictMode -Version Latest
function Do-InnerFunction  { Write-Host $t }
function Do-OutterFunction {
    $private:t = "hello"
    Do-InnerFunction
}

Do-OutterFunction
InvalidOperation: C:\Users\...\Temp\44f5ff41-4105-482b-a134-b505049d2c61\test3.ps1:2
Line |
   2 |      Write-Host $t
     |                 ~~
     | The variable '$t' cannot be retrieved because it has not been set.

Finally!


function Do-InnerFunction {
    Write-Host $t
    $t = "world"
    Write-Host $t
}

function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
    Write-Host $t
}

Do-OutterFunction
hello
world
hello

Ah! So variables are copied to the next "scope".


function Do-InnerFunction {
    Write-Host $t
    $global:t = "world"
    Write-Host $t
}

function Do-OutterFunction {
    $t = "hello"
    Do-InnerFunction
    Write-Host $t
}

Do-OutterFunction
Write-Host $t
hello
hello
hello
world

Now we have created a global $t variable.

This https://ig2600.blogspot.com/2010/01/powershell-is-dynamically-scoped-and.html explains it nicely.

Post22

W chatce w lesie siedzi Pan
Nie odzywa się do nikogo bo jest sam
Myśli ciężkie, głowa pogrążona w chorobie
Zaraz zawiśnie na grobie
Wspomnienia zaplątane same w sobie
Siedzi, mruga, własną głowę zruga
Pora zaraz będzie na spanie
A on wciąż, o matko Boska, gdzie jego posłanie?
Poradzić nic nie może, bo siedzi wciąż na dworze
Robaczki, wykałaczki go wkurwiają
Chciałby uciec jak ten zając
co poradzić temu Panu?
Myślę że to cud że doszedł aż tu
Drogi miał w brud
Co zrobić? Ktr pomoże
Matko boska on wciąż siedzi na dworze
Siedzi, mruga
Fajkę pyka
Tytoń słaby
Jest już cały osiwiały

Post21

blood stains in the snow
you left a few
jumping home
for me to remember the last walk
snow will melt soon
this memory
I will not let fade
you were loved
and you loved us too
of that I'm sure
it's a tough call
to let you sleep
don't fear wherever you go
remember wide beaches
you used to love
we will be there someday to
Plamy krwi na śniegu
zostawiłaś kilka,
wracając do domu,
abym zapamiętał ostatni spacer.
Śnieg wkrótce stopnieje,
ale to wspomnienie
nie pozwolę mu odejść.
Byłaś kochana
i kochałeś też nas,
tego jestem pewien.
To trudna decyzja,
pozwolić ci zasnąć.
Nie bój się, dokądkolwiek zmierzasz,
pamiętaj o szerokich plażach,
które tak kochałaś.
Czekaj tam na nas,
W końcu przyjdziemy.

Environment variable

but only in a specific directory

The idea - use the Prompt function to check if you're in a specific dir and set/unset an env var:

function Prompt {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

Extract the special env setting/unsetting to a function:

function SetOrUnSet-DirectoryDependent-EnvironmentVariables {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

function Prompt {
    SetOrUnSet-DirectoryDependent-EnvironmentVariables
}

If your Prompt function is already overwritten by ex. oh-my-posh:

function SetOrUnSet-DirectoryDependent-EnvironmentVariables {
    $currentDir = Get-Location
    if ("C:\git\that-special-dir" -eq $currentDir) {
        $env:THAT_SPECIAL_ENV_VAR = "./extra.cer"
    }
    else {
        Remove-Item Env:\THAT_SPECIAL_ENV_VAR
    }
}

$promptFunction = (Get-Command Prompt).ScriptBlock

function Prompt {
    SetOrUnSet-DirectoryDependent-EnvironmentVariables
    $promptFunction.Invoke()
}

Why did I need this?

In a repository with several js scrapers run by NODE a few scrape data from misconfigured websites. These websites don't provide the intermediate certificate for https. Your browser automatically fills in the gap for convenience but a simple http client like axios will rightfully reject the connection as it can't verify who it is talking to (see more here)

Solution?

Use NODE_EXTRA_CA_CERTS

  • You configure your production server with NODE_EXTRA_CA_CERTS.
  • When testing locally you get tired of remembering to set NODE_EXTRA_CA_CERTS.
  • You add NODE_EXTRA_CA_CERTS to your powershell profile. Now every time you run anything using NODE (like vs code) you see
    Warning: Ignoring extra certs from `./extra.cer`, load failed: error:02000002:system library:OPENSSL_internal:No such file or directory
    
  • You get annoyed and you ask yourself how to set an environment variable but only in a specific directory

I use this myself here -> the public part of my powershell-profile