Major Hayden

Fun with docling

major@mhtx.net (Major Hayden) — Fri, 26 Sep 2025 00:00:00 +0000

My team at work does lots of work with retrieval-augmented generation, or RAG, and parsing documents is really painful. It’s so painful that I recently delivered a talk on this very topic at DevConf.US 2025!

Another one of the talks at DevConf.US this year was about docling. We’ve been using it on our team for a while and we really enjoy how it takes challenging documents in various formats and parses them into a single, common document.

I’ll walk you through setting up docling in your project and show you some fun things you can do with it.

Adding docling to a project #

I usually use uv for my Python projects. You can start a new project like this:

pipx install uv
mkdir fun-with-docling
cd fun-with-docling
uv init --package --name doclingfun

Now we can add docling to the project. But first, I prefer to install torch-cpu to avoid downloading lots of unnecessary CUDA libraries if I don’t need them.

Start by adding a torch-cpu source in your pyproject.toml file:

[[tool.uv.index]]
name = "torch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[tool.uv.sources]
torch = { index = "torch-cpu" }
torchvision = { index = "torch-cpu" }

Then add torch, torchvision, and docling to your dependencies:

uv add torch torchvision docling

Verify the installation:

> uv run docling --version
2025-09-26 11:02:48,742 - INFO - Loading plugin 'docling_defaults'
2025-09-26 11:02:48,743 - INFO - Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Docling version: 2.54.0
Docling Core version: 2.48.2
Docling IBM Models version: 3.9.1
Docling Parse version: 4.5.0
Python: cpython-313 (3.13.3)
Platform: Linux-6.16.8-200.fc42.x86_64-x86_64-with-glibc2.41

Parsing a document #

Financial markets are a hobby of mine, so I always enjoy reading research on patterns in the market. There’s a great document that we can start with. Download the PDF and parse it with docling:

curl -L -o enhancing_ohlc_data.pdf https://arxiv.org/pdf/2509.16137
> uv run docling enhancing_ohlc_data.pdf --from pdf --to json --output .
2025-09-26 11:07:59,958 - INFO - Loading plugin 'docling_defaults'
2025-09-26 11:07:59,959 - INFO - Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
2025-09-26 11:07:59,964 - INFO - paths: [PosixPath('/tmp/tmp243lux62/enhancing_ohlc_data.pdf')]
2025-09-26 11:07:59,964 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-09-26 11:07:59,971 - INFO - Going to convert document batch...
2025-09-26 11:07:59,971 - INFO - Initializing pipeline for StandardPdfPipeline with options hash f1301fa0db91f613a1f4baa1a2a11518
2025-09-26 11:07:59,973 - INFO - Loading plugin 'docling_defaults'
2025-09-26 11:07:59,974 - INFO - Registered picture descriptions: ['vlm', 'api']
2025-09-26 11:08:00,241 - INFO - Accelerator device: 'cpu'
2025-09-26 11:08:01,278 - INFO - Accelerator device: 'cpu'
2025-09-26 11:08:05,085 - INFO - Accelerator device: 'cpu'
2025-09-26 11:08:05,381 - INFO - Processing document enhancing_ohlc_data.pdf
2025-09-26 11:08:51,604 - INFO - Finished converting document enhancing_ohlc_data.pdf in 51.64 sec.
2025-09-26 11:08:51,604 - INFO - writing JSON output to enhancing_ohlc_data.json
2025-09-26 11:08:51,667 - INFO - Processed 1 docs, of which 0 failed
2025-09-26 11:08:51,672 - INFO - All documents were converted in 51.71 seconds.

Success! We now have a JSON file that contains the parsed document. (If you’re in a hurry and just want to view the JSON, I’ve uploaded it here.)

Let’s break down how the documents work

Groups and texts #

Groups are collections of related content. Here’s an example:

"groups": [
{
 "self_ref": "#/groups/0",
 "parent": {
 "$ref": "#/body"
 },
 "children": [
 {
 "$ref": "#/texts/53"
 },
 {
 "$ref": "#/texts/54"
 },
 {
 "$ref": "#/texts/55"
 },
 {
 "$ref": "#/texts/56"
 }
 ],
 "content_layer": "body",
 "name": "list",
 "label": "list"
},

This group has four children which are called “texts” and we can tell that this is a list of some sort.

If we look for #/texts/53, we can see what the first item in the list is:

{
 "self_ref": "#/texts/53",
 "parent": {
 "$ref": "#/groups/0"
 },
 "children": [],
 "content_layer": "body",
 "label": "list_item",
 "prov": [
 {
 "page_no": 4,
 "bbox": {
 "l": 86.945,
 "t": 669.104,
 "r": 540.004,
 "b": 644.616,
 "coord_origin": "BOTTOMLEFT"
 },
 "charspan": [
 0,
 187
 ]
 }
 ],
 "orig": "\u00b7 Market relevance : VWAP is widely followed by institutional investors, trading algorithms, and benchmark providers [6] [2]. As such, it is a meaningful and valuable quantity to predict.",
 "text": "Market relevance : VWAP is widely followed by institutional investors, trading algorithms, and benchmark providers [6] [2]. As such, it is a meaningful and valuable quantity to predict.",
 "enumerated": false,
 "marker": "\u00b7"
},

This is a single line of text from the document, but it’s part of a list. You can see the parent relationship back to #/groups/0. This text has a label of list_item which indicates that it’s part of a list. There’s also a marker field so you can extract the bullet point character if you want to.

What does this look like in the original document?

So if you found a piece of text that interest you, you can walk backwards to the group and then higher in the document if needed.

Wandering around documents #

Let’s break out some Python and see what we can do with documents.

Getting child items #

First off, let’s find that "#/groups/0" from earlier and get the texts inside of it:

from docling_core.types.doc.document import DoclingDocument
from rich import print

doc = DoclingDocument.load_from_json("enhancing_ohlc_data.json")

# Extract group 0 and find its children.
my_group = doc.groups[0]
refs = my_group.children

# Loop through the children,
# resolve their location,
# and print the original text.
for x in refs:
 text = x.resolve(doc=doc)
 print(text.orig)

After running this, I get the text from the bulleted list:

· Market relevance : VWAP is widely followed by institutional investors, trading algorithms, and benchmark providers [6] [2]. As such, it is a meaningful and valuable quantity to predict.
· Interval-wide robustness : Unlike point-in-time prices such as the open or close, VWAP summarizes trading activity across the full interval, making it a more stable and representative measure of price.
· Reduced discretization noise : VWAP is a continuous, volume-weighted price and is therefore less affected by the rounding effects of penny-level price changes. In contrast, returns computed from last-trade prices (e.g., close-to-close) are often either exactly zero or
artificially large due to the $0.01 minimum tick increment.
· Weaker arbitrage incentives : Predicting the direction of VWAP changes does not lend itself to straightforward arbitrage. A trader who knows that the next bar's VWAP will be higher than the current one cannot directly profit from this knowledge unless they had already
executed near the current VWAP. As a result, patterns in VWAP returns may persist even in broadly efficient markets.

Getting parent items #

What if we wanted to go the other way around? Perhaps there was something really good in #/texts/55 that we wanted to find in the document.

from docling_core.types.doc.document import DoclingDocument
from rich import print

doc = DoclingDocument.load_from_json("enhancing_ohlc_data.json")

# Extract text 55 and get the parent
my_text = doc.texts[55]
parent_ref = my_text.parent

# Resolve the parent reference
parent_item = parent_ref.resolve(doc=doc)
print(parent_item)

Run this and we get the details of #/groups/0:

ListGroup(
 self_ref='#/groups/0',
 parent=RefItem(cref='#/body'),
 children=[RefItem(cref='#/texts/53'), RefItem(cref='#/texts/54'), RefItem(cref='#/texts/55'), RefItem(cref='#/texts/56')],
 content_layer=<ContentLayer.BODY: 'body'>,
 name='list',
 label=<GroupLabel.LIST: 'list'>
)

Removing items #

You can also remove individual items from a document or certain classes of items. Let’s assume that you don’t want any lists in your document. You can remove them like this:

from docling_core.types.doc.document import DoclingDocument, ListItem
from rich import print

doc = DoclingDocument.load_from_json("enhancing_ohlc_data.json")

total_texts = len(doc.texts)
print(f"Total texts in document: {total_texts}")

list_items = [x for x in doc.texts if isinstance(x, ListItem)]
doc.delete_items(node_items=list_items)

total_texts_after_deletion = len(doc.texts)
print(f"Total texts after deletion: {total_texts_after_deletion}")

Running this shows that the list items were removed:

> uv run python src/doclingfun/main.py
Total texts in document: 338
Total texts after deletion: 324

Converting to other formats #

Finally, you can convert documents to other formats. You can certainly do something simple like this:

doc.save_as_markdown("enhancing_ohlc_data_modified.md")

But what if we wanted our group from earlier serialized to markdown without any other document contents. You can!

from docling_core.transforms.serializer.markdown import MarkdownDocSerializer
from docling_core.types.doc.document import DoclingDocument
from rich import print

doc = DoclingDocument.load_from_json("enhancing_ohlc_data.json")

my_group = doc.groups[0]
serializer = MarkdownDocSerializer(doc=doc)
ser_res = serializer.serialize(item=my_group)
print(ser_res.text)

That results in the same text as before, but now the bullets are removed and replaced with Markdown-style dashes:

- Market relevance : VWAP is widely followed by institutional investors, trading algorithms, and benchmark providers [6] [2]. As such, it is a meaningful and valuable quantity to predict.
- Interval-wide robustness : Unlike point-in-time prices such as the open or close, VWAP summarizes trading activity across the full interval, making it a more stable and representative measure of price.
- Reduced discretization noise : VWAP is a continuous, volume-weighted price and is therefore less affected by the rounding effects of penny-level price changes. In contrast, returns computed from last-trade prices (e.g., close-to-close) are often either exactly zero or
artificially large due to the $0.01 minimum tick increment.
- Weaker arbitrage incentives : Predicting the direction of VWAP changes does not lend itself to straightforward arbitrage. A trader who knows that the next bar's VWAP will be higher than the current one cannot directly profit from this knowledge unless they had already
executed near the current VWAP. As a result, patterns in VWAP returns may persist even in broadly efficient markets.

More to explore #

Docling has lots more features that I haven’t covered here. For example, it can do OCR on images. You can also use LLMs to help with parsing. It also comes with a great hybrid chunker to break documents into smaller pieces for embedding into a RAG database.

Getting podman quadlets talking to each other

major@mhtx.net (Major Hayden) — Thu, 25 Sep 2025 00:00:00 +0000

Quadlets are a handy way to manage containers using systemd unit files. Containers running via quadlets have access to the external network by default, but they don’t automatically communicate with each other like they do in a docker-compose setup. Adding networking only requires a few extra steps.

Setting up some quadlets #

I often need a postgres server laying around on my local machine for quick tasks or testing something I’m working on. Lately, I’ve been focused on RAG databases and that usually involves pgvector.

The pgvector extension adds vector data types and functions to PostgreSQL, which is great for storing embeddings from machine learning models. You can search via all of the usual SQL queries that you’re used to, but pgvector adds new capabilities for searching rows based on vector similarity.

Here’s the quadlet for pgvector in ~/.config/containers/systemd/pgvector.container:

[Unit]
Description=pgvector container
After=network-online.target

[Container]
Image=docker.io/pgvector/pgvector:pg17
Volume=pgvector:/var/lib/postgresql/data
Environment=POSTGRES_USER=postgres
Environment=POSTGRES_PASSWORD=secrete
Environment=POSTGRES_DB=postgres
PublishPort=5432:5432

[Service]
Restart=unless-stopped

This gets a postgres server with pgvector up and running with a persistent volume. It’s listening on the default port 5432.

Sometimes I’m in a hurry and pgadmin4 is a quick way to poke around the database. It’s also a good example here since it needs to talk to the pgvector container. Here’s the quadlet for pgadmin4 in ~/.config/containers/systemd/pgadmin4.container:

[Unit]
Description=pgAdmin4 container
After=network-online.target

[Container]
Image=docker.io/dpage/pgadmin4:latest
Volume=pgadmin4:/var/lib/pgadmin
Environment=PGADMIN_DEFAULT_EMAIL=major@mhtx.net
Environment=PGADMIN_DEFAULT_PASSWORD=secrete
Environment=PGADMIN_CONFIG_SERVER_MODE=False
Environment=PGADMIN_CONFIG_MASTER_PASSWORD_REQUIRED=False
PublishPort=8080:80

[Service]
Restart=unless-stopped

Awesome! Let’s reload the systemd configuration for my user account and start these containers:

systemctl --user daemon-reload
systemctl --user start pgvector
systemctl --user start pgadmin4

We can check the running containers:

> podman ps --format "table {{.ID}}\t{{.Names}}"
CONTAINER ID NAMES
b099fdaa6b18 valkey
f8ab764c299c systemd-pgadmin4
052c160fb45b systemd-pgvector

Testing communication #

Let’s hop into the pgadmin4 container and see if we can connect to the pgvector database:

> podman exec -it systemd-pgadmin4 ping pgvector -c 4
ping: bad address 'pgvector'
> podman exec -it systemd-pgadmin4 ping systemd-pgvector -c 4
ping: bad address 'systemd-pgvector'

This isn’t great. There are two problems here:

The containers aren’t on the same network
I want to refer to the pgvector container as pgvector, not systemd-pgvector

Let’s fix that.

Fixing communication #

Open up the ~/.config/containers/systemd/pgvector.container file and make the two changes noted below with comments:

[Unit]
Description=pgvector container
After=network-online.target

[Container]
# Use a consistent name 👇
ContainerName=pgvector
Image=docker.io/pgvector/pgvector:pg17
Volume=pgvector:/var/lib/postgresql/data
Environment=POSTGRES_USER=postgres
Environment=POSTGRES_PASSWORD=secrete
Environment=POSTGRES_DB=postgres
# Add the container to a network 👇
Network=db-network
PublishPort=5432:5432

[Service]
Restart=unless-stopped

Also open the ~/.config/containers/systemd/pgadmin4.container file and make the same network change:

[Unit]
Description=pgAdmin4 container
After=network-online.target

[Container]
Image=docker.io/dpage/pgadmin4:latest
Volume=pgadmin4:/var/lib/pgadmin
Environment=PGADMIN_DEFAULT_EMAIL=major@mhtx.net
Environment=PGADMIN_DEFAULT_PASSWORD=secrete
Environment=PGADMIN_CONFIG_SERVER_MODE=False
Environment=PGADMIN_CONFIG_MASTER_PASSWORD_REQUIRED=False
# Add the container to a network 👇
Network=db-network
PublishPort=8080:80

[Service]
Restart=unless-stopped

Create the network:

podman network create db-network

Now, reload the systemd configuration and restart the containers:

systemctl --user daemon-reload
systemctl --user restart pgvector
systemctl --user restart pgadmin4

Testing communication again #

Now, let’s hop into the pgadmin4 container and see if we can connect to the pgvector database:

> podman exec -it systemd-pgadmin4 ping pgvector -c 4
PING pgvector (10.89.5.6): 56 data bytes
64 bytes from 10.89.5.6: seq=0 ttl=42 time=0.026 ms
64 bytes from 10.89.5.6: seq=1 ttl=42 time=0.036 ms
64 bytes from 10.89.5.6: seq=2 ttl=42 time=0.034 ms
64 bytes from 10.89.5.6: seq=3 ttl=42 time=0.088 ms

--- pgvector ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.026/0.046/0.088 ms

Perfect! 🎉 🎉 🎉

Extra credit #

If you want to deploy your system with automation and avoid the manual network creation, you can add one extra file to your ~/.config/containers/systemd/ directory. Save this as db-network.network:

[Network]
Label=app=db-network

Monitor system and GPU performance with Performance Co-Pilot

major@mhtx.net (Major Hayden) — Tue, 23 Sep 2025 00:00:00 +0000

I’ve used so many performance monitoring tools and systems over the years. When you need to know information right now, tools like btop and glances are great for quick overviews. Historical data is fairly easy to pick through with sysstat.

However, when you want a comprehensive view of system performance over time, especially with GPU metrics for machine learning workloads, Performance Co-Pilot (PCP) is an excellent choice. It has some handy integrations with Cockpit for web-based monitoring, but I prefer using the command line tools directly.

This post explains how to set up PCP on Fedora and enable some very basic GPU monitoring for both NVIDIA and AMD GPUs.

Installing Performance Co-Pilot #

Install the core packages and command line tools:

sudo dnf install pcp pcp-system-tools

Enable and start the PCP services:

sudo systemctl enable --now pmcd pmlogger
sudo systemctl status pmcd

These two services work together like a team:

pmcd (Performance Metrics Collection Daemon) gathers real-time metrics from various sources on your system when you request them.
pmlogger records these metrics to log files for historical analysis.

You can verify that the services are working as expected:

# Check available metrics
pminfo | head -20

# View current CPU utilization
pmval kernel.all.cpu.user

# Show memory statistics
pmstat -s 5

Adding GPU metrics collection #

I do a lot of LLM work locally and I’d like to keep track of my GPU usage over time. Fortunately, PCP supports popular GPUs through something called a PMDA (Performance Metrics Domain Agent). These are packaged in Fedora, but they have an interesting installation process.

NVIDIA GPUs #

Unverified instructions: I only have an AMD GPU, but I pulled this NVIDIA information from various places on the internet. Please let me know if you find any issues and I’ll update the post!

For NVIDIA GPUs, ensure you have the NVIDIA drivers and nvidia-ml library:

# Check if nvidia-smi works
nvidia-smi

# Install the NVIDIA management library if needed
sudo dnf install nvidia-driver-cuda-libs

Now install the NVIDIA PMDA:

cd /var/lib/pcp/pmdas/nvidia
sudo ./Install

The installer will prompt you for configuration options. Accept the defaults unless you have specific requirements.

Thanks to Will Cohen for helping me get these NVIDIA steps corrected! 👏

After installation, verify GPU metrics are available:

# List all NVIDIA metrics
pminfo nvidia

# Check GPU utilization
pmval nvidia.gpuactive

# Monitor GPU memory usage
pmval nvidia.memused

AMD GPUs #

For AMD GPUs, PCP provides the amdgpu PMDA that works with the ROCm stack:

# Ensure rocm-smi is installed and working
rocm-smi

# Install the AMD GPU PMDA package
sudo dnf install pcp-pmda-amdgpu

# Install the PMDA
cd /var/lib/pcp/pmdas/amdgpu
sudo ./Install

After installation, verify AMD GPU metrics:

# List all AMD GPU metrics
pminfo amdgpu

# Check GPU utilization
pmval amdgpu.gpu.load

# Monitor GPU memory usage
pmval amdgpu.memory.used

Querying performance data #

There are lots of handy tools for querying PCP data depending on whether you need information about something happening now or want to analyze historical trends.

Real-time monitoring with pmrep #

The pmrep tool provides formatted output perfect for dashboards or scripts. It’s great for situations where you need to see what’s happening right now. It’s much like iostat or vmstat from the sysstat package, but you get a lot more flexibility.

# System overview with 1-second updates
pmrep --space-scale=MB -t 1 kernel.all.load kernel.all.cpu.user mem.util.used

# GPU metrics for LLM monitoring (NVIDIA)
pmrep --space-scale=MB -t1 nvidia.gpuactive nvidia.memused nvidia.temperature

# GPU metrics for LLM monitoring (AMD)
pmrep --space-scale=MB -t 1 amdgpu.gpu.load amdgpu.memory.used amdgpu.gpu.temperature

Historical analysis with pmlogsummary #

If you’re used to to running sar commands from the sysstat package, you’ll find pmlogsummary very familiar. Again, you can do a lot more with pmlogsummary than with sar, but the basic concepts are similar.

# Summarize yesterday's GPU utilization (NVIDIA)
pmlogsummary -S @yesterday -T @today /var/log/pcp/pmlogger/$(hostname)/$(date -d yesterday +%Y%m%d) nvidia.gpuactive

# Summarize yesterday's GPU utilization (AMD)
pmlogsummary -S @yesterday -T @today /var/log/pcp/pmlogger/$(hostname)/$(date -d yesterday +%Y%m%d) amdgpu.gpu.load

# Find peak memory usage over the last hour
pmlogsummary -S -1hour /var/log/pcp/pmlogger/$(hostname)/$(date +%Y%m%d) mem.util.used

Troubleshooting tips #

If GPU metrics aren’t showing up:

# Check if the PMDA is properly installed
pminfo -f pmcd.agent | grep -E "amdgpu|nvidia"

# Restart PMCD to reload PMDAs
sudo systemctl restart pmcd

# Check PMDA logs for errors
sudo journalctl -u pmcd -n 50

# Verify GPU drivers are working
rocm-smi # for AMD
nvidia-smi # for NVIDIA

Summarize YouTube videos with Fabric

major@mhtx.net (Major Hayden) — Mon, 22 Sep 2025 00:00:00 +0000

I watch plenty of YouTube videos with instructional content. Some are related to my work while many others involve my hobbies, such as ham radio and financial markets. Lots of them have really useful information in them, but I struggle with taking notes while I’m watching.

It turns out there’s a tool for that! During some face to face work meetings last week, a coworker showed off fabric.

Let’s go through how to use it with some examples.

Installation & setup #

The author, Daniel Miessler, offers lots of installation methods in the README. Fedora users will also need yt-dlp to download YouTube videos:

sudo dnf install yt-dlp

Next, run fabric setup to enter the configuration menu:

> fabric --setup

Available plugins (please configure all required plugins)::

AI Vendors [at least one, required]

	[1]	AIML
	[2]	Anthropic (configured)
	[3]	Azure
	[4]	Cerebras
	[5]	DeepSeek
	[6]	Exolab
	[7]	Gemini
	[8]	GrokAI
	[9]	Groq
	[10]	Langdock
	[11]	LiteLLM
	[12]	LM Studio
	[13]	Mistral
	[14]	Ollama
	[15]	OpenAI
	[16]	OpenRouter
	[17]	Perplexity
	[18]	SiliconCloud
	[19]	Together
	[20]	Venice AI

Tools

	[21]	Custom Patterns - Set directory for your custom patterns (optional)
	[22]	Default AI Vendor and Model [required] (configured)
	[23]	Jina AI Service - to grab a webpage as clean, LLM-friendly text (configured)
	[24]	Language - Default AI Vendor Output Language (configured)
	[25]	Patterns - Downloads patterns [required] (configured)
	[26]	Strategies - Downloads Prompting Strategies (like chain of thought) [required]
	[27]	YouTube - to grab video transcripts (via yt-dlp) and comments/metadata (via YouTube API) (configured)

[Plugin Number] Enter the number of the plugin to setup (leave empty to skip):

In my case, I did the following:

Option 2: Anthropic (Claude Opus)
Option 21: Download the patterns (prompts for the LLM)
Option 22: Set Anthropic as the default AI vendor
Option 27: YouTube setup with an API key from Google

Now it’s time to summarize some videos!

Basic summarization #

There are lots of patterns in the repository to choose from, but let’s just use the summarize pattern for now. One of my favorite people on YouTube for information on financial markets is Chris Ciovacco. He publishes videos weekly on Fridays with lots of inter-market analysis and some helpful trading psychology reminders.

His most recent video has this URL:

https://www.youtube.com/watch?v=eFiLaRNw8XM

We can use the video URL with fabric to get a summary:

> fabric -y "https://youtube.com/watch?v=eFiLaRNw8XM" \
 --stream --pattern summarize

And the summary:

ONE SENTENCE SUMMARY:
Market analysis shows strong uptrend intact with multiple bullish signals supporting potential continued gains through 2029-2033.

MAIN POINTS:

1. Fed cut rates 25 basis points with additional cuts likely in pipeline
2. Market could fall 3-14% without damaging existing uptrend from April low
3. Fed cuts within 1% of highs historically led to gains 16/16 times
4. Dow stocks breadth thrust signal triggered, historically very bullish long-term indicator
5. NASDAQ breadth signals show consistent positive returns over 1-5 year periods
6. Multiple technical indicators suggest keeping open mind about strong outcomes
7. S&P 500 RSI above 45 for 102 days indicates short-term overbought conditions
8. Bank of America predicts secular bull market lasting until 2029-2033
9. Current uptrend remains intact based on volume-weighted average price levels
10. Weight of evidence approach favors patience over frequent trading decisions

TAKEAWAYS:

1. Stay patient during normal 3-14% pullbacks as they won't damage the uptrend structure
2. Multiple rare breadth thrust signals historically produced excellent 1-5 year returns
3. Fed rate cuts near market highs have perfect historical track record of gains
4. Short-term overbought conditions may cause 2% drop but shouldn't affect long-term strategy
5. Secular bull market could extend to 2029-2033 based on historical data patterns

This is accurate and it gives me some good tidbits of information as I think about placing trades for the next week.

What actually happened here?

Fabric downloaded the transcript of the video using yt-dlp.
It sent the transcript to Claude Opus with a prompt asking for a summary.
Claude Opus returned the summary, which fabric printed to the terminal.

Extracting wisdom #

There are so many patterns that come with fabric, but one of my other favorites is extract_wisdom. Here’s the same video as before, but now with the new pattern:

> fabric -y "https://youtube.com/watch?v=eFiLaRNw8XM" \
 --stream --pattern extract_wisdom

There’s much, much more detail here:

# SUMMARY

Stock market analyst reviews Fed rate cuts, technical analysis, and historical data suggesting strong upside potential through 2029-2035 secular bull market.

# IDEAS

- Fed cut rates 25 basis points with additional cuts likely in pipeline ahead
- Market could fall 3-14% without damaging existing uptrend from April low significantly 
- When Fed cuts within 1% of S&P highs, market higher year later 16/16 times
- Average gain nearly 15% when Fed cuts near all-time market highs historically
- Dow stocks breadth thrust above 85% after oversold readings signals strong performance ahead
- After similar breadth signals, S&P 500 higher one year later in every instance recorded
- Median gain 16.2% one year after Dow breadth thrust signals trigger historically
- Three-year returns average 41.45% after rare Dow breadth thrust signals trigger successfully
- Five-year returns average 82% after Dow breadth thrust signals with 76% median gain
- NASDAQ breadth thrust signals just triggered in September 2025 for first time recently
- After NASDAQ breadth signals, one-year median gains reach 18% with 17% average historically
- Three years after NASDAQ signals, every case shows gains with 48% average return
- Five years after NASDAQ signals, every case higher with 84% average gain recorded
- NASDAQ 200-day breadth moves rare but produce 15% one-year gains on average
- Two years after NASDAQ 200-day signals, market gains average over 34% consistently
- Four years after NASDAQ signals, every case higher with 56% average and median
- S&P 500 RSI above 45 for 102 days represents one of longest streaks ever
- After similar RSI streaks, market drops 2.17% average over following two weeks
- Bank of America predicts secular bull market lasting until 2029 or 2033 timeframe
- Secular bull market could extend to 2034-2035 based on historical data patterns
- Midcap relative performance made new 52-week low despite bullish breadth signals recently
- Midcap underperformance by 73% week-to-date following Fed meeting and rate cut
- Volatility remains necessary evil for capturing satisfying long-term investment gains and compounding
- Weight of evidence approach requires flexible, unbiased and open mind for decisions
- Patient investors holding during uptrends benefit when markets make new higher highs

# INSIGHTS

- Historical Fed rate cuts near market peaks consistently predict positive one-year market performance
- Breadth thrust signals across multiple indices suggest broad market participation and strength ahead
- Short-term overbought conditions create minor pullbacks within larger secular bull market trends
- Weight of evidence approach balances bullish signals with realistic volatility expectations for investors
- Secular bull markets driven by demographics can extend much longer than typical expectations
- Technical analysis provides objective framework for distinguishing volatility from trend damage assessment
- Multiple independent signals converging suggests higher probability of sustained market uptrend continuation
- Patient long-term investors benefit most from riding out normal volatility within uptrends
- Historical data patterns repeat with remarkable consistency across different market cycle periods
- Professional investment firms reach similar conclusions when analyzing same objective historical data sets

# QUOTES

- "The market could backtrack quite a bit here. It wouldn't do a lot of damage to the existing uptrend."
- "When the Fed cuts rates within 1% of an S&P 500 all-time high, they've done so 16 times historically."
- "In every single case, 16 for 16, the S&P 500 was higher a year later from the date of the cut."
- "It's very very easy to see green tables and believe that the market is never going to go down again."
- "Volatility is a necessary evil. The ability to ride out volatility is a necessary evil."
- "We're thinking in probabilities rather than certainties."
- "Bank of America came out recently and said that we're in a secular bull market that could last until 2029 or 2033."
- "The objective of all of this is to make money."
- "We don't want to get caught off guard if the market falls 2% over the next two weeks."
- "We want to continue to make decisions based on the weight of the evidence."
- "The only way that we can use the weight of the evidence effectively is if we head into next week and every week with that flexible, unbiased and open mind."

# HABITS

- Monitor health of existing market trends using multiple technical analysis reference points daily
- Use anchored volume weighted average price lines to assess uptrend integrity continuously
- Prepare mitigation strategies in advance for various market pullback scenarios and percentage levels
- Focus on weight of evidence approach rather than single indicators for investment decisions
- Maintain fully invested positions during confirmed uptrends to maximize long-term compound growth
- Trade less rather than more in bull markets to minimize taxable events
- Study historical precedents and patterns to maintain realistic expectations about market volatility
- Review breadth data across multiple indices to assess broad market participation levels
- Analyze relative performance between asset classes to identify leadership and lagging sectors
- Maintain flexible, unbiased mindset when interpreting conflicting market signals and data points
- Prepare mentally for normal volatility to avoid emotional decision-making during market stress
- Use multiple timeframes from weeks to years when analyzing investment performance expectations
- Document and track rare market signals to build database of historical precedents
- Balance bullish signals with bearish data to maintain realistic market outlook perspectives
- Consult multiple professional sources to validate independent analysis and investment thesis conclusions

# FACTS

- Fed has cut rates within 1% of S&P highs 16 times with 100% success rate
- S&P 500 RSI stayed above 45 for 102 straight days, longest streak in history
- After similar RSI streaks, market drops average 2.17% over following two weeks historically
- Dow breadth thrust signals occurred only handful of times since 2001 market data available
- NASDAQ breadth signals show 86% success rate one year later with 18% median gains
- After NASDAQ 200-day signals, four-year performance shows 100% success rate with 56% gains
- Midcap performance relative to S&P 500 recently made new 52-week low this year
- Bank of America predicts secular bull market lasting until 2029 or 2033 based on data
- Dotcom bear market lasted approximately three years from 2000 peak to 2003 retest
- S&P 400 midcap new high/low data available going back only to 2010 currently
- NASDAQ 100 breadth data extends back to 2002 for historical comparison analysis purposes
- Current market uptrend began from April low based on technical analysis reference points
- Fed meeting occurred Wednesday September 17th with 25 basis point rate cut announced
- Wall Street Journal reported all major indices performed well following Fed rate cut decision
- Professional investment analysis reaches similar 2034-2035 secular bull market target dates consistently

# REFERENCES

- JP Morgan historical data on Fed rate cuts near market highs
- Wall Street Journal headlines following Fed meeting coverage
- Bank of America secular bull market research and predictions
- August 8th video covering annotated chart analysis
- Last week's video on six recent stock market signals
- S&P 500, S&P 400, NASDAQ 100, and Dow Jones index data
- RSI (Relative Strength Index) technical indicator analysis
- Anchored Volume Weighted Average Price (AWOP) technical analysis tool
- 150-day and 200-day simple moving averages for breadth analysis
- Christmas Eve market low historical reference point
- COVID market crash and recovery period analysis
- Dotcom bubble and bear market historical comparison
- Financial crisis market low and recovery analysis
- 2010, 2011, 2015, 2016 market corrections and recovery patterns

# ONE-SENTENCE TAKEAWAY

Multiple historical signals suggest secular bull market continues through 2029-2035 despite normal volatility.

# RECOMMENDATIONS

- Maintain fully invested positions during confirmed uptrends to maximize long-term compound growth potential
- Use weight of evidence approach combining multiple signals rather than single indicator decisions
- Prepare mitigation strategies in advance for 3-14% pullbacks without panicking or selling positions
- Focus on one to five-year performance expectations rather than short-term two-week market movements
- Study historical precedents to maintain realistic expectations about normal market volatility patterns ahead
- Monitor breadth data across multiple indices to assess broad market participation and strength
- Keep flexible, unbiased mindset when interpreting conflicting market signals and data points consistently
- Trade less frequently in bull markets to minimize taxable events and maximize position holding
- Use technical analysis reference points to distinguish between volatility and actual trend damage
- Maintain patient approach during normal pullbacks within context of existing secular bull trends
- Analyze relative performance between asset classes to identify leadership and lagging sector rotation
- Document rare market signals to build historical database for future investment decision making
- Balance bullish signals with bearish data to maintain realistic market outlook and expectations
- Consult multiple professional sources to validate independent analysis and investment thesis development conclusions
- Prepare mentally for normal volatility to avoid emotional decision-making during temporary market stress

Getting clarification #

You can take any piece of the summary and query the LLM for a further explanation. If you saved the output to a file, then you can send the output right back to the LLM using the -a option.

Or, you can take a small piece and ask for more detail with a pipe:

> echo "How do I do this myself on TradingView? Analyze relative performance between asset classes to identify leadership and lagging sector rotation" | fabric --stream
Here's how to analyze relative performance and sector rotation on TradingView:

## 1. **Set Up Multi-Asset Comparison Charts**

### Basic Relative Performance Setup:
1. Open a new chart
2. Click the "Compare" button (or press "+" next to symbol)
3. Add multiple assets you want to compare
4. Switch to "Percentage" scale (right-click Y-axis → Percentage)

### Key Asset Classes to Compare:
- **Equities**: SPY, QQQ, IWM, EFA, EEM
- **Bonds**: TLT, IEF, HYG, LQD
- **Commodities**: GLD, SLV, DBC, USO
- **Sectors**: XLF, XLK, XLE, XLV, XLI, XLU, etc.

--- SNIP ---

RAG talk recap from DevConf.US 2025

major@mhtx.net (Major Hayden) — Sat, 20 Sep 2025 00:00:00 +0000

Hello from DevConf.US in Boston, Massachusetts!

I presented yesterday about the challenges of developing systems for retrieval-augmented generation (RAG) with large language models (LLMs). If you’ve been reading tech blogs lately, or the r/rag subreddit, you might think that implementing RAG is as simple as tossing documents into a database and watching the magic happen. (Spoiler alert: it’s not.)

I’ll quickly recap the presentation in this post, but you can also download the slides as a PDF or watch the video on YouTube.

What RAG actually is (and isn’t) #

Using RAG with an LLM is a lot like taking an open-note exam. If you know the concepts really well, but you need to quickly reference specific details, an open-note exam isn’t too bad. However, if you don’t know the material at all, having notes won’t help you much. This means that the LLM that you pair with RAG needs some relevant training in the domain you’re working in.

But the big question is how to increase your odds of successfully delivering coherent, complete, and correct answers when you’re up against this challenge:

This is where RAG can improve your odds.

The Fellowship of the RAG #

Building a RAG system isn’t a solo quest. You need a diverse team, each bringing their own perspectives:

Your senior engineers worry about security and data compliance. They’re asking the tough questions: Is sensitive data protected? Can users access only what they should? Has the data been tampered with?

Junior developers get overwhelmed by the document chaos. Where are documents stored? In what formats? How often do they change? Are they even accurate?

Quality engineers face a new paradigm. Traditional deterministic testing doesn’t work with AI systems that might give different answers to the same question.

And then there’s the AI enthusiast who just read 20 HackerNews articles and wants to implement every new technique. (We all know one.)

The lesson? Your fellowship matters more than your technology stack.

Common pitfalls in the Mines of Moria #

Just like the Fellowship’s journey through the Mines, RAG implementation has its share of monsters lurking in the dark.

Document parsing is harder than it looks. That 10,000-page PDF full of Excel tables, charts, and multi-column text written 15 years ago by someone who no longer works there? Yeah, that’s your Balrog. Tools like Docling can help, but sometimes you need to accept that certain documents should stay buried.

Search strategy matters. You’ll need to choose between keyword search (fast but misses semantic meaning), vector search (understands context but requires expensive embedding), hybrid approaches, or graph-based methods. Start simple – you can always evolve your approach.

Model size affects everything. Smaller models need more accurate RAG context because they have less training to fall back on. Frontier models like Claude Opus or GPT-4o can compensate for lower-quality RAG, but they’re expensive. Choose wisely based on your use case and budget.

The road forward #

After navigating the challenges, here are the critical lessons for building production RAG:

Start with a clear user story. Define who will use your system, what they’ll do with it, and what benefit they’ll get. This becomes your north star when making tough decisions.

Build a continuous improvement pipeline. Set up a process to identify knowledge gaps, refine documents, score results, and only promote good content to production. RAG isn’t a deploy-and-forget solution.

Measure everything that matters. Log queries, responses, similarity scores, and user feedback. You can’t improve what you don’t measure.

Documentation quality trumps RAG sophistication. The best RAG system in the world can’t fix terrible documentation. Garbage in, garbage out – always.

Key takeaway #

If there’s one thing to remember from my talk, it’s this: RAG is not a destination but an ongoing quest. The perfect solution is the enemy of progress. Start small, fail fast, learn constantly, and iterate relentlessly.

The slides from my presentation are available in the PDF file in this repository, and I encourage you to check them out for the full Lord of the Rings journey through RAG implementation.

Building production RAG systems taught me that success comes not from following the latest HackerNews trends, but from understanding your users, respecting the complexity, and assembling the right fellowship for the journey.

Stay on the path, and may your RAG responses be ever accurate.

Automatic container updates with Podman quadlets

major@mhtx.net (Major Hayden) — Fri, 19 Sep 2025 00:00:00 +0000

Running containers at home or in production often means juggling updates across multiple services. While orchestration platforms like Kubernetes handle this automatically, what about those simple deployments on a single host?

Podman’s quadlet system integrates containers directly with systemd, and when combined with automatic updates, you get a robust solution that keeps your containers fresh without manual intervention.

Let’s explore how to set up automatic container updates using Podman quadlets on Fedora, turning container management into a hands-off operation that just works.

Setting up a basic quadlet #

First, let’s create a simple quadlet for running a Valkey database service under a user account. Quadlet files for user services live in ~/.config/containers/systemd/.

Create the directory if it doesn’t exist:

mkdir -p ~/.config/containers/systemd/

Then create your quadlet file:

# ~/.config/containers/systemd/valkey.container
[Container]
ContainerName=valkey
Image=docker.io/valkey/valkey:latest
Label=io.containers.autoupdate=registry
PublishPort=16379:6379
Volume=valkey_data:/data

[Service]
Restart=always

[Install]
WantedBy=default.target

The magic happens with the Label=io.containers.autoupdate=registry line. This label tells Podman that this container should be automatically updated when a newer image is available in the registry.

After creating the file, reload systemd and start your container:

systemctl --user daemon-reload
systemctl --user start valkey.service

Your container is now running as a user systemd service! Check its status with:

systemctl --user status valkey.service
podman ps

Enabling automatic updates #

Podman ships with a systemd timer that checks for container updates. The podman-auto-update.timer runs daily by default, but you need to enable it for your user:

systemctl --user enable --now podman-auto-update.timer

You can check when the next update will run:

systemctl --user list-timers podman-auto-update.timer

When the timer triggers, it runs podman auto-update, which:

Checks all containers with the io.containers.autoupdate label
Pulls newer images if available
Restarts containers with the new image
Keeps the old image in case you need to roll back

Customizing update behavior #

The io.containers.autoupdate label supports different values for various update strategies:

# Always pull the latest image from the registry
Label=io.containers.autoupdate=registry

# Only update if the local image changes (useful for locally built images)
Label=io.containers.autoupdate=local

You can also customize when updates occur by creating a timer override:

systemctl --user edit podman-auto-update.timer

Add these lines to run updates every 6 hours instead of daily:

[Timer]
OnCalendar=
OnCalendar=*-*-* 00,06,12,18:00:00

Monitoring updates #

Track what’s happening with your automatic updates using journalctl:

# View recent auto-update logs
journalctl --user -u podman-auto-update.service -n 50

# Follow updates in real-time
journalctl --user -u podman-auto-update.service -f

# Check a specific container's restart history
journalctl --user -u valkey.service | grep Started

Date driven development

major@mhtx.net (Major Hayden) — Sun, 24 Aug 2025 00:00:00 +0000

Scrum, kanban, and waterfall seem familiar to most of us who work in software development, but date driven development is a complicated beast. These are situations where something must be completed by a certain date that cannot be moved.

For some teams, this could be a product launch at a big company event. An executive might need to present something to investors or analysts. The team might need to have something ready for multiple other teams to use in their own work.

Although these situations can be stressful, it’s one of those great opportunities where everyone has a chance to shine! This post covers some techniques and strategies that you can use (at any level) to help your team reach the finish line.

What goes on the truck? #

A talented executive once took me under their wing and explained how to inspire a team to deliver results with a difficult deadline. He would always ask, “What goes on the truck?”

Imagine a truck delivering your product to the big event. There are a lots of different outcomes, but I’ll boil them down to these three:

The truck arrives on time with everything needed to be successful. We all want this outcome, but it’s not always possible. This should be your team’s main goal and focus.
The truck arrives on time with all of the required items, but some extras are missing. Most of the teams I’ve worked with have landed here. Everything that the customer needs is on the truck, but some of the nice-to-haves are missing. Some of these items could be a “fast follow” that arrive after the event.
The truck doesn’t arrive. You don’t want this.

I once went to a fast food restaurant to order a burger and fries. After sitting down with my drink, I waited for my food and it felt like I waited for a long time. Someone came over and said:

“Here’s your burger, but the fries are running behind. We have a new fry cook and she’s still learning. I’ll have your fries over as soon as they’re done.”

What happened there? I received the most important item (the burger), but not the extras (the fries). That was fine since I came to the restaurant for a burger anyway and the fries were just a nice-to-have. The fries were on the way, but it didn’t interrupt the enjoyment of eating the burger. The fries came along shortly and I was happy.

You can do the same thing with your product:

Deliver the base functionality that provides value for your customer.
Ship as many extras as you can. Identify the extras that deliver the most value for the least amount of effort. Work on those first.
Be willing to drop an extra or say “no” to something that could jeopardize the delivery of the base product.

All of this works a lot better if you communicate often about the extras, the value they provide, and the development effort required. That’s the next section!

Clustered, candid communication #

Everyone on the team and everyone who the team depends on must communicate effectively – and often. One of the biggest downfalls for any team under stress is a lack of communication. This causes a mismatch in expectations within the group.

Here are a few simple things you can do to improve your team’s communication:

Talk about all deadlines often. This keeps everyone on the team in the loop about important dates, especially those they might not be aware of. For example, development teams might not know when quality engineering teams need final builds to test. Product managers need early builds to share with pilot customers and sales teams.
Cluster your communication. Interruptions throw a wrench into a team’s productivity. Cluster communications to a certain time of day or a certain day of the week. Every meeting should have a detailed agenda and clear outcomes. Each attendee must know what is expected of them at each meeting.
Prioritize blockers. If someone on the team is blocked, they need to communicate that immediately. Treat these problems with the highest priority and identify what you need to resolve the issue. Blockers are extremely demotivating and can cause issues to fall through the cracks.
If you see something, say something. Encourage everyone to ask questions and raise their concerns. I’ve had good experiences with an “around-the-horn” approach at the end of a meeting where the organizer calls out attendees one by one to to see how they’re feeling.

Don’t forget about communication outside the team!

For outbound communication, find ways to share meaningful status updates with other groups and leaders. This can head off unnecessary interruptions by allowing people to quickly scan a status update.

Assign an point person to handle inbound communications from other teams and rotate that role often (weekly or bi-weekly). This person can triage issues and raise concerns with the team when necessary without distrupting the team’s focus.

Set the ground rules #

This one is often forgotten. It’s important to specify what is negotiable and what isn’t as you start the project.

For example, during the early days of the project, teams might be allowed to merge code without full reviews or complete test coverage. As times goes on, the team might tighten the rules, requiring more strict reviews and increased test coverage. Teams might also forego performance optimizations during the early stages so that the base functionality gets done quickly.

Some good ground rule strategies include:

Get agreement before starting. Set some milestones and the requirements at each step. Be sure that each team member knows what is expected of them whether they’re proposing a change or reviewing that change before a merge.
Set build versus buy requirements. It’s often much easier to add modules or libraries written by other people to speed up the development of a project. Some of these are simple, such as HTTP libraries or logging frameworks. However, if a team is going to adopt something significant, such as a database or a UI framework, set some ground rules about evaluating those options.
Identify who is responsible for what. A RACI matrix is extremely helpful here. These simple charts identify everyone’s role in a project, or for a piece of a project. Setting this up early avoids turf battles or situations where someone feels like they might be stepping on someone else’s toes. It also identifies the right people for a potential escalation or decision.

Speaking of escalation, let’s get to the last section!

Escalate early and effectively #

Most date driven development projects have a lot of visibility, especially with leaders. It’s just as critical for leaders to manage the people on the project as it for those people to manage their own leaders.

The last thing any VP wants to hear is that a project is in trouble because the team needed something trivial to complete it. These upward communications are tricky. Every company has their own protocols and culture, and this sometimes is specific to certain parts of a company. However, in my experience, there are some universal things that work well:

Escalate early. Do not wait. Seriously. If there’s a blocker that might prevent the team from delivering, let your leaders know about it. It’s much better to escalate several silly things early than it is to escalate one critical thing late.
Give a little context. Explain exactly what is needed and any costs involved. Be sure to include what will happen if the issue is or is not resolved. Although this might require a little explanation of the background, keep it brief and focus on the aspects your leaders care the most about.
Prove your assessment. A good friend of mine said “lead with the outcome you want.” Once an executive knows the context, they’re looking for your expert guidance on what they should do. Empower them to make the right decision by sharing your assessment of the best course of action along with some alternatives and the disadvantages of each.
Set a deadline. Your leaders have tons of decisions to make with varying timelines and priorities. There’s a big difference between “we need this by the end of the day” and “we need this by the end of the month.”

The SBAR document gives you a great way to structure your communication in concise way that leaders can quickly understand. It gives them an understanding of what you’re doing, the problem you’re up against, and your assessment of the available options. Just remember to keep it to one page and focus on the things your leaders care about the most.

Stuff happens #

No matter what you do, stuff happens. Systems break. People get sick. Plans change.

Do your best to be flexible and set the right expectations. One of my childrens’ preschools had a great motto around snack time that applies here:

You get what you get and you don’t throw a fit.

Your team can only deliver so much. Take any shortcomings or feedback as a learning opportunity for next time. Set up retrospective meetings to identify what went well and what didn’t. Be sure to share these learnings with other teams, too!

Finally, take time to celebrate the win when you finish. I recently worked with a great team on a product that launched during a big keynote presentation at a company event and it was an amazing feeling. Sure, our product had plenty of rough edges and room for improvement, but at that moment, we delivered.

Scrum, sprints, and outcomes

major@mhtx.net (Major Hayden) — Sun, 01 Jun 2025 00:00:00 +0000

Most software developers have come across agile software methodologies such as Scrum. At its core, Scrum’s goal is to help teams deliver software in smaller chunks over a set period of time, called a sprint. Teams should be able to work better together, deliver more frequently, and adapt to changes more easily.

However, Scrum often becomes a theater of activity – story points, velocity charts, and ceremony compliance – that distracts teams from what customers actually need.

What is Scrum? #

This could be a whole post in itself, but in my experience, Scrum revolves around a few core tenets:

The work. Development, quality, and research work is time-boxed into a sprint, most often lasting two weeks. You cobble together a list of tasks (or tickets) together that you believe you can complete within the sprint.
The team. There’s obviously a development team, but there are two other roles involved. The product owner helps with prioritizing work, organizing the backlog of to-do items, and ensuring that the team is working on the right things to deliver value for customers. A scrum master is almost like a specialized project manager who helps the team stay on track, removes roadblocks, and ensures that the team is following Scrum practices.
The ceremonies. I use ceremony here slightly facetiously, but most Scrum teams have a set of meetings that they hold regularly. There are daily standups, sprint planning meetings, sprint reviews, and retrospectives. The goal is to make the next sprint better than the previous one and surface problems.

When this works well, most people on the team are aware of what others are working on and where they might be struggling. Other adjacent teams, such as marketing or sales, can align their work so that they’re fully prepared to bring those products or improvements into customers’ hands.

It’s not all a panacea. Let’s discuss why.

Estimations and exploratory work #

The world has many, many trades where you can estimate the time and complexity of your work before getting started. If you call a plumber for a leaky pipe, they can estimate the cost and time required to fix it with reasonable accuracy. When your car has a flat tire, the mechanic can tell you how long it will take to fix it and how much it will cost.

Software development is inherently exploratory. That’s because it involves building something that didn’t exist previously. Sometimes there are tools and libraries that speed up development, but these often have limitations that require research.

As an example, I recently worked on a project that involved adding lots of documents to a retrieval-augmented generation (RAG) system. This is a method for giving an AI model access to information to answer questions that it didn’t know already.

Another team had great results with a particular database and mechanism for adding documents, so I was able to utilize most of their strategy. Everything looked like it would be straightforward. An easy ticket with a low estimate; who doesn’t love that?

Then things changed.

The database worked well for their use case, but I had no idea that they were working with a small number of high quality documents in a consistent format. My documents had varying structure, varying quality, and we had a lot more of them.
Our RAG database build times were much longer than theirs, so we had to search for infrastructure that would allow us to build ours faster.
Their embedding model for turning text into vectors for a semantic search worked well for their documents but it didn’t work for ours.
We found a better database, but then we had to find out how we could deploy and manage it following our company’s procedures.

What we thought would be done in a couple of weeks suddenly took a couple of months. We had to come up with new strategies and also think differently about what we would do before and after the RAG search to improve its quality.

Time-boxing this work into a sprint was extremely difficult. Estimation was even more difficult because we didn’t know the complexity involved and how long it would take. Some of the complexity questions couldn’t even be answered because we had dependencies on other teams to complete the work.

This means we were breaking Scrum’s core tenets constantly to deliver the features:

We added tickets to sprints after the sprint opened
We carried lots of issues over to the next sprint as we added new tickets to that sprint
Estimates were wildly inaccurate

In the end, we did deliver what was needed, but we often wondered why we were still putting such a heavy emphasis on Scrum.

Interruptions #

Interruptions are a fact of life. The Linux kernel deals with this via aptly named interrupts, and it receives signals that something needs attention. It could be events coming from a keyboard or mouse, network packets waiting to be processed, or any number of timers running on the system. The kernel is well suited to handle many of these smaller interrupts. Larger ones lead to heavy context switching and these strain the system.

Software developers are no different. I work with brand new software developers and interns who need guidance from time to time on how to attack a problem. These interruptions are great! Someone gains a new skill and can do more than they previously could. I also reinforce my own knowledge by teaching them. It’s usually on a topic that I know well and often tangentially related to what I’m working on anyway.

Getting into a meeting with multiple engineers on Google Meet to argue about story points, burn down charts, sprint velocity, and other metrics is a different story entirely.

During a two week sprint at various companies, I’ve found myself spending a decent amount of time doing things other than software development:

Daily standups.¹ Everyone shares what they completed yesterday, what they completed today, and any blockers they have.
Sprint planning meetings. The team brings together their tickets to plan out what will be included in the next sprint. This is often when a sprint is closed and the next one is opened. You have these every two weeks if you’re doing two week sprints.
Sprint reviews and demonstrations (demos). Some organizations combine these two together, but the goal is to share what was completed during the sprint and demonstrate how any new features work.
Retrospective. Once a sprint has finished, the team meets to discuss what worked, what didn’t work, and what needs to be changed in the next sprint. These often occur once the next sprint has already started, so the team is already in the middle of the next sprint while discussing the previous one.

Standups are excellent, especially for teams with new developers. The other meetings can quickly become a burden.

If we assume standups are 10 minutes per day and the remaining meetings all last an hour each, that’s just under five hours of meetings per two week sprint. That doesn’t include other meetings, such as team meetings, company wide meetings, one-on-ones, and mentorship.

You might think that’s not terrible, but consider that these tickets must be written, estimated, reviewed, and prioritized prior to the sprint planning meeting. Demos must be built and prepared for the sprint review. Retrospectives require the team to think ahead of time about their work before the meeting and then think about how to implement changes after the meeting.

These interruptions can quickly stack up. When you combine software development’s exploratory nature with constant ceremony interruptions, you create a recipe for burnout. We should be focused on delivering an outcome or helping a teammate deliver an outcome. Scheduling these around sprint ceremonies wastes time and energy.

Activity over outcomes #

When you sit down in your new car for the first time or hold that latest smartphone, you’re probably not wondering how many sprints it took to build it. The look of the company’s velocity or burn down charts probably won’t cross your mind.

As the customer, you’re focused on the outcome.

You name it, I’ve probably done it at one company or another. Scrum. Waterfall. Continuous flow. Kanban.

Any of these turns toxic as soon as the focus is on the activity rather than the outcome. Focusing on the activity means putting tons of weight on the processes, the meetings, and the metrics. It means you say you’re interested in the outcomes, but you don’t practice that from day to day.

If you want to ensure teams deliver process compliance instead of customer value, lean in really hard on the agile process. Developers work around these processes by doing quite a few unhelpful things:

Avoid creating tickets or create far too many tickets.
Locking themselves into a solution prematurely to ensure something gets done within the sprint.
Sandbagging estimates to buy time or make things fit into a sprint.
They redefine what “done” means and then add bugs or refinements to later sprints.
Turn Scrum processes into “meeting theater” where everyone goes through the motions but nobody really cares about the outcome.

None of these benefit the team, the leaders, or the end customer. It also pushes developers to look for other teams or other companies that are more focused on outcomes.

The honest answer is that nobody knows when software will ship. No matter what methodology you use or how hard you push on the agile process, you can’t predict the future². Software development is complex, customer demands change constantly, and new technologies emerge daily. What makes sense on day one of the sprint may not make sense on day 14.

It doesn’t have to be this way.

What do we do? #

Everything that a team does must be focused on outcomes. This isn’t the activity that delivers an outcome, but the outcome itself.

In the past, I worked on a team where we ran Kanban instead of Scrum and we had a “theme” that we worked toward. Kanban is more of a continuous flow methodology with a limit on work in progress items and without a defined time for work. It wasn’t the methodology that made us successful. It was the focus on the theme.

For example, we had some themes such as “deliver feature X” or “improve cost efficiency of Y to x%”. Everyone on the team, including developers, quality engineers, and documentation experts all knew the goal. We could work on whatever we needed to in order to achieve that goal but we could not exceed our work in progress (WIP) limits.

As you might expect, someone said “Hey, what happens if we hit the WIP limit?” Our astute manager at the time knew she had hired talented people who are great at solving tough problems and she was ready with her answer: “That’s for you to figure out.”

Something interesting happened when we hit the WIP limit for the first time. Sure, the column in Jira turned red and someone mentioned it in Slack, but that isn’t what I’m talking about. Someone was freed up on their task and realized they couldn’t pull anything else into the “in progress” column.

They looked in the column for a minute and discovered an issue that was really familiar to them, but it was assigned to someone else. They asked the person working on it if they needed help and the person assigned to the ticket said: “Yeah, I think I’m stuck!” It turned out to be a great teaching and mentorship opportunity.

We saw several benefits from being focused on the outcome and not the activity:

The “stuck” issue didn’t appear in the daily standup because the developer was afraid to raise it. That fear was solved by another developer joining in when the WIP limit was hit.
Team members focused heavily on the “in progress” column and we discovered that we didn’t need standups as often. The goal changed to “let’s figure out these stuck issues” organically.
The theme was our “rally cry” going forward. We all knew what we were working towards and why we were doing it.
Estimating issues turned into more of a discussion of what was involved in solving the issue instead of an exercise in futility.
Our product owner only needed to ensure the “to do” column was cleaned up and prioritized. Everyone knew where they needed to pull work from first.

We still had date constraints, but the dates were related to when we needed to deliver something to the customer, not arbitrary dates that we set for ourselves. We all knew the dates, the goal, and the overall mission. We knew the what, the how, and the why.

Summary #

Scrum often transforms software teams into process performers rather than problem solvers. Lighter processes that reinforce the right behaviors deliver more value for teams and helps them focus on outcomes. When teams are focused on outcomes, they can adapt to changes, solve problems, and deliver more value to customers. When teams know their “why,” they’ll figure out the “what” and “how” without rigid processes.

There are synchronous (everyone gets in a meeting together at the same time) and asynchronous (everyone puts their updates in a central place for later review). The async standups definitely save time, but then you lose the ability to ask questions and many developers forget to read the updates. ↩︎
There is a concept of “date driven development” where something must ship on time, and in that case, you can drop features or capabilities to ensure on time delivery. You just don’t be sure how many features and capabilities the product will have when it ships. ↩︎

Vibe-free coding with AI

major@mhtx.net (Major Hayden) — Wed, 07 May 2025 00:00:00 +0000

The internet has been in quite a ruckus about vibe coding recently. Heck, there’s already a Wikipedia page about it! It must be real if it has a Wikipedia page! 😆

Long story short, vibe coding involves asking a large language model (LLM), the foundation of an AI platform, to write code for you. LLMs are actually quite good at writing code. It turns out that they’re often terrible at understanding nuance or stitching complex code relationships together.

I use AI all the time when I write software, but I wouldn’t call it vibe coding. LLMs in my development environment help me catch errors more quickly, highlight improvements, and reduce time spent on very tedious tasks.

This post covers my use cases for AI while writing code and my current setup.

My setup #

About 90% of my development happens in Visual Studio Code, or VS Code. The rest is in vim. There’s a helpful GitHub Copilot extension for VS Code that integrates really well to help with errors, give auto-complete suggestions, and answer questions.

GitHub offers a totally free GitHub Copilot subscription, but if you’re a maintainer of a popular open source project or an avid open source contributor, they currently offer Pro subscriptions at no cost. This free Pro offering could change at any time, so be prepared for that. 😉

AI use cases #

I use AI in several different ways when I write code.

Fixing errors #

Most of my development is done in Python and I’m almost always coding with ruff and mypy. This means that I run into some linting problems from time to time and sometimes I forget to put a type annotation for functions or arguments. Existing non-AI plugins catch most of these mistakes and usually the fixes are easy. Sometimes they’re difficult.

For those difficult times, it’s handy to ask for some AI help, especially when I haven’t used a specific python module before. VS Code gives me a little sparkle emoji underneath the function name and offers to fix the problem for me.

This has helped recently at work as I’ve worked with llama-index extensively and determining which type is returned from a function can be challenging. Sometimes it’s a base model being returned, but sometimes it’s a different class that inherits a base class. Sure, I could dig through documentation or wade through the llama-index code for that, but that’s tedious work. I’d rather get a suggestion from Copilot and confirm that it’s correct. That’s a lot easier than digging through the code to find the right answer.

Understanding #

How many times have you been working on a complex project and you think:

Who wrote this? I have no idea what’s going on. Why is this even here?

I’ll often highlight the code in question and ask for an interpretation from the LLM. Since Copilot has access to more files in the project, it’s able to connect the dots with functions and methods from other files.

This helps me better understand the project in less time. It also helps me learn new methods for doing things (even if those methods might be terrible).

Testing #

Errors sometimes still sneak into the code even with careful linting and strict type checking. I’ll often ask Copilot to write a test for a function for me with branch coverage. Branch coverage catches those situations where code might be skipped based on an if/else clause and it ensures you’re testing all possible code paths. Sometimes Copilot will write a test that checks a condition I didn’t consider. That’s a great opportunity to return to the original code and think through the logic once more.

There are other situations where a test fails and it’s difficult to understand why. I’ll usually bring up the test on the right and the code on the left to think through the code path in my head. Then there are those times where I ask Copilot to give a suggestion and it points to the exact spot in my code where I didn’t consider a specific condition. Strings and byte strings catch me offsides quite often. 🤭

Improvements #

We’ve all been in that situation where we know what needs to be done, we write a few functions, and then think “Gosh, that seems like too much code” or “That looks convoluted”. I’ll sometimes ask Copilot for a suggestion to simplify a function or block of code. I rarely take the whole suggestion, but it reminds me of patterns I’ve forgotten or it introduces me to new ones I haven’t seen before.

As an example, I was working on some async python code that needed to be wrapped in a timeout. The timeout seemed to be working fine, but then the original function being awaited kept running until it timed out later. That caused exceptions to be thrown via Sentry and it was extremely annoying.

What I really needed is a way to stop the awaited function once the timeout wrapper was reached.

I asked Copilot for help with something like:

This function keeps running after the timeout is reached and it causes another exception. I need to kill the awaited function as soon as the timeout is reached.

Sure enough, Copilot came back with a suggestion to use asyncio.wait_for. I’d never seen that before! The docs highlighted the big difference between wait and wait_for:

If a timeout occurs, it cancels the task and raises TimeoutError.

Perfect! 🎉

Tedious tasks #

There are many occasions where Copilot guesses what I’m thinking next, especially as I build out scaffolding for a new class or write documentation for a function. As an example, I was writing an OpenShift template last month and I was bringing over some templated Deployment and Service definitions. I customized the template with lots of variables for the image source, environment variables, and volume mounts.

OpenShift templates have a long section at the bottom where you define the default values for the variables used in your template. As I began typing the first variable name, Copilot suggested the variable name, a short definition, and the default value. The default value was correct, but the description was a little off. After a quick fix of the description, I moved to the next variable and Copilot started filling in the next one from the template. I gradually just tab completed the remainder of the file.

Many of the descriptions needed some tweaks here and there and the default variables needed to be updated. However, all of that structure that I’d be copying and pasting repeatedly was all done for me. That saved me plenty of time and reduced the chance of making a syntax error.

Conclusion #

GitHub Copilot in VS Code feels like a partner that is looking over my shoulder as I work. I can choose when to engage with a suggestion or ask for additional help. It’s also a great way to get un-stuck when you’re in a tight spot.

None of this is a replacement for fully understanding what you’re being asked to write and being knowledgeable about how your application fits together.

Some might argue that an AI coding assistant is some kind of crutch¹. I’d argue that this depends completely on how you use it. If you use it to write code for you without understanding the code, then yes, it’s a crutch.

If you use it to help you understand the code, catch errors, and improve your code, then it’s a useful virtual partner in your coding adventures. 🧗‍♂️

In the realm of US English, a crutch is a device that helps you walk when you have an injury. Many people use it as a metaphor for something that helps you do something you can’t do on your own. I’ve had people tell me that using VS Code is a crutch. Some people even say that vim is a crutch.

“You should juse use ed and enjoy it!” 😆 ↩︎

Don't tell me RAG is easy

major@mhtx.net (Major Hayden) — Fri, 18 Apr 2025 00:00:00 +0000

Blog posts have been moving slowly here lately and much of that is due to work demands since the end of 2024. I’ve been working on an AI-related product with a talented team of people and we learned plenty of lessons about retrieval-augmented generation, or RAG.

This post covers the basics of RAG, some assumptions I made, and what I’ve learned.

What is RAG? #

Large language models, or LLMs, are trained on huge amounts of information. This information could come from just about anywhere, including books, online resources, and even this blog! There are plenty of ethical questions here, especially around LLM providers that train their models on copyrighted or otherwise restricted material. They gain ground on their competitors in the short run, but this is not ideal in the long run.

Sometimes training a model isn’t feasible. That’s where RAG comes in.

Training a model is expensive. It requires lots of very expensive hardware that consumes a significant amount of electricity. This makes RAG ideal for speeding up development at a lower cost. Developers can quickly update or change RAG data for information that changes rapidly, such as sports statistics, and it avoids the hassle of constantly training a model on new information.

A very simple workflow for RAG would be something like this:

Someone asks a question
Search for relevant information in your RAG database
Add the question and the RAG context to a prompt for the LLM
Send the whole prompt to the LLM for inference

Step two is incredibly difficult.

If you are embarking on the RAG journey for the first time, there’s a lot you need to know at a high level. Let’s get started.

Start with high quality documents #

Imagine that you’re one of the top chefs in the world. You work in a restaurant with Michelin stars. The restaurant is opening for a special Saturday night dinner and your plan is to serve a delicious piece of fish for each guest. You already know how you plan to season the fish and how you’re going to cook it. Your sauces are all ready.

The fish arrives and when the cooler opens, you gasp. “What is that smell?” 😱

What do you do? You have hungry guests on the way. Your sauce is exquisite and you know you can cook the fish perfectly. But how are you supposed to deal with fish that has gone bad along the way?

This is likely the first step in your RAG journey: source document quality.

You might run into quality issues like these:

Widely varying document structures or markup
Documents written in other languages, or written in a language that isn’t the author’s primary language
Large blocks of low readability text, such as kernel core dumps, command line output, or diagrams, that are difficult to parse
Boilerplate language across multiple documents
Metadata at the front of the document or scattered throughout
No structure, markup, or boundaries whatsoever
Incorrect, outdated, or problematic information

This is a garbage in, garbage out problem. RAG can help you match documents to user questions, but if the documents steer the user in the wrong direction, the outcome is terrible.

You have a few options:

Engage with the groups who created or currently maintain the information to make improvements
Use an LLM to summarize or extract information from the documents (some LLMs are good at building FAQs from documents)
Find the highest quality documents in the group and start with those

Once you have a document quality plan together, it’s time for the next step.

Getting documents ready for search #

You have plenty of options at this step, but I’ve had a good amount of luck with a hybrid search approach. This combines a vector (semantic) search along with a traditional full text search.

Vector searches aim to capture the meaning behind the search rather than just looking for keywords. They examine how words are positioned in a sentence and which words are closest together. These searches require a step where you convert a string of text into vectors and this can be time consuming on slower hardware.

Keyword, or full-text, searches are cheaper and easier to run. They’re great for matching specific keywords or an exact phrase.

When you combine both of these together, you get the best of both worlds.

The challenge here is that you need an embedding model to convert strings into a list of vectors. Every embedding model, like an inference model, has a limit on how much text that it can turn into vectors in one shot. This is called the context window and it varies from model to model.

If a model has a 350 token context window, that means it can only handle 350 tokens (close to 350 words) before the model overflows. If you put 450 tokens into this example model, it vectorizes the first 350 and skips the remaining 100. This means you can only do a vector search across the first 350 tokens.¹

This is where chunking comes in. You need to split your documents into chunks so that they fit within the context window of the embedding model. However, there are advantages and disadvantages to larger or smaller chunks:

Larger chunks preserve more of the text from your documents and make it easier to find document/chunk relationships. Your vector database is a little smaller and you can get better results from queries that are more broad.
Smaller chunks give you a more precise retrieval and save you money at inference time since you’re sending a little less context to the LLM. They’re better for specific questions and they lower the risk of hallucination since you’re providing context that is more specific.

There are plenty of options to review here, especially with how you create chunks and set overlaps between the chunks. I really like where unstructured is headed with their open source library. You can partition via different methods depending on the document type and then split within the partitions.

This avoids situations where you might put a piece of chapter one with chapter two just because that’s where the chunks happened to split. Partitioning on chapters first and then splitting into chunks keeps the relevant information together better.

Time to search #

I’ve talked about hybrid searches already, but there’s an excellent guide from Pamela Fox that gives you a deep dive into RAG from “I know nothing” to “I can do things!” in 17 slides. This is a great way to visualize what is actually happening behind the scenes for each search type. Be prepared for calculus! 🤓

In a perfect world, your search results should:

Return the smallest amount of matching chunks to avoid confusing the LLM (and to consume fewer tokens)
Only provide chunks that are relevant to the user’s question
Contain all of the needed steps for a process (give all steps of a recipe instead of just the second half)
Complete very quickly to avoid keeping the user waiting

This can be tricky with certain languages, especially English. For example, if someone asks “How do I keep a bat from flying away?”, what are they talking about?

Are they near a bat (the flying mammal) that they want to trap and keep from flying away? 🦇
Are they playing a game of baseball and the bat keeps slipping from their hands as they swing? ⚾

Vector searches help a lot with these confusing situations, but they’re not perfect. Let’s look at a way to improve them next.

Refine the user’s question #

If you can budget some extra tokens to refine the user’s question prior to searching, you can improve your RAG searches substantially.

Let’s keep the last example going and assume you have an AI service that handles baseball questions. In your case, you could take the user’s question and clarify it using an LLM. Here’s an example prompt (that is totally untested):

You are a helpful AI assistant and an expert in the game of baseball.
Use the question provided below to create five questions which are
more specific and relevant to baseball terminology.

Question: How do I keep a bat from flying away?

The LLM might reply with something like this:

1. How can I keep my hands tacky so the bat will stay in my hands
 when I swing?
2. How can I keep a better grip on a bat when my hands are sweaty?
3. What coverings or other materials can I add to a bat to get
 a better grip?
4. Can I adjust my swing to avoid losing my grip on the bat?
5. Are there exercises I can do to avoid letting the bat go when
 I swing?

These refined questions offer more ways for vector searches to match document since there is more meaning to search through. LLMs often add more relevant keywords to the questions and that can be helpful for keyword searches, too.

You can then adjust your workflow to something like this:

Receive a question from the user
Refine the question using an LLM
Use the refined questions to search the RAG database
Add the context and the original question to the prompt (we want to maintain the user’s original intent!)
Send the prompt to the LLM

LLMs can hallucinate and make bad choices, so always send the user’s original question to the LLM rather than the refined questions. If the LLM hallucinates on the question refinement step, the issue is contained to the RAG search rather than the RAG search and inference.

Extra credit #

There are plenty of ways to tweak this process depending on your time and budget.

Categories #

You can break your documents into categories and infer what the user is asking about to narrow your search. Imagine you needed to answer questions for all sports. You might break up your documents into categories for each sport and you could limit your searches to that sport. An LLM might be able to help you quickly determine which sport the user is asking about and then you can narrow your RAG search to only that sport.

Get the whole document #

Let’s day you do a RAG hybrid search and your top 10 results brings back 7 chunks from the same document. In that case, that entire document or portion of the document is likely really useful for the user. You might want to build in some functionality that retrieves the whole document, or perhaps the whole chapter/section, in these situations and sends it to the LLM.

Prioritize documents #

Certain documents in your collection might have a higher priority over others. For example, if you have support teams that track which documents they refer customers to most often, those documents should be weighted higher in your results. You might be able to examine web traffic to tell you which articles on your site are accessed most often. Those documents would be great for a higher weight.

Links to source #

Document quality issues might lead you to put short summaries of documents in your RAG database. Adding a source link to the original page is helpful because an LLM would be able to answer the question at a high level and refer the user to a specific page for a deeper dive.

RAFT #

RAG and fine tuning (RAFT) is another good option if your budget allows for it. You can train your LLM on your high quality documents (quality is more important with RAFT) and provide additional documents via RAG when needed. For example, you might fine tune your model on sports data from last season and earlier. You could then provide the current season’s data via RAG. This gives you great responses on historical data while allowing you to quickly update your current data with the latest games.

Summary #

RAG isn’t easy. 😂

Don’t let anyone tell you it’s easy. You would never take a stack of documents, cut them into pieces, throw them into a box, and put them in front of a brand new hire at your company. Don’t try to do the same thing with RAG. RAG is a quick way to find all of the places in your organization where old data has been ignored. 🤭

Start with high quality documents that have meaningful, relevant information for your users. Get them into a database where you can search them quickly and efficiently.

Once they’re in the database, get creative about how you search them. Simply returning relevant chunks is a good start. From there, look to see how you can take those results and expand upon them.

Good luck in your adventure! 🧗‍♂️

Lots of hand-waving happening here. 😆 Every model counts tokens differently and you may need to look for a specific tokenizer to know how many tokens can fit within the window. ↩︎

Viewing Xorg logs with journalctl in Fedora

major@mhtx.net (Major Hayden) — Sun, 16 Feb 2025 00:00:00 +0000

I love being an early adopter and trudging off into the unknown. After all, that’s one of the best ways to learn new things and you end up improving the experience for everyone who comes behind you. However, things can get a little frustrating from time to time especially when your daily work dictates that your desktop works really well. 😉

Sway has been my desktop of choice for a few years and although it seems to work well, I ran into lots of issues with Wayland. It was easy to plot a course around most of these problems, but not all of them.

I’ve recently run back to safety with my old, trusty, i3 window manager in Xorg. Then I realized a few of my Xorg configuration weren’t taking effect and I couldn’t figure out how to isolate the Xorg logs in the system journal to narrow down the problem.

Skip to the end if you’re short on time or peruse the next section if you haven’t been deep in the innards of your system journal in a while. 🔍

Journal metadata #

Every journal entry in journald has metadata attached to it which you can use to filter the logs. Most people know about filtering based on systemd services, like this:

> journalctl --boot --unit chronyd.service | head
systemd[1]: Starting chronyd.service - NTP client/server...
chronyd[2662]: chronyd version 4.6.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 +DEBUG)
chronyd[2662]: Using leap second list /usr/share/zoneinfo/leap-seconds.list
chronyd[2662]: Frequency -3.595 +/- 6.086 ppm read from /var/lib/chrony/drift
chronyd[2662]: Loaded seccomp filter (level 2)
systemd[1]: Started chronyd.service - NTP client/server.
chronyd[2662]: Added source 192.168.10.1
chronyd[2662]: Selected source 208.67.72.50 (2.fedora.pool.ntp.org)
chronyd[2662]: System clock TAI offset set to 37 seconds
chronyd[2662]: Selected source 173.73.96.68 (2.fedora.pool.ntp.org)

This command shows all of the messages from the chronyd service since the last boot. However, we can get much more specific with our filtering using other criteria.

Examining metadata #

You can examine the metadata behind each log line with the json output in journalctl:

> journalctl --boot --unit chronyd.service -o json -n 1 | jq
{
 "_CMDLINE": "/usr/sbin/chronyd -F 2",
 "_SYSTEMD_CGROUP": "/system.slice/chronyd.service",
 "_MACHINE_ID": "xxxxx",
 "_UID": "990",
 "SYSLOG_TIMESTAMP": "Feb 16 13:55:59 ",
 "__SEQNUM_ID": "c94633ee6da2480ca4602ca6ab47f82a",
 "_PID": "2662",
 "PRIORITY": "6",
 "_HOSTNAME": "zorro",
 "_SYSTEMD_SLICE": "system.slice",
 "SYSLOG_FACILITY": "3",
 "_GID": "989",
 "_SYSTEMD_INVOCATION_ID": "156fce8836374564b01aeb6628160ccb",
 "__CURSOR": "s=c94633ee6da2480ca4602ca6ab47f82a;i=19fb3a;b=ed9e1fccb1744136a3d726bbf2425388;m=33abfdf67;t=62e47cbf2b72f;x=f9a8d4e3bc4fef30",
 "__MONOTONIC_TIMESTAMP": "13870554983",
 "_SOURCE_REALTIME_TIMESTAMP": "1739735759501027",
 "_TRANSPORT": "syslog",
 "_EXE": "/usr/sbin/chronyd",
 "_SYSTEMD_UNIT": "chronyd.service",
 "SYSLOG_IDENTIFIER": "chronyd",
 "_BOOT_ID": "ed9e1fccb1744136a3d726bbf2425388",
 "__REALTIME_TIMESTAMP": "1739735759501103",
 "__SEQNUM": "1702714",
 "_RUNTIME_SCOPE": "system",
 "SYSLOG_PID": "2662",
 "_CAP_EFFECTIVE": "2000400",
 "MESSAGE": "Selected source 173.73.96.68 (2.fedora.pool.ntp.org)",
 "_COMM": "chronyd",
 "_SELINUX_CONTEXT": "system_u:system_r:chronyd_t:s0"
}

The most helpful one for us is _COMM_. We can use it to limit our search solely to Xorg logs.

Every Xorg startup has a line with the Xorg version that looks like this: X.Org X Server 1.21.1.15. Let’s search for that:

> journalctl --boot -o json | grep -i "x.org x server" | jq
{
 "_HOSTNAME": "zorro",
 "_BOOT_ID": "ed9e1fccb1744136a3d726bbf2425388",
 "_SYSTEMD_INVOCATION_ID": "5cf933063b1246909d4ea15e7154bff4",
 "_MACHINE_ID": "xxxxx",
 "__CURSOR": "s=c94633ee6da2480ca4602ca6ab47f82a;i=19de06;b=ed9e1fccb1744136a3d726bbf2425388;m=4597347;t=62e2f4fcc855d;x=b4f482006e1523ff",
 "__MONOTONIC_TIMESTAMP": "72971079",
 "_SYSTEMD_SESSION": "2",
 "_SYSTEMD_SLICE": "user-1000.slice",
 "_SELINUX_CONTEXT": "system_u:system_r:xdm_t:s0-s0:c0.c1023",
 "__SEQNUM_ID": "c94633ee6da2480ca4602ca6ab47f82a",
 "_AUDIT_LOGINUID": "1000",
 "__REALTIME_TIMESTAMP": "1739630597408093",
 "_CMDLINE": "/usr/libexec/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -novtswitch -verbose 3",
 "_AUDIT_SESSION": "2",
 "_SYSTEMD_USER_SLICE": "-.slice",
 "__SEQNUM": "1695238",
 "_GID": "1000",
 "_RUNTIME_SCOPE": "system",
 "SYSLOG_IDENTIFIER": "/usr/libexec/gdm-x-session",
 "MESSAGE": "X.Org X Server 1.21.1.15",
 "_STREAM_ID": "7f35e3ce14d44dc8b589be76d4d355d9",
 "_TRANSPORT": "stdout",
 "_CAP_EFFECTIVE": "0",
 "_SYSTEMD_UNIT": "session-2.scope",
 "_UID": "1000",
 "_EXE": "/usr/libexec/Xorg",
 "PRIORITY": "4",
 "_COMM": "Xorg",
 "_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/session-2.scope",
 "_SYSTEMD_OWNER_UID": "1000",
 "_PID": "3929"
}

Note that the value for _COMM is Xorg. We can use that to search our logs with ease using the cat output from journalctl, which makes the output as terse as possible. It removes all the headers and make it look like you’re reading a plain old text log file:

> journalctl --output cat --boot _COMM=Xorg | head
(--) Log file renamed from "/home/major/.local/share/xorg/Xorg.pid-3929.log" to "/home/major/.local/share/xorg/Xorg.0.log"
X.Org X Server 1.21.1.15
X Protocol Version 11, Revision 0
Current Operating System: Linux zorro 6.12.13-200.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Feb 8 20:05:26 UTC 2025 x86_64
Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.12.13-200.fc41.x86_64 root=UUID=bae22798-ce48-43e9-ac24-7bf7f7158e90 ro rootflags=subvol=root rd.luks.uuid=luks-defea11e-374c-48ab-83df-4f06c4c02186 rhgb quiet
Build ID: xorg-x11-server 21.1.15-1.fc41
Current version of pixman: 0.44.2
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,

In my particular case, I was missing the amdgpu driver for Xorg. I installed the xorg-x11-drv-amdgpu package, rebooted, and now my logs showed the driver being loaded on startup:

> journalctl --output cat --boot _COMM=Xorg | grep -i amdgpu | head
(II) Applying OutputClass "AMDgpu" to /dev/dri/card1
	loading driver: amdgpu
(==) Matched amdgpu as autoconfigured driver 0
(II) LoadModule: "amdgpu"
(II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
(II) Module amdgpu: vendor="X.Org Foundation"
(II) AMDGPU: Driver for AMD Radeon:
	All GPUs supported by the amdgpu kernel driver
(II) AMDGPU(0): Creating default Display subsection in Screen section
(==) AMDGPU(0): Depth 24, (--) framebuffer bpp 32

Repairing 4Runner skid plate bolts

major@mhtx.net (Major Hayden) — Tue, 08 Oct 2024 00:00:00 +0000

I replaced my old Toyota 4Runner with a new one so I could snag the last run of the 5th generation. It’s a tough, reliable vehicle with just enough space for my family and our pets.

However, this new one came with the same problem as my old one. All of the bolts that hold in the front skid plate were mostly stripped. Getting them out was difficult and I knew getting them back in would be worse.

I’ll cover how to fix it in this post. If you’re in a hurry, scroll past the next section! Otherwise, let’s get a little backstory.

Backstory #

All 4Runner models are made in Japan in the Tahara Plant. The build quality is fantastic and you can tell that they’re built with care. You can even find brief notes in Japanese inside the fenders or underneath the car where someone jotted some notes about something.

After careful assembly in Japan, they head to various US ports where American workers add on any extra items that come with the trim level. That could include an upgraded exhaust, different wheels, or in my case, skid plates.

The skid plate is a sturdy piece of steel that mounts under the front of the vehicle and protects lots of important components from damage. You can see it under the front bumper in this photo:

Skid plate on a white 4Runner

If you’ve ever owned a Toyota, you know that the factory is strict about torque applied to various bolts all over the vehicle. All of that gets thrown out the window when the workers at the US ports add on accessories.

Based on all the complaints I’ve seen across various 4Runner forums, they must use air wrenches or some kind of impact wrench to put on the bolts. If the bolt isn’t in straight when they start, it destroys the bolt and causes problems with the mount holes. They also tighten the bolts far past the acceptable torque specs.

To make matters worse, if you take your car in for an oil change at most places, they’ll have an impact wrench handy to ruin the bolts a bit more for you.

Root cause #

If you have bolts that are getting stripped in the mount holes or they’re getting stuck as you try to bring the bolts in or out, you likely have chunks of metal from the bolts wedged in the threads of the bolt holes.

Ingredients #

The fix is quite cheap but very tedious. You’ll need a few parts to get started:

Irwin Hanson 12002 T-Handle Tap Wrench (1/4" to 1/2"): This T-Handle wrench allows you to easily spin the tap screw to clear the metal fragments from your bolt holes. Don’t use a socket set, drill, screwdriver, or anything powerful! You want to take this slow.
Irwin Tap 10-1 25mm Plug: The bolts that go into the holes are M10 bolts with a 25mm thread, so this tap should fit perfectly.
Toyota part PT938-00140-AA: This includes four new bolts with spacers and retaining washers to replace your stripped bolts.
14mm socket and socket wrench OR a 14mm wrench: You’ll need this for removing the bolts and dealing with the hardware for the skid plate.
Some type of lubricant. I used WD-40, but don’t tell anyone. People love to fight about whether WD-40 is a solvent, a grease, or a lubricant. 🤷‍♂️

What’s hilarious is that if you load the Amazon page for the Toyota part, it shows that everyone is buying tap screws and T-Handle wrenches:

Even Amazon knows these bolts are a problem! 😆

Fix the bolt holes #

First things first, you’ll need to get that skid plate off. I strongly recommend taking the rear bolts off first. If the bolts get stuck on the way out, take your time. I found that rocking in the other direction briefly and then trying to loosen them again seemed to work.

With the rear bolts out, move to the bolts closest to the front of the car. Put a box or something sturdy underneath the skid plate that allows it to drop when it’s loose but prevents it from knocking out one of your teeth when it falls. 🤕

When you loosen the front bolts, try loosening one of them 4-5 turns and then go to the other one. Keep going back and forth loosening the bolts until they loosen from the frame of the car. There are retaining washers on the top side of the bolt and removing those bolts aggressively will slide the retaining washers right off the bolt.

Now you’re ready to tap! 👏

Start in with the bolt holes in the rear and spray a decent amount of lubricant in the bottom and top of the bolt hole. Get your tap into the T-Handle wrench and slowly start turning it in the bolt hole like you were installing one of the bolts.

🛑 When you hit resistance, only go 1/2 to 1 turn further. Then back up 2-3 turns. This means you’ve dislodged some metal fragments in the threads!

Working on one of the back holes 💪

After backing up a bit, keep screwing it in further until you hit more resistance. Only go 1/2 to 1 turn more, then back out 2-3 turns. Keep doing this until your tap shows up out of the top side of the bolt hole.

With your tap sticking out of the top of the hole, grab a shop towel or paper towel and clear all of the metal filings away from the top of the hole. Then back the tap all the way out and clean your tap screw. It’s likely going to be covered in black shavings.

If you want to be really thorough, lubricate the hole once more and keep working the tap until the threads feel really smooth. I added some lubricant and fed a new bolt in through the top using finger strength only until I knew the threads were clear.

You can test feed a new bolt through the bottom to verify that you’ve done a good job. Don’t use the old bolts for this. You’ll just get more junk in the threads again. 😭

Repeat these steps for the other four holes and you should be all set.

Replace the skid plate #

Be sure to discard the old bolts to avoid causing more problems for yourself. Re-assemble the new front bolts with the spacer and retaining washer just like they were when you took the skid plate down.

Get the back bolts going in first and get them in about halfway. Start working on the front bolts after that.

When all four bolts are in, grab your torque wrench and tighten them to 22 ft/lbs or 30 Nm as specified in the manual:

Always check the instructions for torque specs! 🔧

If you don’t own a torque wrench, I wouldn’t tighten them much past finger tight with a socket wrench. Then drive to the hardware store and get a basic torque wrench. 😜

Prevention #

This is easy but annoying: always remove the skid plate yourself before any kind of maintenance trip.

Yes, this sounds silly, but these mount points are finicky and there’s no guarantee that the dreaded impact wrench will not show up again to ruin your bolts. I remove mine before any oil changes or scheduled maintenance at the dealer.

Dealers have asked me in the past “Where’s your skid plate anyway?” and I let them know I don’t want my bolts stripped. 😅

Spell check in multiple languages with Firefox

major@mhtx.net (Major Hayden) — Sun, 25 Aug 2024 00:00:00 +0000

Bienvenidos!¹ I’ve been learning Spanish for just over a year and I often type messages in either Spanish or English (my native language) with coworkers and friends. Just like most people, I make spelling mistakes in both languages. 🙃

Firefox offers a feature for multi-language spell checking and translations but it can be a bit challenging to set up. This post explains how to load languages into Firefox and use them for spell checking.

Installing languages #

Take a trip over to [Dictionaries and Languages Packs] on Mozilla’s site. Note that there are two columns available to you here:

Language packs give you the option to change your interface language to something different than your system’s default language.
Dictionaries help with checking spelling.

In the second column, click on the language you want to add for checking spelling. In my case, I picked the Spanish (Spain) Dictionary along with the Spanish (Mexico) Dictionary. Install the dictionaries you want just like any other add-on!

Go to the about:addons page in Firefox and you should see your languages under Languages and Dictionaries on the left side.

Enable the language #

Find an input field and right click inside the field. You should see a Languages context menu appear. Roll over that menu and a new menu pops out to the side:

Firefox context menu showing multiple languages

Click the checkbox to enable the languages that you want to use with the spell checker. That takes effect immediately!

Gracias por leer hasta aquí!² 😜

Welcome! ↩︎
Thank you for reading this far. ↩︎

My meeting hacks

major@mhtx.net (Major Hayden) — Thu, 22 Aug 2024 00:00:00 +0000

Ask anyone about the toughest part of their workday and it usually comes down to one thing: meetings. There are plenty of reasons:

The meeting could have been an email
Nobody notices when I attend the meeting, but they notice when I don’t
The meeting is recurring whether there’s something important to talk about or not
There’s no time for questions after everyone presents in a meeting
Someone dominates the conversation

This was a central problem in my “Five tips for a thriving technology career” talk that I delivered this year. I wrote a recap on the blog earlier this summer as well.

I came up with some more ideas since then, so let’s go!

Use headphones or earbuds #

I find it much easier to understand people in meetings when I have the audio closer to my ears. This helps a lot with understanding non-native English speakers or some native English speakers with thick accents. It reduces the noise from various things in my house (kids, pets, appliances) and allows me to focus on the small sounds that are important for understanding someone else.

How many times have you been in a meeting with someone who talks constantly without earbuds or headphones and you can’t break through with your own voice? Some computers will mute the incoming audio to avoid feedback sounds and you’ll totally miss it when someone is trying to get your attention.

I was once in a meeting where an attendee spoke at length about a topic that was already covered and multiple people tried to speak to let him know that he could stop. He was completely oblivious. The situation improved a lot recently with the addition of “raised hands” indicators in most meeting applications, but it’s still not perfect.

Background music #

If you typically join meetings without earbuds or headphones, then this suggestion isn’t for you. Also, you should go back and re-read the previous section. ️😜

Everyone has their own music preferences, but I find that playing some relaxing music at a low volume really helps me stay focused during meetings. I change the genre of music between different days depending on my mood. No matter what you choose, consider music without vocals to avoid distractions.

A good place to start is Lofi Girl’s “beats to relax/study to” playlist. You can listen on Spotify or on YouTube. Very few songs have vocals, and if they do, it’s barely noticeable. I’ve found that I can keep this playlist on for hours without getting bored of it.

This can be especially helpful for those marathon half or full day meetings. 👔

Ask about taking notes #

Most meeting platforms offer transcription and audio/video recording already, but transcriptions are difficult to read and recordings aren’t usually fun to watch. I love it when someone takes some concise notes about the points that were raised, who raised them, and who holds the action items to solve them.

If nobody’s taking notes, ask if you can!

It’s a great way to ensure you pay attention and the people who missed the meeting will thank you later. Nobody has turned me down yet when I’ve asked.

This can also be helpful if someone likes to talk over everyone else during the meeting or if someone birdwalks into other topics. Un-mute yourself and ask:

Wait, are we still on that previous topic or have we moved to something else?

Another favorite question of mine is:

Did we get an action item for that previous topic? Who owns that item?

Your note taking keeps speakers on track and ensures there is accountability and ownership for problems that need to be solved. It’s also a great way to get your name in front of other people during larger meetings.

Decline that meeting! #

This one got the biggest reaction during my talk at Texas Linux Fest. Sometimes you just need to decline a meeting. Certain aspects of a meeting will push me to the “No” button faster than others, but here’s my two biggest red flags:

More than three attendees: It’s difficult to get much done with a meeting that has 25 people in it. If someone sends me a calendar invitation unannounced and there are more than 3-5 people in the meeting, I ask them on Slack what I’m expected to bring to the meeting. Often times, I hear “Oh, we wanted to be sure you were informed, but there are no action items for you.” That’s a great time to say: “Can you send me the recording or the notes when it’s over?”
Missing agenda: If I’m taking time out of my day to meet, I want to know about the meeting’s goals. What should we have as we leave the meeting? Will we leave with a plan to do something? A set of decisions? Questions for another team?

You are the only one that can advocate for your own time. Nobody else is going to do that for you¹. A very talented executive once told me:

Time is the most valuable thing you bring to work every day. You can’t get more of it, but you can waste it. Your experience and knowledge means nothing if you don’t have time to use it. Treat your time as your most precious asset.

An administrative assistant can help but I’ve never had one before. 😜 ↩︎

Rub some AI on it

major@mhtx.net (Major Hayden) — Wed, 21 Aug 2024 00:00:00 +0000

Author’s note: This post is all about my personal thoughts on artificial intelligence (AI) and they don’t represent the views of any employer or group.

You can’t escape the clutches of AI lately.

It’s in my smartphone nestled next to my text messages. It’s in my work chats. It’s reading my Spanish in Duolingo. It’s in my photo albums retouching my images.

Sometimes we know that there’s AI involved in something and sometimes we don’t.

However, it seems like so many are in a rush to implement some kind of AI offering without a full idea of why they’re doing it. Here an excerpt from the September 2024 issue of Harvard Business Review that explains it well:

Smart early movers in sectors adopting gen AI have certainly captured some of this value in the short term. But relatively soon all surviving companies in those sectors will have applied gen AI, and it won’t be a source of competitive advantage for any one of them, even where its impact on business and business practices will probably be profound. In fact, it will be more likely to remove a competitive advantage than to confer one. But here’s a silver lining: If you already have a competitive advantage that rivals cannot replicate using AI, the technology may serve to amplify the value you derive from that advantage.

AI can help you only if:

You have a product or service your customers value.
You can leverage AI for specific improvements to that product or service that make it more valuable.

AI is not valuable alone #

I’m reminded of a time in the past where I was working hard on OpenStack public clouds. Kubernetes gained more traction day by day. Lots of customers told us: “I’ve got to get on kubernetes so I can move faster.”

As we asked more about their challenges, they listed lots of things that should look familiar:

Developers throwing code over the wall to Q/E and Q/E delays the release
Software passes tests in development and staging, but fails miserably in production
Monolithic applications were crushed by load spikes
Operations teams struggled to deploy software efficiently and reliably

They had a serious problem with delivering their software, but kubernetes couldn’t make any of these better. Adding kubernetes would just give them two problems instead of one.

The running joke whenever someone ran into a problem with a server, a piece of code, or a service was to say “Why don’t you rub a little kubernetes on it?” 🤣

I’m seeing much of the same with AI as companies scramble to get their hands on the best hardware they can find and access to the highest quality large language models (LLMs) they can find. Cloud budgets are blown wide open. When someone asks about the AI effort, the reply is often: “We have to get it before our competitors do, or we’re sunk!”

In February 2024, 36% of company earnings reports mentioned AI – a record high:

How many of them are actually doing something meaningful for their employees or customers with AI?

Work backwards from the experience #

One of my coworkers, Scott McCarty, wrote a great post on InfoWorld titled “What generative AI can do for sysadmins”. What I love most about this article is that Scott remains laser-focused on the experiences and challenges that AI could improve.

There are plenty of challenging situations that every sysadmin faces. The worst of these are when you’re under incredible pressure to bring a system back into a working state and you need to pick through tons of information to identify the problem. You can sometimes spot these issues easily, such as a failing storage drive in a server. Other situations are much more difficult.

The key is to start with the experience. Then work backwards from there.

As an example, one pattern I often see is companies putting AI chatbots in front of their documentation. Sometimes the chatbot will help you find the right documentation faster, but sometimes it’s not much better than a CTRL-F or a quick look at the documentation’s table of contents.

If your documentation is so complicated that you need to spend the time and money to put an AI chatbot in front of it, why not make your documentation better instead?

When something does fail, why not put a link to the documentation in the log message itself? This pattern shows up a lot in modern software lately. If I try to enable a Tailscale exit node but I haven’t forwarded packets on an interface, I get quick instructions on the console with a link to documentation that explains it in more detail.

You cannot use AI to paper over a poor experience. Your customers will see right through it.

Remember the human side #

Sometimes companies simply try to take AI much too far and upset the human nature in all of us.

A great example of this was Google’s awful Olympics ad where it shows a girl’s father using Google Gemini to write a letter to her hero. The reaction at my house when we saw it was: “Wait, you’re taking the time to write a letter to your hero and you’re letting an AI write it? Could that be any more impersonal?” If I’m writing a letter or email to someone I admire, I’m taking the time to write it myself with my own voice.

Another example is a Microsoft ad showing someone turning a long document into a long slide deck instead. If nobody wanted to read the document in the first place, why would they want to read your long slide deck? Also, how would they feel if they know you just jammed a document into a LLM to make a slide deck and then held them hostage in a conference room as you walked through a voiceless set of slides?

This goes back to the last section, but if you’re trying to add AI to replace a human interaction, think that through. Are you papering over a bad experience? Are you looking to cut costs without considering the customer reaction? What’s your plan if the AI interactions backfire?

So what do we do? #

If you work backwards from the customer experience and land on an LLM as the best way to solve a problem or enhance a product, that’s great. However, the experience you are enabling should be so good that:

Customers are genuinely delighted with the experience without knowing AI is involved
You don’t have to mention “AI” for the experience to feel innovative and delightful
You have plans in place for when customers want more from the experience later

Getting hardware or cloud infrastructure together and running an LLM on top is boring. Even going retrieval augmented generation (RAG) is boring. We will soon live in a world where running an LLM is the same level of difficulty as running a web server or a container. That’s not where the value lives.

AI isn’t the king. The experience is. If you forget that, you’re just taking a problem and rubbing some AI on it.

AMD GPU missing from btop

major@mhtx.net (Major Hayden) — Tue, 20 Aug 2024 00:00:00 +0000

I recently built a new PC for my birthday and I splurged a bit with a new AMD Radeon 7900 XTX GPU. Although I’m not a heavy gamer, I’m working with LLMs more often and I’m interested to do some of this work at home.

btop is my go-to tracker for all kinds of data about my system, including CPU usage, memory usage, disk I/O, and network throughput. It’s a great way to track down bottlenecks and find out why your CPU fan is spinning at max speed. 😜

Screenshot of btop running on my system

The problem #

My GPU wasn’t showing up in btop after rebuilding the system. Normally, there’s a bar for the GPU right underneath the CPU usage and it tracks the GPU usage as well as memory usage. Some cards report thermals there, too.

The radeontop tool worked fine and I can see the device in the hardware monitoring subsystem:

> cat /sys/class/hwmon/hwmon1/name
amdgpu

The solution #

I installed plenty of AMD packages, but I missed a critical one: rocm-smi:

> dnf install rocm-smi
> rocm-smi

======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 
 (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) 
==================================================================================================================
0 1 0x744c, 55924 47.0°C 21.0W N/A, N/A, 0 218Mhz 96Mhz 0% auto 327.0W 11% 10% 
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================

That’s the ticket! Now my btop data is complete.

btop showing my GPU stats

Running ollama with an AMD Radeon 6600 XT

major@mhtx.net (Major Hayden) — Thu, 08 Aug 2024 00:00:00 +0000

I’m splitting time between two roles at work now and one of the roles has a heavy focus on LLMs. Much like many of you, I’ve given ChatGPT a try with questions from time to time. I’ve also used GitHub Copilot within Visual Studio Code.

They’re all great, but I was really hoping to run something locally on my machine at home.

Then I stumbled upon a great post on All Things Open titled “Build a local AI co-pilot using IBM Granite Code, Ollama, and Continue” that started me down a path with ollama. The ollama project gets you started with a local LLM and makes it easy to serve it for other applications to use.

It’s so slow 🐌 #

When I first began connecting vscode to ollama, I noticed that the responses were incredibly slow. A quick check with btop showed that my CPU was maxed out at 100% utilization and my GPU was entirely idle. That’s not good.

My first thought was to check the system journal with sudo journalctl --boot -u ollama. That gets me all the messages from ollama since I last booted the machine.

source=images.go:781 msg="total blobs: 0"
source=images.go:788 msg="total unused blobs removed: 0"
source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.4)"
source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1586759388/runners
source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
source=gpu.go:204 msg="looking for compatible GPUs"
source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
source=amd_linux.go:340 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1032 library=/usr/lib64 supported_types="[gfx1030 gfx1100 gfx1101 gfx1102]"
source=amd_linux.go:342 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"
source=amd_linux.go:360 msg="no compatible amdgpu devices detected"

A couple of things in the output stood out to me:

stat /sys/module/amdgpu/version: no such file or directory
msg="amdgpu is not supported" gpu=0 gpu_type=gfx1032 library=/usr/lib64 supported_types="[gfx1030 gfx1100 gfx1101 gfx1102]"
"See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"

Sure enough, the version was missing:

> stat /sys/module/amdgpu/version
stat: cannot statx '/sys/module/amdgpu/version': No such file or directory

And my AMD GPU is indeed an AMD Navi 23 chipset (gfx1032):

> lspci | grep -i VGA
0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c7)

I went over to the linked overrides documentation to figure out what HSA_OVERRIDE_GFX_VERSION is all about:

Ollama leverages the AMD ROCm library, which does not support all AMD GPUs. In some cases you can force the system to try to use a similar LLVM target that is close. For example The Radeon RX 5400 is gfx1034 (also known as 10.3.4) however, ROCm does not currently support this target. The closest support is gfx1030. You can use the environment variable HSA_OVERRIDE_GFX_VERSION with x.y.z syntax. So for example, to force the system to run on the RX 5400, you would set HSA_OVERRIDE_GFX_VERSION=“10.3.0” as an environment variable for the server. If you have an unsupported AMD GPU you can experiment using the list of supported types below.

The fix #

The docs recommended setting HSA_OVERRIDE_GFX_VERSION="10.3.0" to see if my card will work. Let’s edit the systemd unit file for ollama to drop in some additional configuration:

> sudo systemctl edit ollama.service

An editor appeared with text in it:

### Editing /etc/systemd/system/ollama.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

### Edits below this comment will be discarded

So I added the suggested override along with the path to my AMD ROCm directory:

### Editing /etc/systemd/system/ollama.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file

[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="ROCM_PATH=/opt/rocm"

### Edits below this comment will be discarded

Then I can tell systemd to reload the unit and restart ollama:

> sudo systemctl daemon-reload
> sudo systemctl stop ollama
> sudo systemctl start ollama

Back to the system journal for another look:

source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx1032 driver=0.0 name=1002:73ff total="8.0 GiB" available="5.9 GiB"

Success! 🎉

Giving it another try #

I went back to vscode and tried some code completions, but they were only slightly faster than using the CPU. Each time I’d wait for completion, I’d watch btop and the GPU would spike, then the CPU, then the GPU spikes again, and so on.

After talking with a coworker, it looks like my Radeon 6600 XT is great for games, but it lacks the RAM needed to load the model into the GPU. 😭 From what I’ve read, 24GB is the suggested minimum and that’s the largest amount of RAM you’ll find in most GeForce/Radeon consumer graphics cards.

Jellyfin fatal player error

major@mhtx.net (Major Hayden) — Tue, 02 Jul 2024 00:00:00 +0000

Plex has been a mainstay for serving up media at home but it seems to have changed lately towards a more and more commercial offering. A friend recommended Jellyfin and I deployed it on my Synology NAS in a Docker container.

I did a few quick tests in a web browser and everything looked good. But then my Jellyfin android app told me:

Playback failed due to a fatal player error

Everything looked fine in the browser, so it was time to dig in.

Checking the logs #

I opened up an ssh connection on the Synology to check the logs and found something unhelpful:

Jellyfin.Api.Helpers.TranscodingJobHelper: FFmpeg exited with code 1

Running a few searches led me down rabbit holes to plenty of GitHub issues. None of them fixed the issue.

Checking the browser again #

I went through a few different videos from the Synology and played each. They all looked fine in Firefox until I reached one that seemed to stutter. The frame rate looked as if at least half of the frames were bring dropped.

That particular video was in 4K with a high bit rate. Back on the synology, the CPU usage was through the roof.

I configured graphics acceleration when I deployed Jellyfin. Perhaps it wasn’t working?

Jellyfin deployment #

I deployed Jellyfin using the upstream guides with docker-compose:

jellyfin:
 image: docker.io/jellyfin/jellyfin:latest
 container_name: jellyfin
 user: 1026:100
 network_mode: "host"
 devices:
 - /dev/dri
 volumes:
 # removed
 restart: "unless-stopped"

One of the GitHub issues I stumbled upon suggested being specific about the video devices that are mounted inside the container.

$ ls -al /dev/dri
total 0
drwxr-xr-x 2 root root 80 Jun 10 20:23 .
drwxr-xr-x 12 root root 14140 Jun 10 20:24 ..
crw------- 1 root root 226, 0 Jun 10 20:24 card0
crw-rw---- 1 root videodriver 226, 128 Jun 10 20:24 renderD128

I adjusted the deployment in docker-compose.yaml and tried again:

jellyfin:
 image: docker.io/jellyfin/jellyfin:latest
 container_name: jellyfin
 user: 1026:100
 network_mode: "host"
 devices:
 - /dev/dri/renderD128:/dev/dri/renderD128
 - /dev/dri/card0:/dev/dri/card0
 volumes:
 # removed
 restart: "unless-stopped"

I redeployed jellyfin:

$ docker-compose up -d jellyfin

The Android app still had the fatal player error.

Users and groups #

Most of my Synology containers use the uid/gid pair of 1026:100 so allow them to read and write to my storage volume. The /dev/dri/renderD128 is owned by the videodriver group:

$ grep videodriver /etc/group
videodriver::937:PlexMediaServer

This likely came from a time when I installed Plex on Synology via one of the Synology applications rather than from a container. (I’m not sure, but that’s my guess.)

I added that group to the container:

jellyfin:
 image: docker.io/jellyfin/jellyfin:latest
 container_name: jellyfin
 user: 1026:100
 network_mode: "host"
 group_add:
 - "937"
 devices:
 - /dev/dri/renderD128:/dev/dri/renderD128
 - /dev/dri/card0:/dev/dri/card0
 volumes:
 # removed
 restart: "unless-stopped"

After redeploying the container, the Android app worked just fine! Also, the video stuttering disappeared when viewing the 4K video from the browser. 🎉

Redirect local ports with firewalld

major@mhtx.net (Major Hayden) — Fri, 28 Jun 2024 00:00:00 +0000

Linux networking and firewalls give us plenty of options for redirecting traffic from one port to another. We can allow people outside our home to reach a web server we run in our internal network. That’s called destination NAT, ot DNAT.

You can also redirect traffic to different ports on the same host. For example, if you have a daemon listening on port 3000, but you want people to reach that service on port 80, you can redirect traffic from 80 to 3000 on the same host (without network address translation).

But how do we do this with firewalld? 🤔

Old-school iptables methods #

Let’s say you have a service running on port 3000 and you want to expose it to other computers on your same network as port 80. With iptables, you would typically start by enabling IP forwarding:

sudo sysctl -w net.ipv4.ip_forward=1

Add two iptables rules to handle packets coming in from the outside as well as any locally generated packets:

# Handle locally-generated packets on the same machine.
sudo iptables -t nat -A PREROUTING -s 127.0.0.1 -p tcp --dport 80 -j REDIRECT --to 3000`

# Handle packets coming from outside the current machine.
sudo iptables -t nat -A OUTPUT -s 127.0.0.1 -p tcp --dport 80 -j REDIRECT --to 3000`

There’s a weird situation that happens on certain machines with certain network configurations where packets are not properly routed when they are destined for the local network adapter. To fix that, set one more sysctl configuration:

sudo sysctl -w net.ipv4.conf.all.route_localnet=1

Remember to make these sysctl configurations permanent:

sudo mkdir /etc/sysctl.conf.d/
echo "net.ipv4.ip_forward=1" | sudo tee >> /etc/sysctl.conf.d/redirect.conf
echo "net.ipv4.conf.all.route_localnet" | sudo tee >> /etc/sysctl.conf.d/redirect.conf

Why consider firewalld? #

I like firewalld because I can manage lots of settings for different firewall zones and allow access from one zone to another. It also allows me to put certain interfaces in trusted zones so they automatically get more access.

Another nice aspect about firewalld is that it supports iptables and nftables backends. You don’t have to think about the differences between the backends. All of that is taken care of for you.

Port redirections in firewalld #

Let’s start by checking our default firewalld zone:

$ sudo firewall-cmd --list-all
FedoraServer (default, active)
 target: default
 ingress-priority: 0
 egress-priority: 0
 icmp-block-inversion: no
 interfaces: bond0 eno1 eno2
 sources: 
 services: dhcpv6-client http https
 ports: 51820/udp
 protocols: 
 forward: yes
 masquerade: yes
 forward-ports:
 source-ports: 
 icmp-blocks: 
 rich rules:

This output shows that my external network interfaces are attached to the zone and forwarding is already on in my case. If you see forward: no here, just run this command:

$ sudo firewall-cmd --add-forward
success

Now firewalld will manage your forwarding sysctl variables for you on these interfaces. That’s handy. 😉

Next, let’s get the redirect working. We want to take external packets on port 80 and send them to 3000: on the local machine.

$ sudo firewall-cmd \
 --add-forward-port=port=80:proto=tcp:toport=3000:toaddr=127.0.0.1
success

In this command, we told firewalld to take 80/tcp from the outside and send it to port 3000 on the local host (127.0.0.1). Let’s double check our current configuration:

$ sudo firewall-cmd --list-all
FedoraServer (default, active)
 target: default
 ingress-priority: 0
 egress-priority: 0
 icmp-block-inversion: no
 interfaces: bond0 eno1 eno2
 sources: 
 services: dhcpv6-client http https
 ports: 51820/udp
 protocols: 
 forward: yes
 masquerade: yes
 forward-ports: 
	port=80:proto=tcp:toport=3000:toaddr=127.0.0.1
 source-ports: 
 icmp-blocks: 
 rich rules:

Test a connection to port 80 with curl and it should redirect to the service on port 3000.

🚨 If everything works, remember to save the firewalld configuration:

$ sudo firewall-cmd --runtime-to-permanent
success

Extra credit #

We can inspect the nftables rules to see the firewall rules that firewalld set for us. The Arch Linux nftables wiki page is superb for looking up those commands.

If we dump the current ruleset, we see the rule we created in firewalld:

$ sudo nft list ruleset
---SNIP---
chain nat_PRE_FedoraServer_allow {
 meta nfproto ipv4 tcp dport 80 dnat ip to 127.0.0.1:3000
}
---SNIP---

amazon-ec2-utils in Fedora

major@mhtx.net (Major Hayden) — Wed, 08 May 2024 00:00:00 +0000

We’ve all been in that situation where we see a device in Linux and wonder which physical device it corresponds to. I remember when I built my first NAS and received an alert that a drive had failed. It took me a while to figure out which physical drive actually needed to be replaced.

This happens with network devices, too, and I wrote a post about systemd’s predictable network device names back in 2015.

Cloud instances often make it even more confusing because storage devices are fully virtualized and show up differently depending on the cloud provider. I recently packaged amazon-ec2-utils in Fedora to make this a little easier on AWS.

The problem #

I just built a test instance of Fedora 40 in AWS and the AWS API shows the block device mappings like this:

$ aws ec2 describe-instances \
 --instance-ids i-0687448a184ab0a9e | \
 jq '.Reservations[0].Instances[0].BlockDeviceMappings'
[
 {
 "DeviceName": "/dev/sda1",
 "Ebs": {
 "AttachTime": "2024-05-08T15:24:03+00:00",
 "DeleteOnTermination": true,
 "Status": "attached",
 "VolumeId": "vol-0832569729b6c5ea6"
 }
 }
]

However, if I check these devices inside the instance itself, I get something totally different:

[fedora@f40 ~]$ sudo fdisk -l
Disk /dev/nvme0n1: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: Amazon Elastic Block Store 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9FB58ED7-7581-4469-BEB7-64F069151EAF

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 206847 204800 100M EFI System
/dev/nvme0n1p2 206848 2254847 2048000 1000M Linux extended boot
/dev/nvme0n1p3 2254848 20971484 18716637 8.9G Linux root (ARM-64)


Disk /dev/zram0: 1.78 GiB, 1909456896 bytes, 466176 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

[fedora@f40 ~]$ ls -al /dev/sd*
ls: cannot access '/dev/sd*': No such file or directory

One disk isn’t so bad, but what if we add more storage? The API tells me one thing:

> aws ec2 describe-instances --instance-ids i-0687448a184ab0a9e | jq '.Reservations[0].Instances[0].BlockDeviceMappings'
[
 {
 "DeviceName": "/dev/sda1",
 "Ebs": {
 "AttachTime": "2024-05-08T15:24:03+00:00",
 "DeleteOnTermination": true,
 "Status": "attached",
 "VolumeId": "vol-0832569729b6c5ea6"
 }
 },
 {
 "DeviceName": "/dev/sde",
 "Ebs": {
 "AttachTime": "2024-05-08T15:38:29.754000+00:00",
 "DeleteOnTermination": false,
 "Status": "attached",
 "VolumeId": "vol-0a7ba05c5270d7aa3",
 "VolumeOwnerId": "xxx"
 }
 }
]

But then the instance tells me something else entirely:

[fedora@f40 ~]$ sudo fdisk -l
Disk /dev/nvme0n1: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: Amazon Elastic Block Store 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9FB58ED7-7581-4469-BEB7-64F069151EAF

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 206847 204800 100M EFI System
/dev/nvme0n1p2 206848 2254847 2048000 1000M Linux extended boot
/dev/nvme0n1p3 2254848 20971484 18716637 8.9G Linux root (ARM-64)


Disk /dev/zram0: 1.78 GiB, 1909456896 bytes, 466176 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/nvme1n1: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: Amazon Elastic Block Store 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

udev rules to the rescue #

The amazon-ec2-utils package provides some helpful udev rules and scripts to make it easier to identify these devices. This package is on the way to Fedora as I write this post, but it hasn’t reached the stable repos yet. Once it does, you should be able to install it:

$ sudo dnf install amazon-ec2-utils

In the meantime, you can download the latest build and install it on your instance:

$ sudo dnf install /usr/bin/koji
$ koji download-build amazon-ec2-utils-2.2.0-2.fc40 
Downloading [1/2]: amazon-ec2-utils-2.2.0-2.fc40.src.rpm
[====================================] 100% 24.01 KiB / 24.01 KiB
Downloading [2/2]: amazon-ec2-utils-2.2.0-2.fc40.noarch.rpm
[====================================] 100% 20.53 KiB / 20.53 KiB
$ sudo dnf install amazon-ec2-utils-2.2.0-2.fc40.noarch.rpm

The cleanest method to get these new udev rules working is to reboot, but if you’re in a hurry, there’s an option to reload these rules without a reboot:

$ sudo udevadm control --reload-rules
$ sudo udevadm trigger

What do we have in /dev/ now?

[fedora@f40 ~]$ ls -al /dev/sd*
lrwxrwxrwx. 1 root root 7 May 8 15:44 /dev/sda1 -> nvme0n1
lrwxrwxrwx. 1 root root 9 May 8 15:44 /dev/sda11 -> nvme0n1p1
lrwxrwxrwx. 1 root root 9 May 8 15:44 /dev/sda12 -> nvme0n1p2
lrwxrwxrwx. 1 root root 9 May 8 15:44 /dev/sda13 -> nvme0n1p3
lrwxrwxrwx. 1 root root 7 May 8 15:44 /dev/sde -> nvme1n1

We can put a filesystem down on the new device using the same name as the API presents:

$ sudo dnf install /usr/sbin/mkfs.btrfs
$ sudo mkfs.btrfs /dev/sde
btrfs-progs v6.8.1
See https://btrfs.readthedocs.io for more information.

Performing full device TRIM /dev/sde (10.00GiB) ...
NOTE: several default settings have changed in version 5.15, please make sure
 this does not affect your deployments:
 - DUP for metadata (-m dup)
 - enabled no-holes (-O no-holes)
 - enabled free-space-tree (-R free-space-tree)

Label: (null)
UUID: c2fb9e33-3bf6-4b5b-aa80-44e315f499de
Node size: 16384
Sector size: 4096	(CPU page size: 4096)
Filesystem size: 10.00GiB
Block group profiles:
 Data: single 8.00MiB
 Metadata: DUP 256.00MiB
 System: DUP 8.00MiB
SSD detected: yes
Zoned device: no
Features: extref, skinny-metadata, no-holes, free-space-tree
Checksum: crc32c
Number of devices: 1
Devices:
 ID SIZE PATH 
 1 10.00GiB /dev/sde

Being able to know these device names during the instance launch or during storage operations makes it much easier to write automation for these devices. There’s no guess work required to translate the device that an instance shows you to what you see via the API.

Fix big cursors in Java applications in Wayland

major@mhtx.net (Major Hayden) — Fri, 26 Apr 2024 00:00:00 +0000

Scroll through the list of Wayland posts posts on the blog and you’ll see that I’ve solved plenty of weird problems with Wayland and the Sway compositor. Most are pretty easy to fix but some are a bit trickier.

Java applications are notoriously unpredictable and Wayland takes unpredictability to the next level. One particular application on my desktop always seems to start with massive cursors.

This post is about how I fixed and then discovered something interesting along the way.

Fixing big cursors #

I recently moved some investment and trading accounts from TD Ameritrade to Tastytrade. Both offer Java applications that make trading easier, but Tastytrade’s application always started with massive cursors.

To make matters worse, sometimes the cursor looked lined up on the screen but then the click landed on the wrong buttons in the application! Errors are annoying. Errors that cost you money and time must be fixed. 😜

Some web searches eventually led me to Arch Linux’s excellent Wayland wiki page. None of the adjustments or environment variables there had any effect on my cursors.

I eventually landed on a page that suggested setting XCURSOR_SIZE. I don’t remember ever setting that, but it was being set by something:

$ echo $XCURSOR_SIZE
24

One of the suggestions was to decrease it, so I decided to give 20 a try. That was too big, but 16 was perfect and it matched all of my other applications:

$ export XCURSOR_SIZE=20
# /opt/tastytrade/bin/tastytrade

That works fine when I start my application via the terminal, but how do I set it for the application when I start it from ulauncher in sway? 🤔

Desktop file #

The Tastytade RPM comes with a .desktop file for launching the application. I copied that over to my local applications directory:

cp /opt/tastytrade/lib/tastytrade-tastytrade.desktop \
 ~/.local/share/applications/

Then I opened the copied ~/.local/share/applications/tastytrade-tastytrade.desktop file in a text editor:

[Desktop Entry]
Name=tastytrade
Comment=tastytrade
Exec=/opt/tastytrade/bin/tastytrade
Icon=/opt/tastytrade/lib/tastytrade.png
Terminal=false
Type=Application
Categories=tastyworks
MimeType=

I changed the Exec line to be:

Exec=env XCURSOR_SIZE=16 /opt/tastytrade/bin/tastytrade

I launched the application again after making that change, but the cursors were still huge! There has to be another way. 🤔

systemd does everything 😆 #

After more searching and digging, I discovered that systemd has a capability to set environment variables for user sessions:

Configuration files in the environment.d/ directories contain lists of environment variable assignments passed to services started by the systemd user instance. systemd-environment-d-generator(8) parses them and updates the environment exported by the systemd user instance. See below for an discussion of which processes inherit those variables.

It is recommended to use numerical prefixes for file names to simplify ordering.

For backwards compatibility, a symlink to /etc/environment is installed, so this file is also parsed.

Let’s give that a try:

$ mkdir -p ~/.config/environment.d/
$ vim ~/.config/environment.d/wayland.conf

In the file, I added one line with a comment (because you will soon forget why you added it 😄):

# Fix big cursors in Java apps in Wayland
XCURSOR_SIZE=16

After a reboot, I launched my Java application and boom – the cursors were perfect! 🎉

I went back and cleaned up some other hacks I had applied and added them to that wayland.conf file:

# This was important at some point but I'm afraid to remove it.
# Note to self: make detailed comments when adding lines here.
SDL_VIDEODRIVER=wayland
QT_QPA_PLATFORM=wayland

# Reduce window decorations for VLC
QT_WAYLAND_DISABLE_WINDOWDECORATION="1"

# Fix weird window handling when Java apps do certain pop-ups
_JAVA_AWT_WM_NONREPARENTING=1

# Ensure Firefox is using Wayland code (not needed any more)
MOZ_ENABLE_WAYLAND=1

# Disable HiDPI
GDK_SCALE=1

# Fix big cursors in Java apps in Wayland
XCURSOR_SIZE=16

I’m told there are some caveats with this solution, especially if your Wayland desktop doesn’t use systemd to start. This is working for me with GDM launching Sway on Fedora 40.

cloud-init and dhcpcd

major@mhtx.net (Major Hayden) — Thu, 18 Apr 2024 00:00:00 +0000

We’re all familiar with the trusty old dhclient on our Linux systems, but it went end-of-life in 2022:

NOTE: This software is now End-Of-Life. 4.4.3 is the final release
planned. We will continue to keep the public issue tracker and user
mailing list open.

You should read this file carefully before trying to install or use
the ISC DHCP Distribution.

Most Linux distributions use dhclient along with cloud-init for the initial dhcp request during the first part of cloud-init’s work. I set off to switch Fedora’s cloud-init package to dhcpcd instead.

What’s new with dhcpcd? #

There are some nice things about dhcpcd that you can find in the GitHub repository:

Very small footprint with almost no dependencies on Fedora
It can do DHCP and DHCPv6
It can also be a ZeroConf client

The project had its last release back in December 2023 and had commits as recently as this week.

But I use NetworkManager #

That’s great! A switch from dhclient to dhcpcd for cloud-init won’t affect you.

When cloud-init starts, it does an initial dhcp request to get just enough networking to reach the cloud’s metadata service. This service provides all kinds of information for cloud-init, including network setup instructions and initial scripts to run.

NetworkManager doesn’t start taking action until cloud-init has written the network configuration to the system.

But I use systemd-networkd #

Same as with NetworkManager, this change applies to the very early boot and you won’t notice a different when deploying new cloud systems.

How can I get it right now? #

If you’re using a recent build of Fedora rawhide (the unstable release under development), you likely have it right now on your cloud instance. Just run journalctl --boot, search for dhcpcd, and you should see these lines:

cloud-init[725]: Cloud-init v. 24.1.4 running 'init-local' at Wed, 17 Apr 2024 14:39:36 +0000. Up 6.13 seconds.
dhcpcd[727]: dhcpcd-10.0.6 starting
kernel: 8021q: 802.1Q VLAN Support v1.8
dhcpcd[730]: DUID 00:01:00:01:2d:b2:9b:a9:06:eb:18:e7:22:dd
dhcpcd[730]: eth0: IAID 18:e7:22:dd
dhcpcd[730]: eth0: soliciting a DHCP lease
dhcpcd[730]: eth0: offered 172.31.26.195 from 172.31.16.1
dhcpcd[730]: eth0: leased 172.31.26.195 for 3600 seconds
avahi-daemon[706]: Joining mDNS multicast group on interface eth0.IPv4 with address 172.31.26.195.
avahi-daemon[706]: New relevant interface eth0.IPv4 for mDNS.
avahi-daemon[706]: Registering new address record for 172.31.26.195 on eth0.IPv4.
dhcpcd[730]: eth0: adding route to 172.31.16.0/20
dhcpcd[730]: eth0: adding default route via 172.31.16.1
dhcpcd[730]: control command: /usr/sbin/dhcpcd --dumplease --ipv4only eth0

There’s also an update pending for Fedora 40, but it’s currently held up by the beta freeze. That should appear as an update as soon as Fedora 40 is released.

Keep in mind that if you have a system deployed already, cloud-init won’t need to run again. Updating to Fedora 40 will update your cloud-init and pull in dhcpcd, but it won’t need to run again since your configuration is already set.

Texas Linux Fest 2024 recap 🤠

major@mhtx.net (Major Hayden) — Tue, 16 Apr 2024 00:00:00 +0000

The 2024 Texas Linux Festival just ended last weekend and it was a fun event as always. It’s one my favorite events to attend because it’s really casual. You have plenty of opportunities to see old friends, meet new people, and learn a few things along the way.

I was fortunate enough to have two talks accepted for this year’s event. One was focused on containers while the other was a (very belated) addition to my impostor syndrome talk from 2015.

This was also my first time building slides with reveal-md, a “batteries included” package for making reveal.js slides. Nothing broke too badly and that was a relief.

Containers talk #

I’ve wanted to share more of what I’ve done with CoreOS in low-budget container deployments and this seemed like a good time to share it with the world out loud. My talk, Automated container updates with GitHub and CoreOS, walked the audience through how to deploy containers on CoreOS, keep them updated, and update the container image source.

My goal was to keep it as low on budget as possible. Much of it was centered around a stack of caddy, librespeed, and docker-compose. All of it was kept up to date with watchtower.

My custom Caddy container needed support for Porkbun’s DNS API and I used GitHub Actions to build that container and serve it to the internet using GitHub’s package hosting. This also gave me the opportunity to share how awesome Porkbun is for registering domains, including their customized pig artwork for every TLD imaginable. 🐷

We had a great discussion afterwards about how CoreOS does indeed live on as Fedora CoreOS.

Tech career talk #

This talk made me nervous because it had a lot of slides to cover, but I also wanted to leave plenty of time for questions. Five tips for a thriving technology career built upon my old impostor syndrome talk by sharing some of the things I’ve learned over the year that helped me succeed in my career.

I managed to end early with time for questions, and boy did the audience have questions! 📣 Some audience members helped me answer some questions, too!

We talked a lot about office politics, tribal knowledge, and toxic workplaces. The audience generally agreed that most businesses tried to rub copious amounts of Confluence on their tribal knowledge problem, but it never improved. 😜

The room was full with people standing in the back and I’m tremendously humbled by everyone who came. I received plenty of feedback afterwards and that’s the best gift I could ever get. 🎁

Other talks #

Anita Zhang had an excellent keynote talk on the second day about her unusual path into the world of technology. Her slides were pictures of her dog that lined up with various parts of her story. That was a great idea.

Kyle Davis offered talks on valkey and bottlerocket. There was plenty about the redis and valkey story that I didn’t know and the context was useful. It looks like you can simply drop valkey into most redis environments without much disruption.

Thomas Cameron talked about running OKD on Fedora CoreOS in his home lab. There were quite a few steps, but he did a great job of connecting the dots between what needed to be done and why.

Around the exhibit hall #

I helped staff the Fedora/CoreOS booth and we had plenty of questions. Most questions were around the M1 Macbook running Asahi Linux that was on the table. 😉

There were still quite a few misconceptions around the CentOS Stream changes, as well as how AlmaLinux and Rocky Linux fit into the picture. Our booth was right next to the AlmaLinux booth and I had the opportunity to meet Jonathan Wright. That was awesome!

I can’t wait for next year’s event.

Roll your own static blog analytics

major@mhtx.net (Major Hayden) — Thu, 04 Apr 2024 00:00:00 +0000

Static blogs come with tons of advantages. They’re cheap to serve. You store all your changes in git. People with spotty internet connections can clone your blog and run it locally.

However, one of the challenges that I’ve run into over the years is around analytics.

I could quickly add Google Analytics to the site and call it a day, but is that a good idea? Many browsers have ad blocking these days and the analytics wouldn’t even run. For those that don’t have an ad blocker, do I want to send more data about them to Google? 🙃

How about running my own self-hosted analytics platform? That’s pretty easy with containers, but most ad blockers know about those, too.

This post talks about how to host a static blog in a container behind a Caddy web server. We will use goaccess to analyze the log files on the server itself to avoid dragging in an analytics platform.

Why do you need analytics? #

Yes, yes, I know this comes from the guy who wrote a post about writing for yourself, but sometimes I like to know which posts are popular with other people. I also like to know if something’s misconfigured and visitors are seeing 404 errors for pages which should be working.

It can also be handy to know when someone else is writing about you, especially when those things are incorrect. 😉

So my goals here are these:

Get some basic data on what’s resonating with people and what isn’t
Find configuration errors that are leading visitors to error pages
Learn more about who is linking to the site
Do all this without impacting user privacy through heavy javascript trackers

What are the ingredients? #

There are three main pieces:

Caddy, a small web server that runs really well in containers
This blog, which is written with Hugo and stored in GitHub
Goaccess, a log analyzer with a capability to do live updates via websockets

Caddy will write logs to a location that goaccess can read. In turn, goaccess will write log analysis to an HTML file that caddy can serve. The HTML file served by caddy will open a websocket to goaccess for live analytics.

A static blog in a container? #

We can pack a static blog into a very thin container with an extremely lightweight web server. After all, caddy can handle automatic TLS certificate installation, logging, and caching. That just means we need the most basic webserver in the container itself.

I was considering a second caddy container with the blog content in it until I stumbled upon a great post by Florin Lipan about The smallest Docker image to serve static websites. He went down a rabbit hole to make the smallest possible web server container with busybox.

His first stop led to a 1.25MB container, and that’s tiny enough for me.¹ 🤏

I built a container workflow in GitHub Actions that builds a container, puts the blog in it, and stores that container as a package in the GitHub repository. It all starts with a brief Dockerfile:

FROM docker.io/library/busybox:1.36.1
RUN adduser -D static
USER static
WORKDIR /home/static
COPY ./public/ /home/static
CMD ["busybox", "httpd", "-f", "-p", "3000"]

We start with busybox, add a user, put the website content into the user’s home directory, and start busybox’s httpd server. The container starts up and serves the static content on port 3000.

Caddy logs #

Caddy writes its logs in a JSON format and goaccess already knows how to parse caddy logs. Our first step is to get caddy writing some logs. In my case, I have a directory called caddy/logs/ in my home directory where those logs are written.

I’ll mount the log storage into the caddy container and mount one extra directory to hold the HTML file that goaccess will write. Here’s my docker-compose.yaml excerpt:

 caddy:
 image: ghcr.io/major/caddy:main
 container_name: caddy
 ports:
 - "80:80/tcp"
 - "443:443/tcp"
 - "443:443/udp"
 restart: unless-stopped
 volumes:
 - ./caddy/Caddyfile:/etc/caddy/Caddyfile:Z
 - caddy_data:/data
 - caddy_config:/config
 # Caddy writes logs here 👇
 - ./caddy/logs:/logs:z
 # This is for goaccess to write its HTML file 👇
 - ./storage/goaccess_major_io:/var/www/goaccess_major_io:z

Now we need to update the Caddyfile to tell caddy where to place the logs and add a reverse_proxy configuration for our new container that serves the blog:

major.io {
 # We will set up this container in a moment 👇
 reverse_proxy major_io:3000 {
 lb_try_duration 30s
 }

 # Tell Caddy to write logs to `/logs` which
 # is `storage/logs` on the host:
 log {
 output file /logs/major.io-access.log {
 roll_size 1024mb
 roll_keep 20
 roll_keep_for 720h
 }
 }
}

Great! We now have the configuration in place for caddy to write the logs and the caddy container can mount the log and analytics storage.

Enabling analytics #

We’re heading back to the docker-compose.yml file once more, this time to set up a goaccess container:

 goaccess_major_io:
 image: docker.io/allinurl/goaccess:latest
 container_name: goaccess_major_io
 restart: always
 volumes:
 # Mount caddy's log files 👇
 - "./caddy/logs:/var/log/caddy:z"
 # Mount the directory where goaccess writes the analytics HTML 👇
 - "./storage/goaccess_major_io:/var/www/goaccess:rw"
 command: "/var/log/caddy/major.io-access.log --log-format=CADDY -o /var/www/goaccess/index.html --real-time-html --ws-url=wss://stats.major.io:443/ws --port=7890 --anonymize-ip --ignore-crawlers --real-os"

This gets us a goaccess container to parse the logs from caddy. We need to update the caddy configuration so that we can reach the goaccess websocket for live updates:

stats.major.io {
 root * /var/www/goaccess_major_io
 file_server
 reverse_proxy /ws goaccess_major_io:7890
}

At this point, we have caddy writing logs in the right place, goaccess can read them, and the analytics output is written to a place where caddy can serve it. We’ve also exposed the websocket from goaccess for live updates.

Serving the blog #

We’ve reached the most important part!

We added the caddy configuration to reach the blog container earlier, but now it’s time to deploy the container itself. As a reminder, this is the container with busybox and the blog content that comes from GitHub Actions.

The docker-compose.yml configuration here is very basic:

 major_io:
 image: ghcr.io/major/major.io:main
 container_name: major_io
 restart: always

Caddy will connect to this container on port 3000 to serve the blog. (We set port 3000 in the original Dockerfile).

At this point, everything should be set to go. Make it live with:

docker-compose up -d

This should bring up the goaccess and blog containers while also restarting caddy. The website should be visible now at major.io (and that’s how you’re reading this today).

What about new posts? #

I’m glad you asked! That was something I wondered about as well. How do we get the new blog content down to the container when a new post is written? 🤔

As I’ve written in the past, I like using watchtower to keep containers updated. Watchtower offers an HTTP API interface for webhooks to initiate container updates. We can trigger that update via a simple curl request from GitHub Actions when our container pipeline runs.

My container workflow has a brief bit at the end that does this:

 - name: Update the blog container
 if: github.event_name != 'pull_request'
 run: |
 curl -s -H "Authorization: Bearer ${WATCHTOWER_TOKEN}" \
 https://watchtower.thetanerd.com/v1/update
 env:
 WATCHTOWER_TOKEN: ${{ secrets.WATCHTOWER_TOKEN }}

You can enable this in watchtower with a few new environment variables in your docker-compose.yml:

 watchtower:
 # New environment variables 👇
 environment:
 - WATCHTOWER_HTTP_API_UPDATE=true
 - WATCHTOWER_HTTP_API_TOKEN=SUPER-SECRET-TOKEN-PASSWORD
 - WATCHTOWER_HTTP_API_PERIODIC_POLLS=true

WATCHTOWER_HTTP_API_UPDATE enables the updating via API and WATCHTOWER_HTTP_API_TOKEN sets the token required when making the API request. If you set WATCHTOWER_HTTP_API_PERIODIC_POLLS to true, watchtower will still periodically look for updates to containers even if an API request never appeared. By default, watchtower will stop doing periodic updates if you enable the API.

This is working on my site right now and you can view my public blog stats on stats.major.io. 🎉

Florin went all the way down to 154KB and I was extremely impressed. However, I’m not too worried about an extra megabyte here. 😉 ↩︎

Connect Caddy to Porkbun

major@mhtx.net (Major Hayden) — Thu, 29 Feb 2024 00:00:00 +0000

I recently told a coworker about Caddy, a small web and proxy server with a very simple configuration. It also has a handy feature where it manages your TLS certificate for you automatically.

However, one problem I had at home with my CoreOS deployment is that I don’t have inbound network access to handle the certificate verification process. Most automated certificate vendors need to reach your web server to verify that you have control over your domain.

This post talks about how to work around this problem with domains registered at Porkbun.

DNS validation #

Certificate providers usually default to verifying domains by making a request to your server and retrieving a validation code. If your systems are all behind a firewall without inbound access from the internet, you can use DNS validation instead.

The process looks something like this:

You tell the certificate provider the domain names you want on your certificate
The certificate provider gives you some DNS records to add wherever you host your DNS records
You add the DNS records
You get your certificates once the certificate provider verifies the records.

You can do this manually with something like acme.sh today, but it’s painful:

# Make the initial certificate request
acme.sh --issue --dns -d example.com \
 --yes-I-know-dns-manual-mode-enough-go-ahead-please

# Add your DNS records manually.

# Verify the DNS records and issue the certificates.
acme.sh --issue --dns -d example.com \
 --yes-I-know-dns-manual-mode-enough-go-ahead-please --renew

# Copy the keys/certificates and configure your webserver.

We don’t want to live this way.

Let’s talk about how Caddy can help.

Adding Porkbun support to Caddy #

Caddy is a minimal webserver and Porkbun support doesn’t get included by default. However, we can quickly add it via a simple container build:

FROM caddy:2.7.6-builder AS builder

RUN xcaddy build \
 --with github.com/caddy-dns/porkbun

FROM caddy:2.7.6

COPY --from=builder /usr/bin/caddy /usr/bin/caddy

This is a two stage container build where we compile the Porkbun support and then use that new caddy binary in the final container.

We’re not done yet!

Automated Caddy builds with updates #

I created a GitHub repository that builds the Caddy container for me and keeps it updated. There’s a workflow to publish a container to GitHub’s container repository and I can pull containers from there on my various CoreOS machines.

In addition, I use Renovate to watch for Caddy updates. New updates come through a regular pull request and I can apply them whenever I want.

Example pull request from Renovate

Connecting to Porkbun #

We start here by getting an API key to manage the domain at Porkbun.

Log into your Porkbun dashboard.
Click Details to the right of the domain you want to manage.
Look for API Access in the leftmost column and turn it on.
At the top right of the dashboard, click Account and then API Access.
Add a title for your new API key, such as Caddy, and click Create API Key.
Save the API key and secrey key that are displayed.

Open up your Caddy configuration file (the Caddyfile) and add some configuration:

{
 email me@example.com

 # Uncomment this next line if you want to get
 # some test certificates first.
 # acme_ca https://acme-staging-v02.api.letsencrypt.org/directory

 acme_dns porkbun {
 api_key pk1_******
 api_secret_key sk1_******
 }
}

example.com {
 handle {
 respond "Hello world!"
 }
}

Save the Caddyfile and restart your Caddy server or container. Caddy will immediately begin requesting your TLS certificates and managing your DNS records for those certificates. This normally finishes in less than 30 seconds or so during the first run.

If you don’t see the HTTPS endpoint working within a minute or two, be sure to check the Caddy logs. You might have a typo in a Porkbun API key or the domain you’re trying to modify doesn’t have the API Access switch enabled.

Remember that Porkbun requires you to enable API access for each domain. API access is disabled at Porkbun by default.

That’s it! 🎉

Renewals #

Caddy will keep watch over the certificates and begin the renewal process as the expiration approaches. It has a very careful retry mechanism that ensures your certificates are updated without tripping any rate limits at the certificate provider.

Major Hayden

Fun with docling

Adding docling to a project #

Parsing a document #

Groups and texts #

Wandering around documents #

Getting child items #

Getting parent items #

Removing items #

Converting to other formats #

More to explore #

Getting podman quadlets talking to each other

Setting up some quadlets #

Testing communication #

Fixing communication #

Testing communication again #

Extra credit #

Monitor system and GPU performance with Performance Co-Pilot

Installing Performance Co-Pilot #

Adding GPU metrics collection #

NVIDIA GPUs #

AMD GPUs #

Querying performance data #

Real-time monitoring with pmrep #

Historical analysis with pmlogsummary #

Troubleshooting tips #

Further reading #

Summarize YouTube videos with Fabric

Installation & setup #

Basic summarization #

Extracting wisdom #

Getting clarification #

RAG talk recap from DevConf.US 2025

What RAG actually is (and isn’t) #

The Fellowship of the RAG #

Common pitfalls in the Mines of Moria #

The road forward #

Key takeaway #

Automatic container updates with Podman quadlets

Setting up a basic quadlet #

Enabling automatic updates #

Customizing update behavior #

Monitoring updates #

Further reading #

Date driven development

What goes on the truck? #

Clustered, candid communication #

Set the ground rules #

Escalate early and effectively #

Stuff happens #

Scrum, sprints, and outcomes

What is Scrum? #

Estimations and exploratory work #

Interruptions #

Activity over outcomes #

What do we do? #

Summary #

Vibe-free coding with AI

My setup #

AI use cases #

Fixing errors #

Understanding #

Testing #

Improvements #

Tedious tasks #

Conclusion #

Don't tell me RAG is easy

What is RAG? #

Start with high quality documents #

Getting documents ready for search #

Time to search #

Refine the user’s question #

Extra credit #

Categories #

Get the whole document #

Prioritize documents #

Links to source #

RAFT #

Summary #

Viewing Xorg logs with journalctl in Fedora