Migrating from WordPress to Hugo

For more than 10 years, this website has been running on WordPress. Over time I come to dislike it for multiple reasons.

LLMs: great for business but bad business

The true value proposition of LLMs lies in their ability to convert unstructured data from sources like websites and documents into structured information with reasonably high accuracy. Yet, the real profit lies in the products built on top of LLM technology. Each year, approximately 4 million books are published worldwide. On average, a book contains fewer than 120,000 words, translating to less than 160,000 tokens in LLM (Large Language Model) terms. Imagine if every single one of these books were generated by GPT-4鈥攊t would amount to an astounding 640 billion tokens. At $5 per million tokens, generating all these books would tally up to about $3.2 million! Let鈥檚 say the book market represents only about 1% of the total LLM text generation opportunity. Even then the total addressable market of LLM text generation is approximately $300 million annually鈥攁 modest figure when compared to AWS, which raked in $90 billion in 2023 as the cloud market leader.

When to commit Generated code to version control

Generated code, ideally, should not be committed to version control. Committing generated code can sometimes speed up testing and code generation but it is a design smell. It is better to cache generated code via CI caching. Committing generated code to version control is the worst as it is hard to even detect the difference. However, there are a few specific circumstances where committing generated code/config/data to version control is worth it....

Abstractions should be deep not wide

Let鈥檚 say you are building a git analytics product. Your product supports GitHub and GitLab for now. It might support more products in the future. 90% of the codebase that supports GitHub and GitLab is identical. 10% is specific to GitHub and GitLab. There are two ways to build software abstractions here. The easy path to fall for is to have unified objects that take care of both GitHub and GitLab data....

Some data on podcasting

A few years back, I scraped data on podcasters from iTunes. The data was a bit underwhelming and made me realize that podcasters can鈥檛 be a potential market. It is a bit dated but I believe is still relevant.

API services should always have usage Limits

Every public-facing API service should have API usage limits. If this seems overkill then ask yourself if would it be OK if a single IP sends a million requests a second. This does not apply just to publicly documented services but even to undocumented services that are publicly accessible.

Using Python & Poetry inside Docker

Poetry is a great build system. And in 2023, I believe, no one should use the pip for a private Python codebase. Getting it right inside Docker is a different issue, however. Consider a simple Flask-based web server as an example Bash 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # Install poetry $ pip3 install poetry==1....

Timing

Two cryptocurrency exchanges came out early on from Y Combinator. One is 2012. One in 2013. One returned 1500X to early investors. The other one ceased to exist after 2 years. What happened?

Real vs Theoretical Engineering Productivity

Some engineering productivity is real. Some are theoretical.

Too much documentation is harmful

As code changes, documentation becomes stale over time. This happens at big companies. This happens at small companies. Unlike code, documentation is not compiled or tested. The code is executed. If the code execution fails or produces incorrect results, it is fixed with much higher urgency.