A week full of interviews - I had one everyday of this week. I have another slate of interviews for the coming week.

Fall starts today (9/22), and the the fascinating thing about the weather can be seen in the weather app. Yesterday, 9/21, the high temperature was 88F, today it is 77F. During the walk I so many leaves have already fallen and the trees have taken a dual-color look already.

Rereading “Striking Thoughts” by Bruce Lee.

Programming

I am fond of postscript programming language. In response to a question by X/shrihacker, I suggested postscript is a good option to generate “good looking” documents, especially since the “esoteric” aspects of postscript can be smoothed over with the help of AI Assistants like Claude. See this example.

Why Scrum is Stressing You Out - by Adam Ard

  1. sprints never stop
  2. sprints are involuntary
  3. Sprints Neglect Key Supporting Activities
  4. scrumfall is worse than waterfall because it creates chronic tension

Nile — Started using TheNile.DEV - a new player in the hosted postgresql — “Nile is a Postgres platform to ship multi-tenant AI applications - fast, safe, and limitless” — X/sriramsubram. Announcement blogpost It is time for Postgres to care about customers!. Interesting choice of pricing - “unlimited” databases, but queries are charged (first 50M query “tokens” in free tier). Also supports branching, vector embeddings, multiple tenants etc.

Jeff Dean Facts 😆 via Debunking the ‘Three Pillars of Observability’ Myth via Is It Time To Version Observability? (Signs Point To Yes) – charity.wtf

Observability

Saving $10k/month on Analytics - Snowplow Serverless Alternative — Agon Data; see Observability-Ecosystem page with notes on Buz etc. This blog post mentions Bentos (now part of redpanda); uses sst for automated deployments.

Is It Time To Version Observability? (Signs Point To Yes) – charity.wtf

All you need is Wide Events, not “Metrics, Logs and Traces”

The above talks about a o11y 2.0 type tool at Meta called Scuba (2013). (Abraham et al., 2013)

Scuba is the data management system Facebook uses for most real-time analysis. Scuba is a fast, scalable, distributed, in-memory database built at Facebook. It currently ingests millions of rows (events) per second and expires data at the same rate.

The basic idea of Scuba is extremely simple and doesn’t require a glossary page for people to grasp.

Such events called wide, because it’s encouraged to dump to them all the information one can think of. Anything that might be relevant in the context of a certain data - just put it there, it might be useful later. This approach is laying the groundwork for dealing with unknown unknowns - something you can’t think of now that may be revealed later during an incident investigation.

opentelemetry’s span is a wide event.

My understanding from reading the above is the “Wide Events” is just “denormalized” form of events data. See also axiom.co. some comments about Scuba at HN (2017).

Kraken ( PDF; ) is the successor to Scuba. (Harizopoulos et al., 2022).

The developers behind Scuba started Interana, who (Lior Abraham) developed Scuba.io, a commercial offering, and Okay Zed developed Snorkel. Snorkel uses a custom append-only database called Sybil, see difference between scuba and snorkel. likely not very current, dev seems to have stopped around 2019.

LLM

Observability

LLMs break due to hallucinations, providing irrelevant data, infinite chain of thought loops, Biases, Inconsistency, Context Limitations, Lack of Common sense, prompt sensitivity, Confidently wrong answers

  • Langfuse
    • Traces and spans for each step in your LLM pipeline
    • Managing prompt versions. Can be used as seperate envs like dev, prod etc
    • Evals: Scoring your LLM &/ RAGs for correctness
  • Traceloop - LLM Application Observability
    • Use OpenTelemetry that helps you use existing o11y infra like Grafana, Dynatrace etc.
    • Work with user feedback for LLM evals
    • Support for Modals, Frameworks (Langchain etc) and Vector DBs
  • Phoenix Arize
    • OpenTelemetry support
    • Support for frameworks and SDKs
    • Evals: Evals are hard for LLM. Pheonix solves this by providing Evals template and running it at span/trace level.
    • Experiemnts and dataset: Change the input prompt as well as dataset to check LLMs regressions or issues.
    • Arize Copilot: Ask it questions regarding your LLM events

From around the web

Abraham, L., Allen, J., Barykin, O., Borkar, V., Chopra, B., Gerea, C., Merl, D., Metzler, J., Reiss, D., Subramanian, S., Wiener, J. L., & Zed, O. (2013). Scuba: Diving into Data at Facebook. Proc. VLDB Endow., 6(11), 1057–1067. https://doi.org/10.14778/2536222.2536231
Harizopoulos, S., Hopper, T., Mo, M., Chandrasekaran, S. S., Chen, T., Cui, Y., Ganesh, N., Helmling, G., Pham, H., & Wong, S. (2022). Meta’s next-Generation Realtime Monitoring and Analytics Platform. Proceedings of the VLDB Endowment, 15(12), 3522–3534. https://www.vldb.org/pvldb/vol15/p3522-mo.pdf