Datadog and distributed tracing in Java
The path to becoming production-ready for any service, regardless of tech stack, must check the boxes around observability and instrumentation. If you can't see into your application and understand how it's operating at load, then you can't set any standards around SLAs (service level availability), SLOs (service level objectives), or even identify SLIs (service level indicators).
Tracing is just one facet of observability and instrumentation, but for the amount of invested effort I feel that it's got one of the highest returns. Datadog's offering in the observability space is outstanding - easy to wire together, mostly comprehensive guides, and outstanding support for their chosen languages and stacks. That said, during a recenty-completed implementation of distributed tracing across all of our microservices, we ran into a few hiccups with Datadog and OpenTracing that I think are worth calling out.
For all of our services that currently integrate with Datadog, we run the Datadog Java agent with each application. Just going that far will get you basic APM and disjointed tracing where each service only displays trace data for their portion of the full call stack for a given request. When you start updating everything for true distributed tracing (and the real value that it brings), be mindful and pay close attention to the Datadog Java APM guide.
In the "Manual Instrumentation" section, they highlight how to get more detailed and complex tracing through the use of OpenTracing libraries. That's all well and good until you see that there really are no concrete implementations for anything in the opentracing-api
library. When you see that, your next step is probably going to be reading the Datadog APM guide previously mentioned, and seeing that they import a dd-trace-ot
library. Here's where you might get caught up - you can't actually use anything in that library alongside the Java agent.
The Java agent shadows all of the dd-trace-ot
packages and classes. When you retrieve your SpanContext
and try to cast it to a DDSpanContext
to get all that outstanding, detailed information, you're going to end up with a very disappointing ClassCastException
. Trust me, it's frustrating.
How to get around the problem
After re-reading the Datadog Java APM guide for the nth time, you may see a one-liner comment in the code sample for the library dependencies in the sample pom.xml
. The line reads:
<!-- Datadog Tracer (only needed if you do not use dd-java-agent) -->
So how do you get that wonderful, detailed Datadog tracing implementation if you can't use the dd-trace-ot
class and you can't access any of those classes that are rolled into the agent? Well, you just rely on some OpenTracing-standard methods - inject
and extract
.
As noted here, tracer.inject()
takes the current context of the tracer (or a Span, if supplied), and injects it into a receiving object you can use however you please. tracer.extract()
takes in that object you injected and re-hydrates the trace context so it can be used by the tracer. If you stick to nothing but classes in the io.opentracing
package, the Datadog agent will take care of the rest and get you everything that you need to get a nice, beautiful flame graph of a distributed trace in your service's Datadog dashboard.