How to Get a TikTok Transcript with Code (2026 Guide)

TikTok transcripts are trickier than YouTube. Most TikToks have no clean, downloadable caption track, so "just read the captions" usually doesn't work — you need actual speech-to-text. Here are three ways to get the text out, from zero-code to fully DIY.

Option 1: No code (paste a URL)

If you just need the text once or twice, skip the code entirely. Paste the link into the free TikTok transcript generator, copy the result, done. Good for one-offs; not for automation.

Option 2: One API call (recommended for builders)

For anything programmatic — a script, a backend job, a pipeline — a single API call is the least painful path. You send the TikTok URL, you get structured text back, and you don't maintain any scraping or ASR infrastructure.

curl "https://scriptbase.app/api/v1/transcribe?url=https://www.tiktok.com/@user/video/12345" \
  -H "X-API-Key: YOUR_API_KEY"

The response is the same shape you'd get for any other platform:

{
  "success": true,
  "data": {
    "platform": "tiktok",
    "language": "en",
    "duration_sec": 31,
    "segments": [
      { "start": 0.0, "end": 0.40, "text": "Okay" },
      { "start": 0.40, "end": 0.92, "text": "listen" }
    ],
    "full_text": "Okay listen..."
  },
  "meta": { "format": "json", "credits_used": 1, "credits_remaining": 24 }
}

In JavaScript:

const res = await fetch(
  `https://scriptbase.app/api/v1/transcribe?url=${encodeURIComponent(tiktokUrl)}`,
  { headers: { "X-API-Key": process.env.SCRIPTBASE_API_KEY } }
);
const { data } = await res.json();
console.log(data.full_text);

Need subtitles instead of JSON? Add &format=srt. That's the whole integration — grab a free API key (25 credits, no card) to run it.

Why an API for TikTok specifically

No caption track to parse. TikTok rarely exposes usable captions, so you need ASR — and running your own speech model means a GPU bill and an audio pipeline.
TikTok changes constantly. Any scraper you write becomes a maintenance treadmill. A managed API absorbs those breakages.
Music and fast speech. Social audio is noisy; a model tuned for it beats a generic one.

Option 3: Fully DIY

If you want full control (or to avoid per-request cost at very high volume), you can build it yourself:

Resolve and download the video with a tool like yt-dlp.
Extract the audio with ffmpeg.
Transcribe with a speech model — Whisper locally, or an ASR API like Deepgram or AssemblyAI.

yt-dlp -x --audio-format mp3 "https://www.tiktok.com/@user/video/12345" -o audio.mp3
# then send audio.mp3 to your speech-to-text model of choice

Trade-off: maximum control, maximum maintenance. You own the proxy rotation, the breakage when TikTok changes, the GPU costs, and the audio plumbing. Worth it only at scale where per-request pricing genuinely adds up — and even then, do the math first.

Which should you pick?

Need	Best option
One-off, no code	Free web tool
Script / backend / pipeline	One API call
Extreme volume, full control	DIY with yt-dlp + ASR

For 90% of builders, the one-call API is the right answer: it's the only option here with no infrastructure to maintain and a guaranteed result on caption-less videos. Start with the free tier and only graduate to DIY if your volume ever truly demands it.