Back to blog
tutorial·ScriptBase Team·3 min read

How to Get a TikTok Transcript with Code (2026 Guide)

Three ways to turn a TikTok video into text — a no-code tool, a one-call API, and a DIY pipeline — with working code samples and the trade-offs of each.

tiktokapitutorialtranscripts

TikTok transcripts are trickier than YouTube. Most TikToks have no clean, downloadable caption track, so "just read the captions" usually doesn't work — you need actual speech-to-text. Here are three ways to get the text out, from zero-code to fully DIY.

Option 1: No code (paste a URL)

If you just need the text once or twice, skip the code entirely. Paste the link into the free TikTok transcript generator, copy the result, done. Good for one-offs; not for automation.

For anything programmatic — a script, a backend job, a pipeline — a single API call is the least painful path. You send the TikTok URL, you get structured text back, and you don't maintain any scraping or ASR infrastructure.

Terminal window
curl "https://scriptbase.app/api/v1/transcribe?url=https://www.tiktok.com/@user/video/12345" \
-H "X-API-Key: YOUR_API_KEY"

The response is the same shape you'd get for any other platform:

{
"success": true,
"data": {
"platform": "tiktok",
"language": "en",
"duration_sec": 31,
"segments": [
{ "start": 0.0, "end": 0.40, "text": "Okay" },
{ "start": 0.40, "end": 0.92, "text": "listen" }
],
"full_text": "Okay listen..."
},
"meta": { "format": "json", "credits_used": 1, "credits_remaining": 24 }
}

In JavaScript:

const res = await fetch(
`https://scriptbase.app/api/v1/transcribe?url=${encodeURIComponent(tiktokUrl)}`,
{ headers: { "X-API-Key": process.env.SCRIPTBASE_API_KEY } }
);
const { data } = await res.json();
console.log(data.full_text);

Need subtitles instead of JSON? Add &format=srt. That's the whole integration — grab a free API key (25 credits, no card) to run it.

Why an API for TikTok specifically

  • No caption track to parse. TikTok rarely exposes usable captions, so you need ASR — and running your own speech model means a GPU bill and an audio pipeline.
  • TikTok changes constantly. Any scraper you write becomes a maintenance treadmill. A managed API absorbs those breakages.
  • Music and fast speech. Social audio is noisy; a model tuned for it beats a generic one.

Option 3: Fully DIY

If you want full control (or to avoid per-request cost at very high volume), you can build it yourself:

  1. Resolve and download the video with a tool like yt-dlp.
  2. Extract the audio with ffmpeg.
  3. Transcribe with a speech model — Whisper locally, or an ASR API like Deepgram or AssemblyAI.
Terminal window
yt-dlp -x --audio-format mp3 "https://www.tiktok.com/@user/video/12345" -o audio.mp3
# then send audio.mp3 to your speech-to-text model of choice

Trade-off: maximum control, maximum maintenance. You own the proxy rotation, the breakage when TikTok changes, the GPU costs, and the audio plumbing. Worth it only at scale where per-request pricing genuinely adds up — and even then, do the math first.

Which should you pick?

NeedBest option
One-off, no codeFree web tool
Script / backend / pipelineOne API call
Extreme volume, full controlDIY with yt-dlp + ASR

For 90% of builders, the one-call API is the right answer: it's the only option here with no infrastructure to maintain and a guaranteed result on caption-less videos. Start with the free tier and only graduate to DIY if your volume ever truly demands it.