How to Get a TikTok Transcript with Code (2026 Guide)
Three ways to turn a TikTok video into text — a no-code tool, a one-call API, and a DIY pipeline — with working code samples and the trade-offs of each.
TikTok transcripts are trickier than YouTube. Most TikToks have no clean, downloadable caption track, so "just read the captions" usually doesn't work — you need actual speech-to-text. Here are three ways to get the text out, from zero-code to fully DIY.
Option 1: No code (paste a URL)
If you just need the text once or twice, skip the code entirely. Paste the link into the free TikTok transcript generator, copy the result, done. Good for one-offs; not for automation.
Option 2: One API call (recommended for builders)
For anything programmatic — a script, a backend job, a pipeline — a single API call is the least painful path. You send the TikTok URL, you get structured text back, and you don't maintain any scraping or ASR infrastructure.
curl "https://scriptbase.app/api/v1/transcribe?url=https://www.tiktok.com/@user/video/12345" \ -H "X-API-Key: YOUR_API_KEY"The response is the same shape you'd get for any other platform:
{ "success": true, "data": { "platform": "tiktok", "language": "en", "duration_sec": 31, "segments": [ { "start": 0.0, "end": 0.40, "text": "Okay" }, { "start": 0.40, "end": 0.92, "text": "listen" } ], "full_text": "Okay listen..." }, "meta": { "format": "json", "credits_used": 1, "credits_remaining": 24 }}In JavaScript:
const res = await fetch( `https://scriptbase.app/api/v1/transcribe?url=${encodeURIComponent(tiktokUrl)}`, { headers: { "X-API-Key": process.env.SCRIPTBASE_API_KEY } });const { data } = await res.json();console.log(data.full_text);Need subtitles instead of JSON? Add &format=srt. That's the whole integration — grab a free API key (25 credits, no card) to run it.
Why an API for TikTok specifically
- No caption track to parse. TikTok rarely exposes usable captions, so you need ASR — and running your own speech model means a GPU bill and an audio pipeline.
- TikTok changes constantly. Any scraper you write becomes a maintenance treadmill. A managed API absorbs those breakages.
- Music and fast speech. Social audio is noisy; a model tuned for it beats a generic one.
Option 3: Fully DIY
If you want full control (or to avoid per-request cost at very high volume), you can build it yourself:
- Resolve and download the video with a tool like
yt-dlp. - Extract the audio with
ffmpeg. - Transcribe with a speech model — Whisper locally, or an ASR API like Deepgram or AssemblyAI.
yt-dlp -x --audio-format mp3 "https://www.tiktok.com/@user/video/12345" -o audio.mp3# then send audio.mp3 to your speech-to-text model of choiceTrade-off: maximum control, maximum maintenance. You own the proxy rotation, the breakage when TikTok changes, the GPU costs, and the audio plumbing. Worth it only at scale where per-request pricing genuinely adds up — and even then, do the math first.
Which should you pick?
| Need | Best option |
|---|---|
| One-off, no code | Free web tool |
| Script / backend / pipeline | One API call |
| Extreme volume, full control | DIY with yt-dlp + ASR |
For 90% of builders, the one-call API is the right answer: it's the only option here with no infrastructure to maintain and a guaranteed result on caption-less videos. Start with the free tier and only graduate to DIY if your volume ever truly demands it.