Skip to content

ADVANCED Transcription


  • Life-time GitHub Repo access
  • Access to future uploaded Inference Scripts
  • Ability to post Issues

Suitable for:

  • Adding support for uncommon/new words or phrases
  • Improving performance on specific accents
  • Improving performance on specific languages

Repo Content = all you need to fine-tune speech-to-text models:

  • Dataset Generation
  • Fine-tuning Whisper
  • Evaluation

Purchase Options

Individual Access/License

Join 67+ life-time members

Provides individual access to all of the ADVANCED-transcription content above.

Individual Access/License – Repo Bundle

Provides individual access to all ADVANCED-transcription content above, plus the ADVANCED-inference, ADVANCED-fine-tuning and ADVANCED-vision repositories.

For TEAM access, kindly post a comment below.

Video Tutorials

16 thoughts on “ADVANCED Transcription”

  1. Hi my friend!

    My name is Pedro Henrique, and I am a Brazil-based radiologist interested in enhancing the efficiency of imaging diagnostics through advanced technology. I came across your work with the Whisperer AI and became excited about the possibility of applying a customized model in my daily practice. The aim is to use automated transcription to improve the documentation of diagnoses without any commercial intent.

    I possess basic to intermediate knowledge in Python and JavaScript, primarily used to develop personal scripts that facilitate my professional activities. Your expertise in customizing the Whisperer AI, especially in fine tuning, could be precisely what I need to integrate this innovation into my work.

    Could you provide details about how your product can be adapted to a radiologist’s needs? Given my limited experience with fine tuning, is it feasible for me to apply your customized scripts and achieve an effective model for transcription in radiology?

    Furthermore, I am interested in learning more about post-purchase support. How are inquiries and technical assistance managed after buying the product? This information will be crucial to ensure a smooth and efficient transition to using this technology in my work environment.

    I am confident that your solution could represent a significant advancement in my medical practice. I look forward to your response and am available to discuss any additional details.


    Pedro Henrique

    1. Hi Pedro,

      I suppose you could fine-tune a whisper model with a recording of some key radiologist terminology. You would need to record yourself (or get a recording) and then follow the youtube video demo along with the scripts in the repository.

      If you have issues after purchase, you can post an “Issue” in the GitHub repository – I typically reply back within a few days with my suggestions.

  2. Hi there,
    I don’t have enough money to purchase your repo, and I am student by profession and I am keen to learn finetuning STT model.

    Kind Help me with some exceptions, please🥺🥺

    1. Howdy, my recc is to go through the channel. By following along and using the free materials I link, you’ll be able to learn. This will take more time than purchasing the repo, but you’ll get a deeper understanding and that will be great for you as a student. I’m wishing you the best!

  3. I have a few questions about speech to text, please help me answer them.
    1. I used orjinal Whisper for the Azerbaijani language, but in most cases it does not speech to text some words correctly. Can it be improved with fine-tune and with code in your repository?
    2. If I buy the repository, can I easily develop it on my lapdope using my own recordings audio and using your code? Because I am new in this field, I am not fully informed that’s why I have such questions.

      1. NVIDIA GeoForce RTX4060 Laptop GPU version 528.66 8GB, SSD 1T, Memory 32GB. Will it work? Can you show me way how I will make in .vtt format transcript and audio? I am willing to buy it if I find some flaws in your video. Maybe my questions are very simple, I ask these questions because I don’t know much

  4. Hi I purchased the script and had few question on documentation. For my native language, I used the fine tuning model “openai/whisper-small” as you show in your script, but it didn’t work as you said. At the end of run, it appears that the validation transcription is very messy and the accuracy is low. What else can you recommend?
    I test epochs 5,6,7. But it returns not good results WER is over 0.70

    1. My apologies, I thought I had responded earlier.

      The best place to post is in the GitHub repo, since you purchased access. You can create an issue there and I tend to respond quickly.

  5. Hey, please, I did not get respond yet. Should I fine tune “openai/whisper-medium” even it demands more GPU or I should train “openai/whisper-small” model?

  6. Hi Ronan, great job on the repo! Two questions:
    i) What’s the difference between this repo and the free one that you made mention to on huggingface
    ii) I watched your demo on Youtube where you used a single file to fine-tune, how would I change this to feed in an entire folder of 10 hours (or eventually 100s of hours) of audio files with their respective transcriptions?

    1. Howdy:
      i) This repo includes more detail and options around specifying the LoRA, but the biggest difference is that I have put together scripts (shown in the video) for data preparation.
      ii) The script splits audio (and transcripts) into very short chunks for training. You would adjust the code to loop through this for each of your input files.

Leave a Reply

Your email address will not be published. Required fields are marked *