Skip to content

Advanced Fine-tuning Scripts

Features:

  • Life-time GitHub Repo access
  • Access to future uploaded Inference Scripts
  • Ability to post Issues

Repo Content = all you need to fine-tune LLMs:

DECPRECATED SCRIPTS:

  • Fine-tuning for Function Calling and Structure Responses v2 script (i.e. for the Trelis/function_calling_extended dataset). Purchase here.

Notes:

  • Price for new buyers increases periodically as new content is added.
  • This repo is for fine-tuning scripts. Datasets are not included.
  • After gaining access to the repo, you’ll have the option to buy extra seats (i.e. GitHub access for other users in your organization).

Team access includes access and a license for up to 5 team members from the same organisation.


Video Tutorials

Advanced Fine Tuning Scripts Overview


Long Context Summarization:


Embeddings:


Supervised Fine-Tuning:


Unsupervised Fine-Tuning:


Preparing Fine-tuning Datasets:


Quantization:


Quantization:

72 thoughts on “Advanced Fine-tuning Scripts”

    1. Adeel, thanks for writing here and by email. Your email went to spam but I have fished it out and responded.

      Your question related to doing training on a Mac M1 or M2. While the same high level approaches can be used, the scripts are quite different as the best way is to use Llama.cpp – which is written in C, not python. I have just added some notes to this ADVANCED fine tuning repo to help people get started, but detailled scripts for fine-tuning on a Mac are not included.

  1. hi,I’ve purchased the supervised learning script, but it doesn’t seem to have the QA-generated python files you show in the video. Will you open the corresponding github permissions for me?

      1. Do you at least provide an example of how the training dataset should be structured? The function calling notebook is completely unusable otherwise

    1. The scripts are for causal language models (e.g. gpt type). M2M would use different loading and often dataset formats. If you’re only doing M2M models, you might learn something from the repo, but I can’t strongly recommend it.

      1. Hi,
        thank you for the feedback. No, I just don’t need m2m only, so I’ll buy a repo. I am a beginner and everything is explained beautifully in the videos. So looking forward to it – great job! Do you have any plans to expand your scripts repo with m2m soon? Alternatively, is it possible to order the creation of such a script for m2m fine-tuning separately? If so and you are open to it, where can I contact you? Thank you and have a nice day.

        1. Currently I don’t have plans to do M2M, it’s not something I’ve had as much demand for.

          And yes, you’ll see my email with the purchase receipt so feel free to respond there if there are issues.

  2. Hello, when I use QLoRA_Fine_Tuning_SUPERVISED.ipynb, I use my own data set and the model is set to: model_id = “meta-llama/Llama-2-7b-chat-hf”. But I get error:
    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    Can you provide some solutions?
    I have invited you to visit my colab.

    Thank you for your help

    1. Howdy, since you purchased repo access, you can post there by creating an issue. That’s my preferred way to respond as it helps all others who have purchased the repo.

      Please include a code snippet for replication in your post. Thanks

  3. Hi Trelis,

    I bought the train script and function-calling-v3 dataset. When training on the runpod, how much container disk and volume disk do I need to train a 34B size model? And when choosing the model, the orginal one shoule be chosen, not the GPTQ version, am I right?

    btw, your youtube clip https://www.youtube.com/watch?v=hHn_cV5WUDI is very helpful.

    Best,
    Zack

    1. To train 34B in full precision (bf16) on runpod, I find I need two A6000s or two A100s. I usually put 500 GB of container and disk, just to have loads of room, and that doesn’t add much cost.

      Yes, if are choosing, use the base, not GPTQ. GPTQ does not allow LoRA adapters to be merged.

  4. One more question, If I want to use TheBloke/SUS-Chat-34B-GPTQ to train the function-calling version, could I use this GPTQ version as the base model? In your youtube, you mentioned quantization_config, does it relate to this? And normalling, how long it will take to train a 34B modle?

    Thanks,
    Zack

    1. Yes, you can train a GPTQ model with LoRA. However, the adapters can be merged, which is very inconvenient because it means you have to apply an adapter every time you load the model PLUS inferencing a model with an adapter is slower if they aren’t merged. Hence, why everyone trains the base model.

  5. I bought Fine-tuning for Function Calling and Structure Responses v3 script already, if I buy this advanced one which includs the function-calling script, could I get the previous payment refund?

    Thanks,
    Zack

  6. I’ve used the fine tune script to train the SUSTech/SUS-Chat-34B model on runpod, but failed. Do you have a email or someway to connect to get the help? Another thing is that, in the script there is no way to see the training process, if you can add one, that would be great!

  7. Errors as below:
    —————————————————————————
    NotImplementedError Traceback (most recent call last)
    Cell In[34], line 1
    —-> 1 trainer = CustomTrainer(
    2 model=model,
    3 train_dataset=train_dataset,
    4 eval_dataset=validation_dataset, #turn on the eval dataset if you want to use evaluation functionality versus the test dataset (not provided in this script)
    5 args=transformers.TrainingArguments(
    6 # max_steps=1,
    7 num_train_epochs=1, #stronger models typically only need 1 epoch. Check if the validation loss is still dropping, that means you can train more.
    8 per_device_train_batch_size=1,
    9 gradient_accumulation_steps=1,
    10 evaluation_strategy=”steps”,
    11 max_grad_norm=1,
    12 warmup_ratio=0.1,
    13 eval_steps=0.2,
    14 learning_rate=1e-4,
    15 # fp16=True,
    16 bf16=True,
    17 logging_steps=1,
    18 output_dir=”outputs”,
    19 # optim=”paged_adamw_8bit”, #for training in 4bit (quantized)
    20 optim=”adamw_torch”, #for training in full fp16/bf16 precision
    21 lr_scheduler_type=’constant’,
    22 ),
    23 data_collator=data_collator,
    24 # data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    25 )
    26 model.config.use_cache = False # silence the warnings. Please re-enable for inference!

    File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:481, in Trainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
    476 # Bnb Quantized models doesn’t support `.to` operation.
    477 if (
    478 self.place_model_on_device
    479 and not getattr(model, “quantization_method”, None) == QuantizationMethod.BITS_AND_BYTES
    480 ):
    –> 481 self._move_model_to_device(model, args.device)
    483 # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs
    484 if self.is_model_parallel:

    File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:716, in Trainer._move_model_to_device(self, model, device)
    715 def _move_model_to_device(self, model, device):
    –> 716 model = model.to(device)
    717 # Moving a model to an XLA device disconnects the tied weights, so we have to retie them.
    718 if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, “tie_weights”):

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1160, in Module.to(self, *args, **kwargs)
    1156 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    1157 non_blocking, memory_format=convert_to_format)
    1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    -> 1160 return self._apply(convert)

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809 for module in self.children():
    –> 810 module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814 # If the new tensor has compatible tensor type as the existing tensor,
    815 # the current behavior is to change the tensor in-place using `.data =`,
    (…)
    820 # global flag to let the user control whether they want the future
    821 # behavior of overwriting the existing tensor or not.

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809 for module in self.children():
    –> 810 module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814 # If the new tensor has compatible tensor type as the existing tensor,
    815 # the current behavior is to change the tensor in-place using `.data =`,
    (…)
    820 # global flag to let the user control whether they want the future
    821 # behavior of overwriting the existing tensor or not.

    [… skipping similar frames: Module._apply at line 810 (5 times)]

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809 for module in self.children():
    –> 810 module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814 # If the new tensor has compatible tensor type as the existing tensor,
    815 # the current behavior is to change the tensor in-place using `.data =`,
    (…)
    820 # global flag to let the user control whether they want the future
    821 # behavior of overwriting the existing tensor or not.

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:833, in Module._apply(self, fn, recurse)
    829 # Tensors stored in modules are graph leaves, and we don’t want to
    830 # track autograd history of `param_applied`, so we have to use
    831 # `with torch.no_grad():`
    832 with torch.no_grad():
    –> 833 param_applied = fn(param)
    834 should_use_set_data = compute_should_use_set_data(param, param_applied)
    835 if should_use_set_data:

    File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1158, in Module.to..convert(t)
    1155 if convert_to_format is not None and t.dim() in (4, 5):
    1156 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    1157 non_blocking, memory_format=convert_to_format)
    -> 1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

    NotImplementedError: Cannot copy out of meta tensor; no data!

  8. I purchased access to the Advanced-fine-tuning scripts. I am looking at QLoRA_Fine_Tuning_function_calling_Zephyr.ipynb on the function-calling branch. I cannot run the notebook as it is trying to load dataset “Trelis/function_calling_extended” (which is a separate paid product). How can I run the notebook? Is it possible to get a sample dataset, even if only with a few rows of usable data, so that I can train a LoRA using my Windows based RTX 4090 system?

  9. Hello There,

    I would like to know – if we purchase the access to the Full Repo today; will we also get the access to the upcoming Codes/Scripts that you add to this repo ?

    Thanks.

      1. Hello Ronan,

        Thanks for the reply.

        When I click on the Purchase Full Repo Access button on the Advanced Fine-tuning Scripts page; it takes to the Stripe Checkout Page. However, on Stripe Checkout Page – it shows as “Advanced Inference and Server Setup Repo” as the description (on the left side)

        I want to purchase the full repo access for “Advanced Fine-tuning Scripts”.

        Ronan, is the above link correct or do I need to use some other link for purchasing the Advanced Fine Tuning Scripts. Please confirm the same and I will go ahead with the purchase.

        Thanks Ronan.

          1. Thanks Ronan for the clarification and the resolution.

            I have made the purchase and have received the access to the Repo.

            Way to go !

  10. Hiya

    Got access to your repo, great stuff. I’m a viewer of your videos and enjoy the content. I would be interested in a video of some Dataset creation and what you have had to put into those.

    Good man

    Hoops

      1. i want to add to this. super interested in dataset creation – especially how to think about it, how to do it, how to test it, iteratively, etc
        there is absolutely no tutorial out there that covers this topic exhaustively.

        super happy to pay for it.

  11. Hi,
    I have purchased advanced fine tuning scripts..will the finetuning dataset v3 be enough to interact with a postgres database .Functions are working with gpt function calling.If i finetune mixtral will i have the same results?

    1. Howdy, I don’t know for sure as I haven’t tested postgres options. My reccs for you:
      1. Train Mixtral as you said (OR buy the out of the box fine-tuned Mixtral function calling model from mart.trelis.com)
      2. Train OpenChat 3.5 Or DeepSeek Coder 33B (again, out of the box versions are available from mart.trelis.com if you prefer). These models are stronger for function calling than Mixtral.

      If the v3 dataset doesn’t work sufficiently well for postgres queries (although I think there’s a reasonable chance they will), you can append some postgres examples to the v3 dataset when you train.

  12. Hi , your youtube videos are really good.

    I wanted to ask if the fine tuning repo also contain all the content of the inference repo or only the future inference scripts as well. I know it is a dummy question because all the fine tuning scripts will contain inference code as well when the model is fine tuned.

    Also, It would help me great deal if you can you provide me with a promo code for some good discount. Thanks

    1. Howdy!

      The fine-tuning repo is focused on fine-tuning. Yes, most scripts do a have a generation function because that is needed to test performance.

      If you want scripts to do more complex inference (speed tests, api setup, server setup, long context, concurrent requests), then there is a different repo/product available called Advanced Inference.

      Re discounts, buying lifetime access to a repo includes access to future scripts I upload. Meanwhile, as new scripts are added, the price of each repo increases for new buyers of lifetime access. So, while there isn’t a discount offered in percentage or money terms, there is a benefit to those who buy while the product is young.

    1. oooh, nice, advanced paper! I need to read it in full.

      Can you give an example when you say “generate synthetic data using the function calling approach”? Do you mean get the LLM to call a function with some text as the arguments to a function call, and then the function formats that info into a table. If so, then I say yes!

      1. actually simpler ask – function calling is based on the llm being able to format structured data according to a strict schema.

        so i just ask the llm to generate synthetic data according to a strict schema (and maybe strict rules – “dont make this column zero” for e.g.)

          1. let me try:
            1. https://github.com/sdv-dev/SDV -> tool to generate synthetic tabular data
            2. https://arxiv.org/pdf/2210.06280.pdf -> paper on how to fine-tune LLMs to generate synthetic tabular data
            3. Aha moment! – tabular data is same as function calling. “Function calling is a feature that was largely popularized by OpenAI that enables LLMs to extract structured data from unstructured text”
            4. this means that i can probably do something like “hey OpenAI, generate 10000 simulated function calls of the form fn(int, string, float) where int should always be greater than 50”

            1. The short answer is yes you can use prompting to get structured data from an LLM. Some work better than others. Function call is a specific type of request for structured data because you only want that structured data when it is appropriate to call a function.

              Most data extraction requests are easier than this because you are explicitly asking for data to be extracted (there’s no uncertainty of whether to call a function or not). So, you don’t necessarily need a function calling model if extracting data (especially structured data). On this topic, I’ll have a video out soon on data extraction.

    2. hi
      am interested in this. just quick question – are your finetuning scripts compatible with axolotl and deepspeed ? cos thats generally what everyone has been using these days.

      1. I’m using the basic transformers library – (and unsupervised and supervised fine tuning have options to speed up with Unsloth). I think adding Deepspeed is a small step, but it’s not there. Axotyl isn’t there either. I’ll add them to my research list.

        1. so primarily the reason why i was asking for deepspeed and axolotl is due to compatibility with kubernetes.
          most of us have kubernetes clusters *already*. And it becomes tablestakes. asking for single machines for training becomes one of those problematic things in most teams that are already on k8s.

    3. what Full Repo access contain.
      is it contain all script.
      what about dataset if required that.

      apart from that Excellent work by you.

    4. Hi,
      I hope that my message finds you well.
      Your videos are very informative and thank you for sharing your knowledge.
      I would like to ask if there is any student discount to buy this repository, It is expensive for students like me.

      Best regards,
      Manpreet

    5. hi,
      first basic question – should we buy the function calling dataset separately (https://huggingface.co/datasets/Trelis/function_calling_v3) ? or is it included in these scripts package ?

      second genuine question (not trolling) – how would you compare your scripts to something like this https://twitter.com/NexusflowX/status/1732041385455624256 ?

      why im asking this question – my interest is to finetune for my usecase for my functions (and possibly perform at par with gpt4)

      1. howdy! appreciate the questions.

        Yes, the function calling dataset is separate to any of the repos (e.g. ADVANCED-fine-tuning or ADVANCED-inference)

        When comparing to Nexus Raven, I suppose you mean comparing with some of the Trelis function calling models (as opposed to the scripts)? I haven’t dug in on Nexus Raven, but my guess is that the OpenChat function calling model is in the ballpark, it’s very strong – see the function calling video. The trelis function calling models are easy to use with runpod one-click templates and the ADVANCED-inference repo, and the YouTube video can be a helpful guide. Probably Nexus Raven has got a different set of compatibilities/toolboxes that make it easy to use there.

        Re your usecase. I would say start and see what performance you get with open source models. Then try out OpenChat function calling if you want to push performance. Lastly, try fine-tuning with a dataset like trelis function calling and maybe the associated training scripts (or one you build yourself).

    6. Hello ,
      I want to purchase the supervised repo, and does it contain the Q&A data generation script?
      I am a student, and the cost is a bit expensive for me. If there is a student discount, where can I apply that?

      1. Yes, this contains the Q&A data generation script. If you are a student, you can use the STUDENT discount code. After purchasing, please respond to your receipt with a link to your LinkedIn page so I can confirm student status. Thanks.

    7. Hello Ronan Trelis,

      I came across your website offering scripts for AI training, and I’m reaching out for your guidance in navigating through the scripts you have developed. My objective is to train a Large Language Model (LLM) on a specialized topic, specifically “dog training,” in a language that is not well-supported by smaller models and even poorly by larger ones, which in my case is Hungarian. I possess digitized books for this purpose but require assistance in preparing the text, converting it into a chat format, and ultimately teaching it to the model. My aim is to automate this process as much as possible, which is why I am interested in subscribing to your scripts.

      Could you please recommend which scripts I should purchase to make this project feasible? I would appreciate it if you could respond via email, but I will also keep an eye on your website for a reply.

      Thank you for your assistance.

      Best regards,
      Győző

      1. Best option is the ADVANCED-fine-tuning repo. I suggest doing a combination of unsupervised (with the books as input data) followed by some supervised fine-tuning with some Q&A that you can generate synthetically using the repo guidance.

    8. I’ve developed a streamlined backend API for managing tasks, where some actions depend on others.

      Example routes:

      /workspace/{workspace_name}
      /workspace/{workspace_name}/stack/{stack_name}/{action}
      /workspace/{workspace_name}/stack/{stack_name}/layer/{layer_name}/{action}

      Example Request: create stack1 in workspace1

      I need to create the workspace first, and then create the stack.

      What LLM is best for this? and does advanced repo access give me access to associated training scripts?

      1. This looks like a case of structured fine-tuning. So you would follow the tutorials/youtube vid for that. You can see trelis.com/function-calling for options on buying the script OR the advanced fine tuning repo, which has that script and others.

    Leave a Reply

    Your email address will not be published. Required fields are marked *