Skip to content

Enterprise Server, API and Inference Guide


  • Deploy a language model to an EC2 server (e.g. AWS).
  • Deploy APIs for Llama 70B in one-click.
  • Deploy a 100k+ long-context Yi API in one-click.
  • Includes prompt formats and guidance for Llama 2, Mistral, Yi and DeepSeek models.
  • Inferencing function-calling models.

Video Tutorials

Deploy Llama 2 on an EC2 server

Run Llama 2 with 32k context length

Deploy a Llama 2 70B API in 5-clicks with AWQ

4 thoughts on “Enterprise Server, API and Inference Guide”

    1. Yes, if you are running an EC2 server on AWS that has a GPU, this repo provides install instructions.

      Separately, the repo explains how you can use runpod (which is cheaper per hour, unless you have lots of free AWS credits).

  1. Hello Ronan,

    Hope you are doing well.
    Quick question. If I were to purchase all of your repos, would you be willing to offer a bundle/better price? I believe your work will expedite our research.


Leave a Reply

Your email address will not be published. Required fields are marked *