MatX designs hardware tailored for the world’s best AI models: We dedicate every transistor to maximizing performance for large models.
Other products put large models and small models on equal footing; MatX makes no such compromises. For the world’s largest models, we deliver 10× more computing power, enabling AI labs to make models an order of magnitude smarter and more useful.
Our product
We focus on cost efficiency for high-volume pretraining and production inference for large models. This means:
- We’ll support training and inference. Inference first.
- We optimize for performance-per-dollar first, and for latency
second.
- We’ll offer the best performance-per-dollar by far.
- We’ll provide competitive latency, e.g. <10ms/token for 70B-class models.
- Our target workloads, where we expect to achieve peak performance:
- Transformer-based models with at least 7B (ideally 20B+) activated parameters, including both dense and MoE models.
- Thanks to an excellent interconnect, we can scale up to the largest (e.g. 10T-class) models.
- For inference: peak performance requires at least thousands of simultaneous users, and up to many millions.
- For training: peak performance requires at least 1022 total training FLOPs (7B-class). Can scale well up to very large, e.g. 1029, total training FLOPs (10T-class).
- We offer excellent scale-out performance, supporting clusters with hundreds of thousands of chips.
- We give you low-level control over the hardware; we know that expert users want that.
What our product enables
With our hardware:
- The world’s best models will be available 3-5 years sooner.
- Individual researchers can train 7B-class models from scratch every day, and 70B-class models multiple times per month.
- Any seed-stage startup can afford to train a GPT-4-class model from scratch and serve it at ChatGPT levels of traffic.