docs(bench): update charts · huggingface/optimum-quanto@285862b (original) (raw)

`@@ -12,25 +12,25 @@ Note: the language modeling head (lm_head) of the tested models is not quantized

12

13

`The paragraphs below display results for some popular models on a NVIDIA A10 GPU.

14

15

meta-llama/Meta-Llama-3-8B

15

meta-llama/Meta-Llama-3.1-8B

16

17

18

19

meta-llama/Meta-llama-3-8B Lambada prediction accuracy

19

meta-llama/Meta-llama-3.1-8B Lambada prediction accuracy

20

21

22

23

24

25

26

meta-llama/Meta-Llama-3-8B WikiText perplexity

26

meta-llama/Meta-Llama-3.1-8B WikiText perplexity

27

28

29

30

31

32

33

33

34

35

36

`@@ -39,21 +39,21 @@ The paragraphs below display results for some popular models on a NVIDIA A10 GPU

39

40

41

42

mistralai/Mistral-7B-Instruct-v0.3 Lambada prediction accuracy

42

43

44

45

46

47

48

49

mistralai/Mistral-7B-Instruct-v0.3 WikiText perplexity

49

50

51

52

53

54

55

56

mistralai/Mistral-7B-Instruct-v0.3 Latency

56

57

58

59

`@@ -62,67 +62,21 @@ The paragraphs below display results for some popular models on a NVIDIA A10 GPU

62

63

64

65

google-gemma-2b Lambada prediction accuracy

65

66

67

68

69

70

71

72

72

73

74

75

76

77

78

79

80

81

82

83

-

84

EleutherAI-pythia-1b

85

-

86

87

88

EleutherAI-pythia-1b Lambada prediction accuracy

89

90

91

92

-

93

94

95

EleutherAI-pythia-1b WikiText perplexity

96

97

98

99

-

100

101

102

103

104

105

106

-

107

princeton-nlp/Sheared-LLaMA-1.3B

108

-

109

110

111

princeton-nlp/Sheared-LLaMA-1.3B Lambada prediction accuracy

112

113

114

115

-

116

117

118

princeton-nlp/Sheared-LLaMA-1.3B WikiText perplexity

119

120

121

122

-

123

124

125

princeton-nlp/Sheared-LLaMA-1.3B Latency

79

126

80

127

81

128

82