GitHub - at-aaims/forge (original) (raw)

FORGE: Pre-training Open Foundation Models for Science

Contributions

FORGE models

Model #Params #Tokens Link
Forge-bio 1.44B 38B download
Forge-che 1.44B 41B download
Forge-eng 1.44B 29B download
Forge-mat 1.44B 15B download
Forge-phy 1.44B 32B download
Forge-soc 1.44B 90B download
Forge-s1 1.44B 10B download
Forge-s2 1.44B 20B download
Forge-s3 1.44B 30B download
Forge-s4 1.44B 257B download
Forge-m1 13B 30B download
Forge-m2 13B 257B download
Forge-l 22.4B 257B download

Data sources

Example usages

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast model = GPTNeoXForCausalLM.from_pretrained("path_to_forge_model") tokenizer = GPTNeoXTokenizerFast.from_pretrained("path_to_forge_model") prompt = "high entropy alloy applications include" input_ids = tokenizer(prompt, return_tensors="pt").input_ids gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.7, max_length=100) gen_text = tokenizer.batch_decode(gen_tokens)[0] print(gen_text)

high entropy alloy applications include high strength steels, alloys, composites, as well some metal alloys. In recent years, there has been much interest the use of such materials for manufacturing parts, components, machinery. For example, automotive sector an increasing number applications. most widely used is steels.

Pre-processing

Training

Scientific downstream tasks

Raw performance data and plots

Reference

@INPROCEEDINGS{10.1145/3581784.3613215,
  author={Junqi Yin and Sajal Dash and Feiyi Wang and Mallikarjun Shankar},
  title={FORGE: Pre-training Open Foundation Models for Science}, 
  booktitle={SC23: International Conference for High Performance Computing, Networking, Storage and Analysis}, 
  year={2023},
  doi={10.1145/3581784.3613215}}