# GenerRNA > GenerRNA is a generative pre-trained language model for de novo RNA design — a decoder-only Transformer (GPT-style) that generates novel RNA sequences without requiring structural input, functional labels, or sequence alignments. Published in PLOS ONE (2024). To our knowledge, it is the first application of a generative language model to RNA generation. ## Key facts - Type: generative language model (decoder-only Transformer, GPT-style). - Size: ~350 million parameters, 24 layers, model dimension 1280, 1024-token context window (~4000 nucleotides), BPE tokenizer with vocabulary size 1024. - Training data: ~16 million RNA sequences (~17.4 billion nucleotides) derived from RNAcentral release 22, deduplicated with MMseqs2 at 80% sequence identity. - Capabilities: zero-shot de novo RNA generation; fine-tuning for specific families or functions (e.g., RNA with high binding affinity to the proteins ELAVL1 and SRSF1). - License: MIT (open source). Weights hosted on Hugging Face; code and docs on GitHub. ## Links - Model and weights (Hugging Face): https://huggingface.co/pfnet/GenerRNA - Code and documentation (GitHub): https://github.com/ekkkkki/GenerRNA - Paper (PLOS ONE): https://doi.org/10.1371/journal.pone.0310814 - Preprint (bioRxiv): https://doi.org/10.1101/2024.02.01.578496 - PubMed: https://pubmed.ncbi.nlm.nih.gov/39352899/ - Project page: https://ekkkkki.github.io/GenerRNA/ ## Citation Zhao Y, Oono K, Takizawa H, Kotera M (2024) GenerRNA: A generative pre-trained language model for de novo RNA design. PLOS ONE 19(10): e0310814. https://doi.org/10.1371/journal.pone.0310814