--- id: "alpaca-to-chatml" title: "Convert Alpaca to Conversation Format" slug: "alpaca-to-chatml" description: "Convert Alpaca format to ChatML Conversation format" code: | -- Convert Alpaca format to Conversation format WITH source_view AS ( SELECT * FROM train -- Change 'train' to your desired view name here ) SELECT [ struct_pack( "from" := 'system', "value" := instruction ), struct_pack( "from" := 'user', "value" := input ), struct_pack( "from" := 'assistant', "value" := output ) ] AS conversation FROM source_view WHERE instruction IS NOT NULL AND input IS NOT NULL AND output IS NOT NULL; --- # Converting Alpaca to ChatML Conversation Format ```sql -- Convert Alpaca format to Conversation format WITH source_view AS ( SELECT * FROM train -- Change 'train' to your desired view name here ) SELECT [ struct_pack( "from" := 'system', "value" := instruction ), struct_pack( "from" := 'user', "value" := input ), struct_pack( "from" := 'assistant', "value" := output ) ] AS conversation FROM source_view WHERE instruction IS NOT NULL AND input IS NOT NULL AND output IS NOT NULL; ``` ## Why? Differences between Alpaca and ChatML Conversation Format: 1. **Alpaca Format**: - The Alpaca format usually has three columns: `instruction`, `input`, and `output`. 2. **ChatML Conversation Format**: - The ChatML Conversation format is a JSON format that contains a list of messages. - Each message has a `from` field, which can be either `system`, `user`, or `assistant`. - The `value` field contains the message content. ## Example ### `yahma/alpaca-cleaned` You can run this query through via the `sql_console` in the Hugging Face Hub [here](https://huggingface.co/datasets/yahma/alpaca-cleaned?sql_console=true&sql=WITH+%0Asource_view+AS+%28%0A++SELECT+*+FROM+train++--+Change+%27train%27+to+your+desired+view+name+here%0A%29%0ASELECT+%0A++%5B%0A++++struct_pack%28%0A++++++%22from%22+%3A%3D+%27system%27%2C%0A++++++%22value%22+%3A%3D+instruction%0A++++%29%2C%0A++++struct_pack%28%0A++++++%22from%22+%3A%3D+%27user%27%2C%0A++++++%22value%22+%3A%3D+input%0A++++%29%2C%0A++++struct_pack%28%0A++++++%22from%22+%3A%3D+%27assistant%27%2C%0A++++++%22value%22+%3A%3D+output%0A++++%29%0A++%5D+AS+conversation%0AFROM+source_view%0AWHERE+instruction+IS+NOT+NULL+%0A++AND+input+IS+NOT+NULL+%0A++AND+output+IS+NOT+NULL%3B). ![Alpaca to ChatML](./alpaca-to-conversation.png)