The library was designed with two strong goals in mind:

Be as easy and fast to use as possible:

We strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions,
    just three standard classes required to use each model: configuration,
    models, and a preprocessing class (tokenizer for NLP, image processor for vision, feature extractor for audio, and processor for multimodal inputs).