gpt-3 and photos

Is it likely to build several percentages of gpt-3 edifying neural community to be in actuality expert in image classification? are there any architectures for mixing text and photos together on a single neural community (the same to how a human bran has its possess areas that specialise in various tasks)?