Nowadays, the most impressive feature of the AI system is that the teaching machine can perform various tasks, whether it is real-time translation of speech or accurate distinction between Chihuahuas and blueberry muffins. However, this process still requires manual training of personnel to carry out a lot of grasping and data sorting. However, the emergence of self-supervised learning (SSL) methods that have revolutionized natural language processing may become the key to giving AI some much-needed common sense. Facebook’s AI Research Department (FAIR) first applied SSL to computer vision training.
“We have developed SEER (SElf-supERvised), which is a new self-supervised computer vision model with billions of parameters, which can learn from any random image group on the Internet without the need to carefully perform most computer vision The curatorial and marking work of the company will be trained today,” Facebook AI researchers wrote in a blog post on Thursday. In the SEER case, Facebook showed it more than one billion random, unmarked and uncurated public Instagram pictures.
Under a supervised learning program, Facebook AI chief scientist Yann LeCunn told Engadget: “To recognize speech, you need to mark the words that are pronounced; if you want to translate, you need to use parallel text. To recognize images, you need to write on each image put on label.”
LeCunn explained that, on the other hand, unsupervised learning is “the problem of trying to train the system in an appropriate way to represent images without the need to label images.” One such method is joint embedding, in which a pair of nearly identical pairs are presented to the neural network. Image-the original image and a slightly modified and deformed copy. LeCunn said: “You train the system so that any vector produced by these two elements should be as close to each other as possible.” “Then, the problem is to make sure that when the system displays two different images, it makes a difference. Vectors, which is what we call different “embedding”. A very natural way is to randomly select millions of different images that you know about, run them through the network, and hope to get the best image.” However, Given the scale of the necessary training data, comparison methods such as these tend to take up a lot of resources and time.
Applying the same SSL technology used in NLP to computer vision presents other challenges. As stated by LeCunn, semantic language concepts are easily decomposed into words and discrete phrases. “But for images, the algorithm must decide which pixel belongs to which concept. In addition, the same concept can be very different between images, such as cats viewed in different poses or from different angles,” he wrote. “We need to look at a large number of images to grasp the changes surrounding a single concept.”
In order for this training method to be effective, researchers need both algorithms that are flexible enough to learn from a large number of unannotated images, and complex networks that can classify the data generated by the algorithms. Facebook found the former in its recently published content According to LeCunn, “using online clustering to quickly group images with similar visual concepts and take advantage of their similarity” is six times faster than the previous state-of-the-art technology. The latter can be found in RegNets, which is a complex network that can apply billions (if not trillions) of parameters to the training model while optimizing its functions based on the available computing resources.
The results of this new system are impressive. After pre-training with billions of parameters, SEER outperforms the most advanced self-supervision system on ImageNet with a score of 84.2% . Even if only 10% of the original data set is used to train it, SEER can still achieve an accuracy of 77.9%. Moreover, when only 1% of the OG data set is used, SEER can still achieve a top-1 accuracy of 60.5%.
In essence, this research shows that, like NLP training, unsupervised learning methods can be effectively applied to computer vision applications. By increasing flexibility, Facebook and other social media platforms should have better capabilities to handle prohibited content.
LeCunn said: “What we want to have is already there to some extent, but what we need to improve is a general image understanding system.” “So whenever you upload a photo or image on Facebook, the The system will calculate one of the embedded content, so we can tell you that this is a photo of a cat, or that you know it is propaganda by terrorists.”
Like other AI research, LeCunn’s team is also releasing its research and SEER training library under an open source license, called VISSL.If you are interested in rotating the system, please go to To get other documentation and get its GitHub code.
to request modification Contact us at Here or [email protected]