Digg rolled out new major updates to their dupe detection technology and content submission process. Mainly aim to eliminating duplicate submissions on digg.com
So what the hake dupe detection technology after all? Let's see. As you know Digg.com getting new stories that is based on users. Most common are the same stories from the same site, but with different URLs that is submitted by different digg users. so to analyzed the types of duplicate stories being submitted, Digg's R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. According to Digg, it has proven to be a reliable way of identifying identical content from the same source.
Another common type of duplicate is the same or similar story covered on different sites. so when user submit the content it was hard to find the duplicates, because the same content cumming from different sources and Because this enters more subjective territory, Digg focused on doing a better job at detecting dupes with similar descriptive information. By help of Digg’s improved search technology which released a couple months back Digg now able to match stories with similar titles and descriptions with much higher accuracy than before.
So with the combination of Digg's dupe detection and search technology now the site able to get and identify more comprehensive and unique content from users. To do this digg made changes to its submission process
While in pilot, the new dupe detection system will continue to only block submissions of the exact same URLs within a 30-day period. Digg also monitoring when certain Diggers choose to bypass high-confidence duplicates and will use this data to continue to improve the process going forward.
Digg gonna highlight how it works for its dupe detection technology in a separate tech blog post later.
So what the hake dupe detection technology after all? Let's see. As you know Digg.com getting new stories that is based on users. Most common are the same stories from the same site, but with different URLs that is submitted by different digg users. so to analyzed the types of duplicate stories being submitted, Digg's R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. According to Digg, it has proven to be a reliable way of identifying identical content from the same source.
Another common type of duplicate is the same or similar story covered on different sites. so when user submit the content it was hard to find the duplicates, because the same content cumming from different sources and Because this enters more subjective territory, Digg focused on doing a better job at detecting dupes with similar descriptive information. By help of Digg’s improved search technology which released a couple months back Digg now able to match stories with similar titles and descriptions with much higher accuracy than before.
So with the combination of Digg's dupe detection and search technology now the site able to get and identify more comprehensive and unique content from users. To do this digg made changes to its submission process
While in pilot, the new dupe detection system will continue to only block submissions of the exact same URLs within a 30-day period. Digg also monitoring when certain Diggers choose to bypass high-confidence duplicates and will use this data to continue to improve the process going forward.
Digg gonna highlight how it works for its dupe detection technology in a separate tech blog post later.