For the attention of each of us!
From time to time I dare say I’m not the only one that stumbles upon a duplicate post. While I don’t know how others deal with this, if at all, I for one always report them to either Hunter or Monk for resolution regardless of who made the posts(myself included).
As the site grows it’s inevitable that this will happen but a simple search or two can tell us if an item is already here or not. I can understand it when an item originally appeared in a bundle and may not be individually searchable, one can easily be lead to believe that it hasn’t been posted.
Even then, when we’re filling in the submission boxes and asked for links, is it not easy enough to double check to see if it was part of a larger bundle at that point?
All I could suggest is that we all take the time to do a proper search, thus avoiding any bad blood further down the line when you suddenly realise that no-one is downloading what you shared because it’s no longer here! No-one likes to waste their time.
Please use the original title of the goods in question, copy/pasted directly from the source where possible, this will make items more easily searchable and help avoid duplicate posts due to abbreviation usage or spelling mistakes.
I follow a simple process when I’m browsing the blog that I respectfully suggest everyone should consider. I always skip ahead a number of pages into the blog, and keep going until I see a page that I recognize from a previous visit. I don’t open posts yet, and certainly don’t download anything. Then, I start reading backwards towards the first page and open the first (and only the first) post of any interesting item that I encounter. Of course, when I say first, what I really mean is earliest, because I’m reading backwards.
Even if we assume that the vast majority of duplicates are caused by carelessness, not by a deliberate attempt to usurp someone else’s post, this approach still provides an incentive to avoid duplicates. It takes time to put a post together. If people check for duplicates a bit more carefully before they create a post, they can avoid wasting their time creating a post that no-one will read. This won’t stop duplication, there will always be honest mistakes, but it might help to reduce it.
I don’t want to get into a debate about whether we should have points on Zone, let’s just take it as read that we do. I would like to see duplication reduced, and anything that might help is worth thinking about.
@hunter, et al.
Hi all, duplicates are definitely a real problem with organization’s data so I offer you a simple IT solution that I created with the company I work for. It just requires SQL and the use of a scripting language that works on your installation.
When I cleaned data for my client’s WordPress site, we had a MySQL database with membership and product tables. A member could have entered data for a certain product more than once but we needed to find unique instances of the first person that entered a product into a catalog to give bonus credits to the first one that entered the product. I think we can use the same pseudocode-logic to help solve the duplicates problem here at ZGFX.
Step 1. Use SQL to make a list of all Post IDs, Date they were entered into your system, Source hyperlink where duplicate original source HTTP links exist. I know I said ‘where’ but there will be ‘Group By’ and ‘Having’ clauses in this query instead. Order By the entry date ASCending. Dump the list on to a delimited text file
Step 2. Clean the list of the first instance of a post’s original link on Daz/Rendo/wherever. The first instance of the hyperlink is the first post that made it into the system. You can automate this with your scripting language of choice, I use Perl since it is the bomb with text handling. Once you remove the first instance you are left with records that are duplicates that got posted after the first original poster.
Step 3. Collect the remaining POST IDs. Again with your scripting language of choice collect just the POST IDs.
Step 4. Build a text file of the collected IDs and arrange them in a comma separated manner and encapsulated by ‘()’. You see where I am going next right? 😉
Step 5. Build the DELETE SQL to remove the collected IDs aka.
Yes, this solution is long winded and you can do all these steps in a gigantic spaghetti-coded complex SQL call all at once but for maintainability, the steps are broken down into 5 easy chunks. Also, these five steps can be executed through a cronjob on your web server so it happens once or twice a week on a schedule.
If you want to implement something like this and need assistance do not hesitate to PM me. It is a simple process to wrap code around…
I noticed a very obvious dupe on the blog today and the approval page also has several of them on it. It seems almost like the uploader don’t even care, and even worse, they seem to be approved.
Right now there are 2 Genevieve 7 Pro Bundle approved almost next to each other on the blog. I have no clue about how the approve processes works and I am very grateful for the work that goes into this page. But that one should have been easy to spot.
I have a system to eliminate dupes from my own collection, but it has 1-2 days delay. On other sites it rarely mattered as the items don’t cost anything. But here they do, so it is annoying to spend points on duplicates.
As a band-aid to fix the going-forward duplicate problems –
Construct some PHP to take the original source link of the product, executes sql to query the database and if the source link already exists there, the ADD Post page PHP code stops the post from being added into the ZGFX Blog database and gives the poster a nice ‘Sorry, product already exists’ message.
You must be logged in to reply to this topic.