HomeMobile DevicesDefining AnTuTu Benchmarks – Samsung Faking Scores Nikolas Nikolaou September 2, 2013 Mobile Devices, Tech News For those of you that do not know, AnTuTu is a benchmark-scoring app used by tech journalists and analysts alike to compare smartphones and tablets on the market. Have you ever seen blog posts with screenshots of points for devices, surrounded by a black background? If so, then you’ve seen the AnTuTu user interface. The app is a free one from the Google Play Store, but Apple does not offer AnTuTu benchmark scoring on iOS from the App Store. In fact, AnTuTu only scores Android devices, no iPhones or iPads included. The goal of AnTuTu is to run some diagnostics tests on your smartphone or tablet and provide an overall score for your phone or tablet’s performance. That is, you receive a score – a number – that you can then compare and contrast with other numerical scores for other Android devices. Let’s say that you run tests on your phone with the AnTuTu benchmark app and come up with a score of 10,000 (as I did on my Verizon Galaxy S3). Once you get your score, you are allowed to view a list that shows the performance of other devices. I can look on the chart and compare my GS3 with, say, the GS4 or the Xperia Z. Since I’ve done this kind of testing before, I can tell you that the Xperia Z tends to come out with a 19,000 + score – a rather high-scoring device, seeing that it just arrived on the market a few months ago. The GS3, on the other hand, is a handset with a dual-core processor (not the quad-core processor of the Xperia Z) and does not have the juice of the Xperia Z. I wouldn’t expect the GS3’s score to outwit the Xperia Z. Keep in mind that the score is a number, and there’s no comparison chart that shows you that scores over 12,000 are most impressive, for example, or that scores under 10,000 demonstrate that the device is too slow. There are many consumers who download these benchmark scoring apps to see what they’re all about, then shrug and say, “I have no clue as to what these numbers mean.” What do the numbers mean anyway? Well, they’re comparative – when stacked up against other numbers, your device will either perform better or worse than other devices. How then, can one compare his or her device to others on the market if he or she cannot interpret the meaning of the numeric score? Interpreting Your Numeric Score from AnTuTu How can you interpret your score from AnTuTu? The secret to interpretation is to find a point of reference by which you can score all other devices. Some say you don’t need this one point of reference, and you can still compare and contrast devices without one numeric score. That is true; at the same time, however, you cannot understand how great of a performance your device provides if you do not know the highest scoring level to which current devices can reach. One good example of this pertains to vehicles and miles per gallon (or MPG). If your vehicle can get 20 miles to the gallon, you may feel good about your vehicle and think “my vehicle’s reliable.” However, what would you say if you discovered that the greatest vehicles on the market (within your preferred price range) could get 40 MPG? Your car may be reliable, but it looks average (and more like a gas guzzler) in light of this additional information. Without the additional information, you may feel confident about your vehicle – but you may be alone in your confidence. Knowing the optimum amount of MPG helps you to know whether or not other consumers have the same vote of confidence in your vehicle that you do. It is no different with restaurants or movies, which is why reviews have become a huge deal to consumers on their smartphones. When Apple introduced iOS 6, it did so by including Red Tomato reviews of movies and Yelp reviews of restaurants so that you would be able to weigh whether or not to visit a restaurant based on what the majority of other individuals think about the movie or place. Google’s App Store does the same thing: when you prepare to download an app, you can read numerous reviews of the apps by consumers, with the most recently submitted reviews presented to you first. This means that, unlike the guy who complained about the app crashing 2 years ago, the app may actually be more stable now and deserve better reviews than his – and you get to read the newest reviews first. Often, when a developer’s app receives terrible ratings, app developers will often respond and tell grieving individuals to write them and contact them for help with getting the app to work. The developer’s willingness to cooperate is not based on general kindness, but the desire to see his or her app receive excellent reviews. If a developer appears to not care about an individual’s complaint, the app will receive terrible reviews and no one will purchase it. In short, reviews provide a general consensus about things such as vehicles, movies, and restaurants, allowing us to understand an object’s reputation among the general public. Therefore, it doesn’t make sense to run AnTuTu’s diagnostic tests, receive a number, and have no way to interpret it. You can only interpret your AnTuTu score properly when you have some other device’s numerical score with which to compare your own. The next question you are likely to ask is, “How do I find the optimum numeric score?”, and the answer is coming up next. How to Find the Top Numeric Score in AnTuTu Once you run your diagnostic tests by way of AnTuTu, you will receive a score on one page, but have other pages to glance at afterwards. If you look at the bottom of the app, you will see a list of categories: Test, chart, info, and forum. The test section finds the numeric score for your phone or tablet’s performance; the chart shows you how your device stacks up to others that are currently on the market. Info provides information on your phone and/or carrier if your phone is activated, and the forum allows you to post comments and ask questions. The forum feature, however, is still listed as “coming soon,” so the app is still a work in progress. In any case, the “chart” section will provide you with the device comparisons you need to see how great or less-than-great your device performs when stacked up against other popular devices. To find out how your numeric score stacks up to other phones you may be interested in, simply click on one of the new phones that are listed. The list of phones is not exhaustive, and there are far more phones in AnTuTu for benchmark comparisons than tablets – a fact upon which the developer of the AnTuTu app must continue to change. The Nexus 7, Nexus 10, and Galaxy Tab 7.0 seem to be the only three tablets added into the mix of smartphones. This year, Google announced its 2013 Nexus 7 and Samsung announced Galaxy Tabs 3 7.0, 8.0, and 10. Sony announced its Xperia Z tablet (not to be confused with the smartphone by the same name), and looks to announce its Xperia Z Ultra phone/tablet on September 4th. These devices should be factored into the equation as well, seeing that consumers will buy them and tech enthusiasts will want to have an idea of how excellent the performance of their device is (on average). Not only is the selection of devices small, but there are no benchmark scores for iPhones and iPads running iOS; while I realize that AnTuTu may be an Android tech geek’s app, I still think that AnTuTu can make the effort – particularly when you consider that Apple approved the addition of Geekbench 3 to its App Store recently. Geekbench is another benchmark scoring app that I will cover soon. Back to finding the top numeric score: if you look at the top of the chart section, you will see the phones that rank as the highest-scoring devices currently: the GS4 and the HTC One sit at the top of the list. If you touch the “compare” box to the right of the phone titles, you can see what your score looks like against these top phones. Do you see the numbers and the green light above the titles? These are category designations, and the top designation consists of the phones that score from 20,000-40,000 points on AnTuTu. In this category, the HTC One and Galaxy S4 sit alone. Beneath the top category is the 15,000-20,000-point category. This is where most of the phones on the market sit currently. The Galaxy S4’s AnTuTu benchmark score comes in at 25,000-28,000, while the HTC One’s benchmark score runs somewhere between 23,000 and up. In AnTuTu’s scoring, then, the Galaxy S4 reigns supreme. Thus, when you compare your Nexus 4 AnTuTu benchmark score with that of the GS4, you will find that the Nexus 4 runs decent but is quite a ways behind in its performance. The GS4 has the potential to do far more than the 2012 Nexus 4 can. This makes sense when you consider that Google’s Nexus 4 is almost twelve months old. AnTuTu’s Current Benchmark Scores While the GS4 and the HTC One reign supreme on AnTuTu, however, the application is now a bit behind in its smartphone stats. After all, LG Electronics just announced the arrival of its LG G2 to the smartphone market, and this phone (along with Samsung’s GS4 variants and Google’s Moto X) has not yet been included in AnTuTu’s current Android smartphone comparisons. AnTuTu must update frequently, however, seeing that new smartphones emerge every few months. This is one of the complaints I have with the app, although I love the interface and it seems easy enough to use. What many consumers may not know about the LG G2 smartphone is that its AnTuTu benchmark score is remarkable: A device has leaked in AnTuTu benchmark scoring that places what is labeled the “LG-F320” at 32,002 points. The LG-F320 appeared on July 1st, and the LG G2 was unveiled earlier this month. Since the LG G2 was the successor smartphone to the LG Optimus G, we believe that this is the phone to which LG refers in benchmark tests. It makes sense when you consider that the LG G2 is the first smartphone to arrive on the market with Qualcomm’s 2.3Ghz, Snapdragon 800 processor (and the LG-F320 device clocked in with a Snapdragon 800 processor on AnTuTu) and that this is about 4,000 or fewer points above Samsung’s GS4, which clocks in in the 27,000-28,000-point range. Depending on your network and so on, your GS4 may clock in closer to 30,000 points. In AnTuTu’s benchmarks regarding other phones, we’ve also been able to view what Sony’s upcoming Xperia Z Ultra will look like in benchmark tests, and the new Sony phablet (coming in at 6.4 inches, to be exact) scores a whopping 34,758 on AnTuTu. The new Galaxy S4 LTE-Advanced variant of Samsung’s acclaimed GS4 model scored an impressive 31,491, which means that these new devices will all topple Samsung’s GS4 as well as HTC’s One models. If you thought your 25,000-point GS4 model on AnTuTu was amazing, these scores will make you feel “human” again. As of now, I’ve covered what AnTuTu is, how it can help you with a tech gadget purchase, and how to find that point of reference that will help you compare your device’s numeric score with other devices. I will cover the attack made against Samsung and how AnTuTu fits in all of this in my next post. Stay tuned… AnTuTu ties in to Samsung and the recent accusation made against the Korean manufacturer. It may be obvious to you (or not), but I am a huge fan of Samsung products. I own a Galaxy S3 as well as a new waterproof Galaxy S4 Active. I do not care so much for Samsung’s plastic backplates as some may, but I like the look and feel of Samsung’s products and appreciate why the company relies on its plastic finish so much. It’s interesting that, for all the conversation Apple fans have made against plastic, they’re sure interested in Apple’s new plastic iPhone 5C. A little odd, don’t ya think? Samsung’s Intentional Error Anyway, back to the issue at hand. Samsung has been accused in recent weeks of forging its benchmark scores on AnTuTu in order to push the company’s smartphones above the competition. The situation is complex, and each side has something to offer regarding the problem at hand. First, I’ll start with the tech site that “discovered” the problem, Anandtech. Anandtech tested the performance of the international, Exynos 5 octa-core processor Galaxy S4 using a few top-of-the-line benchmark testers. In a nutshell, Anandtech discovered that Samsung has fixed its highest CPU (central processing unit) and GPU (graphics processor unit) frequencies for certain apps; this means that you can only experience the higher frequencies when running certain benchmark testing applications. AnTuTu and Quadrant, two benchmark-testing applications on Android, record higher scores overall for the GS4’s performance in its GPU frequency of 532Mhz; when it comes to other benchmark-testing applications such as GFXbench 2.7 and Epic Citadel, the GPU frequency is clocked in at a lower level of 480Mhz. Depending on what benchmark application you use as a result, you will record different scores. What makes the situation even worse for Samsung is that Anandtech not only found that different benchmark sources record high or low frequencies, but that within written code, these differences were actually inserted as commands, with the words “process_changed” following the specific apps (AnTuTu, Quadrant, Linpack, etc.). Written code does not lie, so it seems as though Samsung really did change these apps and intentionally placed maximum GPU clock speeds into the system. This was a deliberate action and there is no way to explain this away. The question that many consumers would ask is “Why?” Why would Samsung do such a thing? Anandtech claims that Samsung did this “to produce repeatable (and high) results in CPU tests, and deliver the highest possible GPU performance benchmarks.” Klug and Shimpi, said that this has been done with PCs in the past, as a way to limit device overheating and possible electric blowout when testing devices on stress-limit tests. I agree with Anandtech in this regard: this seems intentional to me from Anand’s work on the subject, and seeing the words “process changed” doesn’t help the matter. I doubt that the written code here would lie about the process. This still hasn’t stopped Samsung from responding to Klug’s and Shimpi’s claim that the company “doctored” the benchmark testing results. A short time after Anandtech’s claim, Samsung responded with the following, according to the The Telegraph: “The maximum GPU frequency is lowered to 480Mhz for certain gaming apps that may cause an overload, when they are used for a prolonged period of time in full-screen mode. Meanwhile, a maximum GPU frequency of 533Mhz is applicable for running apps that are usually used in full-screen mode, such as the S Browser, Gallery, Camera, Video Player, and certain benchmarking apps, which also demand substantial performance. The maximum GPU frequencies for the GALAXY S4 have been varied to provide optimal user experience for our customers, and were not intended to improve certain benchmark results.” I have underlined the above statement for emphasis, to show that, from Samsung’s perspective, maximum GPU frequencies were changed in order to prevent the device from overloading. This does not mean that you cannot access the 533Mhz GPU frequency when using your video player, however. I applaud Samsung for this response, but doesn’t it make more sense to not provide the maximum frequency if most applications use the smaller one of 480Mhz? And, if the company provides the maximum GPU frequency, should it not also provide a qualifying statement regarding the clocked-in frequency for different tasks on the GS4? While the changed GPU and CPU frequencies were altered on the international GS4, The Telegraph also notes that Samsung tampered with its frequencies on the American version of the GS4 as well, noting that “it [Anandtech] claimed that the US S4 model’s processor jumped to the highest performance mode and stayed there, regardless of actual usage.” If Anandtech’s claim is true (and the site is well-known for its accurate claims and tests), then Samsung has a problem on its hands. While the company’s statement may address the international GS4 with an octa-core processor (Exynos 5), what explains the constant, 450Mhz GPU frequency recorded with regard to the American version? If the international version has its GPU frequency restricted due to device overload, what about the quad-core processor of the American version? The truth is that the highest frequency clocked in by the American GS4 is around 450Mhz, and Anandtech says that, in its tests, the American GS4 remained at 450Mhz despite high usage or low usage. Unfortunately, this doesn’t bode well for Samsung. If Samsung claims that it is forced to limit the GPU frequency for international devices, why not limit those same tasks on American GS4 models? Why is it that the company had to limit the tasks on the international model, but maintains the highest frequencies on the American model? One could claim that the additional cores of the international model may have driven the device to overload (while the American version does not). At the same time, however, the American model can also overload – and a number of Americans have experienced a Galaxy phone explosion in the past. A friend of my cousin recalls that she purchased her device and was watching a movie while playing a game when her device got too hot in her hands and exploded in her bathroom after she left it there to cool off. At the end of the day, Samsung does not have an answer for why the company’s GS4 shows high benchmark scores in famous benchmark apps for certain activities and not others – if you count the fact that Samsung does not limit the maximum frequencies on the American GS4. Samsung could say that the American model does not reach 533Mhz (but only 450Mhz) and doesn’t need to be limited. If this is the case, however, it would do Samsung some good to acknowledge this publicly without waiting on the back end until a reputable site such as Anandtech comes along and exposes Samsung for what many perceive as intentional lying. Whether or not Samsung is right or wrong here remains to be seen, but it is an issue worth discussing. Do you think that Samsung should limit the maximum frequency on its devices and report the high score (even if most tasks on the smartphone or tablet will not reach the highest score)? You will understand now the problem with the optimized scores in benchmark tests. When it comes to Samsung, at least, you can tell that the company uses its highest frequency on some tests and the lower frequency in others. This does seem dishonest indeed, but we have to remember that Samsung claims the 533Mhz GPU frequency on the international GS4 is attainable for certain tasks. The rub comes in when you consider that the top GPU frequency is not available for all tasks. Samsung’s so-called “forgery” of its benchmark results have led many in the consumer community to put a blanket claim over all benchmark scores and label them “useless.” It doesn’t take a rocket scientist to see this: just visit some random tech blogs that have lots of comments in user forums and even comments over this Samsung issue, and you will see that there are those who think that benchmarks are unreliable and not worth much. Where does AnTuTu come in, you ask? AnTuTu is one of the benchmark apps in question that records the 533Mhz of the international GS4; thus, when you run diagnostics tests on the international GS4 using AnTuTu, you will likely receive a score of 25,000+. However, the benchmark apps such as AnTuTu show you the maximum potential of the device – which differs from the actual real-time results. With this said, the question on the table is the following: is 533Mhz the maximum performance of the GS4 in GPU performance? The answer to the above question is a tricky one, for you can only reach 533Mhz in certain tasks without smartphone overload. Therefore, Samsung should qualify its statement in the future about 533Mhz. The best way to do this is to say that “Note: 533Mhz can only be reached when performing certain tasks; your performance is fixed at the lower GPU frequency of 480Mhz in performing other tasks that could cause your device to overload.” This may help prevent the Anandtech incident from occurring in the future. Whether the current incident occurred because of Samsung’s forgery of the GS4’s performance or not, Samsung should take a proper course of action moving forward so that this does not occur in the future. The next time it happens, tech writers and enthusiasts will respond in a more severe manner than before. The Samsung forgery issue has brought a question to the forefront about benchmark apps: are benchmark apps useful? Are benchmarks beneficial? Most benchmarks out there on the market show you the maximum potential of a device, and, until recently, were the only kinds of evaluation apps on the market. Now, however, Geekbench 3 has come to the rescue.