Calculate Trending Topics and Sentiment Trends from Live Stream of Messages

The conceptual motive behind the project is to quantify the information that flows through Twitter via finding Trending Topics from live Twitter Stream, with a hidden technical motive of building a scalable system, and face and solve the challenges encountered during the system construction Continue reading

Building OSX presentation

Parts:

Guides:

TonyMacX86 has long been the go-to website for Hackintosh builds, and moderator ammulder does not disappoint with one of the most comprehensive guides I’ve ever seen. I based my build off of his instructions (which he continues to update/maintain) and I experienced zero issues. If you’re going to build a Hackintosh, this is as easy as it gets. Guide is found here.

 

How To Register Millions of Fake Accounts With Ease

iPhones Charging

 

Fake accounts are a bigger problem than ever. With so many new security, why are they still so prevalent? Recent studies show that approximately 10 percent of accounts on social media sites are fake [1,2]. Other reports are more drastic: Instagram’s crackdown on spam fake accounts in December of last year exposed 18.9 million (29 percent) of followers of the Instagram official account as fake [3].

Really, is it that easy to register so many fake accounts? Sounds too good to be true. The reality is that there are many “helper” tools that enable bad actors to evade traditional security measures. Free voicemail services like K7 and Laser Voicemail provide disposable numbers to bypass phone verification. Guerrilla Mail, Mailinator, Fake Mail Generator are just a few of the providers of anonymous, temporary email addresses. Captcha solver services, many manned by human labor in Southeast Asia (see Figure 1), can cost as low as $0.5 for 1000 images. Anonymous proxies, VPNs (e.g., HideMyAss, FilterBypass, ZenMate), and cloud hosting services allow traffic to appear from different locations, defeating blacklisting or IP-based rules.

Workers Distribution by Countries

To make it even easier for attackers, there are all-in-one account creator software that automates all of the above for you, such as the $2,500 (two PC license) deal from spamvilla.com, and “click farms” where fake accounts are registered manually and resold for different purposes [4]. Even dedicated hardware, i.e., jailbroken iPhones, have emerged in China. The phone comes complete with not only account creation capabilities for multiple online services (WeChat, Momo, Bilin, iAround, Weju, and Moca), but also automated messaging scripts and IP changer software for $550 – $700. The title image at the top of this post is a screenshot of the jailbroken iPhones being programmed by the seller.

Taobao ad for all-in-one “fraud” phones.

The table below summarizes the security solutions commonly used at online services, and the attack techniques to defeat them.

Security Solution & Attack Techniques Table

Why are fake accounts so attractive? The sophistication of online services today has opened up lucrative opportunities for criminals. As mentioned in our earlier blog post, many service features including social reputation, ad impressions, promotional/reward points, and in-game virtual items can be converted into real-world gains. If account creation software alone costs $2,500, the profit that can be milked out of the fake accounts must be many, many times greater – at the cost of the online service.
References
[1] Emil Protalinski. “Facebook estimates that between 5.5% and 11.2% of accounts are fake.” The Next Web 3 Feb 2014. http://thenextweb.com/facebook/2014/02/03/facebook-estimates-5-5-11-2-accounts-fake/
[2] Lara O’Reilly. “8% of Instagram accounts are fakes and 30% are inactive, study says.” Business Insider< 2 Jul 2015. http://www.businessinsider.com/italian-security-researchers-find-8-percent-of-instagram-accounts-are-fake-2015-7
[3] Vindu Goel. “Millions of fake Instagram users disappear in purge.” The New York Times 18 Dec. 2014. http://bits.blogs.nytimes.com/2014/12/18/millions-of-fake-instagram-users-disappear-in-purge/
[4] Doug Bock Clart. “How click farms have inflated social media currency.” New Republic 20 Apr. 2015. http://www.newrepublic.com/article/121551/bot-bubble-click-farms-have-inflated-social-media-currency[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

CouchDB-Lucene with Tomcat

I had fun trying to get CouchDB-Lucene working with Tomcat Server (7.X). I wanted it running in Tomcat since I’m quite familiar Tomcat’s care and feeding (i.e. install/configure/upgrade routines). It also helps that I’ve always had terrible luck getting any CLI utility, like the one provided by CouchDB-Lucene, running reliably as a service. I finally got it working- Here are my notes from that endeavor.

Background


The way CouchDB-Lucene (I’ll call it CDBLucene here for brevity) works is that you configure your CouchDB instance to proxy certain HTTP requests, specifically requests for the _fti path, to the CDBLucene instance. CDBLucene is then responsible for returning the search results.

CDBLucene, of course, can’t do this alone. It has to be configured to know where the parent CouchDB instance is located and how to connect to it.

cdblucene-tomcat.png
My Crude Illustration of the CouchDB-Lucene Request/Response Flow

Note that it’s entirely possible (and probably desirable in many situations) to query CDBLucene directly for the results and bypass the additional load and overhead on the CouchDB instance.

cdblucene-tomcat-direct.png
Alternate (Direct) Use of CDBLucene- Bypasses CouchDB HTTP Proxying

I wanted to have our primary CouchDB 1.6.1 instance run on a separate VM than the CDBLucene instance. Since they run on separate JVMs and communicate using HTTP, this should be technically possible.

To do so, we need to:

  • Configure CouchDB → CDBLucene: Configure CouchDB to forward requests to CDBLucene, and
  • Configure CDBLucene → CouchDB: Configure CDBLucene to know how to reach back to CouchDB to read the appropriate index

Again, note that step 2 is entirely optional. You can simply query CDBLucene directly since CouchDB is nothing more than a glorified HTTP proxy for lucene searches.

Assumptions We had two Ubuntu VMs already set up and running in AWS. One had our existing installation of CouchDB, complete with an existing database and design view with a lucene index already created. The second VM that will host the CDBLucene instance was freshly created and running Tomcat as an upstart service (installed courtesy of apt-get with the tomcat7 image).

Configuring CDBLucene → CouchDB

This is what gave me a minor headache. The CDBLucene documentation is non-existent, so I had to jump through the source a bit to figure out what’s going on. Here’s what I ended up doing:

From the machine you intend to host the CDBLucene instance, clone the GitHub repository.

git clone https://github.com/rnewson/couchdb-lucene.git

This will create a clone in a new couchdb-lucene directory. Once completed, we need to make some modifications since CDBLucene depends on an embedded couchdb-lucene.ini that gets rolled up into the war file. Edit the couchdb-lucene/src/main/resources/couchdb-lucene.ini file to reflect your configuration, making sure the url parameter points back to your CouchDB instance.

[lucene]
port=8080
host=[cdb_lucene-instance]
dir=lucene-indexes
timeout=10000
limit=25
allowLeadingWildcard=false

[couchdb-lucene]
url=http://[couchdb-instance]:5984

The [lucene] section refers to the CDBLucene instance, so include the host and port that you’ve configured Tomcat to use. Also note that the dir path will refers to [TOMCAT_HOME] as the base path, so you may want to change this or create [TOMCAT_HOME]/lucene-indexes directory manually.

You now should build the package. Change into the couchdb-lucene directory and start a maven build including the war:war goal:

cd couchdb-lucene
mvn clean install war:war

This will build a deployable war file in the /target directory. Since by default will create a war file with the full version name, I renamed it so that it’s a bit more memorable when you go to deploy it:

mv target/couchdb-lucene-1.1.0-SNAPSHOT.war couchdb-lucene.war

Now you can deploy the war to file to Tomcat by moving it to the [TOMCAT_HOME]/webapps directory. Assuming that you installed Tomcat using apt-get, this is as simple as:

cd couchdb-lucene.war /var/lib/tomcat7/webapps/

Tomcat should immediate detect the new file and move to hot-deploy it. Check the Tomcat instance to make sure the deployment occurred:

curl -L localhost:8080/couchdb-lucene

You should be able to connect but will receive a ‘bad_request’ reason:

{
	"reason":"bad_request",
	"code":400
}

The bad_request reason is expected since we didn’t provide any useful parameters. You should also check the couchdb-lucene logs to make sure the indexing was started correctly:

cat /var/log/tomcat7/couchdb-lucene.log

If successful, you should see an entry like:

Index output goes to: /var/lib/tomcat7/lucene-indexes

Great! Now back over to the CouchDB instance to finish the installation:

Configuring CouchDB → CDBLucene

This is actually pretty straight forward. As described in the CouchDB-Lucene docs, you’ll want to go into your CouchDB local.ini file (/etc/couchdb/local.ini on Ubuntu) and add a httpd_global_handlers reference to point to your CDBLucene Tomcat instance.

[httpd_global_handlers]

_fti = {couch_httpd_proxy, handle_proxy_req, <<”http://[TOMCAT_INSTANCE]/couchdb-lucene">>}

Once done, restart CouchDB and try to run a search:

curl -l http://[couchdb]:8080/_fti/[database]/_design/[view]/[search]?q=[search_query]

Note that the request may timeout the first time because CDBLucene is building its index.

Summary

Conclusion: The primary trick to this is to realize that most of the settings you need to separate CDBLucene and CouchDB reside in the CDBLucene packaged war. The easiest way to make necessary configuration changes is to edit the /src/main/resources/couchdb-lucene.ini file prior to packaging with maven.

If you forget or need to change it later, you can do so post-deployment by going into the [TOMCAT_HOME]/webapps/couchdb-lucene/WEB-INF/classes folder and editing couchdb-lucene.ini there. Restart tomcat and you should be ready to go (just be careful since any redeployment will overwrite this file).

Medication Identifiers and How to Check for Potential Problems

Let’s take a look at the various identifiers in this post. We’ll also review how you can use that data to check for related warning & alerts using the PillFill API services.

Prescription Structure

Let’s begin by looking at an example of a prescription collected by the PillFill RX aggregator (with some fields omitted):

{
  "rxNumber": "995575",
  "medicationName": "OMEPRAZOLE DR 40 MG CAPSULE",
  "pharmacyStoreId": "CVS #1183",
  "daysSupply": 30,
  "quantityRemaining": 30,
  "quantityPerDose":1,
  "dosesPerDay":1,
  "dispenseDate": "2014-04-30",
  "computedInactiveAfterDate": "2013-04-29",
  "previousDispenseDates": [
       "2013-04-29",
       "2013-03-01"
  ],
  "brandNames": [
       "Prilosec 40 MG Delayed Release Oral Capsule"
   ],
   "quantity": "30",
   "ndc": "00378522293",
   "rxNormId": "200329",
   "ndfrtNui": "N0000156769",
   "ndfrtName": "OMEPRAZOLE 40MG CAP,SA",
   "splId": "44260509-A91C-4906-BCC7-4EB5D3465DED",
   "uuid": "00153170CDEC5571782287505711C59EB63C",
}

The three most important identifiers included in the prescription data are:

  • The FDA’s National Drug Code (ndc)
  • The FDA’s Structured Product Label (splId)
  • NIH’s RxNorm ID (RxNormId)

We also make liberal use of:

  • The VA’s National Drug File Reference Terminology ID (NUI)
  • The FDA’s UNique Ingredient Identifier (UNII)

Identifier Relationships

The natural question to ask at this point is “Why so many identifiers?” Each identification system provides different levels of understanding to what the medication actually is — a question that can be surprisingly hard to answer at times. Let’s consider Ibuprofen as an example- we have to consider it from multiple perspectives:

  • Drug Ingredients: Which drugs are included / what are the active ingredients? (e.g. Ibuprofen, Acetaminophen & Caffeine)
  • Drug Concept: What’s the strength & form of the drug? (e.g. 200mg Ibuprofen pill, 500mg Acetaminophen pill)
  • Drug Package: How is the drug packaged or group together? Who manufactured it? (e.g. Equate 500ct Ibuprofen, Equate 200ct Ibuprofen 2-Pack, Equate 12ct Ibuprofen Convenience Pack)
  • Drug Product: Which one of the specific drug packages did the patient receive? (Equate 12ct Ibuprofen Convenience Pack)

Each of the above identifiers can help answer those questions:

drug-relationships.png

 

A very general hierarchy for OTC Ibuprofen (slightly different for RX drugs)

  • Ingredient = UNique Ingredient Identifier (UNII)
  • Concept = RxNorm ID
  • Package = Structured Product Label (SPL)
  • Product = National Drug Code (NDC)

Like with most hierarchal relationships, it’s important for accuracy to start with the most specific identifier available when trying to figure out where something belongs. That’s why PillFill focuses on gathering NDCs whenever possible, since all other relationships can be easily and accurately derived from there. If the NDC is not available, PillFill will associate at least RxNorm/NDFRT identifiers for each prescription.

Medication Warnings & Alerts

So now that we have a working understanding of the medication identifiers, how can we use them to check for potential problems and gain additional insights?

FDA Recalls & Shortage Alerts

The FDA issues recall & shortage alerts primarily based on NDCs. PillFill provides a RESTFul service for checking for such alerts. It’s as simple as you might hope- a HTTP GET call to an endpoint with the 11-Digit NDC from the prescription:

https://developer.pillfill.com/service/v1/alerts/fdaAlerts?ids=00603388821

If the drug is found to have any alerts, you’ll get them back in the result set:

{ 
	"ndc": ["00603388821"], 
	"type": "recall", 
	"reason": [ "Qualitest Issues Voluntary, Nationwide Recall for One Lot of Hydrocodone Bitartrate and Acetaminophen Tablets, USP 10 mg/500 mg Due to the Potential for Oversized Tablets" ], 
	"resolution": [], 
	"additionalInfoUrl":  "http://www.fda.gov/Safety/Recalls/ucm318827.htm"
}

Drug Ingredient Overdose Warnings

The FDA has published a Maximum Recommended Therapeutic Dose (MRTD) guideline which identifies the maximum amount of each drug ingredient is considered safe based on the weight of the individual (using mg/kg). PillFill offers another RESTFul service to handle these calculations for solid oral (pill-based) medications:

https://developer.pillfill.com/service/v1/interactions/mrtd?ids=[RX-ID_0]&ids=[RX-ID_1]…&weightInKgs=68

For this service to operate correctly, the prescription must have the SplId available and the quantityPerDose / dosesPerDay fields set for each prescription included. The service will then calculate the ingredient dose levels across all products and provide feedback if the value is over 90% of the MRTD level.

{
     "unii": "KG60484QX9",
     "ingredientName": "Omeprazole",
     "currentLoad": 91,
     "mrtd": 2,
     "relatedRxs": ["RX-ID_0","RX-ID_1"]
}

Drug/Drug Interactions

To check for potential drug interactions, we’re going to use a NIH-provided RESTful service. It requires the RxNorm ID of each drug to be considered. Again, it’s a relatively simple RESTful GET request to find potential interactions:

http://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis=207106+152923+656659

When an interaction is found, a response is generated with some (fairly basic) detail about why the interaction is relevant:

{
	"fullInteractionType":[
	{
		"minConcept":[
		{
			"rxcui":"152923",
			"name":"Simvastatin 40 MG Oral Tablet [Zocor]","tty":"SBD"
		},
		{
			"rxcui":"656659",
			"name":"bosentan 125 MG Oral Tablet","tty":"SCD"
		}],
		"description":"Bosentan may decrease the serum concentration of simvastatin by increasing its metabolism. Monitor for changes in the therapeutic and adverse effects of simvastatin if bosentan is initiated, discontinued or dose changed."
	},{
		"minConcept":[
		{
			"rxcui":"152923",
			"name":"Simvastatin 40 MG Oral Tablet [Zocor]",
			"tty":"SBD"
		},
		{
			"rxcui":"207106",
			"name":"Fluconazole 50 MG Oral Tablet [Diflucan]",
			"tty":"SBD"
		}],
		"description":"Increased risk of myopathy/rhabdomyolysis"
	}]
}

Summary

Each identifier is useful in different ways, especially considering each service you’ll want to use will require levels of specificity. Once you understand how each interrelates though, you’ll realize the power associated with each of the different identifier and their terminology/information models.

That said, there are countless other checks you can also preform given the specific information found in the prescription- Ingredient allergies using UNIIs, Drug & Condition/Disease Contraindication Checks, etc. Each can help a patient potentially avoid a serious problem that they otherwise would have never considered otherwise. For PHRs to ever go mainstream, they must and should help to that end. It’s no longer enough to simply track pills.

This is an intervention. Stop storing secrets in your apps.

This is a repost of a submission I made to Reddit’s /r/androiddev subreddit. You can pull up the thread to see the ensuing discussion.


Reading the comments on the cloned app post nearly gave me a nervous breakdown. Fellow devs, please sit down and let’s have a frank conversation about your android app’s security.

It’s time you hear and accept this: Your app doesn’t love you and it will happily betray your trust in the right hands. Don’t believe me? Check out the countless posts from /r/netsec, /r/reverseengineering, /r/pwned about breaking into android apps. A skilled RE (Reverse Engineering) hacker will quickly convince your app to give up all of its closely guarded secrets and do all of those nasty little actions that you took such great pains (and unit tests) to prevent.

Now I know this is hard to hear and I can already anticipate what you’re going to say:

  • But hacking is impossible without access to my apk! Getting access to an apk is simple. Once downloaded to a rooted phone, pulling it via ADB is one adb pull /data/app/[your_apk_name] command away. You can even download an APK from Google Play with the right help.
  • But I used Proguard on my code! Great! Did you include any sensitive URLs, strings, keys, passwords, or other resources in your app? Then you’re still screwed. Proguard only makes your app’s code “harder to reverse engineer”– extracting all of those juicy unprotected strings and resources from your code is still insanely easy. Even modifying a proguard-protected app to bypass a boolean security check is the first thing that you learn to do when patching an app.
  • But I cleverly disguised and encrypted my strings/keys/urls! Do you ever communicate those back to a server? There are plenty of tools that will readily intercept and display any network communication and endpoints- yes, even if it’s SSL/TLS protected. Even if you were careful to never reveal them on the wire, it’s still possible to yank the sensitive bits directly from the memory a running application.
  • But I implemented certificate pinning! That’s a good thing for your users’ security, but it give you no assurances regarding the trustworthiness of your app. Certificate pinning is still easily bypassed if someone intends to do so.

Bottom line: Just like with testing, you should never assume that your mobile app will only operate the way you originally intended it to. Ever. Here are the harsh realities of android development (or any mobile app for that matter):

  • How do I prevent someone from subverting/cloning my app? You can’t. You can make it harder through obfuscation, indirection, etc. In most cases it’ll only provide an annoyance and delay to a determined reverse engineer- a scenario that you absolutely will encounter if your app is at all successful. Also remember that each deterrent you bake makes your app more difficult to manage and is another opportunity to anger an otherwise legitimate user when something goes wrong. Even for companies like Google that do prevent most client-side exploitations, it’s a never ending, incredibly expensive arms race between their vast security teams and individual hackers.
  • How do I perform business-sensitive operations then? If you need to do something that you must be absolutely sure is not subverted, only do them on a server you own and control. This is especially true with any financial transactions, including verification of in-app purchases.
  • But I don’t have the money for a server. Stand up a free AWS instance, heroku dyno, or a $5/month DigitalOcean droplet. If you’re serious enough about your app to try and make money from it, keeping your most sensitive operations on systems you own and control is a simple and cheap safeguard.
  • This all sounds a bit nihilistic. Why do anything then? Realize that there’s two different types of security you should be considering- User Security and Business Security. You should be focused on user security with your mobile apps- it’s about making sure that your users don’t accidentally do anything that would negatively impact them. Things like cert pinning, TLS/SSL, verifying inputs, etc. are all important for user security. But, just like in life, you can’t protect people from hurting themselves or doing bad things. So for your own security, you must always assume that you have malicious users out there (i.e. users of your mobile app) that will lie, cheat, and otherwise ignore all of those virtuous safeguards you put in place for them.

Conclusion – Your app is not your trusted friend. Never included anything in it (e.g. passwords, secret URLs, API keys, etc) that you wouldn’t want someone to see. Securing your app should be about protecting your users’ security and privacy- any sensitive operations or information that you or your business depend on being secret or done absolutely correctly should be handled exclusively on systems you own and control.

Cracking Secrets in Android Apps

As a follow up on my somewhat incoherent rant about developers hiding passwords, keys, and other sensitive information in Android apps, I wanted to go through a semi-realistic example and explain the thought behind some of these strategies and why they may not be as effective as you might initially hope.

While not a comprehensive review, we’ll take a look at the most common secret-stashing strategies (and how it can go wrong):

  • Embedded in strings.xml
  • Hidden in Source Code
  • Hidden in BuildConfigs
  • Using Proguard
  • Disguised/Encrypted Strings
  • Hidden in Native Libraries

Common Hiding Strategies

To help illustrate some of these concepts, I created an example Android app on Github that we’ll analyze in this post. The full source code is available for review, but be sure to also take a look at the decompiled source. It’s important that you appreciate the perspective of both the developer and the reverse-engineer as you look for potential vulnerabilities.

0. Including Secrets in strings.xml

As an Android developer, your first instinct is probably to include any secrets, such as an API key, in your XML resources as you would with any other assets. We’ve done just that as well in our res/values/strings.xml file:


    <string name="app_name">HidingPasswords
    <string name="hello_world">Hello world!
    <string name="action_settings">Settings
    <string name="server_password">My_S3cr3t_P@$$W0rD
  

While tidy, it’s also probably the easiest to subvert and extract. To see how we can do so, start by downloading our app’s APK- you can download manually from github or using wget from the command line:

$ wget https://github.com/pillfill/hiding-passwords-android/releases/download/1.0/app-x86-universal-debug.apk

Now let’s run strings, the go-to tool finding interesting things in binaries:

$ strings app-x86-universal-debug.apk
  …(Lots of output)

You should see all kinds of interesting values here- If you look closely, you’ll even see our key/password included:

$ strings app-x86-universal-debug.apk | grep My
    My_S3cr3t_P@$$W0rD

The strings command makes smash-and-grab style API key theft very easy. It works on all kinds of binaries- not just Android apps.

1. Including Secrets in Your Source Code

This is another common starting point for many developers tackling an API integration. To demonstrate, we’ve included a public static final String field and even a byte[] array with our hard-coded keys inside our example app’s MainActivity:

public class MainActivity extends AppCompatActivity {
  //A simple static field to store sensitive keys
  private static final String myNaivePasswordHidingKey = "My_S3cr3t_P@$$W0rD";
  //A marginally better effort to store a key in a byte array (to avoid string analysis)
  private static final byte[] mySlightlyCleverHidingKey = new byte[]{
    'M','y','_','S','3','c','r','3','t','_','P','@','$','$','W','0','r','D','_','2'

While the strings utility won’t find these quite as easily as with our XML resources, it still can work with a little more digging. Since APKs are actually compressed/zipped files under the covers, We can extact the APK contents and still find both passwords:

$ unzip app-x86-universal-debug.apk
$ strings classes.dex | grep My
	  My_S3cr3t_P@$$W0rD_2
	  My_S3cr3t_P@$$W0rD

Again, strings was able to find both values (our password string and byte array!) without breaking a sweat. We told it to look in the classes.dex file- the file that ultimately contains your compiled java code.

2. Including Secrets in Your Build Config

Another suggestion from last week’s Reddit discussion was to manage the key in the BuildConfig from the Android Gradle plugin. There’s definitely some merit to this approach since it
can minimize the risk of leaving secrets exposed in your version control system (especially important if you use a public DVCS like GitHub):

buildTypes {
  debug {
    minifyEnabled true
    buildConfigField "String", "hiddenPassword", "\"${hiddenPassword}\""
  }
}

You can then set this value in a .gitignore’d local.properties or a checked-in gradle.properties as shown here:

hiddenPassword=My_S3cr3t_P@$$W0rD

Unfortunately this doesn’t improve on the secret-in-source-code situation described above since these values are emitted as BuildConfig code. It can be inspected and extracted exactly in the same manner.

3. Protecting Secrets with Proguard

So we’re losing the battle with strings. Okay, no problem! We can just throw a little proguard at our app, have it obfuscate our source code, and it should solve our little strings problem. Right?

Not quite. Let’s take a look at proguard-rules.pro in our project:

# Just change our classes (to make things easier)
  -keep class !com.apothesource.** { *; }

We’re already telling proguard to obfuscate all of the code in our package (com.apothesource.**). I can also say with confidence that Proguard worked as instructed. So why are we still able to see the passwords?

Proguard explicitly does not do anything to protect or encrypt strings. The reason makes sense too- It can’t just change the value of a string that your app depends on without the risk of significant side effects. You can see exactly what proguard did by reviewing the mapping.txt file in our build output:

com.apothesource.hidingpasswords.HidingUtil -> com.apothesource.hidingpasswords.a:
    java.lang.String hide(java.lang.String) -> a
    java.lang.String unhide(java.lang.String) -> b
    void doHiding(byte[],byte[],boolean) -> a
com.apothesource.hidingpasswords.MainActivity -> .hidingpasswords.MainActivity:
    byte[] mySlightlyCleverHidingKey -> a
    java.lang.String[] myCompositeKey -> b

So you can see that it renamed our classes, methods, and member/field names as expected. It just didn’t help us at all when it comes to our strings problem. You can also look at the output of the compiler to see the effect of proguard. Here are the normal vs. proguard outputs on our MainActivity static fields, for example:

Normal Output:

#static fields
.field private static final TAG:Ljava/lang/String; = "HidingActivity"
.field private static final myCompositeKey:[Ljava/lang/String;
.field private static final myNaivePasswordHidingKey:Ljava/lang/String; = "My_S3cr3t_P@$$W0rD"
.field private static final mySlightlyCleverHidingKey:[B

Proguard Output:

#static fields
.field private static final n:[B
.field private static final o:[Ljava/lang/String;

Proguard does a good job here of detecting that it can replace variable names and even inline our password to make it a local variable. When you inspect the generated method implementation, though, our password is still there in raw form:

.method public b(Ljava/lang/String;)V 
  
  move-result-object v0
  const-string v1, "My_S3cr3t_P@$$W0rD"

While not a silver bullet, Proguard is still an important tool if you intend to prevent reverse engineering. It is highly effective in stripping valuable context like variable, method, and class names from the compiled output, making detailed analysis tasks much more difficult. If you’d like to compare the decompiled outputs of a proguard vs non-proguard protected application, we’ve included both version of our app on Github.

4. Hiding Your Secret Strings

Since proguard isn’t hiding your strings, why not do it yourself?

You can hide secret strings by transforming though various encoding or encrypting methods, base64 being a very common one. In our app, we do this through some lightweight XOR operations:

//A more complicated effort to store the XOR'ed halves of a key (instead of the key itself)
private static final String[] myCompositeKey = new String[]{
  "oNQavjbaNNSgEqoCkT9Em4imeQQ=","3o8eFOX4ri/F8fgHgiy/BS47"
};

This is still our My_S3cr3t_P@$$W0rD secret- We’ve just done some hiding by XORing the value with a randomly generated value. You can inspect the simple HidingUtil implementation if you’d like to see how this value was generated. Note that while this naive method generates a random XOR key for each call, there’s no reason you couldn’t use the same key for all values in your app that you’d like to protect.

When you’re ready to use your ‘hidden’ key, you simply reverse the process:

public void useXorStringHiding(String myHiddenMessage) {
  byte[] xorParts0 = Base64.decode(myCompositeKey[0],0);
  byte[] xorParts1 = Base64.decode(myCompositeKey[1], 0);

  byte[] xorKey = new byte[xorParts0.length];
  for(int i = 0; i < xorParts1.length; i++){
    xorKey[i] = (byte) (xorParts0[i] ^ xorParts1[i]);
  }
  HidingUtil.doHiding(myHiddenMessage.getBytes(), xorKey, false);
}

While not terribly clever (or optimized), this is a step in the right direction since this effectively neuters the strings-based analysis. This effectively forcing anyone still analyzing your app to now dive deeper, normally involving 1) studying your app’s compiled output to figure out your hiding scheme, and/or 2) attempting to patch your app. The bad news is that neither is particularly difficult to do.

4a. Studying Smali Output

Smali is an assembler/disassembler for Android’s dalvik VM. It disassembles compiled Android dex code into a human-readable syntax. Utilities like APKTool build on smali resulting in a powerful tool to inspect compiled applications, including those from the Google Play Store.

Consider again, for example, our useXorStringHiding method that combines the XOR key components that we described above. Now compare that with the smali instruction generated from APKTool. There are important clues that can quickly indicate our strategy for hiding strings, like our loop to XOR the values:

:goto_0
  
  array-length v5, v3
  if-ge v0, v5, :cond_0
  aget-byte v5, v2, v0
  aget-byte v6, v3, v0
  
  xor-int/2addr v5, v6

  int-to-byte v5, v5
  aput-byte v5, v4, v0
  add-int/lit8 v0, v0, 0x1

  goto :goto_0

Even if you’re not fluent in reading the generated instructions, simply knowing that we have an XOR operation involved gives us 90% of what we need to start pulling things apart.

4b. Patching Binaries

Let’s say I didn’t want to or couldn’t figure out the encoding scheme by just studying the above output. What other options do I have?

Plenty. Let’s say that we’re not able to figure out the above loop, but we are pretty confident that the key we want is available at the end of the loop:

invoke-static {v0, v4, v1}, Lcom/apothesource/hidingpasswords/HidingUtil;->a([B[BZ)V

Instead of trying to figure out what permutations we take along the way, we can simply modify the generated instructions to log the values out to the console at the end. While I won’t try to cover all of the nuances of patching binaries here, rest assured that after patching our app with the new logging statement, every key that passes through this method will be dutifully written out to the console, negating all of our hard work.

5. Native C/C++ JNI Secret Hiding

The strategy of moving sensitive operations out of Java and into native libraries was a common mitigation suggested in the /r/androiddev discussion. It certainly is one of the more effective strategies to thwart reverse engineering attempts since it adds several layers of complexity. To demonstrate this approach, our app includes JNI calls to a C custom function that XORs our keys just like we did in our Java-based implementation. The native/JNI hook is in the HidingUtil class:

/**
  * Our hook to the JNI hiding method.
  * @param plainText Text to hide (XOR key is hard-coded in the JNI app)
  * @return A {@link Base64#encode} encoded value of (plainText XOR key)
 */
 public native String hide(String plainText);

 /**
  * Our hook to the JNI unhiding method.
  * @param cipherText {@link Base64}-encoded text to unhide(XOR key is hard-coded in the JNI app)
  * @return A string with the original plaintext (cipherText XOR key)
 */
 public native String unhide(String cipherText);

The C-source for the function isn’t terribly interesting- It’s a C-language rehashing of the our same XOR-based Java functions.

As expected, decompiling the output doesn’t yield anything useful:

.method public native hide(Ljava/lang/String;)Ljava/lang/String;
.end method

.method public native unhide(Ljava/lang/String;)Ljava/lang/String;
.end method

Our native code compiles into platform-specific SharedObject (or .so) libraries. This additional layer of protection comes at a fairly high cost though, especially you’re not using JNI hooks already. Builds and testing becomes significantly more complicated and standard troubleshooting/crash analysis tools won’t work at this level.

Even if you are comfortable attaching a JNI interface to your app for this purpose, it’s also important to remember that it is still not foolproof. Our naive implementation of the C-functions is vulnerable to the same tool that originally gave us such heartburn initially: strings.

$ strings libhidingutil.so | grep My
  My_S3cr3t_P@$$W0rD

Back to where we started!

To be fair, I’m not implying that this is the end of the rabbit hole- you can add layers of indirection and string hiding in the native library as well. Just remember that native libraries have their own reverse engineering tools. So long as you hide secrets in the bits you give to your users, rest assured that someone is out there patiently trying to extract them back out.

Summary

The best way to protect secrets is to never reveal them. Compartmentalizing sensitive information and operations on your own backend server/service should always be your first choice. If you do have to consider a hiding scheme, you should do so with the realization that you can only make the reverse engineering process harder (i.e. not impossible) and you will add significant complication to the development, testing, and maintenance of your app in doing so.