Things a Devop Needs to Know – Part I

This entry is part 1 of 4 in the series TMQ - Less Words, More Facts

This article has been republished to:

https://hackernoon.com/back-end-performance-those-metrics-we-care-about-ade678e87969

Boss Asked a Monitor System, I Did an APP

This entry is part 3 of 4 in the series TMQ - Less Words, More Facts

original link: http://tmq.qq.com/2017/07/app-2/

Recently I engaged in a new project that requires a backend API monitoring system. It is an Android project enlisting a chatbot and HTTP is considered not applicable as the communication protocol.

IMs such as Mobile QQ, Wechat use their own private binary protocol without published specification. So it is hard to simulate the network communication without direct support from their dev team.

Likewise, the new project adopts a proprietary protocol called “Harley”. Technically, I need to simulate the “Harley” protocol for the purpose of backend API monitoring.

Harley Protocol

Catering to the mobile environment, Harley provides fully optimized network API layer, real-time push notification, common resource downloading as well as incremental self-update to all sorts of mobile Apps.

Harley SDK is optimized for interactive traffic (high frequency and small volume). The business backend supports HTTPS, TAF or jce based protocols.

As Harley is mobile oriented, the official SDK supports only Android and iOS.

Traditional Method

Traditionally, a monitor system runs on PC, simulates the network request and verifies the response.

Applicability

  1. PC based system is convenient for standard protocols (e.g., HTTP, SMTP etc.) as there are various open source components
  2. On the other hand, it is hard to develop a system based on proprietary protocol. In our case, I have only Android and iOS SDK for Harley.

Pros

  1. PC based monitor system is stable
  2. PC is performant
  3. PC based system is extensible

Cons

  1. It requires a substantial development effort (translator: that is, re-program the whole SDK)
  2. The runtime is different from the real world environment, on which APP runs
  3. The code base of monitor system is segregated from that of the real product, and it is costly to synchronize them timely

Brood Something New…

In fact, the 1 and 3 issues are the major ones for the traditional PC based plan:

  1. develop effort: the development of a Harley simulation utility is so big that developers are reluctant to take the task
  2. maintenance effort: once backend change/upgrade the protocol fields, I need to adjust the simulation utility as well

So I decided to brain storm with myself:

  1. Why do we need a backend monitor in the first principle? Basically, backend API monitor is a periodically triggered automatic test to check the functionality of backend in real time. So we can notice the problem within 10 minutes.
  2. Why backend API monitor can not be substituted with UI automation test?: UI automation test takes too long. And the stability and coverage are too limited to real-time monitoring.
  3. Is mobile-end can only be used for UI automation? The performance and extensibility of mobile-end (Android and iOS) are equivalent to PCs.
  4. Backend API monitor enlists pure network interaction and does not involve any UI, is mobile-end still applicable?: I can’t see any reason why it is not as an APP of pure network interaction is enough.
  5. How about development effort? We can use the SDK directly. If the product uses MVP, we can reuse the code of Model and Presenter layer directly.
  6. How can the monitoring charts be displayed on a small phone? There are numerous ways to transfer data from a mobile phone to a PC. We can transfer the result to PC for display purpose.
  7. We need to execute periodical tasks in the long run, how about stability? No one has done this before so we do not know. If a real device is not stable, theoretically we can use a simulator.
  8. Is problems reported from mobile-end can be easily pinpointed? Of course. We use the product code directly so if there is any problem, most likely it is the problem of the source code itself!
  9. If the system is implemented in mobile-end, is it hard to maintain and upgrade? Whenever the product upgrade,  we copy the code directly:)

After the brain storm, I decided to give it a go.

Finalized Plan

By largely copying the real code from the target product, I accomplished an architecture for the monitoring system as follows:

Mobile-end layer

As recommended by Google, MVP is used in most Android development.

Model: Data layer. Different from MVC, in MVP, Model is resonsible for database r/w., network request/response etc. So I copy all the source code of this layer.

View: For UI presentation. Do not need.

Presenter: This is the logic layer bridging View and Model layers. In MVP, Model does not communicate directly to View. Instead, Presenter fetches the data from Model, processes the data and deliver the result to View for presentation. This way View and Model can be decoupled. I copy this layer selectively.

Theoretically, the copied source code = all Model + portion of Presenter.

“Theoretically” is far from enough. I need to categorize the specific source code:

Finally, copied source code = NetService (all)+ Manager (portion)+ Engine (portion)

App Implementation

1.Jce module:  Jce is a Tencent component used for defining the internal communication protocol, i.e., fields used for RPC. Please refer to the relevant open source projects of Tencent (translator: the Tencent version of Protocol Buffer)

For example, the request message is defined as follows:

2.The copied source code includes: Harley SDK init, packet assembly, network request, packet check.

Initialization: (from NetService module)

Network request: (from NetService & Engine)

Response check: (from Manager)

It involves sensitive information so is omitted here.

3.adb communication: we expose Activity (DO NOT expose Activity in ordinary APP for security reason)

Receive boot instruction from adb; start thread; send request

The relevant adb command:

4.log the result:

PC Layer Implementation

Easy as 10s lines of python.

1.Trigger task:

2.Capture log using adb:

Cautious:

Here, I use multi-processes instead of multi-thread for the following reasons: log capture is a blocking operation, if not terminated, it will keep on running. However, there is no thread abort instruction in python.

The return values, Poplog and f, represent the opened processes and files. They are used to terminate the process the close file respectively.

3.Log analysis:

1) check if the test is finished: “test_is_end”

2) check the result: it mainly analyzes the log, and check the critical fields. It involves sensitive information so is omitted here.

4. Report & alert

Generate report based on the data from step 3. Alert if any abnormal is noticed.

5. Scheduling

It is worth noting that the script can be used directly by jekins and other CI platforms.

 Result

  1. Efficiency: Each case (for one request), which involves network request + log printing + log capture + log analysis, finishes in seconds, which is sufficient for the purpose of backend API monitor.
  2. Stability: Initially a physical phone is used. After some incidents of unstable USB, problematic battery, a simulator is used instead.

Based on the test, the X86 Android 7.0 simulator is the most stable and performant version (host: i5 + 8GB + VT). The simulator is configured as follows:

To conclude, task accomplished!

Postscript

In this case, I summarized the methodology when encountering multiple options with different risks, pros and cons:

Critical points:

  1. The first thing we need to understand the END GOAL. For the API testing, the end is quality. Automation, backend API test is pure means, not ends.
  2. We need to investigate feasible as well as infeasible options, as an intuitively infeasible option might turn out to be the optimal option.
  3. We need to enumerate concrete reasons why an option is not feasible, we need to check repeatedly whether those reasons are true.
  4. With the lens of the END GOAL, we noticed that sometimes we can not find the optimized plan, simply because some else have done it before in a less efficient way; Sometimes we think a solution is not possible simply because no one has done it before. This is how a case becomes a stereotype. So we need to keep in mind that as technology advances, things are changing.
  5. Think when confronting a problem. Invite others in the brainstorm if necessary.
  6. Record the thinking for retrospection.

Post-postscript

In this monitor system implementation, the benefits and the prerequisites of the practice in used can be summarized as follows:

benefits

  1. Less code as most of the source code is copied directly
  2. With the deepened understanding of the source code, I can tell the bugs in source code directly when talking to developers.
  3. The client logic (in Model and Presenter) is verified while we are monitoring backend API. One stone, one Goliath, and one bird.

prerequisites

  1. The client source code complies to MVP.
  2. Testers can read, compile and debug the source code.

 

 

 

 

Passionate Test – The Expedition of Legion of Smoke

This entry is part 4 of 4 in the series TMQ - Less Words, More Facts

original link: http://tmq.qq.com/2016/07/the-passion-test-smoke-regiment-expedition/

I. The Origin of Smoke

The test team has adopted smoke test for a long time. After we restructured in FTs(feature teams), smoke test is managed by each FT.
In the background of fast iteration of versions, it is hard to parallelize test of multiple versions and to release in time using the orthodox process of “ready for test->in test->regression test”.
Thus we decided to try smoke test to alleviate the testing pressure as well as enhance the testing quality.
One day I said: “I will treat the guy a meal if he finds more than 10 bugs in a smoke”, then the smoke started.

(N.g., a feature team is grouped with roles responsible for the same feature module, which includes developers, product managers, testers, ops, designers etc.)

II. Everyday Scenes

1. There are 23 more “testers” joined our team!

Every day on the 10th floor of Nantong building, we can see a bunch of developers, ops, and product manager turned into testers, tapped on their phones with smile, and recorded bugs one by one.

“Another crash, it’s developed by A”

A: “gimme the phone, I’ll collect the crash log”, then A grabbed the phone and rushed to his computer……

Before the smoke test finished, A came back: “solved, it is because of of#$%^&&*”. People looked up to him while thinking “I’ll give you another one”……

2. The minimal adaptation base

Dev A: “A UI component misplaces in my phone, and the text is overlapped!”

Tester B: “This plugin can not be turned on in my phone”

Product manager C: “Why everything is normal in mine”

Smoke organizer: “Record those adaptation issues.” The rich product manager C pulled out several phones from his pocket, “try them”……

3. Bugs explosion even before the product is ready for test

After half an hour, there was a full page of bugs recorded in the laptop on the pool table. Tester clicked on the “import” button, the corresponding developers received the bug notification.

Developer A: “I will code this portion of logic, so I can fix all the bugs along the way. Good to know them beforehand!”

Developer B: “I found several bugs of my own, will fix them before anyone noticed.”

Tester C: “The developers are going to fix the bugs before the test, I will worry so much less.”

Developer D: “So many bugs……”

4. Everyone is product manager

After a lunch, the product manager felt pleased when he saw developer D was busy, as he always does until D looked up to him:”the bugs are all reassigned to you”

“What…?” The product manager rushed to his computer, opened the 10 unread tickets. He found all of them are UX issues.

5. A sad history

Pitfall 1, (smoke) test release

Smoke organizer: “We are going to have a smoke test today, could you release a test version please?”

Developer A: “Sure thing. I’ll do it now”

After 15 minutes, “How is it going?”

Developer A”The release platform has problems, I have to solve it first”

After 15 minutes, “How is it going?”

“XXX just submitted some code, I will release it again”

……

Pitfall 2, I am not a born leader 🙁

Smoke organizer: “Let’s install the App and start smoke”

Developer A: “OK, after I fix this issue”

Test B: “OK, after I finish this test case”

Product C: “Sure, could you install it for me please?”

After 10 minutes, everyone is doing their own things on their own seat……

Pitfall 3, I will take the Karma I started

Initially, we used an A4 to record bugs. And we testers have to record tens of bugs manually like a file clerk. Moreover, we have to do the regression test for all the bugs generated by us in the normal test as well as smoke test.

6. Countermeasures

Countermeasure 1

We rotate the developer responsible for the test release. And we start the release process 30 minutes before the smoke.

Countermeasure 2

We rotate the smoke organizer, so everyone can understand the difficulty of the organizer and not fixate only on his own matter.

Countermeasure 3

We use a template to record bug on a laptop, which can be imported to TAPD (translator: the Tencent internal version of JIRA) with one click. Moreover, the bug reporter is responsible for doing the regression test himself.

Then we testers are really happy.

After a few iterations, smoke test is now efficient and pleasant. I will list the tips and reflection during this process.

III. Benefits

a) UX flaws: Dogfooding and show case is a bit too late. An early stage smoke test can expose lots of UX issues.

b) Adaptation issues: With more people involved, more kinds device can be covered.

c) Bug shooting: A lot of bugs can be found at pre-test stage.

d) More perspectives: With various roles involved, we can get suggestions of issue fixing as well as optimization from different dimensions.

IV. Our Approach

Like “accurate test”, smoke is not the silver bullet and has inevitable drawbacks. For instances: the validity of the bug discovered,  repeated bugs, the time cost of a developer to deal with a large amount of bugs, etc.

As aforementioned, we faced some problem in practice indeed. However, honed with several iterations, every FT has recognized the value of smoke test, and agreed that the advantages of smoke test outweigh its disadvantages. The whole team is becoming more and more active in smoke test, and developers sometimes propose a round of smoke themselves.

I list some of the statistic data and the improvement process of smoke test in Tencent Mobile Manager 6.2 for your reference:

1. Bug valid rate

Valid rate: 64%

Advice: Better wrong than missed

Explanation: Despite the relatively low valid rate, we still encourage people to report new bugs. Developers and product managers can evaluate them.

2. Repeated bugs:

Explanation: The figure below gives the rejected bugs in smoke test, repeat rate is relatively high.

Advice: Generally, we consider the more times a bug repeats, the more critical it is since more people noticed it.

3. Optimization approach

It is critical that all roles put their thoughts in the practice of optimization.

Smoke guide:

1. Testers list the testing routes for important features.

2. Developers mark the routes that deserve more notice in accordance with the newly submitted code.

3. Product managers highlight the features that reflect the spirit of the product.

Bug record:

1. mark the issue type (bug, UX, logic, requirement, etc.)

2. format the content for the import

3. record bug right after the test.

The voice of the crowd:

“One thing. I noticed that recent issues are all bug. Just wanner remind that we can focus more on UX issues. And please use the App from the point of view of a common user”

“Another one. We have recorded a large volume of issues. Do you guys think it is a good idea to categorize them with marking their type (bug, UX, logic)?”

“Better ask developers to highlight the features that deserve more notice”

“Yes. Every time when you commit code to SVN, we need to think the test route in the tester’s point of view”

V. Smoke In Practice

Test preparation:

1. device: a laptop for bug recording

2. time: 5:30 – 6:00 afternoon

Developers:

1. rotate: each time we appoint a developer to release the test version

2. archive: archive should be ready in RTX (translator: the internal slack or hipchat of Tencent) before 5:30

3. collect: collect the testing routes (provided by testers and developers as aforementioned)

4. rendezvous: at 5:30

5. guide: developers highlight the important testing routes

Testers:

1. collect: collect all the bugs and batch import to TAPD

2. regression: print out the bugs (from previous rounds) that require regression test, and remind the corresponding person to conduct the regression

3. prioritize: with the assistance of developers, prioritize the bug reported

4. solved bugs: keep note of the solved bugs in regression

5. guide: testers highlight the important testing routes from the testers’ point of view

Products:

1. showcase: during the smoke test, product managers will answer the questions regarding UX. Reasonable feedbacks are recorded directly to TAPD.

How to record:

1. Title format: [version+module+smoke] [BUG/UX/logic/requirements] ***-reporter

bug [6.4 throughput App Smoke] [UX] The spinning-loading-bar requires optimization. It is not distinct and is ugly.

2. Import to TAPD

VI. Practical Tips

Today I’m the lead: rotate the developers to be the lead to enhance their proactivity and understanding

I’ll finish it today: record all the bug on the same day, so developer can react faster

Transfer to product manager in 1 second: UX issues are transferred to product manager instantly. The product manager then evaluate those issues and change the requirement if necessary

Better wrong than missed: regardless how the issue seems to be, big or trivial, record it

Faster regression: fast regression without leftover

Big and full: collect the phone model of everyone and provision devices that are missing

Foodie: prepare some snacks as reward

VII. Conclusion

In the background of agile development and fast iteration, smoke test can provide means for other roles to focus on the quality. It is an efficient supplementary mean to normal test as well. The optimization process requires all people’s involvement to produce good result in practice.