Hack Reactor | SDC | Journal Entry 6

Posted by Calvin on February 04, 2023 at 2:43 PM

Overview
DBMS and API Server Stress-Testing / Optimizations Part 3

Challenge/Motivation
Local server stress testing metrics and results after adding indexing to my MongoDB database

Actions Taken
My local server went above the SLA of 2000ms surprisingly early in my testing, with some responses taking close to 15 seconds and some not even firing (showing zeroes in the metrics row), likely due to the fact that I was querying the end of my database with more than 1 million entries:
10 RPS
at 500 RPS my computer froze and had to be restarted:
500 RPS
After adding indexes to both the product_id and style_id (results.style_id) in my database, I only went above the 2000ms SLA after getting to 1100 RPS:
1100 RPS

Results Observed
I was surprised at how much of a positive impact indexing had on my stress tests for my local server, going from a failure point of 10 RPS to a failure point of 1100 RPS, which just goes to show that even simple optimizations can make a huge difference. I decided to keep my local stress-testing and analysis relatively simple, as the most important metrics will be post-deployment, so my next few journal entries will be related to the deployment process followed by stress-testing the deployed server.

0 Comments

Hack Reactor | SDC | Journal Entry 5

Posted by Calvin on January 26, 2023 at 11:55 PM

Categories: hack reactor SDC

Overview
DBMS and API Server Stress-Testing / Optimizations Part 2

Challenge/Motivation
Researching and implementing a way to perform stress tests on my server, locally. Make changes to optimize, then test again. Repeat until response time is greater than 2000ms or error rate is greater than 1%.

Actions Taken
I decided to use K6 at the suggestion of our Technical Mentor. New Relic is also a popular choice, but was not recommended due to the fact that it is prone to the observer effect, in the sense that the logging of data can cause make your tests less efficient, and apparently has some memory leak issues.

In order to test my local server (as opposed to the deployed version) I had to download and install k6 on my machine, and will be creating and running the tests via JavaScript files. The first suggested test is called the "smoke test", which is just a simple test to make sure your server is returning the expected data for a very small amount of load. I set it to test a single server route, products/1000000 (using a larger product ID so as to test data at the end of my Mongo database), looping a single request for one minute, at 1 request per second, and got the following results:

I made notes of data points that will be the most useful for determining whether changes made had positive (or negative) affects on the performance of my server, such as http_req_duration (total time for the request), http_req_waiting (time spent waiting for the response, otherwise known as "time to first byte"), and http_req_failed (the rate of failed requests).

Now that I know that my server routes are sending back the correct data and status codes in the smoke test, I can start to create some more robust tests that gradually scale the number of requests per second until my response time and/or error % goes above the SLA.

Results Observed
I was happy to get my first k6 test working, even if it is only a smoke test so far. My next entry will be related to my first load test, which will allow me to run tests in various stages (for example from up to 100 users in the first 5 minutes, stay at 100 users for the next 10 minutes, and then ramp down to 0 users in the last 5 minutes).

0 Comments

Hack Reactor | SDC | Journal Entry 4

Posted by Calvin on January 24, 2023 at 11:05 PM

Categories: hack reactor SDC

Overview
DBMS and API Server Stress-Testing / Optimizations Part 1

Challenge/Motivation
Our goal for DBMS optimization is to make sure all of our read queries complete in under 50ms, including queries for items at the end of our database (or in my case documents at the end of my collection). For our API server, we needed to test the requests per second (first locally, and then again after deployment), scaling up to (and possibly over, depending on the hardware your server is running on) 1000 requests per second, to the point where your server begins to fail. Following testing, I will optimize both the database and server as best I can and then do some more testing so I can reflect on the improvements.

Actions Taken
DBMS
Although I had already added some indexes to make the ETL process move quicker, I decided to delete the indexes so I can really see how effective MongoDB indexing is at optimizing queries. Without indexing, a few of my queries were reaching close to 1000ms, which is way above the 50ms goal:

Results Observed
After indexing, every test (except for the first, which I suspect is due to either my testing server or mongoose connection loading up) was under 50ms:

Adding indexes to a MongoDB database seems to improve query speed by tenfold, which is a pretty good return on investment for only a few seconds of work. I'll talk about server stress testing and optimization in my next entry, as I still have some research to do on the best methods.

0 Comments

Hack Reactor | SDC | Journal Entry 3

Posted by Calvin on January 24, 2023 at 10:27 PM

Categories: hack reactor SDC

Overview
REST API node.js Server Testing with Supertest

Challenge/Motivation
My testing goal was to test happy paths and sad paths while making sure that my code coverage percentage was as high as I could get it without being redundant or testing unnecessary code.

Actions Taken
I decided to use the Supertest assertion library, which allows you to run a testing environement server without starting the actual server. For Setup and Teardown, I added a mongoose connection to my database in the beforeAll function, and made sure to close the server and mongoose connection in the afterAll function.

I made a test for each GET and POST request my server accepts, including two "sad path" tests for each request: A product ID that does not exist (0) and a product ID of the wrong type (string), to make sure that my error status codes and messages are correct in the responses. I also created a test specifically for the case where a product has no styles associated with it - due to the front end we were provided not accounting for this case, I had to either add some dummy style data to the database, or append it to data before sending the response, and I decided that adding it to the server response would the simplest solution.

The only two lines of code I was unable to cover were the error responses for the GET and POST requests to my /cart route, as there isn't really any kind of data that would throw an error (the data format in the POST request is hardcoded in the front end code), and I ended up with 95% coverage, which I am happy with.

Results Observed
Supertest was easy to implement and I definitely recommend it as an assertion library for testing server and API routes/responses.

0 Comments

Hack Reactor | SDC | Journal Entry 2

Posted by Calvin on January 17, 2023 at 8:44 PM

Categories: hack reactor SDC

Overview
Performing an ETL process - CSV files to MongoDB

Challenge/Motivation
Now that I've decided that my primary database will be MongoDB, my next tasks were to design the schema and populate the database with data from provided CSV files, a few of them with more than 1 million lines of data.

Actions Taken
Because the data given by the previous API was JSON and MongoDB uses BSON for its document structure, creating the schema a pretty quick process. I decided to keep all of the product data in a single collection called 'products', and the cart data in a separate collection called 'cart', as it made sense to me to keep these separate since only the cart collection would need to handle POST and DELETE requests. Once my schema was fleshed out, it was time to do some research on best practices for ETL.

Although I wasn't sure exactly what my ETL code would look like yet, I knew there were a few steps I would likely need to take in order to complete the ETL process:

Read the data from the CSV file(s) using JavaScript/Node, likely line by line.
Get each line of data into a format that can be used in an update query with mongoose and sent to my mongo database.
Repeat this process for each CSV file.

I started out by doing some research on a node library I could use to read the CSV files and translate each line to JSON, and I came across the library csvtojson. From there, I was able to get each line of data into a JSON object with the headers as the key and the row data as the values, and then it was simply a matter of using a mongoose query to add the data into my MongoDB collection.

My first roadblock was that node seems to only be able to handle 1.5GB of memory usage, and reading the entire CSV file at once went over this threshold, which gave me a memory error in the console. I needed to figure out how to only hold each line in memory rather than the entire file, and after some research discovered that I could use node's createReadStream() method to get the file data, csvtojson to format the data into JSON, and then some well-placed pause() and resume(), which seemed to solve the memory issues I was dealing with.

The second roadblock I reached was that when I was not simply creating documents and adding them to a collection but rather updating documents that I had already created (I needed to first add the products.csv data as documents in my products collection and then add data from the other CSV files to these documents) the process was incredibly slow. There was one point where I had allowed the code to run for 2 hours and only gotten through 200,000 documents, with 800,000+ to go. I felt like there were probably things I could do to speed up this process, and that was when I did a little research and learned about creating indexes for my MongoDB database.

Creating an index on a specific key in the database tells MongoDB to create a sorted data store of the key and its associated values, which allows for much quicker look ups. I created an index for both the product_id and style_id in my collection, and immediately my ETL process started going faster. I even created these indices in the middle of some currently-running ETL code (the aforementioned 2 hours for 200k documents code) and less than 30 minutes later I had already processed approximately double that amount. Some of the updates still took quite a long time, such as updating the 1.9 million lines of photos related to each style_id in my collection, but things were still going smoothly. After many hours of processing CSV files, I finally had all of the required data in my database, and from there it was only small tweaks.

Results Observed
Overall, I enjoyed the challenge of the ETL process and became much more comfortable with mongoose and MongoDB queries, and I'm excited to see the fruits of my labor once I get the Front End Client we selected working with my data.

0 Comments

Newer posts Older posts