Sunday, May 10, 2020

Entity Framework (EF), SQLite and UWP

What I learned by using Entity Framework (EF) in a UWP application

I'm writing a super nifty e-book reader program (because, why not?). Along the way I discovered
that Entity Framework (EF), the easy-to-use object-relation manager layer, is in fact a poor match to UWP.

How so? Let me count the ways!

Bundled and non-bundled SQLite

Let's start with this page: Getting Started with EF Core.  In it, we discover that there's two ways to use SQLite: we can either use the SQLite that's bundled in with Windows. Or we can, for no good reason, bundle in a copy of SQLite into our app.

Except: using the bundled SQLite, is actually a terrible, terrible idea. Why? Because thanks to other libraries, chances are you'll already have a Sqlite in your app. And using the NuGet package to use the bundled Sqlite in fact just causes validation errors down the line when you switch to Release mode to make an actual release.

Tracking and Non-Tracking

Moving on to tracking: when you build an EF query, you can be either tracking or non-tracking. The value to non-tracking is described here: a non-tracking query is faster. Except that it really, really isn't, especially in my scenario.

Here's the scoop: I'm building a book database. There's a list of books; each book has some "flat" per-book information (like a string for the title) and also has both embedded information (like an entire class with the book's current download status) and lists of embedded information (like the set of people associated with the book aka the author, publisher, illustrator).

When you make a query for books, you only get the "flat" data. If you want the embedded classes like the download data, you have to include that in the query with a ".Include(DownloadData)" in the query.

When I search, I want the search query to be as slim as possible. Every embedded thing that's added just makes the search slower (and it's already too slow!). But EF has a hidden saving grace: you can do a query with a "slim" search, and then for individual specific items, do a thicker query. EF will automatically return the same actual object for the thin and thick queries, just with most stuff filled in.

That's a total win! I can slim down my queries as much as possible to find books for the user, and then plump them back up when I'm displaying specific individual books.

Yeah, but remember that advice about tracking and no-tracking? Turns out no-tracking queries don't do any of the above. And they aren't actually any faster (at least as far as I can tell).

Release Mode is Really slow

This took a long time to figure out: for a while in my e-book app, pressing the "next page" button was slow. And it was 100% my fault; I did a little bit of investigation (following several red herrings along the way!), and made it totally zippy.

Then I compiled in Release mode. Release mode is normally much faster than Debug mode, but (surprise!), that's now how EF rolls. Release mode is very much significantly slower than Debug mode.

There's tons of information about this on-line, most of it too old to be useful.

And then I discovered that my next page button was taking 15 seconds and spiking the CPU to 25% usage. That's completely unacceptable for an e-book reader; next page should be 0.1 seconds at most. My assumption, of course, is that my fixes (see earlier) were the cause, but they weren't.

All of the slowdown was simply saving the current book position. It turns out that one of the things that really slow in Release mode is saving database changes to the database. Like, 15 seconds slow.

Change Tracking (INotifyPropertyChanged) is fast

The SaveChanges advice is a real winner. It turns out that SaveChanges is super slow in release mode even though it's blink-of-an-eye in debug mode.

Using the default SaveChanges strategy, AFAICT EF has to keep a shadow copy of all object, and then when you do a SaveChanges it has to traverse everything to see what's changed. Then it can write out just the changes.

I don't know why this is fast in Debug mode and slow in Release mode.

With the change, EF can track all of your object changes; database updates are then super fast.

The only downside is that the advice doesn't actually say how to set the ChangeTrackingStrategy. You do that by created an override of the OnModelCreating method in you DbContext class.

   protected override void OnModelCreating(ModelBuilder mb)


Understanding that there's queries and there's queries

A surprise (to me) in EF is despite this not being documented, it's really important to know where a query is happening. EF is an abstraction layer on a database; you have to know what is happening on the database (an .AsQueryable()) versus what's happening in the EF layer (an .AsEnumerable()).

Need to make sure that your search is returning a "plump" object? That has to happen on the Database side, not the EF side. But need to do a complex search (like people do!)? That's strictly on the EF side of the house. Searches are faster on the Database side, but on the other hand, they are also super flaky.

But wait, it's just a little harder. EF converts queries at run-time. It's easy to make a LINQ query that looks right, and matches a simple, intuitive understanding of how everything is working, but will fail at run time.

What's all this about a flaky search?

Sometimes people want to find stuff by an author ("Dickens"). To do that, you make a search string ("%Dickens%") and then use EF.Functions.Like(b.Title, likeSearch) for the compare. Unless, of course, your search string is blank; if you do that, then the likeSearch string is plain "%%" and for some reason, EF (or maybe SQLite) really, really hates search strings like that.

As matching for "%%" is hard to understand. Yes, in general you shouldn't do weird searches. But this one is pretty simple and unambiguous.

How big is my database?

The main book database starts off with 60K entries (because I'm including the entire Project Gutenberg catalog! It's awesome!). Each entry has 6 sub-entries (list of People, list of Files, Review, DownloadData, NavigationData and a list of Notes).

A typical book has about 4 People and 12 Files associated with it. The database when compressed is 32 megabytes and expands to 158 megabytes when installed.

Getting started: making the database

To make your first database, EF needs to scan your app, looking for the data context so it can build a bunch of scaffolding. This doesn't work with a UWP app. Your solution needs a second project which exists only to be a regular .NET app that EF can figure out.

Yes, this sucks.

Things to avoid: splitting the database

My e-book program has a big, read-only database of all Project Gutenberg books. First thought: use two database files, one for stuff that really is read-only, and then use a different database file for everything that the user enters.

So a book title, and the files that can be download is database #1, and the notes the user wrote is database #2.

Don't do this. EF has no support for it, and it will only make your life more complex without being actually useful. Stick with a single database.

TL/DR: things to watch for with EF, SQLite and UWP

  1. Run in Release mode early; it will catch surprise performance issues
  2. Track which things are Queryable (database) and which are Enumerable (EF)
  3.  Use the embedded database, not the built-in database
  4. Don't bother with .NoTracking
  5. Absolutely use INotifyPropertyChanged and ..Changing to go fast
  6. Complex searches and queries should be run on the EF side

No comments: