[PSP]Project Diva 2/Extend CPK repacking? by MrFreeman at 9:38 PM EST on December 1, 2011
Greetings,

I'm not sure if this is the right place, but I do know that hcs has created an extractor for almost all CPK archives that exist. I was wondering if there was specific documentation from the format aside browsing the source code.

hcs's program appears to be modular enough to find the TOC and follow it's rules in order to extract the files, but it seems the TOC is larger, and even the entire header is larger than CPK files from other games.

What I'd like to accomplish is create a repacker in order to create user-end, importable DLC for the game. Until other formats, such as camera movement and other gameplay mechanics are fully understood, levels from either game can be exported and imported into eachother, giving Diva 2/Extend users a chance to play Extend/2's levels.

While I've documented some of the formatting, I'd like to see if there's anything currently available to see if I missed anything before I continue to dive into the project.
by hcs at 3:19 AM EST on December 2, 2011
This was recently discussed in another thread.

There I provide the original notes I wrote up while I was RE'ing the files. They're really more about .csb than .cpk, though. If you have any questions about the code I'd be happy to help.
by MrFreeman at 1:37 AM EST on December 3, 2011
Thank you very much! That kind of helps, but I'm much more of a visual person, and put this little chart together of the information I currently have regarding the Project Diva 2/Extend TOC.

Edit: Fixed some information in the chart.

Edit2: Discovered what some more values were. It now looks as if all I really need to figure out is #19. I'm going to assume #17 is some sort of constant that increments, which can be faked easily, especially if all the values in the first File Information chunks start as 0x60.

One TOC chunk down, two to go, I'd say. Unless you might see something here which is obviously wrong.

edited 3:47 PM EST December 3, 2011
by MrFreeman at 12:39 AM EST on December 6, 2011
It appears I can only edit posts in my own thread so many times before it makes me post again. THIS is the most current TOC chart.

I'm not sure I assumed why #16 was not used and why I didn't realize #19 was still a part of a file's information.

It seems there is no particular formula for the File Order numbers, except to create some kind of new array list for memory use in the game. I'll have to ask codestation to confirm this for me.

For example: Diva2ExSound.cpk:
First file name: BGM_EDIT_00.aix
File Count Number: 37

7th file name: BGM_PV_37.aix
File Count Number: 0

Since there is no formula to this, I will need to create an internal list in the program with each file's order number and filename, to ensure file repacking will be successful.

DLC will be an exception because it looks like it follows a formula for what files have specific File Order numbers.

edited 12:50 AM EST December 6, 2011
by hcs at 1:44 PM EST on December 6, 2011
I think it is more useful to recognize the overall database table structure being used, and from that things make more sense. Look at the "general table structure" portion of the description, and consider running utf_view (needs the offset of the @UTF) to get a feeling for what the tables look like. That should fully explain most things, or at the very least it will give you the names and types of all the fields.

I think you will find that besides the TOC at the start of the file, there may also be ETOC, ITOC, and GTOC tables elsewhere, which may or may not also be important to modify if you are regenerating the file. The master table at the top of the CPK describes where these are located. I highly recommend giving utf_view a go at these tables.

edited 1:54 PM EST December 6, 2011
by MrFreeman at 9:04 PM EST on December 6, 2011
While I've always been using the "general table structure" as a guideline, I did not know that utf_view worked like that (admittedly, I've figured out quite a bit without it and I seem to have been pretty accurate). It seems to be pulling things out of thin air, however. If anything, running utf_view has created more questions than answers for me, as it seemingly grabs data from non-existant values.

I was also already aware of the ETOC and ITOC that the Project Diva CPK files contain, though I hadn't really looked at the master table and made any connections with it yet, but a lot of the data is quite obvious with the structure spelled out in ASCII.

The viewer, however, seems to omit some data, which does not seem like it would do me much good in recreating CPK files.
by MrFreeman at 6:24 PM EST on December 11, 2011
Alright, at this point, I am able to successfully recreate a 1:1 CPK file of the DLC from Project Diva 2, however. There is one value in the header data that I can not seem to comprehend, and is currently the only value I am omitting. That is the "EnabledPacketSize" and "EnabledDataSize" values, which are both the same value.

The values are larger than the file length itself, and I can't seem to create any multiple of numbers to get this value.

Any ideas?

EDIT1: Well, this is odd. The DLC runs just fine without those values in the header...

However, due to the strict StringID allocation and odd file orders assigned to files, I don't believe there will be any modular way to create CPK files from scratch. I am borrowing a lot of existing data from a CPK file and regenerating it on the fly, which would not work for any other type of CPK file Project Diva uses.

edited 6:34 PM EST December 11, 2011
by Suyo at 5:28 PM EDT on April 3, 2012
Sory for this extreme bump, but I found the meaning of a new value. Don't know whether OP is still here, but posting anyways.
We've recently been successful creating Custom DLC, but not from scratch, but by editing files. We are currently looking into this, and also types of DLC not officially released by SEGA.

The last 4 bytes in 11 are actually a part of the file information. That means 17 is not the last part, and dropped for the last file, but actually the first one, and it also is in the last set.
It represents the offset where the releated filename is. The count starts from before "<NULL>.CpkTocInfo.DirName". Example: The first value of the first file in the sample of OPs picture is 52. Counting 52 from "<NULL>" we end up at 0x16D, the first 0 in "00_attest.txt". From there, it will be read until the 00 stops it.
(I hope I explained that in a way that can be understood.)

I don't know whether I am allowed to post a link, so I won't, but if you are interested in joining a few beginners like me, drop me a mail at su,per,yo,shi,1@gmx.de (remove the ,).
by hcs at 6:00 PM EDT on April 3, 2012
Feel free to post a link.

The thing you are referring to is the string table, and yes it begins at the start of <NULL>, which is the string you'd get if you had an offset 0 into the string table (so it is largely a debugging aid). 19 and 20 in that chart MrFreeman posted are both the string table.

I'll try to lay it out with more detail here:

The overall structure of the @UTF table is, in order:
1. The header:
0x00-0x03: @UTF
0x04-0x07: table size
0x08-0x0b: offset of rows (from 8)
0x0c-0x0f: offset of string table (from 8)
0x10-0x13: offset of data/contents (from 8)
0x14-0x17: string table offset of table name
0x18-0x19: number of fields
0x1a-0x1b: size of each row (in the rows section)
0x1c-0x1f: number of rows

2. The schema immediately follows the header, it has a specification for each field:
0x00: type
0x01-0x04: string table offset of field name
0x05-: (optional) constant value

The type is a bitfield. It is constructed from one of these values, which specify the storage mode:

0x50: there is an individual value for each row
0x30: there is a constant value for all rows
0x10: the value is never set (missing, assume it is zero)

The storage mode is then ORed with a data type:

#define COLUMN_TYPE_DATA 0x0b
#define COLUMN_TYPE_STRING 0x0a
#define COLUMN_TYPE_FLOAT 0x08
#define COLUMN_TYPE_8BYTE 0x06
#define COLUMN_TYPE_4BYTE2 0x05
#define COLUMN_TYPE_4BYTE 0x04
#define COLUMN_TYPE_2BYTE2 0x03
#define COLUMN_TYPE_2BYTE 0x02
#define COLUMN_TYPE_1BYTE2 0x01
#define COLUMN_TYPE_1BYTE 0x00


If you had a 2 byte constant value, for example, it would be 0x32 (or 0x33).

I believe that the "2" values (COLUMN_TYPE_1BYTE2 etc) are signed integers and the others are unsigned.

COLUMN_TYPE_DATA indicates a 32 bit offset into the data section, followed by a 32 bit size. This is not used in many cases where the data is not actually in the table, but is rather somewhere else in the file. This is like the "blob" type in most databases.

COLUMN_TYPE_STRING indicates a 32 bit offset into the string table.

3. The string table (at an offset specified by the header)

4. The rows (at an offset specified by the header)
This consists of the data for each row in the table, if the data was not constant or missing, so pretty much only with storage type 0x50 above. The data is given in the same order as the column listing in 2 above. The size of each row was specified in the header, and they are all the same size (as the varying size data and strings are stored elsewhere).

5. The data (at an offset specified by the header)

---

This should give you enough info to fully understand how the tables are laid out in the file.

If you have any specific questions about that and how it relates to given files, I'd be happy to answer them.

edited 6:48 PM EDT April 3, 2012
by Suyo at 8:27 AM EDT on April 4, 2012
Alright:
Original thread on ProjectDIVA.fr, a french PD forum, by SonicDX
Thread on our site, ProjectDIVA.net, recording my horrible attempts at figuring out how to create new types of DLC

Right now I am able to change the structure of a DLC, and that is all I needed for what I'm currently trying to do - maybe I will later create a tool to easily pack Custom DLC without the need to use hex editing, so this is going to be useful then - thanks!


Go to Page 0

Search this thread

Show all threads

Reply to this thread:

User Name Tags:

bold: [b]bold[/b]
italics: [i]italics[/i]
emphasis: [em]emphasis[/em]
underline: [u]underline[/u]
small: [small]small[/small]
Link: [url=http://www.google.com]Link[/url]

[img=https://www.hcs64.com/images/mm1.png]
Password
Subject
Message

HCS Forum Index
Halley's Comet Software
forum source